perception
30 articles about perception in AI news
mlx-vlm v0.4.4 Launches with Falcon-Perception 300M, TurboQuant Metal Kernels & 1.9x Decode Speedup
The mlx-vlm library v0.4.4 adds support for TII's Falcon-Perception 300M vision model and introduces TurboQuant Metal kernels, achieving up to 1.9x faster decoding with 89% KV cache savings on Apple Silicon.
Anthropic Survey of 80,508 Users Reveals AI's Dual Perception: Hope for Work & Growth, Fear of Unreliability & Job Loss
Anthropic's global study of 80,508 users finds people simultaneously hold hope and fear about AI. Top hopes center on work improvement and personal growth, while top concerns are unreliability, job loss, and reduced autonomy.
Digital Fruit Fly Brain Achieves First Full Perception-Action Loop in Simulation
Startup Eon Systems has demonstrated what appears to be the first complete whole-brain emulation controlling a simulated body. Their digital model of a fruit fly brain, with 125,000 neurons and 50 million synapses, successfully drives realistic behaviors in a physics-simulated fly body.
AgentComm-Bench Exposes Catastrophic Failure Modes in Cooperative Embodied AI Under Real-World Network Conditions
Researchers introduce AgentComm-Bench, a benchmark that stress-tests multi-agent embodied AI systems under six real-world network impairments. It reveals performance drops of over 96% in navigation and 85% in perception F1, highlighting a critical gap between lab evaluations and deployable systems.
The Next Frontier for Self-Driving Cars: Teaching AI to Think Like a Human
A new survey argues that autonomous driving's biggest hurdle is no longer perception but a lack of robust reasoning. The integration of large language models offers a path forward but creates a critical tension between slow deliberation and split-second safety.
Microsoft's Phi-4-Vision: A Compact AI Model That Excels at Math, Science, and Understanding Interfaces
Microsoft has released Phi-4-reasoning-vision-15B, a 15-billion parameter open-weight multimodal model designed for tasks requiring both visual perception and selective reasoning. The compact model excels at scientific, mathematical, and GUI understanding while balancing compute efficiency.
Beyond the Black Box: New Framework Tests AI's True Clinical Reasoning on Heart Signals
Researchers have developed a novel framework to evaluate how well multimodal AI models truly reason about ECG signals, separating perception from deduction. This addresses critical gaps in validating AI's clinical logic beyond superficial metrics.
EmbodiedAct: How Active AI Agents Are Revolutionizing Scientific Simulation
Researchers have developed EmbodiedAct, a framework that transforms scientific software into active AI agents with real-time perception. This breakthrough addresses critical limitations in how LLMs interact with physical simulations, enabling more reliable scientific discovery through embodied actions.
Jensen Huang's AI Productivity Mandate: Engineers Must Spend 50% of Salary on AI Tokens
NVIDIA CEO Jensen Huang argues that a $500K engineer should spend at least $250K annually on AI inference tokens, framing token consumption as essential as CAD tools for chip design. He claims this investment eliminates perceptions of difficulty, time, and resource constraints in development.
NVIDIA Spotlights Physical AI Tools for Robotics Week 2026
NVIDIA is highlighting its platforms for robot simulation, synthetic data, and AI-powered learning during National Robotics Week 2026, aiming to accelerate the transition from virtual training to physical deployment.
AI Forecasters Revise AGI Timeline: Key Milestones Pulled Forward to 2029-2030 After Recent Model Progress
A significant update from AI forecasters indicates key AGI milestones have been pulled forward, with the median prediction for AGI arrival shifting from 2032 to 2029-2030. This revision follows rapid progress in recent model capabilities, particularly in reasoning and tool use.
Generative World Renderer: 4M+ RGB/G-Buffer Frames from Cyberpunk 2077 & Black Myth: Wukong Released for Inverse Graphics
A new framework and dataset extracts over 4 million synchronized RGB and G-buffer frames from Cyberpunk 2077 and Black Myth: Wukong, enabling AI models to learn inverse material decomposition and controllable game environment editing.
26 Humanoid Robot Brands to Field 300+ Units in Beijing's E-Town Half Marathon on April 19
On April 19, Beijing's E-Town will host a half marathon where 300+ humanoid robots from 26 brands will run 21km. This is the largest public endurance and locomotion stress test for commercial humanoid platforms.
DeepSeek V4 to Run on Huawei Ascend 950PR Chips, Sparking 20% Price Surge
DeepSeek's anticipated V4 model will be powered by Huawei's Ascend 950PR chips, with Alibaba, ByteDance, and Tencent stockpiling hundreds of thousands of units ahead of launch. This has driven chip prices up approximately 20% in recent weeks.
AI-2027 Authors Accelerate AGI Timelines, Citing Rapid Progress in Agentic Coding
The AI-2027 forecasting group has accelerated its timeline for when AI could replace human software engineers by 1.5 years, from late 2029 to mid-2028. This revision is based on observed rapid progress in agentic coding systems over the last 3-5 months.
OpenAI Acquires Tech Podcast TBPN in First Media Deal, Signaling Strategic Content Shift
OpenAI has acquired the online technology talk show TBPN, marking its first foray into media ownership. The move signals a strategic shift toward controlling narrative channels around AI development and adoption.
mmAnomaly: New Multi-Modal Framework Uses Conditional Latent Diffusion to Achieve 94% F1 Score for mmWave Anomaly Detection
Researchers introduced mmAnomaly, a multi-modal anomaly detection system that uses a conditional latent diffusion model to synthesize expected mmWave spectra from visual context, achieving up to a 94% F1 score for detecting concealed weapons and through-wall anomalies.
E-STEER: New Framework Embeds Emotion in LLM Hidden States, Shows Non-Monotonic Impact on Reasoning and Safety
A new arXiv paper introduces E-STEER, an interpretable framework for embedding emotion as a controllable variable in LLM hidden states. Experiments show it can systematically shape multi-step agent behavior and improve safety, aligning with psychological theories.
Google DeepMind Maps Six 'AI Agent Traps' That Can Hijack Autonomous Systems in the Wild
Google DeepMind has published a framework identifying six categories of 'traps'—from hidden web instructions to poisoned memory—that can exploit autonomous AI agents. This research provides the first systematic taxonomy for a growing attack surface as agents gain web access and tool-use capabilities.
CARLA-Air Unifies CARLA and AirSim Simulators in Single Unreal Engine Process for Embodied AI
CARLA-Air merges the CARLA autonomous driving and AirSim drone simulators into one Unreal Engine process, enabling zero-latency air-ground sensor synchronization with 18 sensor types for embodied AI training.
OpenAI Internal Model Reportedly Solves Three New Erdős Problems, Marking AI Advance in Pure Mathematics
An internal AI model at OpenAI has reportedly solved three previously unsolved mathematical problems from the Erdős collection. This development signals a potential leap in AI's capacity for abstract reasoning and formal theorem proving.
LimX's Oli Robot Demonstrates Autonomous Unboxing and Boot-Up via 31-DoF System
LimX's Oli robot autonomously exited its shipping container, powered up its 31-degree-of-freedom system, and began moving. The demo highlights progress in self-contained robotic deployment without human setup.
Aldi Partners with Instacart to Power U.S. E-commerce Platform
Aldi U.S. has launched a new website and app powered by Instacart's white-label Storefront Pro platform, shifting from in-house development. The move aims to enhance product recommendations, discovery, and meal planning while leveraging Instacart's fulfillment network.
Roboflow's RF-DETR Model Ported to Apple MLX, Enabling Real-Time On-Device Instance Segmentation
Roboflow's RF-DETR object detection model is now available on Apple's MLX framework, enabling real-time instance segmentation on Apple Silicon devices. This port unlocks new on-device visual analysis applications for robotics and augmented vision-language models.
The AI Agent Production Gap: Why 86% of Agent Pilots Never Reach Production
A Medium article highlights the stark reality that most AI agent demonstrations fail to transition to production systems, citing a critical gap between prototype and deployment. This follows recent industry analysis revealing similar failure rates.
Apple Removes AI Coding Apps Replit & Vibecode from App Store, Coinciding with Xcode AI Integration
Apple has removed AI-powered coding apps Replit and Vibecode from the App Store, reportedly for enabling app creation outside Apple's approval system. This coincides with Apple's recent integration of its own AI coding assistant into Xcode.
Exclusive | Buying the Dip? This AI Agent Will Do It for You - WSJ
The Wall Street Journal reports on a new AI agent designed to autonomously execute 'buy the dip' investment strategies. This represents a significant step in the evolution of AI agents from assistants to autonomous decision-makers with financial agency.
Maker 'Sword Man' Builds 5,000 kg Real-Time Motion-Tracking Robotic Hand
A Chinese maker known as Sword Man has constructed a massive 5,000 kg robotic hand from scratch. It uses a motion-tracking glove to perfectly mimic the operator's hand movements in real-time.
Atlanta Startup Deploys AI-Powered Robot Dogs for Nighttime Neighborhood Security
A U.S. startup based in Atlanta is deploying quadrupedal robots for autonomous nighttime neighborhood patrols. The units are designed to detect intruders and alert residents, representing a commercial pivot for legged robotics.
Uber Acquires Luxury Chauffeur Service Blacklane to Expand Executive Travel Business
Uber has acquired the luxury chauffeur booking platform Blacklane, which operates in over 500 cities across 60+ countries. This strategic move directly expands Uber's footprint in the high-end, executive travel segment.