perception

30 articles about perception in AI news

mlx-vlm v0.4.4 Launches with Falcon-Perception 300M, TurboQuant Metal Kernels & 1.9x Decode Speedup

The mlx-vlm library v0.4.4 adds support for TII's Falcon-Perception 300M vision model and introduces TurboQuant Metal kernels, achieving up to 1.9x faster decoding with 89% KV cache savings on Apple Silicon.

85% relevant

Anthropic Survey of 80,508 Users Reveals AI's Dual Perception: Hope for Work & Growth, Fear of Unreliability & Job Loss

Anthropic's global study of 80,508 users finds people simultaneously hold hope and fear about AI. Top hopes center on work improvement and personal growth, while top concerns are unreliability, job loss, and reduced autonomy.

87% relevant

Digital Fruit Fly Brain Achieves First Full Perception-Action Loop in Simulation

Startup Eon Systems has demonstrated what appears to be the first complete whole-brain emulation controlling a simulated body. Their digital model of a fruit fly brain, with 125,000 neurons and 50 million synapses, successfully drives realistic behaviors in a physics-simulated fly body.

95% relevant

AgentComm-Bench Exposes Catastrophic Failure Modes in Cooperative Embodied AI Under Real-World Network Conditions

Researchers introduce AgentComm-Bench, a benchmark that stress-tests multi-agent embodied AI systems under six real-world network impairments. It reveals performance drops of over 96% in navigation and 85% in perception F1, highlighting a critical gap between lab evaluations and deployable systems.

100% relevant

The Next Frontier for Self-Driving Cars: Teaching AI to Think Like a Human

A new survey argues that autonomous driving's biggest hurdle is no longer perception but a lack of robust reasoning. The integration of large language models offers a path forward but creates a critical tension between slow deliberation and split-second safety.

81% relevant

Microsoft's Phi-4-Vision: A Compact AI Model That Excels at Math, Science, and Understanding Interfaces

Microsoft has released Phi-4-reasoning-vision-15B, a 15-billion parameter open-weight multimodal model designed for tasks requiring both visual perception and selective reasoning. The compact model excels at scientific, mathematical, and GUI understanding while balancing compute efficiency.

85% relevant

Beyond the Black Box: New Framework Tests AI's True Clinical Reasoning on Heart Signals

Researchers have developed a novel framework to evaluate how well multimodal AI models truly reason about ECG signals, separating perception from deduction. This addresses critical gaps in validating AI's clinical logic beyond superficial metrics.

75% relevant

EmbodiedAct: How Active AI Agents Are Revolutionizing Scientific Simulation

Researchers have developed EmbodiedAct, a framework that transforms scientific software into active AI agents with real-time perception. This breakthrough addresses critical limitations in how LLMs interact with physical simulations, enabling more reliable scientific discovery through embodied actions.

70% relevant

Jensen Huang's AI Productivity Mandate: Engineers Must Spend 50% of Salary on AI Tokens

NVIDIA CEO Jensen Huang argues that a $500K engineer should spend at least $250K annually on AI inference tokens, framing token consumption as essential as CAD tools for chip design. He claims this investment eliminates perceptions of difficulty, time, and resource constraints in development.

85% relevant

NVIDIA Spotlights Physical AI Tools for Robotics Week 2026

NVIDIA is highlighting its platforms for robot simulation, synthetic data, and AI-powered learning during National Robotics Week 2026, aiming to accelerate the transition from virtual training to physical deployment.

85% relevant

AI Forecasters Revise AGI Timeline: Key Milestones Pulled Forward to 2029-2030 After Recent Model Progress

A significant update from AI forecasters indicates key AGI milestones have been pulled forward, with the median prediction for AGI arrival shifting from 2032 to 2029-2030. This revision follows rapid progress in recent model capabilities, particularly in reasoning and tool use.

85% relevant

Generative World Renderer: 4M+ RGB/G-Buffer Frames from Cyberpunk 2077 & Black Myth: Wukong Released for Inverse Graphics

A new framework and dataset extracts over 4 million synchronized RGB and G-buffer frames from Cyberpunk 2077 and Black Myth: Wukong, enabling AI models to learn inverse material decomposition and controllable game environment editing.

85% relevant

26 Humanoid Robot Brands to Field 300+ Units in Beijing's E-Town Half Marathon on April 19

On April 19, Beijing's E-Town will host a half marathon where 300+ humanoid robots from 26 brands will run 21km. This is the largest public endurance and locomotion stress test for commercial humanoid platforms.

87% relevant

DeepSeek V4 to Run on Huawei Ascend 950PR Chips, Sparking 20% Price Surge

DeepSeek's anticipated V4 model will be powered by Huawei's Ascend 950PR chips, with Alibaba, ByteDance, and Tencent stockpiling hundreds of thousands of units ahead of launch. This has driven chip prices up approximately 20% in recent weeks.

91% relevant

AI-2027 Authors Accelerate AGI Timelines, Citing Rapid Progress in Agentic Coding

The AI-2027 forecasting group has accelerated its timeline for when AI could replace human software engineers by 1.5 years, from late 2029 to mid-2028. This revision is based on observed rapid progress in agentic coding systems over the last 3-5 months.

85% relevant

OpenAI Acquires Tech Podcast TBPN in First Media Deal, Signaling Strategic Content Shift

OpenAI has acquired the online technology talk show TBPN, marking its first foray into media ownership. The move signals a strategic shift toward controlling narrative channels around AI development and adoption.

91% relevant

mmAnomaly: New Multi-Modal Framework Uses Conditional Latent Diffusion to Achieve 94% F1 Score for mmWave Anomaly Detection

Researchers introduced mmAnomaly, a multi-modal anomaly detection system that uses a conditional latent diffusion model to synthesize expected mmWave spectra from visual context, achieving up to a 94% F1 score for detecting concealed weapons and through-wall anomalies.

72% relevant

E-STEER: New Framework Embeds Emotion in LLM Hidden States, Shows Non-Monotonic Impact on Reasoning and Safety

A new arXiv paper introduces E-STEER, an interpretable framework for embedding emotion as a controllable variable in LLM hidden states. Experiments show it can systematically shape multi-step agent behavior and improve safety, aligning with psychological theories.

75% relevant

Google DeepMind Maps Six 'AI Agent Traps' That Can Hijack Autonomous Systems in the Wild

Google DeepMind has published a framework identifying six categories of 'traps'—from hidden web instructions to poisoned memory—that can exploit autonomous AI agents. This research provides the first systematic taxonomy for a growing attack surface as agents gain web access and tool-use capabilities.

95% relevant

CARLA-Air Unifies CARLA and AirSim Simulators in Single Unreal Engine Process for Embodied AI

CARLA-Air merges the CARLA autonomous driving and AirSim drone simulators into one Unreal Engine process, enabling zero-latency air-ground sensor synchronization with 18 sensor types for embodied AI training.

85% relevant

OpenAI Internal Model Reportedly Solves Three New Erdős Problems, Marking AI Advance in Pure Mathematics

An internal AI model at OpenAI has reportedly solved three previously unsolved mathematical problems from the Erdős collection. This development signals a potential leap in AI's capacity for abstract reasoning and formal theorem proving.

85% relevant

LimX's Oli Robot Demonstrates Autonomous Unboxing and Boot-Up via 31-DoF System

LimX's Oli robot autonomously exited its shipping container, powered up its 31-degree-of-freedom system, and began moving. The demo highlights progress in self-contained robotic deployment without human setup.

85% relevant

Aldi Partners with Instacart to Power U.S. E-commerce Platform

Aldi U.S. has launched a new website and app powered by Instacart's white-label Storefront Pro platform, shifting from in-house development. The move aims to enhance product recommendations, discovery, and meal planning while leveraging Instacart's fulfillment network.

100% relevant

Roboflow's RF-DETR Model Ported to Apple MLX, Enabling Real-Time On-Device Instance Segmentation

Roboflow's RF-DETR object detection model is now available on Apple's MLX framework, enabling real-time instance segmentation on Apple Silicon devices. This port unlocks new on-device visual analysis applications for robotics and augmented vision-language models.

89% relevant

The AI Agent Production Gap: Why 86% of Agent Pilots Never Reach Production

A Medium article highlights the stark reality that most AI agent demonstrations fail to transition to production systems, citing a critical gap between prototype and deployment. This follows recent industry analysis revealing similar failure rates.

90% relevant

Apple Removes AI Coding Apps Replit & Vibecode from App Store, Coinciding with Xcode AI Integration

Apple has removed AI-powered coding apps Replit and Vibecode from the App Store, reportedly for enabling app creation outside Apple's approval system. This coincides with Apple's recent integration of its own AI coding assistant into Xcode.

85% relevant

Exclusive | Buying the Dip? This AI Agent Will Do It for You - WSJ

The Wall Street Journal reports on a new AI agent designed to autonomously execute 'buy the dip' investment strategies. This represents a significant step in the evolution of AI agents from assistants to autonomous decision-makers with financial agency.

82% relevant

Maker 'Sword Man' Builds 5,000 kg Real-Time Motion-Tracking Robotic Hand

A Chinese maker known as Sword Man has constructed a massive 5,000 kg robotic hand from scratch. It uses a motion-tracking glove to perfectly mimic the operator's hand movements in real-time.

87% relevant

Atlanta Startup Deploys AI-Powered Robot Dogs for Nighttime Neighborhood Security

A U.S. startup based in Atlanta is deploying quadrupedal robots for autonomous nighttime neighborhood patrols. The units are designed to detect intruders and alert residents, representing a commercial pivot for legged robotics.

85% relevant

Uber Acquires Luxury Chauffeur Service Blacklane to Expand Executive Travel Business

Uber has acquired the luxury chauffeur booking platform Blacklane, which operates in over 500 cities across 60+ countries. This strategic move directly expands Uber's footprint in the high-end, executive travel segment.

84% relevant