updates

30 articles about updates in AI news

Memento-Skills Agent System Achieves 116.2% Relative Improvement on Humanity's Last Exam Without LLM Updates

Memento-Skills is a generalist agent system that autonomously constructs and adapts task-specific agents through experience. It enables continual learning without updating LLM parameters, achieving 26.2% and 116.2% relative improvements on GAIA and Humanity's Last Exam benchmarks.

85% relevant

New Research Diagnoses LLMs' Struggle with Multiple Knowledge Updates in Context

A new arXiv paper reveals a persistent bias in LLMs when facts are updated multiple times within a long context. Models increasingly favor the earliest version, failing to track the latest state—a critical flaw for dynamic knowledge tasks.

78% relevant

Meta Halts Mercor Work After Supply Chain Breach Exposes AI Training Secrets

A supply chain attack via compromised software updates at data-labeling vendor Mercor has forced Meta to pause collaboration, risking exposure of core AI training pipelines and quality metrics used by top labs.

97% relevant

DACT: A New Framework for Drift-Aware Continual Tokenization in Generative Recommender Systems

Researchers propose DACT, a framework to adapt generative recommender systems to evolving user behavior and new items without costly full retraining. It identifies 'drifting' items and selectively updates token sequences, balancing stability with plasticity. This addresses a core operational challenge for real-world, dynamic recommendation engines.

86% relevant

MiniMax M2.7 AI Agent Rewrites Its Own Harness, Achieving 9 Gold Medals on MLE Bench Lite Without Retraining

MiniMax's M2.7 agent autonomously rewrites its own operational harness—skills, memory, and workflow rules—through a self-optimization loop. After 100+ internal rounds, it earned 9 gold medals on OpenAI's MLE Bench Lite without weight updates.

95% relevant

Trace2Skill Framework Distills Execution Traces into Declarative Skills via Parallel Sub-Agents

Researchers introduced Trace2Skill, a framework that uses parallel sub-agents to analyze execution trajectories and distill them into transferable declarative skills. This enables performance improvements in larger models without parameter updates.

85% relevant

Secure Your MCP Servers: ClawGuard Scans for Tool Poisoning and Rug Pulls

New security tool ClawGuard scans MCP servers for hidden instructions in tool descriptions, parameter exploits, and malicious updates—critical for Claude Code users connecting to external tools.

91% relevant

Momentum-Consistency Fine-Tuning (MCFT) Achieves 3.30% Gain in 5-Shot 3D Vision Tasks Without Adapters

Researchers propose MCFT, an adapter-free fine-tuning method for 3D point cloud models that selectively updates encoder parameters with momentum constraints. It outperforms prior methods by 3.30% in 5-shot settings and maintains original inference latency.

75% relevant

Google Advances Agentic Shopping with UCP as OpenAI Retreats from Instant Checkout

Google is expanding its Universal Commerce Protocol (UCP) for AI shopping agents, adding multi-item cart creation, real-time catalog updates, and identity linking. This comes as OpenAI pulls back from its ChatGPT Instant Checkout feature, signaling a strategic pivot in the AI commerce landscape.

100% relevant

MetaClaw: Personal AI Agent That Meta-Learns from Conversations Using Cloud LoRA and Skill Synthesis

MetaClaw is a personal AI agent that automatically evolves from every conversation. It meta-learns in the wild using cloud LoRA and skill synthesis, scheduling weight updates during idle time with zero downtime.

85% relevant

Blue Yonder Expands Agentic AI and Mobile Apps for Retail Supply Chain Execution

Blue Yonder announced new agentic AI capabilities and mobile companion apps for retail planning and execution. The updates target merchandise financial planning, assortment optimization, and mobile allocation workflows to improve decision speed and accuracy.

100% relevant

OpenAI's GPT-5.4: The Million-Token Context Window That Changes Everything

OpenAI's upcoming GPT-5.4 will feature a groundbreaking 1 million token context window, matching competitors like Gemini and Claude. The model introduces an 'Extreme reasoning mode' for complex tasks and represents a shift toward monthly updates.

95% relevant

AI-Native CRM Revolution: How Lightfield Automates Sales Workflows Beyond Traditional Systems

Lightfield introduces an AI-native CRM that automatically updates customer data by connecting to email, calendar, and meetings, eliminating manual upkeep and transforming how sales teams manage relationships.

85% relevant

Grok's Weekly Evolution: How xAI's Rapid Iteration Model Could Redefine AI Development

xAI's Grok AI assistant is implementing a weekly improvement cycle, promising 'recursive intelligence growth' through continuous updates. This rapid iteration approach could accelerate AI capabilities beyond traditional development models.

85% relevant

Tencent's Training-Free GRPO: A Paradigm Shift in AI Alignment Without Fine-Tuning

Tencent researchers have introduced Training-Free GRPO, a method that achieves reinforcement learning-level alignment results for just $18 instead of $10,000—with zero parameter updates. This breakthrough could fundamentally change how we optimize language models.

95% relevant

Blue Yonder Expands Agentic AI and Mobile Apps for Supply Chain Execution

Supply chain software leader Blue Yonder announced new AI agents and mobile applications for retail planning and execution. The updates target merchandise financial planning, assortment optimization, and mobile allocation tasks to help teams make faster, smarter decisions.

100% relevant

How to Lock in $2/Month Claude Code Access with ANTHROPIC_BASE_URL

Use ANTHROPIC_BASE_URL=https://simplylouie.com/api/v1 to switch Claude Code to predictable $2/month pricing instead of per-token charges.

89% relevant

How Claude Code's System Prompt Engine Actually Works

Claude Code builds its system prompt dynamically from core instructions, conditional tool definitions, user files, and managed conversation history, revealing the critical role of context engineering.

92% relevant

AI Weekly: GPT-6 Rumors, DeepSeek V4 on Huawei, Anthropic Models, Qwen 3.6-Plus

A weekly roundup video aggregates major AI rumors and announcements, including unverified GPT-6 details, DeepSeek V4 reportedly running on Huawei hardware, and launches of Anthropic's Conway and Ultraplan and Alibaba's Qwen 3.6-Plus.

85% relevant

How to Fix Claude Code's Remote Control Issues and Get Visual Feedback

Practical solutions for Claude Code's remote control instability and lack of visual feedback when building UI components.

74% relevant

How Anthropic's Team Uses Skills as Knowledge Containers (And What It Means For Your CLAUDE.md)

Learn how to use Claude Code skills not just for automation but as living knowledge bases, following patterns from Anthropic's own engineering team.

70% relevant

EgoAlpha's 'Prompt Engineering Playbook' Repo Hits 1.7k Stars

Research lab EgoAlpha compiled advanced prompt engineering methods from Stanford, Google, and MIT papers into a public GitHub repository. The 758-commit repo provides free, research-backed techniques for in-context learning, RAG, and agent frameworks.

85% relevant

OpenAI Image Generation V2 Release Imminent, Per Leak

A post from a known leaker indicates OpenAI's next image generation model, potentially DALL-E 4, is about to be released. This would mark a major competitive move in the rapidly evolving text-to-image space.

85% relevant

GPT-Image-2 Appears in ChatGPT App Images Tab, Signaling OpenAI Visual AI Push

A user spotted 'GPT-Image-2' listed in the images tab of the ChatGPT mobile app. This indicates OpenAI is testing a potential successor to its DALL-E image generation models directly within its flagship product.

85% relevant

GitNexus Open Sources Codebase Knowledge Graph Engine for AI Agents

GitNexus, an open-source knowledge graph engine, autonomously indexes codebases to map dependencies and execution flows. It integrates with Claude Code, Cursor, and Windsurf via MCP to give AI agents architectural awareness, preventing breaking changes.

99% relevant

AI Forecasters Revise AGI Timeline: Key Milestones Pulled Forward to 2029-2030 After Recent Model Progress

A significant update from AI forecasters indicates key AGI milestones have been pulled forward, with the median prediction for AGI arrival shifting from 2032 to 2029-2030. This revision follows rapid progress in recent model capabilities, particularly in reasoning and tool use.

85% relevant

Anthropic's Next-Generation AI Model Details Leak Amidst Competitive Pressure

Details about Anthropic's upcoming AI model have reportedly leaked, revealing advanced capabilities that could significantly impact cybersecurity applications. The leak comes as Anthropic pursues an ambitious $5 billion funding plan to compete directly with OpenAI.

84% relevant

OpenSCAD Web: Open-Source Text-to-CAD Tool Runs Fully In-Browser via WebAssembly

A developer has released an open-source text-to-CAD tool that runs entirely in a web browser using WebAssembly. Users describe a 3D object in plain English, optionally upload a reference image, and receive a parametric model with adjustable dimensions that exports directly to 3D printer formats.

85% relevant

Jack Dorsey Predicts AI Will Replace Corporate Middle Management by Automating Coordination

Jack Dorsey states AI can substitute corporate middle management by building live models of organizational activity from digital systems, fundamentally changing coordination mechanisms.

85% relevant

Google's AICore Beta Enables On-Device Gemini Nano 4 Downloads for Android Phones

A new beta of Google's AICore system service enables users to download Gemini Nano 4 Full and Gemini Nano 4 Fast models directly onto compatible Android phones, including those with Snapdragon 8 Elite Gen 5 chips. This moves beyond pre-installed AI to user-initiated model management.

85% relevant