memory

30 articles about memory in AI news

Nous Research's Hermes Agent Features Self-Improving Skills, Persistent Memory

A new evaluation of Nous Research's Hermes Agent highlights its self-improving ability to build reusable tools from experience and a smarter persistent memory system that conserves token usage. The agent reportedly improves with continued use, representing a shift towards more adaptive AI systems.

85% relevant

Memory Systems for AI Agents: Architectures, Frameworks, and Challenges

A technical analysis details the multi-layered memory architectures—short-term, episodic, semantic, procedural—required to transform stateless LLMs into persistent, reliable AI agents. It compares frameworks like MemGPT and LangMem that manage context limits and prevent memory drift.

100% relevant

Building a Memory Layer for a Voice AI Agent: A Developer's Blueprint

A developer shares a technical case study on building a voice-first journal app, focusing on the critical memory layer. The article details using Redis Agent Memory Server for working/long-term memory and key latency optimizations like streaming APIs and parallel fetches to meet voice's strict responsiveness demands.

76% relevant

MemFactory Framework Unifies Agent Memory Training & Inference, Reports 14.8% Gains Over Baselines

Researchers introduced MemFactory, a unified framework treating agent memory as a trainable component. It supports multiple memory paradigms and shows up to 14.8% relative improvement over baseline methods.

97% relevant

MemoryCD: New Benchmark Tests LLM Agents on Real-World, Lifelong User Memory for Personalization

Researchers introduce MemoryCD, the first large-scale benchmark for evaluating LLM agents' long-context memory using real Amazon user data across 12 domains. It reveals current methods are far from satisfactory for lifelong personalization.

74% relevant

Memory Sparse Attention (MSA) Achieves 100M Token Context with Near-Linear Complexity

A new attention architecture, Memory Sparse Attention (MSA), breaks the 100M token context barrier while maintaining 94% accuracy at 1M tokens. It uses document-wise RoPE and end-to-end sparse attention to outperform RAG systems and frontier models.

95% relevant

Google's TurboQuant Compresses LLM KV Cache 6x with Zero Accuracy Loss, Cutting GPU Memory by 80%

Google researchers introduced TurboQuant, a method that compresses LLM KV cache from 32-bit to 3-bit precision without accuracy degradation. This reduces GPU memory consumption by over 80% and speeds up inference 8x on H100 GPUs.

97% relevant

Google Gemini Launches Manual Memory & Chat Import to Ease Switching from ChatGPT, Claude

Google Gemini is rolling out 'Import Memory' and 'Import Chat History' features for desktop users. The manual tools provide prompts and a .zip upload to transfer data from other AI assistants, aiming to lower the barrier for users to switch from competitors like ChatGPT or Claude.

79% relevant

TurboQuant Ported to Apple MLX, Claims 75% Memory Reduction with Minimal Performance Loss

Developer Prince Canuma has successfully ported the TurboQuant quantization method to Apple's MLX framework, reporting a 75% reduction in memory usage with nearly no performance degradation for on-device AI models.

85% relevant

Google's TurboQuant AI Research Report Sparks Sell-Off in Micron, Samsung, and SK Hynix Memory Stocks

Google's TurboQuant research blog publication triggered immediate market reaction, with shares of major memory manufacturers dropping 2-4% as investors anticipate AI-driven efficiency gains reducing future memory demand.

85% relevant

Add Vector Memory to Claude Code: The claude-memory-mcp Server Solves CLAUDE.md's 200-Line Limit

Install this open-source MCP server to give Claude Code persistent, searchable memory across projects. It surfaces only relevant context, solving CLAUDE.md's scaling problems.

95% relevant

Add Persistent Memory to Claude Code in 5 Minutes with memoclaw-mcp

Stop re-explaining your preferences. Install the memoclaw-mcp server to give Claude Code persistent, semantic memory across sessions using the Model Context Protocol.

100% relevant

Google's TurboQuant Cuts LLM KV Cache Memory by 6x, Enables 3-Bit Storage Without Accuracy Loss

Google released TurboQuant, a novel two-stage quantization algorithm that compresses the KV cache in long-context LLMs. It reduces memory by 6x, achieves 3-bit storage with no accuracy drop, and speeds up attention scoring by up to 8x on H100 GPUs.

95% relevant

Anthropic's 'Auto-dream' Feature for Claude Code Automatically Compacts and Indexes Project Memory

A potentially unreleased Claude Code feature called 'Auto-dream' uses a background subagent to periodically review, consolidate, and index project memory, keeping the main MEMORY.md file short and durable.

91% relevant

Stop Pasting Context: Add Persistent Memory to Claude Code with Bossa MCP

Bossa MCP gives Claude Code persistent filesystem memory across sessions, eliminating repetitive context pasting and enabling smarter progressive disclosure.

97% relevant

Supermemory Claims ~99% on LongMemEval_s with Experimental ASMR Technique, Plans Open-Source Release

An experimental AI technique called ASMR (Agentic Search and Memory Retrieval) reportedly achieved near-perfect performance (~99%) on the LongMemEval_s benchmark. The method replaces vector search with parallel observer agents and will be open-sourced in 11 days.

95% relevant

Memory Sparse Attention (MSA) Enables 100M Token Context Windows with Minimal Performance Loss

Memory Sparse Attention (MSA) is a proposed architecture that allows AI models to store and reason over massive long-term memory directly within their attention mechanism, eliminating the need for external retrieval systems. The approach reportedly enables context windows of up to 100 million tokens with minimal performance degradation.

85% relevant

Flash-KMeans: An IO-Aware GPU Implementation That Rethinks K-Means Memory Access

Flash-KMeans is a new, exact k-means clustering implementation designed for GPUs. It focuses on optimizing memory access patterns to overcome I/O bottlenecks that limit performance.

85% relevant

How Claude Code's New Auto-Memory and Remote Control Features Stack Up Against OpenClaw

Claude Code has rapidly added auto-memory and remote session control, but understanding their practical limits is key to using them effectively.

74% relevant

Qwen 3.5 397B-A17B MoE Model Runs on M3 Mac at 5.7 TPS with 5.5GB Active Memory via SSD Streaming

Developer Dan reportedly runs the 209GB Qwen 3.5 397B-A17B MoE model on an M3 Mac at ~5.7 tokens per second using only 5.5GB of active memory by quantizing and streaming weights from SSD.

85% relevant

AMA-Bench Released: New Benchmark Focuses on Agent Memory Beyond Dialogue

Researchers have released AMA-Bench, a new evaluation framework designed to test AI agent memory capabilities specifically, moving beyond standard dialogue-based assessments. The benchmark aims to address limitations in existing memory evaluation methods.

85% relevant

Did You Check the Right Pocket? A New Framework for Cost-Sensitive Memory Routing in AI Agents

A new arXiv paper frames memory retrieval in AI agents as a 'store-routing' problem. It shows that selectively querying specialized data stores, rather than all stores for every request, significantly improves efficiency and accuracy, formalizing a cost-sensitive trade-off.

70% relevant

SK Group Chairman Forecasts Memory Chip Shortage Until 2030, Warns of Sustained Price Increases

SK Group Chairman Chey Tae-won predicts the global memory chip supply crunch could persist until around 2030, with wafer supply lagging demand by over 20% and prices continuing to rise.

85% relevant

How a GPU Memory Leak Nearly Cost an AI Team a Major Client During a Live Demo

A detailed post-mortem of a critical AI inference failure during a client demo reveals how silent GPU memory leaks, inadequate health checks, and missing circuit breakers can bring down a production pipeline. The author shares the architectural fixes implemented to prevent recurrence.

100% relevant

Cog: Add Persistent Memory and Self-Reflection to Claude Code with Just Markdown

Cog is a plain-text cognitive architecture for Claude Code that adds persistent memory, self-reflection, and foresight using only CLAUDE.md files—no servers or dependencies.

100% relevant

Memory Market Squeeze Threatens iPhone Price Hikes as AI Demands Strain Supply

A global RAM shortage and price increases could force Apple to raise iPhone prices by up to $250, according to industry analysis. The tech giant is reportedly unwilling to absorb the cost, passing it directly to consumers amid surging memory demands from AI applications.

85% relevant

AI Agents Get a Memory Upgrade: New Framework Treats Multi-Agent Memory as Computer Architecture

A new paper proposes treating multi-agent memory systems as a computer architecture problem, introducing a three-layer hierarchy and identifying critical protocol gaps. This approach could significantly improve reasoning, skills, and tool usage in collaborative AI systems.

85% relevant

Hindsight AI: How Biomimetic Memory Systems Are Revolutionizing Agent Intelligence

Hindsight, an open-source AI memory system, achieves state-of-the-art performance on the LongMemEval benchmark by mimicking human memory structures. Unlike traditional RAG approaches, it employs parallel retrieval strategies to enable agents that don't just remember—they learn.

95% relevant

Hybrid Self-evolving Structured Memory: A Breakthrough for GUI Agent Performance

Researchers propose HyMEM, a graph-based memory system for GUI agents that combines symbolic nodes with continuous embeddings. It enables multi-hop retrieval and self-evolution, boosting open-source VLMs to surpass closed-source models like GPT-4o on computer-use tasks.

72% relevant

RF-Mem: A Dual-Path Memory Retrieval System for Personalized LLMs

Researchers propose RF-Mem, a memory retrieval system for LLMs that mimics human cognitive processes. It adaptively switches between fast 'familiarity' and deep 'recollection' paths to personalize responses efficiently, outperforming existing methods under constrained budgets.

77% relevant