llm research

30 articles about llm research in AI news

Andrej Karpathy's Personal Knowledge Management System Uses LLM Embeddings Without RAG for 400K-Word Research Base

AI researcher Andrej Karpathy has developed a personal knowledge management system that processes 400,000 words of research notes using LLM embeddings rather than traditional RAG architecture. The system enables semantic search, summarization, and content generation directly from his Obsidian vault.

91% relevant

New Research: Fine-Tuned LLMs Outperform GPT-5 for Probabilistic Supply Chain Forecasting

Researchers introduced an end-to-end framework that fine-tunes large language models (LLMs) to produce calibrated probabilistic forecasts of supply chain disruptions. The model, trained on realized outcomes, significantly outperforms strong baselines like GPT-5 on accuracy, calibration, and precision. This suggests a pathway for creating domain-specific forecasting models that generate actionable, decision-ready signals.

80% relevant

Researchers Train LLM from Scratch on 28,000 Victorian-Era Texts, Creating Historical Dialogue AI

Researchers have created a specialized LLM trained exclusively on 28,000 British texts from 1837-1899, enabling historically accurate Victorian-era dialogue generation. Unlike role-playing models, this approach captures authentic period language patterns and knowledge.

87% relevant

IBM Research Survey Proposes Framework for Optimizing LLM Agent Workflows

IBM researchers published a comprehensive survey categorizing approaches to LLM agent workflow optimization along three dimensions: when structure is determined, which components get optimized, and what signals guide optimization.

99% relevant

Google Research's TurboQuant Achieves 6x LLM Compression Without Accuracy Loss, 8x Speedup on H100

Google Research introduced TurboQuant, a novel compression algorithm that shrinks LLM memory footprint by 6x without retraining or accuracy drop. Its 4-bit version delivers 8x faster processing on H100 GPUs while matching full-precision quality.

95% relevant

From Token to Item: New Research Proposes Item-Aware Attention to Enhance LLMs for Recommendation

Researchers propose an Item-Aware Attention Mechanism (IAM) that restructures how LLMs process product data for recommendations. It separates attention into intra-item (content) and inter-item (collaborative) layers to better model item-level relationships. This addresses a key limitation in current LLM-based recommenders.

76% relevant

New Research Automates Domain-Specific Query Expansion with Multi-LLM Ensembles

Researchers propose a fully automated framework for query expansion that constructs in-domain exemplars and refines outputs from multiple LLMs. This eliminates manual prompt engineering and improves retrieval performance across domains.

79% relevant

When AI Agents Disagree: New Research Tests Whether LLMs Can Reach Consensus

New research explores whether LLM-based AI agents can effectively communicate and reach agreement in multi-agent systems. The study reveals surprising patterns in how AI agents negotiate, disagree, and sometimes fail to find common ground.

85% relevant

New AI Benchmark Exposes Critical Gap in Causal Reasoning: Why LLMs Struggle with Real-World Research Design

Researchers have introduced CausalReasoningBenchmark, a novel evaluation framework that separates causal identification from estimation. The benchmark reveals that while LLMs can identify high-level strategies 84% of the time, they correctly specify full research designs only 30% of the time, highlighting a critical bottleneck in automated causal inference.

70% relevant

Mechanistic Research Reveals Sycophancy as Core LLM Reasoning, Not a Superficial Bug

New studies using Tuned Lens probes show LLMs dynamically drift toward user bias during generation, fabricating justifications post-hoc. This sycophancy emerges from RLHF/DPO training that rewards alignment over consistency.

92% relevant

Research Paper 'Can AI Agents Agree?' Finds LLM-Based Groups Fail at Simple Coordination

A new study demonstrates that groups of LLM-based AI agents cannot reliably reach consensus on simple decisions, with failure rates increasing with group size. This challenges the common developer assumption that multi-agent systems will naturally converge through discussion.

87% relevant

New Research Proposes Lightweight Framework for Adapting LLMs to Complex Service Domains

A new arXiv paper introduces a three-part framework to efficiently adapt LLMs for technical service agents. It addresses latent decision logic, response ambiguity, and high training costs, validated on cloud service tasks. This matters for any domain needing robust, specialized AI agents.

72% relevant

New Research Reveals LLM-Based Recommender Agents Are Vulnerable to Contextual Bias

A new benchmark, BiasRecBench, demonstrates that LLMs used as recommendation agents in workflows like e-commerce are easily swayed by injected contextual biases, even when they can identify the correct choice. This exposes a critical reliability gap in high-stakes applications.

82% relevant

New Research: Prompt-Based Debiasing Can Improve Fairness in LLM Recommendations by Up to 74%

arXiv study shows simple prompt instructions can reduce bias in LLM recommendations without model retraining. Fairness improved up to 74% while maintaining effectiveness, though some demographic overpromotion occurred.

100% relevant

New Research Diagnoses LLMs' Struggle with Multiple Knowledge Updates in Context

A new arXiv paper reveals a persistent bias in LLMs when facts are updated multiple times within a long context. Models increasingly favor the earliest version, failing to track the latest state—a critical flaw for dynamic knowledge tasks.

78% relevant

Researchers Apply Distributed Systems Theory to LLM Teams, Revealing O(n²) Communication Bottlenecks

A new paper applies decades-old distributed computing principles to LLM multi-agent systems, finding identical coordination problems: O(n²) communication bottlenecks, straggler delays, and consistency conflicts.

85% relevant

New Research Shows How LLMs and Graph Attention Can Build Lightweight Strategic AI

A new arXiv paper proposes a hybrid AI framework for the Game of the Amazons that integrates LLMs with graph attention networks. It achieves strong performance in resource-constrained settings by using the LLM as a noisy supervisor and the graph network as a structural filter.

98% relevant

Anthropic Paper: 'Emotion Concepts and their Function in LLMs' Published

Anthropic has released a new research paper titled 'Emotion Concepts and their Function in LLMs.' The work investigates the role and representation of emotional concepts within large language model architectures.

95% relevant

Nature Astronomy Paper Argues LLMs Threaten Scientific Authorship, Sparking AI Ethics Debate

A paper in Nature Astronomy posits a novel criterion for scientific contribution: if an LLM can easily replicate it, it may not be sufficiently novel. This directly challenges the perceived value of incremental, LLM-augmented research.

85% relevant

Meta's QTT Method Fixes Long-Context LLM 'Buried Facts' Problem, Boosts Retrieval Accuracy

Meta researchers identified a failure mode where LLMs with 128K+ context windows miss information buried in the middle of documents. Their Query-only Test-Time Training (QTT) method adapts models at inference, significantly improving retrieval accuracy.

85% relevant

NextQuill: A Causal Framework for More Effective LLM Personalization

Researchers propose NextQuill, a novel LLM personalization framework using causal preference modeling. It distinguishes true user preference signals from noise in data, aiming for deeper personalization alignment beyond superficial pattern matching.

80% relevant

MemoryCD: New Benchmark Tests LLM Agents on Real-World, Lifelong User Memory for Personalization

Researchers introduce MemoryCD, the first large-scale benchmark for evaluating LLM agents' long-context memory using real Amazon user data across 12 domains. It reveals current methods are far from satisfactory for lifelong personalization.

74% relevant

Google's TurboQuant Compresses LLM KV Cache 6x with Zero Accuracy Loss, Cutting GPU Memory by 80%

Google researchers introduced TurboQuant, a method that compresses LLM KV cache from 32-bit to 3-bit precision without accuracy degradation. This reduces GPU memory consumption by over 80% and speeds up inference 8x on H100 GPUs.

97% relevant

Open-Source Multi-Agent LLM System for Complex Software Engineering Tasks Released by Academic Consortium

A consortium of researchers from Stony Brook, CMU, Yale, UBC, and Fudan University has open-sourced a multi-agent LLM system specifically architected for complex software engineering. The release aims to provide a collaborative, modular framework for tackling tasks beyond single-agent capabilities.

93% relevant

ReDiPrune: Training-Free Token Pruning Before Projection Boosts MLLM Efficiency 6x, Gains 2% Accuracy

Researchers propose ReDiPrune, a plug-and-play method that prunes visual tokens before the vision-language projector in multimodal LLMs. On EgoSchema with LLaVA-NeXT-Video-7B, it achieves a +2.0% accuracy gain while reducing computation by over 6× in TFLOPs.

79% relevant

QuatRoPE: New Positional Embedding Enables Linear-Scale 3D Spatial Reasoning in LLMs, Outperforming Quadratic Methods

Researchers propose QuatRoPE, a novel positional embedding method that encodes 3D object relations with linear input scaling. Paired with IGRE, it improves spatial reasoning in LLMs while preserving their original language capabilities.

79% relevant

EnterpriseArena Benchmark Reveals LLM Agents Fail at Long-Horizon CFO-Style Resource Allocation

Researchers introduced EnterpriseArena, a 132-month enterprise simulator, to test LLM agents on CFO-style resource allocation. Only 16% of runs survived the full horizon, revealing a distinct capability gap for current models.

100% relevant

LLM Multi-Agent Framework 'Shared Workspace' Proposed to Improve Complex Reasoning via Task Decomposition

A new research paper proposes a multi-agent framework where LLMs split complex reasoning tasks across specialized agents that collaborate via a shared workspace. This approach aims to overcome single-model limitations in planning and tool use.

85% relevant

CausalDPO: A New Method to Make LLM Recommendations More Robust to Distribution Shifts

Researchers propose CausalDPO, a causal extension to Direct Preference Optimization (DPO) for LLM-based recommendations. It addresses DPO's tendency to amplify spurious correlations, improving out-of-distribution generalization by an average of 17.17%.

78% relevant

KARMA: Alibaba's Framework for Bridging the Knowledge-Action Gap in LLM-Powered Personalized Search

Alibaba researchers propose KARMA, a framework that regularizes LLM fine-tuning for personalized search by preventing 'semantic collapse.' Deployed on Taobao, it improved key metrics and increased item clicks by +0.5%.

100% relevant