code quality

30 articles about code quality in AI news

Forge Plugin Adds Governance to Claude Code: 22 Agents, Quality Gates, and Zero Config

Install the Forge plugin to add automated quality checks, health scoring, and specialized agents to Claude Code workflows in 30 seconds.

Mar 17, 202689% relevant

Claude Sonnet 4.5 vs 4.0: What the Quality Regression Means for Your Claude Code Workflow

Recent analysis shows Claude Sonnet 4.5 may have quality regressions vs 4.0. Here's how Claude Code users should adapt their prompting and model selection.

Mar 13, 202686% relevant

New Research Identifies Data Quality as Key Bottleneck in Multimodal Forecasting

A new arXiv paper introduces CAF-7M, a 7-million-sample dataset for context-aided forecasting. The research shows that poor context quality, not model architecture, has limited multimodal forecasting performance. This has implications for retail demand prediction that combines numerical data with text or image context.

Mar 16, 202670% relevant

Codex-CLI-Compact: The Graph-Based Context Engine That Cuts Claude Code Costs 30-45%

A new local tool builds a semantic graph of your codebase to pre-load only relevant files into Claude's context, reducing token usage by 30-45% without quality loss.

Apr 1, 2026100% relevant

How to Build a Custom AI Agent with Claude Code's Skills, SubAgents, and Hooks

A developer's deep dive into customizing Claude Code with 7 skills, 5 subagents, and quality-check hooks—showing how to move beyond basic prompting to create a truly autonomous coding assistant.

Mar 31, 2026100% relevant

How a Non-Programmer Built a 487-File Unity Tool with Claude Code's 'Vibe Coding'

A graphic designer built a complex Unity map editor with 151K+ lines of C# using Claude Code's iterative 'describe → test → fix' workflow and early quality rule enforcement.

Mar 22, 2026100% relevant

AI's Hidden Talent: How Mediocre Code Delivers Exceptional Real-World Value

New research reveals AI can transform low-quality code into high-value practical applications, with the biggest impact outside traditional software development. Even skills rated just 6.2/12 deliver significant productivity boosts across diverse fields.

Mar 1, 202685% relevant

How to Install claude-flow MCP and 3 Skills That Transform Claude Code

A production team's setup reveals claude-flow MCP with hierarchical-mesh topology and three essential skills that add structure, parallelism, and quality control.

Mar 25, 2026100% relevant

NanoVDR: A 70M Parameter Text-Only Encoder for Efficient Visual Document Retrieval

New research introduces NanoVDR, a method to distill a 2B parameter vision-language retriever into a 69M text-only student model. It retains 95% of teacher quality while cutting query latency 50x and enabling CPU-only inference, crucial for scalable search over visual documents.

Mar 16, 202682% relevant

Renoise AI Tool Enables Programmatic Video Generation, Promising Faster Production

Renoise has launched an AI tool that generates videos through code rather than traditional editing. The platform claims to produce high-quality videos more easily and faster than previous methods.

Mar 23, 202685% relevant

Google DeepMind's Unified Latents Framework: Solving Generative AI's Core Trade-Off

Google DeepMind introduces Unified Latents (UL), a novel framework that jointly trains diffusion priors and decoders to optimize latent space representation. This approach addresses the fundamental trade-off between reconstruction quality and learnability in generative AI models.

Feb 28, 202675% relevant

Meta Halts Mercor Work After Supply Chain Breach Exposes AI Training Secrets

A supply chain attack via compromised software updates at data-labeling vendor Mercor has forced Meta to pause collaboration, risking exposure of core AI training pipelines and quality metrics used by top labs.

Apr 4, 202697% relevant

SteerViT Enables Natural Language Control of Vision Transformer Attention Maps

Researchers introduced SteerViT, a method that modifies Vision Transformers to accept natural language instructions, enabling users to steer the model's visual attention toward specific objects or concepts while maintaining representation quality.

Apr 4, 202685% relevant

Microsoft & CUHK Debut 'Medical AI Scientist' Agent That Generates Ideas, Runs Experiments, and Writes Papers

Microsoft Research and CUHK have developed an autonomous AI agent that can formulate research ideas, execute experiments, and author papers, achieving near-MICCAI quality on 171 clinical cases across 19 tasks.

Mar 31, 202695% relevant

Apple Silicon Achieves Near-Lossless LLM Compression at 3.5 Bits-Per-Weight, Claims Independent Tester

Independent AI researcher Matthew Weinbach reports achieving near-lossless compression of large language models on Apple Silicon, storing models at 3.5 bits-per-weight while maintaining within 1-2% quality of bf16 precision.

Mar 30, 202687% relevant

Text-to-Speech Cost Plummets from $0.15/Word to Free Local Models Using 3GB RAM

High-quality text-to-speech has shifted from a $0.15 per word cloud service to free, local models requiring only 3GB of RAM in 12 months, signaling a broader price collapse in AI inference.

Mar 30, 202685% relevant

Insanely Fast Whisper CLI Transcribes 2.5 Hours of Audio in 98 Seconds with Flash Attention 2

A new open-source CLI tool called Insanely Fast Whisper achieves 19x speedup over standard Whisper large-v3, transcribing 150 minutes of audio in 98 seconds using Flash Attention 2 and batching with no quality loss.

Mar 27, 202697% relevant

SELLER: A New Sequence-Aware LLM Framework for Explainable Recommendations

Researchers propose SELLER, a framework that uses Large Language Models to generate explanations for recommendations by modeling user behavior sequences. It outperforms prior methods by integrating explanation quality with real-world utility metrics.

Mar 26, 202692% relevant

Google Research's TurboQuant Achieves 6x LLM Compression Without Accuracy Loss, 8x Speedup on H100

Google Research introduced TurboQuant, a novel compression algorithm that shrinks LLM memory footprint by 6x without retraining or accuracy drop. Its 4-bit version delivers 8x faster processing on H100 GPUs while matching full-precision quality.

Mar 25, 202695% relevant

Jensen Huang Predicts AI Training Shift to Synthetic Data, Compute as New Bottleneck

NVIDIA CEO Jensen Huang states AI training is moving from real-world to synthetic data, with compute power becoming the primary constraint as AI-generated data quality improves.

Mar 24, 202685% relevant

Wharton Study Finds 'AI Writes, Humans Review' Model Failing in Real Business Contexts

New Wharton research reveals the 'AI writes, humans review' workflow is breaking down in practice, with human reviewers struggling to effectively evaluate AI-generated content. The study suggests current review processes may be insufficient for quality control.

Mar 24, 202685% relevant

ByteDance's Helios: A 14B Parameter Video Generation Model Running at 19.5 FPS on a Single H100 GPU

ByteDance has introduced Helios, a 14-billion parameter video generation model that reportedly runs at 19.5 frames per second on a single NVIDIA H100 GPU. This represents a significant step in making high-quality, real-time video synthesis more computationally accessible.

Mar 23, 202695% relevant

Solving LLM Debate Problems with a Multi-Agent Architecture

A developer details moving from generic prompts to a multi-agent system where two LLMs are forced to refute each other, improving reasoning and output quality. This is a technical exploration of a novel prompting architecture.

Mar 23, 202678% relevant

Stop Bloating Your CLAUDE.md: The Infrastructure-First Workflow That Cuts Review Time in Half

Replace verbose CLAUDE.md rules with automated hooks and modular skills to enforce quality and prevent context degradation.

Mar 20, 202681% relevant

PRISM Study: Mid-Training on 27B Tokens Boosts Math Scores by +15 to +40 Points, Enables Effective RL

A comprehensive study shows mid-training on 27B high-quality tokens consistently improves reasoning in LLMs. This 'retention-aware' phase restructures 90% of weights, creating a configuration where RL can succeed.

Mar 19, 202688% relevant

OpenSWE Releases 45,000+ Executable Environments for Training SWE Agents, Achieves 66% on SWE-bench Verified

OpenSWE introduces a framework with over 45,000 executable environments for training software engineering agents, achieving 66% on SWE-bench Verified through quality filtering of multi-agent synthesized environments. The Docker infrastructure is open-sourced for full reproducibility.

Mar 16, 202685% relevant

AI Agents Gain Financial Autonomy: New Tool Enables AI to Purchase Premium Data

A groundbreaking development allows AI agents to autonomously pay for high-quality data through premium APIs. The system self-determines budget allocation with zero manual setup, currently operational across multiple AI platforms.

Mar 14, 202685% relevant

CostRouter Emerges as Smart AI Gateway, Cutting API Expenses by 60% Through Intelligent Model Routing

A new API gateway called CostRouter analyzes request complexity and automatically routes queries to the cheapest capable AI model, saving developers up to 60% on API costs while maintaining quality thresholds.

Mar 12, 202679% relevant

How One Junior Developer's CLAUDE.md Template Cut Debugging Time by 70%

A junior developer's real-world CLAUDE.md template for project onboarding that dramatically improved Claude Code's context and output quality.

Mar 12, 2026100% relevant

CARE Framework Exposes Critical Flaw in AI Evaluation, Offers New Path to Reliability

Researchers have identified a fundamental flaw in how AI models are evaluated, showing that current aggregation methods amplify systematic errors. Their new CARE framework explicitly models hidden confounding factors to separate true quality from bias, improving evaluation accuracy by up to 26.8%.

Mar 3, 202680% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety