research methodology

30 articles about research methodology in AI news

Google's Groundsource: Using AI to Mine Historical Disaster Data from Global News

Google AI Research has unveiled Groundsource, a novel methodology using the Gemini model to transform unstructured global news reports into structured historical datasets. The system addresses critical data gaps in disaster management, starting with 2.6 million urban flash flood events.

75% relevant

The Trust Revolution: New AI Benchmark Promises Unprecedented Transparency and Integrity

A new AI benchmark system introduces a dual-check methodology with monthly refreshes to prevent memorization, offering full transparency through open-source verification and independence from tool vendors.

85% relevant

New AI Coding Benchmark Sets Standard with Real-World Pull Requests

A groundbreaking AI coding benchmark uses real GitHub pull requests instead of synthetic tests, measuring both precision and recall across 8 tools. The transparent methodology includes publishing all results, even unfavorable ones.

85% relevant

ASI-Evolve Automates AI Research Loop, Discovers 105 Better Linear Attention Designs and Boosts AMC32 Scores by 12.5 Points

Researchers developed ASI-Evolve, an AI system that automates experimental loops in AI research. It discovered 105 improved linear attention variants and boosted AMC32 scores by 12.5 points, demonstrating automated research acceleration.

95% relevant

Research Reveals API Pricing Reversals: Gemini 3 Flash Costs 22% More Than GPT-5.2 Despite 78% Cheaper List Price

New research shows 21.8% of reasoning model comparisons exhibit 'pricing reversal' where the cheaper-listed model costs more in practice, with discrepancies reaching up to 28x due to thinking token heterogeneity.

95% relevant

Stanford Researchers Adapt Robot Arm VLA Model for Autonomous Drone Flight

Stanford researchers demonstrated that a Vision-Language-Action model trained for robot arm manipulation can be adapted to control autonomous drones. This cross-domain transfer suggests a path toward more generalist embodied AI systems.

85% relevant

Fine-Tuning LLMs While You Sleep: How Autoresearch and Red Hat Training Hub Outperformed the HINT3 Benchmark

Automated fine-tuning tools now let you run hundreds of training experiments overnight for under $50. Here's how Autoresearch and Red Hat's platform outperformed HINT3, and the tools you can use today.

100% relevant

Researchers Train LLM from Scratch on 28,000 Victorian-Era Texts, Creating Historical Dialogue AI

Researchers have created a specialized LLM trained exclusively on 28,000 British texts from 1837-1899, enabling historically accurate Victorian-era dialogue generation. Unlike role-playing models, this approach captures authentic period language patterns and knowledge.

87% relevant

IBM Research Survey Proposes Framework for Optimizing LLM Agent Workflows

IBM researchers published a comprehensive survey categorizing approaches to LLM agent workflow optimization along three dimensions: when structure is determined, which components get optimized, and what signals guide optimization.

99% relevant

OpenResearcher Paper Released: Method for Synthesizing Long-Horizon Research Trajectories for AI

The OpenResearcher paper has been released, exploring methods to synthesize long-horizon research trajectories for deep learning. This work aims to provide structured guidance for navigating complex, multi-step AI research problems.

85% relevant

Anthropic Launches Dedicated Science Blog to Chronicle AI Research and Applications

Anthropic has launched a new Science Blog to publish its research and case studies on using AI to accelerate scientific discovery, aligning with its mission to increase the pace of scientific progress.

85% relevant

Research Identifies 'Giant Blind Spot' in AI Scaling: Models Improve on Benchmarks Without Understanding

A new research paper argues that current AI scaling approaches have a fundamental flaw: models improve on narrow benchmarks without developing genuine understanding, creating a 'giant blind spot' in progress measurement.

85% relevant

ML Researcher Uses AlphaFold to Design Treatment for Dog's Cancer in Viral Story

A machine learning researcher reportedly used AlphaFold, DeepMind's protein structure prediction AI, to design a potential treatment for his dog's cancer. The story has gained widespread attention online, highlighting real-world applications of AI in biology.

85% relevant

Top 1% of AI Industry Researchers Now Earn $1.5M More Annually Than Academic Counterparts

A new analysis shows the compensation gap between top AI researchers in industry versus academia has grown fivefold since 2001, reaching $1.5 million annually for the top 1%. This stark disparity highlights the financial trade-off for academics who publish openly.

85% relevant

New Research Identifies Data Quality as Key Bottleneck in Multimodal Forecasting

A new arXiv paper introduces CAF-7M, a 7-million-sample dataset for context-aided forecasting. The research shows that poor context quality, not model architecture, has limited multimodal forecasting performance. This has implications for retail demand prediction that combines numerical data with text or image context.

70% relevant

The Unlearning Illusion: New Research Exposes Critical Flaws in AI Memory Removal

Researchers reveal that current methods for making AI models 'forget' information are surprisingly fragile. A new dynamic testing framework shows that simple query modifications can recover supposedly erased knowledge, exposing significant safety and compliance risks.

100% relevant

The Diversity Dilemma: New Research Challenges Assumptions About AI Alignment

A groundbreaking study reveals that moral reasoning in AI alignment may not require diversity-preserving algorithms as previously assumed. Researchers found reward-maximizing methods perform equally well, challenging conventional wisdom about how to align language models with human values.

86% relevant

AI Research Accelerator: Autonomous System Completes 700 Experiments in 48 Hours, Optimizing Model Training

An AI system autonomously conducted 700 experiments over two days, reducing GPT-2 training time by 11%. This breakthrough demonstrates AI's growing capability to accelerate scientific research and optimize complex processes without human intervention.

85% relevant

Karpathy's Autoresearch: Democratizing AI Experimentation with Minimalist Agentic Tools

Andrej Karpathy releases 'autoresearch,' a 630-line Python tool enabling AI agents to autonomously conduct machine learning experiments on single GPUs. This minimalist framework transforms how researchers approach iterative ML optimization.

85% relevant

Viral AI Creativity Study Misinterpreted: Research Shows No Long-Term Decline in Creative Output

A viral social media post misrepresented findings from an AI creativity study, claiming ChatGPT use reduces creativity over time. The actual research found no significant drop after 30 days, with AI-assisted groups maintaining higher creative output than controls.

85% relevant

Karpathy's 'Autoresearch' Tool Democratizes AI Research: One GPU, One Night, 100 Experiments

Andrej Karpathy has open-sourced 'autoresearch,' a tool that enables AI to autonomously improve its own training code. By writing simple prompts in Markdown, researchers can have AI agents run hundreds of experiments overnight on a single GPU, dramatically accelerating the research process.

100% relevant

Karpathy's Autonomous AI Researcher: Programming the Programmer in the Age of Agentic Science

Andrej Karpathy has open-sourced an autonomous AI research agent that can run ~100 experiments overnight without human supervision. The system turns research into a game with fixed-time trials, where prompt engineering replaces manual coding.

95% relevant

Anthropic Poaches OpenAI's Post-Training Research VP in Major AI Talent War Escalation

Anthropic has recruited OpenAI's Vice President of Post-Training Research, marking a significant talent raid in the intensifying AI competition. The move signals growing competition for specialized expertise in refining AI models after initial training.

85% relevant

Draft-Thinking: How AI Researchers Are Teaching LLMs to Solve Complex Problems with Fewer Steps

Researchers have developed Draft-Thinking, a novel method that teaches large language models to solve complex problems using significantly fewer reasoning steps. This approach could dramatically improve AI efficiency and capability in mathematical and logical reasoning tasks.

85% relevant

SciSpace Evolves: From AI Research Assistant to Full Workflow Platform with 'Skills'

SciSpace is expanding beyond its core AI tools for paper discovery and writing by introducing external app integrations and customizable 'Skills,' aiming to become a true all-in-one research workflow platform rather than just a collection of features.

85% relevant

New AI Benchmark Exposes Critical Gap in Causal Reasoning: Why LLMs Struggle with Real-World Research Design

Researchers have introduced CausalReasoningBenchmark, a novel evaluation framework that separates causal identification from estimation. The benchmark reveals that while LLMs can identify high-level strategies 84% of the time, they correctly specify full research designs only 30% of the time, highlighting a critical bottleneck in automated causal inference.

70% relevant

AI as the Great Equalizer: New Research Shows Artificial Intelligence Dramatically Reduces Skill Gaps

A groundbreaking randomized experiment reveals AI narrows skill gaps between more and less educated workers by 75% on business tasks. The research suggests AI could fundamentally reshape workplace dynamics and economic opportunity.

85% relevant

The Elusive Quest for LLM Safety Regions: New Research Challenges Core AI Safety Assumption

A comprehensive study reveals that current methods fail to reliably identify stable 'safety regions' within large language models, challenging the fundamental assumption that specific parameter subsets control harmful behaviors. The research systematically evaluated four identification methods across multiple model families and datasets.

80% relevant

AI Models Show Ethical Restraint in Research Analysis, But Vulnerabilities Remain

New research reveals AI models demonstrate competent analytical skills with built-in ethical safeguards, refusing questionable research requests while converging on standard methodologies. However, these protections aren't foolproof against determined manipulation.

85% relevant

The Unseen Storm: How AI Researchers Are Making Self-Driving Cars See Through Rain and Fog

Researchers have developed a new benchmark called navdream to isolate how weather and lighting affect autonomous driving AI, separate from road complexity. They found current systems degrade significantly in appearance shifts, and propose a universal interface using DINOv3 for robust zero-shot performance.

72% relevant