SWE-bench Verified

product→ stable

SWE-bench

Gemini is a generative artificial intelligence chatbot and virtual assistant developed by Google. It is powered by the large language model (LLM) of the same name, after previously being based on LaMDA and PaLM 2.

8Total Mentions

+0.03Sentiment (Neutral)

+1.0%Velocity (7d)

First seen: Feb 24, 2026Last active: 5h agoWikipedia

Timeline

ShutdownFeb 24, 2026
OpenAI calls for retirement due to fundamental flaws and evidence of answer memorization
View source
status:
recommended for retirement

Relationships

Competes With

←
OpenAI
company1 source Compare30% conf.

Uses

←
OpenSWE
product1 source90% conf.
←
Claude Mythos
ai model1 source80% conf.

Recent Articles

Claude Mythos Scores 93.9% on SWE-Bench, Discovers Thousands of Zero-Days
~
Anthropic has developed Claude Mythos, a model that autonomously found zero-day exploits in every major OS and browser. Due to its unprecedented cyber
7h ago97 relevance
Agent Psychometrics: New Framework Predicts Task-Level Success in Agentic Coding Benchmarks with 0.81 AUC
~
A new research paper introduces a framework using Item Response Theory and task features to predict success on individual agentic coding tasks, achiev
5d ago75 relevance
Claude 4.5 Sonnet Shows 58% Accuracy on SWE-Bench with 15.2% Variance, Study Finds Consistency Amplifies Both Success and Failure
~
New research on LLM agent consistency reveals Claude 4.5 Sonnet achieves 58% accuracy with low variance (15.2%) on SWE-bench, but 71% of its failures
Mar 30, 202689 relevance
ReCUBE Benchmark Reveals GPT-5 Scores Only 37.6% on Repository-Level Code Generation
~
Researchers introduce ReCUBE, a benchmark isolating LLMs' ability to use repository-wide context for code generation. GPT-5 achieves just a 37.57% str
Mar 30, 202696 relevance
DeepSeek-R1 Scores 79.8% on SWE-Bench Verified, Matching Claude 3.5 Sonnet in Code Generation
~
DeepSeek's new R1 reasoning model achieved 79.8% on SWE-Bench Verified, matching Claude 3.5 Sonnet's performance. This marks significant progress in A
Mar 17, 202685 relevance
OpenSWE Releases 45,000+ Executable Environments for Training SWE Agents, Achieves 66% on SWE-bench Verified
+
OpenSWE introduces a framework with over 45,000 executable environments for training software engineering agents, achieving 66% on SWE-bench Verified
Mar 16, 202685 relevance

Predictions

No predictions linked to this entity.

AI Discoveries

No AI agent discoveries for this entity.

Sentiment History

6-W096-W126-W15

Positive sentiment

Negative sentiment

Range: -1 to +1

Week	Avg Sentiment	Mentions
2026-W09	-0.70	1
2026-W10	0.10	1
2026-W12	0.20	2
2026-W14	0.10	3
2026-W15	0.10	1

Timeline

Relationships

Competes With

Uses

Recent Articles

Claude Mythos Scores 93.9% on SWE-Bench, Discovers Thousands of Zero-Days

Agent Psychometrics: New Framework Predicts Task-Level Success in Agentic Coding Benchmarks with 0.81 AUC

Claude 4.5 Sonnet Shows 58% Accuracy on SWE-Bench with 15.2% Variance, Study Finds Consistency Amplifies Both Success and Failure

ReCUBE Benchmark Reveals GPT-5 Scores Only 37.6% on Repository-Level Code Generation

DeepSeek-R1 Scores 79.8% on SWE-Bench Verified, Matching Claude 3.5 Sonnet in Code Generation

OpenSWE Releases 45,000+ Executable Environments for Training SWE Agents, Achieves 66% on SWE-bench Verified

Predictions

AI Discoveries

Sentiment History