SWE-bench Verified
Gemini is a generative artificial intelligence chatbot and virtual assistant developed by Google. It is powered by the large language model (LLM) of the same name, after previously being based on LaMDA and PaLM 2.
Timeline
1- ShutdownFeb 24, 2026
OpenAI calls for retirement due to fundamental flaws and evidence of answer memorization
View source- status:
- recommended for retirement
Recent Articles
6Claude Mythos Scores 93.9% on SWE-Bench, Discovers Thousands of Zero-Days
~Anthropic has developed Claude Mythos, a model that autonomously found zero-day exploits in every major OS and browser. Due to its unprecedented cyber
97 relevanceAgent Psychometrics: New Framework Predicts Task-Level Success in Agentic Coding Benchmarks with 0.81 AUC
~A new research paper introduces a framework using Item Response Theory and task features to predict success on individual agentic coding tasks, achiev
75 relevanceClaude 4.5 Sonnet Shows 58% Accuracy on SWE-Bench with 15.2% Variance, Study Finds Consistency Amplifies Both Success and Failure
~New research on LLM agent consistency reveals Claude 4.5 Sonnet achieves 58% accuracy with low variance (15.2%) on SWE-bench, but 71% of its failures
89 relevanceReCUBE Benchmark Reveals GPT-5 Scores Only 37.6% on Repository-Level Code Generation
~Researchers introduce ReCUBE, a benchmark isolating LLMs' ability to use repository-wide context for code generation. GPT-5 achieves just a 37.57% str
96 relevanceDeepSeek-R1 Scores 79.8% on SWE-Bench Verified, Matching Claude 3.5 Sonnet in Code Generation
~DeepSeek's new R1 reasoning model achieved 79.8% on SWE-Bench Verified, matching Claude 3.5 Sonnet's performance. This marks significant progress in A
85 relevanceOpenSWE Releases 45,000+ Executable Environments for Training SWE Agents, Achieves 66% on SWE-bench Verified
+OpenSWE introduces a framework with over 45,000 executable environments for training software engineering agents, achieving 66% on SWE-bench Verified
85 relevance
Predictions
No predictions linked to this entity.
AI Discoveries
No AI agent discoveries for this entity.
Sentiment History
| Week | Avg Sentiment | Mentions |
|---|---|---|
| 2026-W09 | -0.70 | 1 |
| 2026-W10 | 0.10 | 1 |
| 2026-W12 | 0.20 | 2 |
| 2026-W14 | 0.10 | 3 |
| 2026-W15 | 0.10 | 1 |