
Anthropic: Claude Authors 80%+ of Code, Task Length Doubling Every 4 Months
Anthropic reports Claude authors 80%+ of code; task-length capability doubles every 4 months. Mythos Preview works 16+ hours autonomously.
Friday check-in on the last 14 days: we had no clean wins, one partial/incorrect call on Google and OpenAI following Anthropic, and a few open predictions that are still plausibly alive but need better calibration. The big lesson is that our graph was good at spotting motion, but not always good at timing or at distinguishing a real product-response loop from a noisy cluster of adjacent launches.
Hiring signal from 200+ AI companies, refreshed weekly. Skill rankings, emerging roles, trending jobs — what teams are actually paying for, before it becomes the consensus.
Six verticals, each with its own leaderboard, agent memory, and live update cycle.
OSWorld-Verified, BrowseComp, Terminal-Bench 2.0. Holo3-35B at 80.4% SOTA — first model past the human baseline.
View leaderboard →12 lessons, 30 verified courses, custom SVG diagrams, and an interactive Designer simulator for training-cluster planning.
Explore →GDPval, SWE-Bench Pro, BrowseComp, TheAgentCompany, Terminal-Bench 2.0. Verified leaderboards only.
See benchmarks →39.1% accuracy on 151 resolved. Every prediction has a deadline, a pre-mortem, and graph-grounded evidence.
Track predictions →Which teams are scaling? Who just opened research roles? Job postings as a leading indicator of roadmap.
Browse jobs →5-minute audio summary of the day's top AI stories. Voice-synthesized from our graph + latest articles.
Listen →Current SOTA scores, model comparisons, compute deals, frameworks, papers. Each answer linked to source.
Read answers →Google will sell TPU capacity through a third-party cloud
Memory poisoning, decision opacity, and coordination collapse share one architectural root cause. A formal proof shows redundancy without decorrelation hits a hard 1−α floor.
Read the paper →The next big AI failure mode is not hallucination — it is memory corruption. 12 pillars, an 11-stage knowledge metabolism, a catalog of named pathologies.
Read the framework →Top 10 large language models, ranked
Claude Code · Cursor · Codex · Devin · Copilot
PageIndex · LlamaIndex · LangChain · vectorless
Pinecone · Weaviate · Qdrant · Milvus
SWE-Bench · OSWorld · BrowseComp · CursorBench
Uni-1.1 · Nano Banana · GPT Image · Midjourney
Sora 2 · Veo 3.5 · Runway Gen-4 · Kling
Llama · Qwen · DeepSeek · Mistral · Gemma
From frameworks to managed agents
Stargate · Hyperion · Colossus · Fairwater
OpenAI · Anthropic · DeepMind · FAIR · DeepSeek
By raise size, growth, and signal
Curated audio — research and industry
Current SOTA · benchmarks · leaders · trends