Timeline
Achieved 100% resident identification accuracy in a safety evaluation for a care home smart speaker system.
Used in prompt compression study analyzing 358 successful runs from 1,199 real orchestration instructions
Anthropic released Claude Sonnet 4.6 with native chain-of-thought reasoning mode for complex coding tasks
Service disruption with elevated error rates reported on status page
Release of Claude Sonnet 4.5 model by Anthropic
Released as OpenAI's most capable frontier model with unified coding, reasoning, and computer operation capabilities
Demonstrated surpassing human baselines on OSWorld benchmark with 75% score
OpenAI releases GPT-5.4 with native computer use, tool search, and 1M token context window
Expected to follow shortly after DeepSeek v4 release
Identified as experiencing up to 33% accuracy degradation in extended conversations according to new research