Timeline
Achieved 100% resident identification accuracy in a safety evaluation for a care home smart speaker system.
Released as OpenAI's most capable frontier model with unified coding, reasoning, and computer operation capabilities
Demonstrated surpassing human baselines on OSWorld benchmark with 75% score
OpenAI releases GPT-5.4 with native computer use, tool search, and 1M token context window
Expected to follow shortly after DeepSeek v4 release
Identified as experiencing up to 33% accuracy degradation in extended conversations according to new research
Evaluated on LLM-WikiRace benchmark, showing superhuman performance on easy tasks but only 23% success on hard challenges
Google DeepMind released Gemini 3.1 Pro, achieving top scores on major AI benchmarks