gpt 5
30 articles about gpt 5 in AI news
AI's Time Horizon Expands: Claude and GPT Push Multi-Hour Task Capabilities
New analysis reveals Claude Opus 4.6 and GPT 5.3 Codex can handle complex tasks requiring hours of human effort. The METR benchmark shows AI systems approaching 3-4 hour time horizons at 50% success rates, signaling major progress in sustained reasoning.
OpenAI Finishes GPT-5.5 'Spud' Pretraining, Halts Sora for Compute
OpenAI has finished pretraining its next major model, codenamed 'Spud' (likely GPT-5.5), built on a new architecture and data mix. The company reportedly halted its Sora video generation project entirely, sacrificing a $1B Disney investment, to prioritize compute for Spud's launch.
AI Weekly: GPT-6 Rumors, DeepSeek V4 on Huawei, Anthropic Models, Qwen 3.6-Plus
A weekly roundup video aggregates major AI rumors and announcements, including unverified GPT-6 details, DeepSeek V4 reportedly running on Huawei hardware, and launches of Anthropic's Conway and Ultraplan and Alibaba's Qwen 3.6-Plus.
OpenAI Testing New Image Model in ChatGPT, User Reports 'Very Good'
A user reports OpenAI is testing a new image generation model in ChatGPT, describing its output as 'very good.' This signals ongoing internal development of visual AI capabilities.
GPT-Image-2 Appears in ChatGPT App Images Tab, Signaling OpenAI Visual AI Push
A user spotted 'GPT-Image-2' listed in the images tab of the ChatGPT mobile app. This indicates OpenAI is testing a potential successor to its DALL-E image generation models directly within its flagship product.
OpenAI's GPT-Image-2 Model Reportedly Achieves Photorealistic Video Generation, Surpassing Prior Map-Generation Flaws
A social media user claims OpenAI's GPT-Image-2 model now produces video indistinguishable from reality, a significant leap from its predecessor's documented failure to generate coherent world maps.
New Research: Fine-Tuned LLMs Outperform GPT-5 for Probabilistic Supply Chain Forecasting
Researchers introduced an end-to-end framework that fine-tunes large language models (LLMs) to produce calibrated probabilistic forecasts of supply chain disruptions. The model, trained on realized outcomes, significantly outperforms strong baselines like GPT-5 on accuracy, calibration, and precision. This suggests a pathway for creating domain-specific forecasting models that generate actionable, decision-ready signals.
Glass AI Coding Editor Expands to Windows, Bundles Claude Opus 4.6, GPT-5.4 & Gemini 3.1 Pro Access
The Glass AI coding editor is now available on Windows, offering developers a single subscription that includes usage of Claude Opus 4.6, GPT-5.4, and Gemini 3.1 Pro without additional API costs. This expansion significantly broadens its potential user base beyond the Mac ecosystem.
DeepSeek-R1 Reportedly Hits 78.9% on OS-World, Outperforming GPT-5.4 at 1/10th Cost
A new benchmark claim suggests DeepSeek-R1 has achieved 78.9% on the OS-World agentic coding benchmark, reportedly outperforming GPT-5.4 while operating at one-tenth the cost. If verified, this would represent a significant leap in cost-performance for AI coding agents.
OpenAI Announces 'AI Superapp' Vision, Aiming to Consolidate ChatGPT, Codex, and Browsing into a Single Platform
OpenAI announced a vision for an 'AI superapp,' moving from separate tools like ChatGPT and Codex to a unified platform. The strategic goal is to leverage consumer scale to achieve enterprise dominance and become core AI infrastructure.
Microsoft Copilot Researcher Adopts Two-Model System: OpenAI GPT Drafts, Anthropic Claude Audits
Microsoft has restructured its Copilot Researcher agent into a two-model system, using OpenAI's GPT for drafting and Anthropic's Claude for auditing. This hybrid approach aims to improve accuracy by separating generation from verification.
ChatGPT GPT-5.4 Pro's 'Thinking' Harness Shows Advanced Scientific Paper Comprehension, Including Figure Analysis
OpenAI's ChatGPT GPT-5.4 Pro, with its 'Thinking' harness, demonstrates advanced multimodal understanding of scientific papers, identifying key figures and extracting visual information beyond text parsing.
ReCUBE Benchmark Reveals GPT-5 Scores Only 37.6% on Repository-Level Code Generation
Researchers introduce ReCUBE, a benchmark isolating LLMs' ability to use repository-wide context for code generation. GPT-5 achieves just a 37.57% strict pass rate, showing the task remains highly challenging.
OpenAI's ChatGPT Ad Pilot Hits $100M+ Annual Run Rate in Six Weeks, Shows Low Dismissal Rates
OpenAI's controlled advertising pilot for ChatGPT has reached an annualized revenue run rate exceeding $100 million in roughly six weeks. The system, shown to fewer than 20% of eligible users, uses topic-based targeting while leaving AI responses unchanged.
GLM-5.1 Released by Zhipu AI, Claiming Performance Close to GPT-4o and Claude 3.5
Zhipu AI has released GLM-5.1, its latest large language model series. The company claims its top-tier model, GLM-5.1-9B/1M, achieves performance close to GPT-4o and Claude 3.5 Sonnet, narrowing the gap with leading Western models.
OpenAI Scales Back ChatGPT Instant Checkout, Pivots to Merchant Apps
OpenAI is scaling back its Instant Checkout feature for ChatGPT after it failed to drive significant sales. The company will now focus on letting merchants use their own checkout within ChatGPT apps, prioritizing discovery over transaction.
Google Gemini Launches Manual Memory & Chat Import to Ease Switching from ChatGPT, Claude
Google Gemini is rolling out 'Import Memory' and 'Import Chat History' features for desktop users. The manual tools provide prompts and a .zip upload to transfer data from other AI assistants, aiming to lower the barrier for users to switch from competitors like ChatGPT or Claude.
Open-Source Code Editor 'Cline' Integrates Claude Opus, GPT-4, and Gemini Pro via Single API
Developer Hasan Tohar announced 'Cline', an open-source code editor that integrates multiple top-tier AI models through a unified interface. The tool allows switching between Claude Opus, GPT-4, and Gemini Pro without managing separate API keys or subscriptions.
OpenAI Launches GPT-5.4 Mini and Nano: Smaller, Cheaper Variants with Same Reasoning Modes
OpenAI has released GPT-5.4 mini and nano, two more affordable variants of its GPT-5.4 model. The nano version is positioned as the smallest and most cost-effective option in the lineup.
American Express Bets on Agentic AI Commerce with ACE Developer Kit and ChatGPT Perks
AmEx CEO Stephen Squeri's shareholder letter outlines a proactive strategy for the agentic AI commerce era, launching an ACE developer kit for payment integration and offering business cardholders a ChatGPT subscription credit. The company sees its premium membership model as resilient against disruptive AI commerce theories.
GPT-5.2-Based Smart Speaker Achieves 100% Resident ID Accuracy in Care Home Safety Evaluation
Researchers evaluated a voice-enabled smart speaker for care homes using Whisper and RAG, achieving 100% resident identification and 89.09% reminder recognition with GPT-5.2. The safety-focused framework highlights remaining challenges in converting informal speech to calendar events (84.65% accuracy).
Revieve Launches AI Skin Advisor for ChatGPT, Expanding Generative AI Beauty Discovery
Beauty tech platform Revieve launches an AI Skin Advisor as a ChatGPT plugin, enabling conversational skin analysis and product discovery. This represents a strategic expansion into generative AI platforms for beauty brands and retailers.
Glass AI IDE Emerges, Claims to Offer Free Access to Claude Opus 4.6, GPT-5.4, and Gemini 3.1 Pro
A new AI-powered coding editor called Glass claims to provide free access to multiple top-tier LLMs, including Claude Opus 4.6, GPT-5.4, and Gemini 3.1 Pro, without API fees. This positions it as a direct, cost-free competitor to established paid AI IDEs like Cursor and Windsurf.
OpenAI Discontinues Standalone Sora App and Developer Access, Consolidates Video AI in ChatGPT
OpenAI is discontinuing the standalone Sora app and its developer version, consolidating all video generation access within ChatGPT. This strategic pivot suggests a focus on integrated AI experiences over specialized tools.
Tessera Launches Open-Source Framework for 32 OWASP AI Security Tests, Benchmarks GPT-4o, Claude, Gemini, Llama 3
Tessera introduces the first open-source framework to run all 32 OWASP AI security tests against any model with one CLI command. It provides benchmark results for GPT-4o, Claude, Gemini, Llama 3, and Mistral across 21 model-specific security tests.
ChatGPT Launches 'Library' Feature: Persistent Document Storage Across Conversations with 512MB File Limits
OpenAI introduces ChatGPT Library, a persistent storage system that saves uploaded files (PDFs, docs, images) at the account level for reuse across different chats. The feature is rolling out to Plus, Team, and Enterprise users with specific file size and token limits.
GPT-5.4 Pro Reportedly Solves Open Problem in FrontierMath, With Human Verification
Researchers Kevin Barreto and Liam Price used GPT-5.4 Pro to produce a construction for an open problem in FrontierMath, which mathematician Will Brian confirmed. A formal write-up is planned for publication.
AI Outperforms Humans on Product Idea Creativity, With GPT-4 Scoring 2.5x Higher Than Prolific Workers
A new study finds AI models consistently generate more creative product ideas than human crowdworkers, with GPT-4 scoring 2.5x higher. Larger, more recent models show significantly better performance than earlier versions.
MiRA Framework Boosts Gemma3-12B to 43% Success Rate on WebArena-Lite, Surpassing GPT-4 and WebRL
Researchers propose MiRA, a milestone-based RL framework that improves long-horizon planning in LLM agents. It boosts Gemma3-12B's web navigation success from 6.4% to 43%, outperforming GPT-4-Turbo (17.6%) and the previous SOTA WebRL (38.4%).
Fine-Tuning OpenAI's GPT-OSS 20B: A Practitioner's Guide to LoRA on MoE Models
A technical guide details the practical challenges and solutions for fine-tuning OpenAI's 20-billion parameter GPT-OSS model using LoRA. This is crucial for efficiently adapting large, complex MoE models to specific business domains.