generative video

30 articles about generative video in AI news

NemoVideo AI Automates Video Editing Based on Text Prompts

A video creator states NemoVideo AI now automates complex editing tasks like cuts and transitions from simple text descriptions, reducing a 5-hour manual process to a prompt-driven workflow.

85% relevant

OpenAI's GPT-Image-2 Model Reportedly Achieves Photorealistic Video Generation, Surpassing Prior Map-Generation Flaws

A social media user claims OpenAI's GPT-Image-2 model now produces video indistinguishable from reality, a significant leap from its predecessor's documented failure to generate coherent world maps.

85% relevant

Stanford's EgoNav Trains Robot Navigation on 5 Hours of Human Video, Enables Zero-Shot Control of Unitree G1

Stanford's EgoNav system uses a 5-hour egocentric video walk of campus to train a diffusion model that enables zero-shot navigation for a Unitree G1 humanoid robot, eliminating the need for robot-specific training data.

99% relevant

OpenAI Discontinues Standalone Sora App and Developer Access, Consolidates Video AI in ChatGPT

OpenAI is discontinuing the standalone Sora app and its developer version, consolidating all video generation access within ChatGPT. This strategic pivot suggests a focus on integrated AI experiences over specialized tools.

95% relevant

Meta's V-JEPA 2.1 Achieves +20% Robotic Grasp Success with Dense Feature Learning from 1M+ Hours of Video

Meta researchers released V-JEPA 2.1, a video self-supervised learning model that learns dense spatial-temporal features from over 1 million hours of video. The approach improves robotic grasp success by ~20% over previous methods by forcing the model to understand precise object positions and movements.

97% relevant

Higgsfield AI Pays Bartender $1M+ for Face Scan to Train AI Video Model Diffuse

AI startup Higgsfield paid a New Jersey bartender over $1 million for a full-face 3D scan to train its text-to-video model Diffuse. The deal highlights the emerging market for high-fidelity biometric data to create photorealistic digital humans.

85% relevant

OpenClaw's Pexo Agent Generates Videos Directly Within Telegram, Discord, and WhatsApp

OpenClaw has launched Pexo, an AI agent that creates videos from text prompts directly within messaging apps like Telegram, Discord, and WhatsApp, without requiring users to switch applications.

85% relevant

AI Video Generation Goes Mainstream: Text-to-Video Assistant Skill Emerges

A new AI skill called Medeo Video Skill for OpenClaw allows users to generate complete videos through simple text commands. Users can request videos on any topic, and the AI handles the entire creation process automatically.

89% relevant

AIVideo Agent Emerges as First Complete AI Video Production Pipeline

A new AI system called AIVideo Agent promises to automate the entire video production workflow from concept to final edit. Positioned as the "OpenClaw for video," this development could revolutionize content creation for creators and businesses alike.

85% relevant

Kling AI Video Platform Goes Global: How 3.0 Release Redefines Accessible Cinematic AI

Kling AI has launched its 3.0 platform worldwide, offering 1080p cinematic video generation and advanced motion control. This marks a significant step toward professional-grade AI video tools becoming accessible to global creators.

85% relevant

Google DeepMind's Unified Latents Framework: Solving Generative AI's Core Trade-Off

Google DeepMind introduces Unified Latents (UL), a novel framework that jointly trains diffusion priors and decoders to optimize latent space representation. This approach addresses the fundamental trade-off between reconstruction quality and learnability in generative AI models.

75% relevant

PixVerse's 'Playable Reality': AI Blurs Lines Between Video, Games and Virtual Worlds

PixVerse introduces 'Playable Reality,' an AI-generated medium that defies traditional categorization. Blending elements of video, gaming, and virtual environments, this technology creates interactive, dynamic experiences rather than static content.

85% relevant

R1's Real-Time World Model: The Paradigm Shift from Video Generation to World Generation

Rabbit's R1 introduces a real-time world model that continuously generates evolving environments rather than static video frames. This represents a fundamental shift from passive content creation to interactive world simulation, enabling seamless AI interactions without waiting or regeneration cycles.

85% relevant

Video of Massive AI Training Lab in China Sparks Debate on Automation's Scale

A social media post showcasing a vast Chinese AI training lab has reignited discussions about job displacement, underscoring the tangible infrastructure powering the current AI surge.

85% relevant

Generative World Renderer: 4M+ RGB/G-Buffer Frames from Cyberpunk 2077 & Black Myth: Wukong Released for Inverse Graphics

A new framework and dataset extracts over 4 million synchronized RGB and G-buffer frames from Cyberpunk 2077 and Black Myth: Wukong, enabling AI models to learn inverse material decomposition and controllable game environment editing.

85% relevant

The AI Music Revolution: How Google and Apple Are Democratizing Music Creation

Google and Apple are integrating generative AI music features into their core platforms, allowing users to create custom 30-second tracks from text, photos, or video prompts. This move signals AI's transition from experimental tools to mainstream consumer applications.

70% relevant

Alibaba's Qwen3.5-Omni Launches with Script-Level Captioning, Audio-Visual Vibe Coding, and Real-Time Web Search

Alibaba's Qwen team has released Qwen3.5-Omni, a multimodal model focused on interpreting images, audio, and video with new capabilities like script-level captioning and 'vibe coding'. It's open-access on Hugging Face but does not generate media.

85% relevant

ViGoR-Bench Exposes 'Logical Desert' in SOTA Visual AI: 20+ Models Fail Physical, Causal Reasoning Tasks

Researchers introduce ViGoR-Bench, a unified benchmark testing visual generative models on physical, causal, and spatial reasoning. It reveals significant deficits in over 20 leading models, challenging the 'performance mirage' of current evaluations.

94% relevant

SoftBank Secures Record $40 Billion Bridge Loan to Finance OpenAI Stake

SoftBank Group has signed a $40 billion bridge loan to finance its investment in OpenAI, marking the largest loan of its kind as the Japanese conglomerate doubles down on the AI race. The move signals a major financial commitment to secure a central position in the generative AI market.

89% relevant

Tongyi Lab Releases World's First Open-Source Multi-Speaker AI Dubbing Model

Alibaba's Tongyi Lab has released the first open-source AI model capable of dubbing multi-speaker conversations, addressing one of the hardest problems in AI video generation. The model synchronizes voice with lip movements across multiple speakers in a single pass.

85% relevant

ElevenLabs Unleashes 'Flows': The Unified AI Creative Suite That Could Revolutionize Content Production

ElevenLabs has launched Flows, a groundbreaking AI platform that seamlessly integrates image, video, voice, music, and sound effects generation into a single visual pipeline. This eliminates tool-switching and re-exporting, potentially transforming creative workflows.

85% relevant

The Great GPU Scramble: How Hardware Shortages Are Defining the AI Arms Race

Oracle founder Larry Ellison identifies GPU acquisition as the primary bottleneck in AI development, with companies racing to secure limited hardware for breakthroughs in medicine, video generation, and autonomous systems.

85% relevant

From OpenAI to the Factory Floor: How Bob McGrew's Arda Is Revolutionizing Manufacturing with Visual AI

Former OpenAI research chief Bob McGrew is raising $70M at a $700M valuation for Arda, a startup using video-based AI to automate factories. The system watches production footage to train robots, coordinating both machines and human workers across entire manufacturing cycles.

95% relevant

BetterScene Bridges the Gap: How Aligning AI Representations Unlocks Photorealistic 3D Synthesis

Researchers introduce BetterScene, a novel AI method that dramatically improves 3D scene generation from just a handful of photos. By aligning the internal representations of a powerful video diffusion model, it produces consistent, artifact-free novel views, pushing the boundary of what's possible in computational photography and virtual world creation.

78% relevant

The One-Stop AI Platform Revolution: GlobalGPT Consolidates 100+ Models Without Barriers

GlobalGPT has launched a unified platform offering access to over 100 AI models for image and video generation without waitlists, restrictions, or invite codes. This consolidation represents a significant shift toward democratizing advanced AI tools for creators and businesses alike.

85% relevant

DeepMind's Diffusion Breakthrough: Training Better Latents for Superior AI Generation

Google DeepMind researchers have developed new techniques for training latent representations in diffusion models, potentially leading to more efficient, higher-quality AI-generated content across images, audio, and video domains.

85% relevant

Disney's Legal Blitz Against ByteDance Signals New Era in AI Copyright Wars

Disney has accused ByteDance of a 'virtual smash-and-grab' for allegedly using copyrighted Marvel, Star Wars, and Disney characters to train its Seedance 2.0 AI video generator. This marks the second major cease-and-desist from Disney against AI companies in six months, highlighting escalating tensions between content creators and AI developers over training data rights.

80% relevant

China Proposes Mandatory Labels, Consent Rules for AI Digital Humans

China has proposed its first legal framework specifically targeting AI-generated digital humans, requiring mandatory disclosure labels, explicit consent for biometric data, and strict child-safety measures including bans on virtual intimate services for users under 18.

87% relevant

OpenSCAD Web: Open-Source Text-to-CAD Tool Runs Fully In-Browser via WebAssembly

A developer has released an open-source text-to-CAD tool that runs entirely in a web browser using WebAssembly. Users describe a 3D object in plain English, optionally upload a reference image, and receive a parametric model with adjustable dimensions that exports directly to 3D printer formats.

85% relevant

Developer Open-Sources 'Prompt-to-3D' Tool for Instant, Navigable World Generation

A developer has released an open-source tool that creates interactive 3D worlds from text or image inputs. This moves 3D asset generation from static models to instant, explorable environments.

91% relevant