Multimodal AI

Google Launches Gemma 4 12B For Local Multimodal AI On Laptops

Google introduced Gemma 4 12B, an encoder-free multimodal open model built to run locally on laptops with 16GB of memory.



June 4, 2026

Google

Thinking Machines Previews Real-Time AI Interaction Models

Mira Murati's Thinking Machines Lab previewed interaction models designed for continuous real-time collaboration with AI.



May 12, 2026

Mira Murati

Luma AI Launches Uni-1: A Reasoning-First Image Model That Outscores Google and OpenAI at 30% Lower Cost

Luma AI's Uni-1 uses autoregressive architecture to beat Google Nano Banana 2 and OpenAI GPT Image 1.5 on reasoning benchmarks while cutting 2K resolution pricing by up to 30%.



March 24, 2026

Multimodal AI

Xiaomi Launches Three MiMo V2 AI Models Targeting Agents, Robotics, and Voice Synthesis

Xiaomi unveiled MiMo-V2-Pro, MiMo-V2-Omni, and MiMo-V2-TTS — a trio of AI models featuring over 1 trillion parameters, multimodal perception, and emotional speech synthesis, rivaling Claude Opus 4.6 on agent benchmarks.



March 23, 2026

AI Agents

Google Releases Gemini Embedding 2: First Natively Multimodal AI Embedding Model

Google has launched Gemini Embedding 2, the first natively multimodal embedding model capable of jointly mapping text, images, and video into a unified vector space for retrieval and search tasks.



March 12, 2026

Gemini

DeepSeek Poised to Release V4 Multimodal AI Model, Withholding Early Access from Nvidia and AMD

China's DeepSeek is on the verge of releasing its V4 multimodal model — capable of generating text, images, and video — while reportedly denying early optimization access to Nvidia and AMD, instead granting it exclusively to domestic chipmakers Huawei and Cambricon ahead of China's annual parliamentary sessions.



March 2, 2026

AI Competition

DeepSeek Building AI Search Engine to Challenge Google's Dominance

DeepSeek job postings reveal plans for multimodal AI search engine supporting text, images, and audio, directly targeting Google's search market share.



February 1, 2026

AI Agents

Moonshot AI's Kimi K2.5 Narrows US-China AI Development Gap to Closest Ever

Beijing-based Moonshot AI releases Kimi K2.5, an open-source multimodal AI model that rivals OpenAI and Anthropic while being four times cheaper to run, raising questions about US semiconductor export controls' effectiveness in constraining China's AI development.



January 29, 2026

Open-Source AI

Google Launches Gemma 4 12B For Local Multimodal AI On Laptops

Thinking Machines Previews Real-Time AI Interaction Models

Luma AI Launches Uni-1: A Reasoning-First Image Model That Outscores Google and OpenAI at 30% Lower Cost

Xiaomi Launches Three MiMo V2 AI Models Targeting Agents, Robotics, and Voice Synthesis

Google Releases Gemini Embedding 2: First Natively Multimodal AI Embedding Model

DeepSeek Poised to Release V4 Multimodal AI Model, Withholding Early Access from Nvidia and AMD

DeepSeek Building AI Search Engine to Challenge Google's Dominance

Moonshot AI's Kimi K2.5 Narrows US-China AI Development Gap to Closest Ever

Multimodal AI

Latest News and Analysis in Multimodal AI