DeepSeek V4 Pro Trails US AI Models In Government Benchmark
A CAISI evaluation says DeepSeek V4 Pro is China's strongest model but still trails leading US frontier AI systems.
A CAISI evaluation says DeepSeek V4 Pro is China's strongest model but still trails leading US frontier AI systems.
A rigorous new benchmark tested top AI models on investment banking tasks; not one output was deemed client-ready, though half of bankers found value as a starting point.
A new benchmark reveals that even top AI models drop roughly 50% in accuracy when analyzing complicated charts, exposing a key limitation in visual reasoning.
Google's enhanced Gemini 3 Deep Think model demonstrates superior performance over OpenAI's GPT-5.2 and Anthropic's Claude Opus 4.6 in latest benchmark tests.
Claude Opus 4.6 achieves breakthrough performance with 65.4% on Terminal-Bench and 72.7% on OSWorld, surpassing Gemini 3 Flash in real-world work applications.