Zhu Jun, the ShengShu founder

Alibaba Cloud Backs ShengShu with $293M Bet on World Model AI

Alibaba Cloud's 2 billion yuan bet backs ShengShu as the startup pivots from video generation toward general world models designed to simulate physical environments and advance robotics.

Alibaba Cloud’s 2 billion yuan bet backs ShengShu as the startup pivots from video generation toward general world models designed to simulate physical environments and advance robotics.

Alibaba Cloud led a 2 billion yuan ($293 million) Series B investment in ShengShu Technology, the Chinese startup behind the AI video generation tool Vidu, the company announced Friday. The round highlights a broader industry shift toward “world models”  AI systems trained on video and physical-world data rather than text.

State-backed China Internet Investment Fund, TAL Education Group, and Baidu Ventures participated in the round, along with existing backers LINK-X Capital and Delta Capital, according to ShengShu’s official statement. 

ShengShu said the capital will fund development of a “general world model” that processes multimodal sensory data, including vision, audio, and touch, to simulate how the physical world operates. The company described the effort as a step toward AI capable of supporting autonomous vehicles, humanoid robots, and embodied AI systems deployed in industrial and home settings.

ALSO READ: Ridge AI Raises $2.6M Pre-Seed to Bring Browser-Native Analytics to B2B Software

“ShengShu believes that a general world model, built on multimodal data such as vision, audio, and touch, more naturally captures how the physical world works than large language models,” the company said. Zhu Jun, the company’s founder, added that the goal is to “connect perception and action” so AI systems can model and predict real-world behavior.

The round follows a 600 million yuan raise, ShengShu closed roughly two months earlier from Qiming Venture Partners. The company declined to disclose its current valuation.

Founded in early 2023 by Tsinghua University alumnus Zhu Jun, ShengShu became the first Chinese company to release a video generation model when Vidu launched in April 2024. The move came months before OpenAI made its Sora tool widely available, a product OpenAI has since discontinued. ShengShu’s latest Vidu Q3 Pro model, released in January, ranks among the top 10 video generation models globally, per Artificial Analysis.

The investment is part of a broader Alibaba Cloud strategy in world-scale infrastructure. Last month, Alibaba and Baidu Ventures co-led a $50 million round in Tripo AI, a platform that generates 3D digital models from photographs and is developing its own world model. In September, Alibaba led a $60 million investment in PixVerse, which released a world model earlier this year. Alibaba has also open-sourced AI models for video generation and, in February, released a model for robotics applications. 

ShengShu also competes domestically with ByteDance and Kuaishou, both of which have released AI video generation tools, and faces international competition from Google and U.S. startups, including Runway.

The broader push into world models reflects a recognized ceiling on what large language models can accomplish on their own. Researchers and investors increasingly argue that AI systems built for robotics and physical-world applications require more than text-based reasoning. They require the ability to perceive, simulate, and respond to spatial and sensory inputs that LLMs were not designed to handle.

ShengShu said it has established strategic partnerships with companies developing embodied AI for industrial, commercial, and residential applications, though it did not name specific partners.

Avatar photo
NN Desk

Lasă un răspuns

Adresa ta de email nu va fi publicată. Câmpurile obligatorii sunt marcate cu *

Stay updated with NervNow Weekly

Subscribe now