Wenzhang Sun 孙文章

I work on structured spatiotemporal modeling for generative and multimodal systems, with a focus on human-centric video generation, efficient 3D/4D world representation, and multimodal evaluation.

My recent work at Li Auto Foundation Models centers on digital humans, video generation systems, 4D/world representations, and multimodal evaluation. Before that, I received my Ph.D. from Beijing Institute of Technology, where I worked on 3D human modeling and reconstruction.

Selected Publications Research Projects

research

Research Themes

Cartoon illustration of digital human video generation

Human-Centric Video Generation

Building controllable and interactive digital humans that can communicate naturally through audio, motion, expression, and video.

Cartoon illustration of 3D and 4D video world representation

Efficient 3D/4D World Representation

Learning compact spatiotemporal representations that connect video generation, 4D scene understanding, and future-world prediction.

Cartoon illustration of multimodal evaluation and agentic systems

Multimodal Evaluation & Systems

Designing evaluation protocols and agentic systems that make multimodal models more measurable, reliable, and useful in workflows.

news

Recent updates

2026.07SeMo accepted to ACM MM 2026.
2026.06Holo-World released on arXiv with a project page.
2026.06RiO-DETR accepted to ECCV 2026.
2026.05PREX released on arXiv with a project page.
2026.03RiO-DETR released as an arXiv preprint.
2026.02MUSE released as an arXiv preprint.
2026.01DrivingScene and PAGS accepted to ICASSP 2026.
2025.06FaceVid-1K accepted to ICCV 2025.
2025.02MoEE accepted to CVPR 2025.

selected work

Current focus

Interactive digital humans and world models

I am exploring real-time interactive digital-human generation for continuous multimodal dialogue, as well as 4D-guided video generation world models.

professional service

Peer Review

Conference Reviewer

NeurIPS · CVPR · ECCV · AAAI

Journal Reviewer

IEEE Transactions on Multimedia (TMM)