Wenzhang Sun avatar

Research-oriented Algorithm Engineer

Wenzhang Sun 孙文章

I work on structured spatiotemporal modeling for generative and multimodal systems, with a focus on human-centric video generation, efficient 3D/4D world representation, and multimodal evaluation.

My recent work at Li Auto Foundation Models centers on digital humans, video generation systems, 4D/world representations, and multimodal evaluation. Before that, I received my Ph.D. from Beijing Institute of Technology, where I worked on 3D human modeling and reconstruction.

research

Research Themes

Cartoon illustration of digital human video generation

Human-Centric Video Generation

Building controllable and interactive digital humans that can communicate naturally through audio, motion, expression, and video.

Cartoon illustration of 3D and 4D video world representation

Efficient 3D/4D World Representation

Learning compact spatiotemporal representations that connect video generation, 4D scene understanding, and future-world prediction.

Cartoon illustration of multimodal evaluation and agentic systems

Multimodal Evaluation & Systems

Designing evaluation protocols and agentic systems that make multimodal models more measurable, reliable, and useful in workflows.

news

Recent updates

  • RiO-DETR accepted to ECCV 2026.
  • DrivingScene and PAGS accepted to ICASSP 2026.
  • PREX released on arXiv with a project page.
  • RiO-DETR and MUSE released as arXiv preprints.
  • MoEE accepted to CVPR 2025; FaceVid-1K accepted to ICCV 2025.

selected work

Current focus

Interactive digital humans and world models

I am exploring real-time interactive digital-human generation for continuous multimodal dialogue, as well as 4D-guided video generation world models.