RePerformer: Immersive Human-centric Volumetric Videos from Playback to Photoreal Reperformance


CVPR 2025

Yuheng Jiang*, Zhehao Shen*, Chengcheng Guo, Yu Hong, Zhuo Su, Yingliang Zhang, Marc Habermann, Lan Xu

* Equal Contribution     † Corresponding Author

Paper Video

Given a dynamic sequence with dense footage, Reperformer can not only deliver accurate free-view playback but also realistically re-perform the dynamic scene under similar yet novel motions.

Overview Video



We introduce Reperformer, a novel 3DGS-based approach for generating human-centric volumetric videos from dense multi-view inputs. It provides realistic playback and can vividly reperform the non-rigid scenes driven by similar yet unseen motions

Pipeline


Our approach begins by disentangling motion/appearance Gaussians and repacking the attributes of the appearance Gaussians into 2D maps using a Morton-based 2D parameterization. For network training, we adopt the U-Net architecture to learn a generalizable mapping from the position maps to the attribute maps. For re-performance, we adopt a semantic-aware alignment module to associate the motion Gaussians of a new performer with the original appearance Gaussians, enabling seamless transfer and photorealistic rendering.

Reperformance


To handle more generalized cases, we propose a semantic-aware motion transfer module based on the motion Gaussian proxy, leveraging the U-Net generalization capability for re-performance.

More Results


Our playback approach achieves high-fidelity rendering of ultra-long and complex human interaction sequences with over 2000 frames.

Comparison


Qualitative comparison with SOTA playback-only methods on novel view synthesis on DualGS dataset.
Qualitative comparison with AP-NeRF, TAVA and SC-GS on both training motion and novel motion synthesis.
Qualitative comparison with Animatable Gaussians.

Result Gallery


Acknowledgements


The authors would like to thank Kaixing Zhang and Meihan Zheng from ShanghaiTech University for processing the dataset. We are grateful to Yize Wu,Lijun Chen from ShanghaiTech and Heming Zhu from MPII for insightful discussions. We also thank the reviewers for their feedback. This work was supported by National Key R&D Program of China (2022YFF0902301), Shanghai Local college capacity building program (22010502800). We also acknowledge support from Shanghai Frontiers Science Center of Human-centered Artificial Intelligence (ShangHAI).