Relightable Neural Human Assets from Multi-view Gradient Illuminations

Abstract

Human modeling and relighting are two fundamental problems in computer vision and graphics, where high-quality datasets can largely facilitate related research. However, most existing human datasets only provide multi-view human images captured under the same illumination. Although valuable for modeling tasks, they are not readily used in relighting problems.

To promote research in both fields, in this paper, we present UltraStage, a new 3D human dataset that contains more than 2K high-quality human assets captured under both multi-view and multi-illumination settings. Specifically, for each example, we provide 32 surrounding views illuminated with one white light and two gradient illuminations. In addition to regular multi-view images, gradient illuminations help recover detailed surface normal and spatially-varying material maps, enabling various relighting applications.

Inspired by recent advances in neural representation, we further interpret each example into a neural human asset which allows novel view synthesis under arbitrary lighting conditions. We show our neural human assets can achieve extremely high capture performance and are capable of representing fine details such as facial wrinkles and cloth folds. We also validate UltraStage in single image relighting tasks, training neural networks with virtual relighted data from neural assets and demonstrating realistic rendering improvements over prior arts. UltraStage will be publicly available to the community to stimulate significant future developments in various human modeling and rendering tasks.

Video

Dataset

In total, UltraStage provides more than 2K human actions, each containing 32 8K images captured under three illuminations, resulting in a total of 192K high-quality frames. Given images captured under two gradient illuminations, we can estimate the corresponding high-quality surface normal maps. We propose to take the images captured under white light as the approximation of albedo maps.

Method Pipeline

Our neural processing pipeline consists of two stages. In the first stage, we take the high-quality normal maps as guidance, training a signed distance field (SDF). In the second stage, we devise a depth-guided texture blending method to synthesize more detailed albedo and normal buffers and apply inverse rendering frameworks to generate material buffer. We take use of these novel-view G-buffer maps to perform photo-realistic relighting.

Capture System

PlenOptic Stage Ultra is composed of 460 light panels, each with 48 LED beads, resulting in a total of 22,080 individually controllable light sources that can illuminate the scene with arbitrary lighting conditions.

Capture Process

For each pose, we capture it by 32 surrounding cameras under three lighting conditions: color gradient illumination, inverse color gradient illumination, and white light. The three lighting patterns are switched at 5fps.

BibTeX


    @inproceedings{zhou2023relightable,
      title={Relightable Neural Human Assets from Multi-view Gradient Illuminations},
      author={Zhou, Taotao and He, Kai and Wu, Di and Xu, Teng and Zhang, Qixuan and Shao, Kuixiang and Chen, Wenzheng and Xu, Lan and Yu, Jingyi},
      booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
      pages={4315--4327},
      year={2023}
    }

Acknowledgements

We thank Chang for data acquisition. We thank Hongyang Lin, Qiwei Qiu and Qingcheng Zhao for building the hardware. This work was supported by National Key R&D Program of China (2022YFF0902301), NSFC programs (61976138, 61977047), STCSM (2015F0203-000-06), and SHMEC (2019-01-07-00-01-E00003). We also acknowledge support from Shanghai Frontiers Science Center of Human-centered Artificial Intelligence (ShangHAI).