Controllable 3D Face Synthesis with Conditional Generative Occupancy Fields

1CUHK MMLab           2Oxford VGG           3SenseTime Research

Abstract

Capitalizing on the recent advances in image generation models, existing controllable face image synthesis methods are able to generate high-fidelity images with some levels of controllability, e.g., controlling the shapes, expressions, textures, and poses of the generated face images. However, these methods focus on 2D image generative models, which are prone to producing inconsistent face images under large expression and pose changes. In this paper, we propose a new NeRF-based conditional 3D face synthesis framework, which enables 3D controllability over the generated face images by imposing explicit 3D conditions from 3D face priors. At its core is a conditional Generative Occupancy Field (cGOF) that effectively enforces the shape of the generated face to commit to a given 3D Morphable Model (3DMM) mesh. To achieve accurate control over fine-grained 3D face shapes of the synthesized image, we additionally incorporate a 3D landmark loss as well as a volume warping loss into our synthesis algorithm. Experiments validate the effectiveness of the proposed method, which is able to generate high-fidelity face images and shows more precise 3D controllability than state-of-the-art 2D-based controllable face synthesis methods.

Method Overview

(Top): Conditional Generative Occupancy Field (cGOF) leverages a mesh-guided volume sampler and a distance-aware volume density regularizer, which effectively conditions the generated NeRF on an input 3DMM mesh. It is trained in an adversarial learning framework using only single-view images. (Bottom): The 3D landmark loss encourages the semantically important facial landmarks to follow the input mesh, and the volume warping loss enforces two NeRF volumes generated with different expression codes to be consistent through a warping field induced from the corresponding 3DMM meshes

Our Results

Pose Control Demo.

Pose Control Comparison.

Expression Control Demo.

More Expression Control Results.


Citation

@misc{https://doi.org/10.48550/arxiv.2206.08361,
doi = {10.48550/ARXIV.2206.08361},
url = {https://arxiv.org/abs/2206.08361},
author = {Sun, Keqiang and Wu, Shangzhe and Huang, Zhaoyang and Zhang, Ning and Wang, Quan and Li, HongSheng},
keywords = {Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Controllable 3D Face Synthesis with Conditional Generative Occupancy Fields},
publisher = {arXiv},
year = {2022},
copyright = {arXiv.org perpetual, non-exclusive license}
}