Official Repository for CVPR 2022 paper I M Avatar: Implicit Morphable Head Avatars from Videos.
- Clone this repo:
git clone --recursive [email protected]:zhengyuf/IMavatar.git
- Create a conda environment
conda env create -f environment.yml
and activateconda activate IMavatar
- We use
libmise
to extract 3D meshes, buildlibmise
by runningcd code; python setup.py install
- Download FLAME model, choose FLAME 2020 and unzip it, copy 'generic_model.pkl' into
./code/flame/FLAME2020
- When choosing your GPU, avoid RTX30xx since it seems unstable with broyden's method, see here if you want to know more. The results in the paper are obtained from a GeForce RTX2080Ti GPU. Quadro RTX6000 is also tested to converge well.
Download a preprocessed dataset from Google drive or ETH Zurich server. You can run download_data.bash
.
Or prepare your own dataset following intructions in ./preprocess/README.md
.
Link the dataset folder to ./data/datasets
. Link the experiment output folder to ./data/experiments
.
python scripts/exp_runner.py ---conf ./confs/IMavatar_supervised.conf [--wandb_workspace IMavatar] [--is_continue]
Set the is_eval flag for evaluation, optionally set checkpoint (if not, the latest checkpoint will be used) and load_path
python scripts/exp_runner.py --conf ./confs/IMavatar_supervised.conf --is_eval [--checkpoint 60] [--load_path ...]
Download a pretrained model from Google drive or ETH Zurich server. See download_data.bash
.
The following features are not used in the main paper, but helpful for training.
- Semantic-guided Training:
set
loss.gt_w_seg
toTrue
to use semantic segmentation during training. Using semantic maps leads to improved training stability, and better teeth reconstruction quality. - Ghost Bone:
If FLAME global rotations in your dataset are not identity matrices, set
deformer_network.ghostbone
toTrue
. This allow the shoulder and upper body to remain un-transformed. - Pose Optimization:
When the FLAME parameters are noisy, I find it helpful to set
optimize_camera
toTrue
. This optimizes both the FLAME pose parameters and the camera translation parameters. Similarly, setoptimize_expression
andoptimize_latent_code
toTrue
to optimize input expression parameters and per-frame latent codes.
- Our preprocessing script scales FLAME head meshes by 4 so that it would fit the unit sphere tighter. Remember to adjust camera positions accordingly if you are using your own preprocessing pipeline.
- Multi-GPU training is not tested. We found a single GPU to be sufficient in terms of batch size.
If you find our code or paper useful, please cite as:
@inproceedings{zheng2022imavatar,
title={{I} {M} {Avatar}: Implicit Morphable Head Avatars from Videos},
author={Zheng, Yufeng and Abrevaya, Victoria Fernández and Bühler, Marcel C. and Chen, Xu and Black, Michael J. and Hilliges, Otmar},
booktitle = {Computer Vision and Pattern Recognition (CVPR)},
year = {2022}
}