Imagine the situation: you go to VR game club and, before entering VR room, the system takes several photos of you in just a few seconds. Immediately after that, you’re being equipped, launching your favorite VR game in multiplayer mode, and are able to recognize your friends there not just by their nicknames and colored costumes, but by their faces and emotions. Many researchers all over the world and we are working on turning this vision into reality. One very important step on the path of making this a reality is the creation of a VR avatar of a real person.
There are video games in which you can manually choose your appearance, like Sims 4 and Fallout 3. Although, if you want your puppet to look like you, isn’t it better just to say: “Hey, computer, here are my photos. Change sliders automatically” - and get your avatar in few minutes (or seconds)?
In the following sections you will find:
- pretty images with results of our work.
- what you should do to start animating static, 3D models of a user’s head;
- links to literature where you can find more detailed explanations of the procedure;
- core formulas to familiarize with basic notations;
- Input data
We’ve started from animation of incomplete, 3D scan of a head. One may ask if that is even possible. We have a model of a head with a huge hole in it and no information about bones and muscles! There is a trick. First, we should have set of different heads with different facial expressions. Gathering such a database is challenging:
- Real faces
- You should involve several hundred people.
- Scan their faces. You may use:
- a multiple-view stereo approach to take shape information from several or dozens of images of one head. The photos should be done at the same moment;
- a 3D scanner like Microsoft Kinect.
- Make these faces have the same amount of vertices and triangles and create a bijection between vertices of each pair of faces. Each point of one face should correspond to semantically identical point on another face.
- Synthesized faces
- Hire 3D artists;
- Tell them to draw a lot of faces based on the basic one, but you may not be sure that these faces will be diverse enough as real faces.
If you want to have faces with emotions, you should not only perform the non-trivial operation of finding correspondence between different faces with neutral expressions but also the correspondence between faces with different expressions. The latter case is much more complex. Imagine matching a face with a closed mouth with a face with a wide-opened mouth automatically. Databases for this for non-commercial research purposes exist .
Sculpting Head of a User Automatically
Given a set of corresponding, 3D head models (also called a generative model), we can morph them to get faces which are similar to ones in our database but don’t exist there. For example, given a big head and little head, we can average their corresponding vertices componentwise and get an average head model. Below you can see an average model from a FLAME database of the Max Planck Institute . People in the dataset were captured in caps, so the back of the head may look weird.
We can morph faces from the database to get a needed one; here’s how. This procedure is described in various papers . Here is the main expression to minimize:
Minimization of the function is iterative: first, you should minimize k (surjective mapping from the base model to 3D scan), and then other arguments, then k again and so on.
The database with faces is denoted as M. Its s-th element contains n 3D vectors which are coordinates of the s-th vertex of heads from the database (there are n heads as you’ve probably guessed):
The 3D scan, called target mesh, may have normal vectors which must have the unit length:
Weights λ are constant, specified beforehand, and should satisfy following constraints:
Weights w are calculated on each iteration depending on the distance between corresponding points, orientations of their normal vectors, and manually predefined correspondences (if any):
Parameters to find
As was mentioned before, we should find a mapping from the base model to the 3D scan:
Next, we need to find the similarity transformation parameters: scale, rotation, and translation (offset):
Finally, the shape is needed. It’s described by weights of faces from the database:
The image below is a visualization of one iteration of the software implemented in the ARVI Lab. Vertices of the 3D scan have a blue color, while the source model has a white color for vertices not used — on the current iteration, they have weights w equal to 0. The colors green (low penalty) to red (high penalty) are used for vertices which are mapped to the target mesh.
You may not have recognized the original head because it’s textureless. Texture can be baked from the original model with the help of 3D computer graphics software like Blender and Autodesk 3DS Max. Next, you can see a rendering of the textured model with another expression:
Soon, we’ll be able to see ourselves and our friends in video games not just as titles on our avatars’ heads, but as recognizable bodies and faces showing real emotions. You’ll be able to read the lips of your comrades, meet real people on virtual streets, speak to crowds during meetings, and many other wondrous things!
 Tianye Li, Timo Bolkart, Michael J. Black, Hao Li, and Javier Romero. 2017. Learning a model of facial shape and expression from 4D scans. ACM Trans. Graph. 36, 6, Article 194 (November 2017), 17 pages. DOI: https://doi.org/10.1145/3130800.3130813
 Cao Chen, Yanlin Weng, Shun Zhou, Yiying Tong, Kun Zhou: "FaceWarehouse: a 3D Facial Expression Database for Visual Computing", IEEE Transactions on Visualization and Computer Graphics, 20(3): 413-425, 2014
 Schönborn, S., Egger, B., Morel-Forster, A. et al. Int J Comput Vis (2017) 123: 160. DOI: https://doi.org/10.1007/s11263-016-0967-5
 Large scale 3D Morphable Models J. Booth, A. Roussos, A. Ponniah, D. Dunaway, S. Zafeiriou. International Journal of Computer Vision (IJCV), April 2017. https://link.springer.com/article/10.1007/s11263-017-1009-7
 A 3D Morphable Model learnt from 10,000 faces J. Booth, A. Roussos, S. Zafeiriou, A. Ponniah, D. Dunaway. Proceedings of IEEE Int’l Conf. on Computer Vision and Pattern Recognition (CVPR), June 2016. http://ibug.doc.ic.ac.uk/media/uploads/documents/0002.pdf
 Alexandru Eugen Ichim, Sofien Bouaziz, and Mark Pauly. 2015. Dynamic 3D avatar creation from hand-held video input. ACM Trans. Graph. 34, 4, Article 45 (July 2015), 14 pages. DOI: https://doi.org/10.1145/2766974
 Hao Li, Jihun Yu, Yuting Ye, and Chris Bregler. 2013. Realtime facial animation with on-the-fly correctives. ACM Trans. Graph. 32, 4, Article 42 (July 2013), 10 pages. DOI: https://doi.org/10.1145/2461912.2462019
 B. Amberg, R. Knothe and T. Vetter, "Expression invariant 3D face recognition with a Morphable Model," 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition, Amsterdam, 2008, pp. 1-6. DOI: https://doi.org/10.1109/AFGR.2008.4813376