PRNet Extension: 3D Face Reconstruction and Dense Alignment

Project Video

Outline

Introduction

Background Work

Datasets

Baseline Model

Model Details and Architecture

UV position map illustration — ground truth 3D face point cloud exactly matches the face in the 2D image when projected to the x-y plane. Thus the position map can be easily comprehended as replacing the r, g, b values in the texture map by x, y, z coordinates.
The architecture of PRNet. The Green rectangles represent the residual blocks, and the blue ones represent the transposed convolutional layers. It is an encoder-decoder structure, which learns how to transfer input images to a UV position map.
Illustration of the final weight mask. From left to right: UV texture map, UV position map, colored texture map with segmentation information (blue for eye region, red for nose region, green for mouth region, and purple for neck region), the final weight mask.

Baseline Training and Results

Baseline Training Loss over the epochs

Baseline Results

Top Pose: Left to Right: Original Ground Truth Image, 2D facial landmark points, 3D sparse alignment, 3D dense alignment
Side Pose: Left to Right: Original Ground Truth Image, 2D facial landmark points, 3D sparse alignment, 3D dense alignment
Side Pose: Left to Right: Original Ground Truth Image, 2D facial landmark points, 3D sparse alignment, 3D dense alignment
A GIF shows the model’s outputs of sparse and dense alignment from the 10th epoch to the 140th epoch.

Generalizability and Testing of the baseline model

Original Image of Prof. Deepak Pathak, 2D facial landmark points, 3D sparse alignment, 3D dense alignment
Testing the model’s outcomes on the images in the wild with images from different categories such as gender (male/female), poses(side/front/top) or accessories (with/without spectacles).

Novelties: PRNet Extensions

MobilenetV2 as encoder

MobileNetV2 Architecture

Improvements

MobileNet Training Loss over the epochs

MobileNet Results

Left to Right: Original Ground Truth Image, 2D facial landmark points, 3D sparse alignment, 3D dense alignment

Generalizability and Testing of the model

Original Image of Prof. Deepak Pathak, 2D facial landmark points, 3D sparse alignment, 3D dense alignment
Testing the model’s outcomes on the images in the wild with images from different categories such as gender (male/female), poses(side/front/top) or accessories (with/without spectacles).

Resnet18 as Encoder

ResNet18 Training Loss over the epochs

ResNet18 Results

Left to Right: Original Ground Truth Image, 2D facial landmark points, 3D sparse alignment, 3D dense alignment
A GIF shows the model’s outputs of sparse and dense alignment from the 0th epoch to the 490th epoch on the ResNet18 inspired model.

ResNet18 Model’s Generalizability

Original Image of Prof. Deepak Pathak, 2D facial landmark points, 3D sparse alignment, 3D dense alignment
Testing the model’s outcomes on the images in the wild with images from different categories such as gender (male/female), poses(side/front/top) or accessories (with/without spectacles)

Conclusion and Improvements

Comparison: Baseline PRNet (Left) vs ResNet18 PRNet. The Jawline is more prominent on the ResNet18 trained model, and the eye alignment is also more exact.
Comparison: Baseline PRNet(Left) vs ResNet18 PRNet: Dense Alignment of Side Face aligned picture
Comparison: Baseline PRNet(Left) vs ResNet18 PRNet: Dense Alignment of Top view aligned picture

Code and Saved Models

Note

References

Graduate students at Carnegie Mellon University