A Pose and Occlusion-Invariant 3D Facial Recognition System using Custom CNN and Double Transfer Learning
In the era of advanced surveillance and biometric authentication, traditional 2D face recognition systems often fail under challenging conditions such as head rotations, occlusions (e.g., hands, glasses), and expressions. This project proposes a novel and highly efficient 3D facial recognition system that remains robust against pose variation and occlusion, utilizing custom convolutional neural networks (CNN), lightweight deep learning backbones, and geometry-aware preprocessing.
Standard 2D face recognition systems rely heavily on frontal images and struggle with profile or occluded faces. Real-world applications, however, demand performance across multiple angles and obstructions. 3D face data, rich in spatial and angular information, offers a way forward. Yet, processing 3D point clouds using conventional CNNs requires a creative bridge between 3D geometry and 2D deep learning architectures.
Our pipeline converts raw 3D point cloud data into a CNN-compatible input using a concept called Multi-Feature Facial Maps (MFFM). This consists of three channels:
These three maps form a 3-channel input tensor of size 112x112x3, analogous to RGB images.
A two-stage Conv2D stack followed by a residual block processes the MFFM input to extract low-level facial geometry features. The Conv2D layers use 3x3 kernels with Batch Normalization and ReLU activation to ensure stable and non-linear transformation.
MobileNetV2, a lightweight and efficient CNN, is used as the core feature extractor. It is chosen over heavy models like ResNet due to its use of depthwise separable convolutions and inverted bottlenecks, drastically reducing computation by up to 8x while maintaining high accuracy. Each bottleneck comprises: