A Pose and Occlusion-Invariant 3D Facial Recognition System using Custom CNN and Double Transfer Learning

In the era of advanced surveillance and biometric authentication, traditional 2D face recognition systems often fail under challenging conditions such as head rotations, occlusions (e.g., hands, glasses), and expressions. This project proposes a novel and highly efficient 3D facial recognition system that remains robust against pose variation and occlusion, utilizing custom convolutional neural networks (CNN), lightweight deep learning backbones, and geometry-aware preprocessing.


Motivation

Standard 2D face recognition systems rely heavily on frontal images and struggle with profile or occluded faces. Real-world applications, however, demand performance across multiple angles and obstructions. 3D face data, rich in spatial and angular information, offers a way forward. Yet, processing 3D point clouds using conventional CNNs requires a creative bridge between 3D geometry and 2D deep learning architectures.


Methodology Overview

Our pipeline converts raw 3D point cloud data into a CNN-compatible input using a concept called Multi-Feature Facial Maps (MFFM). This consists of three channels:

These three maps form a 3-channel input tensor of size 112x112x3, analogous to RGB images.

1. Preprocessing Module

A two-stage Conv2D stack followed by a residual block processes the MFFM input to extract low-level facial geometry features. The Conv2D layers use 3x3 kernels with Batch Normalization and ReLU activation to ensure stable and non-linear transformation.

2. CNN Backbone: MobileNetV2

MobileNetV2, a lightweight and efficient CNN, is used as the core feature extractor. It is chosen over heavy models like ResNet due to its use of depthwise separable convolutions and inverted bottlenecks, drastically reducing computation by up to 8x while maintaining high accuracy. Each bottleneck comprises: