Human Face Recognition

3 D face recognition in a ambient intelligence environment scenario

3-D Face Recognition in a Ambient Intelligence Environment Scenario

                  This scenario offers the opportunity to exploit the potential of face as a not intrusive biometric identifier to not just regulate access to the controlled environment but to adapt the provided services to the preferences of the recognized user. Biometric recognition  refers to the use of distinctive physiological (e.g., fingerprints, face, retina, iris) and behavioural (e.g., gait, signature) characteristics, called biometric identifiers, for

automatically recognizing individuals. Because biometric identifiers cannot be easily

misplaced, forged, or shared, they are considered more reliable for person recognition than traditional token or knowledge-based methods. Others typical objectives of biometric recognition are user convenience (e.g., service access without a Personal Identification Number), better security (e.g., difficult to forge access). All these reasons make biometrics very suited for Ambient Intelligence applications, and this is specially true for a biometric identifier such as face which is one of the most common methods of recognition that humans use in their visual interactions, and allows to recognize the user in a not intrusive way without any physical contact with the sensor. 

           We present a face recognition system based on 3D features to verify the identity of subjects accessing the controlled Ambient Intelligence Environment and to customize all the services accordingly. In other terms to add a social dimension to man-machine communication and thus may help to make such environments more attractive to the human user. The proposed  approach relies on stereoscopic face acquisition and 3D mesh reconstruction to avoid highly  expensive and not automated 3D scanning, typically not suited for real time applications. For each subject enrolled, a bidimensional feature descriptor is extracted from its 3D mesh and compared to the previously stored correspondent template. This descriptor is a normal map, namely a color image in which RGB components represent the normals to the face geometry. A weighting mask, automatically generated for each authorized person, improves recognition robustness to a wide range of facial expression.



                  We project the 3D geometry onto 2D space through spherical mapping. The result is a bidimensional representation of original face geometry which retains spatial relationships between facial features. Color info coming from face texture are used to mask eventual beard covered regions according to their relevance, resulting in a 8 bit greyscale filter mask (Flesh Mask). Then, a variety of facial expressions are generated from the neutral pose through a rig-based animation technique, and corresponding normal maps are used to compute a further 8 bit greyscale mask (Expression Weighting Mask) aimed to cope with expression variations. At this time the two greyscale masks are multiplied and the resulting map is used to augment with extra 8 bit per pixel the normal map, resulting in a 32 bit RGBA bitmap (Augmented Normal Map).




We also want to preserve the spatial relationships between facial features, so we project vertices’ 3D coordinates onto a 2D space using a spherical projection. We can now store normals of mesh M in a bidimensional array N using mapping coordinates, by this way each pixel represents a normal as RGB values .



Figure 2(a) 3d mesh model, (b) wireframe model, (c) projection in 2D spatial coordinates,(d) normal map


To compare the normal map NA from input subject to another normal map NB previously stored in the reference database, we compute through:

Θ= arcos( rNA .rNBgNA . gNB +   bNA . bNB )

the angle included between each pairs of normals represented by colors of pixels with

corresponding mapping coordinates, and store it in a new Difference Map D with

components r, g and b opportunely normalized from spatial domain to color domain, so

0 ≤ rNA ,gNA ,bNA ≤1 and 0 ≤ rNB, gNB, bNB ≤1. The value θ, with 0 ≤ θ ≤ π, is the angular

difference between the pixels with coordinates in (xNA  , yNA) in NA and ( xNB  , yNB ) in NB and it is stored in D as a gray-scale color. At this point, the histogram H is analyzed to estimate the similarity score between NA and  NB . On the X axis we represent the resulting angles between each pair of comparisons (sorted from 0° degree to 180° degree), while on the Y axis we represent the total number of differences found. The curvature of H represents the angular distance distribution between mesh MA and MB, thus two similar faces featuring very high values on small angles, whereas two unlike faces have more distributed differences (see Figure 3). We define a similarity score through a weighted sum between H and a Gaussian function G, as in:




where with the variation of σ and k is possible to change recognition sensibility. To reduce the effects of residual face misalignment during acquisition and sampling phases, we calculate the angle θ using a k × k (usually 3 × 3 or 5 × 5) matrix of neighbour pixels.






                                                        Figure 3.

Example of histogram H to represent the angular distances. (a) shows a typical

histogram between two similar Normal Maps, while (b) between two different Normal Maps .



Addressing Beard and Facial Expressions via 8 bit Alpha Channel

The presence of beard with variable length covering a portion of the face surface in a subject previously enrolled without it (or vice-versa), could lead to a measurable difference in the overall or local 3D shape of the face mesh (see Figure 4). In this case the recognition accuracy could be affected resulting, for instance, in a higher False Rejection Rate FRR. To improve the robustness to this kind of variable facial features we rely on color data from the captured face texture to mask the non-skin region, eventually disregarding them during the comparison.


figure 4. Normal maps of the same subject enrolled in two different sessions with and

without beard .We exploit flesh hue characterization in the HSB color space to discriminate between skin and beard/moustaches/eyebrows. Indeed, the hue component of each given texel is much less affected from lighting conditions during capturing then its corresponding RGB value. Nevertheless there could be a wide range of hue values within each skin region due to factors like facial morphology, skin conditions and pathologies, race, etc., so we need to define this range on a case by case basis to obtain a valid mask. To this aim we use a set of specific hue sampling spots located over the face texture at absolute coordinates, selected to be representative of flesh’s full tonal range and possibly distant enough from eyes, lips and typical beard and hair covered regions.


Figure 5.


                  Flesh Hue sampling points (a), Flesh Hue Range (b) non-skin regions in white (c) .This is possible because each face mesh and its texture are centered and normalized during the image based reconstruction process (i.e. the face’s median axis is always centered on the origin of 3D space with horizontal mapping coordinates equal to 0.5), otherwise normal map comparison would not be possible. We could use a 2D or 3D technique to locate main facial features (eye, nose and lips) and to position the sampling spots relative to this features, but even these approaches are not safe under all conditions. For each sampling spot we sample not just that texel but a 5 x 5 matrix of neighbour texels, averaging them to minimize the effect of local image noise. As any sampling spot could casually pick wrong values due to local skin color anomalies such as moles, scars or even for improper positioning, we calculate the median of all resulting hue values from all sampling spots, resulting in a main Flesh Hue Value FHV which is the center of the valid flesh hue range. We therefore consider belonging to skin region all the texels whose hue value is within the range: -t ≤ FHV ≤ t, where t is a hue tolerance which we experimentally found could be set below 10° (see Figure 5-b). After the skin region has been selected, it is filled with pure white while the remaining pixels are converted to a greyscale value depending on their distance from the selected flesh hue range (the more the distance the darker the value).

                 To improve the facial recognition system and to address facial expressions we opt to the use of expression weighting mask, a subject specific pre-calculated mask aimed to assign different relevance to different face regions. This mask, which shares the same size of normal map and difference map, contains for each pixel an 8 bit weight encoding the local rigidity of the face surface based on the analysis of a pre-built set of facial expressions of the same subject. Indeed, for each subject enrolled, each of expression variations (see Figure 6) is compared to the neutral face resulting in difference maps.


Figure 6.

                  An example of normal maps of the same subject featuring a neutral pose (leftmost face) and different facial expressions

The average of this set of difference maps specific to the same individual represent its

expression weighting mask. More precisely, given a generic face with its normal map N0

(neutral face) and the set of normal maps N1, N2, …, Nn (the expression variations), we first calculate the set of difference map D1, D2, …, Dn resulting from {N0 - N1, N0 - N2, …, N0 – Nn}. The average of set {D1, D2, …, Dn} is the expression weighting mask which is multiplied by the difference map in each comparison between two faces.

We generate the expression variations through a parametric rig based deformation system

previously applied to a prototype face mesh, morphed to fit the reconstructed face mesh. This fitting is achieved via a landmark-based volume morphing where the transformation and deformation of the prototype mesh is guided by the interpolation of a set of landmark points with a radial basis function. To improve the accuracy of this rough mesh fitting we need a surface optimization obtained minimizing a cost function based on the Euclidean distance between vertices.

                  So we can augment each 24 bit normal map with the product of Flesh Mask and Weighting Mask normalized to 8 bit (see Figure 7). The resulting 32 bit per pixel RGBA bitmap can be conveniently managed via various image formats like the Portable Network Graphics format (PNG) which is typically used to store for each pixel 24 bit of colour and 8 bit of alpha channel (transparency). When comparing any two faces, the difference map is computed on the first 24 bit of color info (normals) and multiplied to the alpha channel (filtering mask).  


Testing Face Recognition System into an Ambient Intelligence Framework


                   Ambient Intelligence (AmI) worlds offer exciting potential for rich interactive experiences. The metaphor of AmI envisages the future as intelligent environments where humans are surrounded by smart devices that makes the ambient itself perceptive to humans’ needs or wishes. The Ambient Intelligence Environment can be defined as the set of actuators and sensors composing the system together with the domotic interconnection protocol. People interact with electronic devices embedded in environments that are sensitive and responsive

to the presence of users. This objective is achievable if the environment is capable to learn,

build and manipulate user profiles considering from a side the need to clearly identify the

human attitude;in other terms, on the basis of physical and emotional user status captured

from a set of biometric features.



Figure 7.

Comparison of two Normal Maps using Flesh Mask and the resulting difference

Map (c)



Figure 8.

                  Ambient Intelligence Architecture.To design Ambient Intelligent environments, many methodologies and techniques have to be merged together originating many approaches reported in recent literature. We opt to a framework aimed to gather biometrical and environmental data ,described in to test the effectiveness of face recognition systems to aid security and to recognize the emotional user status. This AmI system’s architecture is organized in several sub-systems, as depicted in Figure 8, and it is based on the following sensors and actuators: internal and external temperature sensors and internal temperature actuator, internal and external luminosity sensor and internal luminosity actuator, indoor presence sensor, a infrared camera to capture thermal images of user and a set of color cameras to capture information about gait and facial features. Firstly Biometric Sensors are used to gather user’s biometrics (temperature, gait, position, facial expression, etc.) and part of this information is handled by Morphological Recognition Subsystems (MRS) able to organize it semantically. The resulting description, together with the remaining biometrics previously captured, are organized in a   hierarchical structure based on XML technology in order to create a new markup language, called H2ML (Human to Markup Language) representing user status at a given time. Considering a sequence of H2ML descriptions, the Behavioral Recognition Engine (BRE), tries to recognize a particular user behaviour for which the system is able to provide suitable services. The available services are regulated by means of the Service Regulation System (SRS) aimed to achieve hardware transparency and to minimize the fuzzy inference time.

This architecture is able to distribute personalized services on the basis of physical and

emotional user status captured from a set of biometric features and modelled by means of a mark-up language, based on XML. This approach is particularly suited to exploit biometric technologies to capture user’s physical info gathered in a semantic representation describing a human in terms of morphological features.





                   We presented a 3D face recognition method applied to an Ambient Intelligence Environment. The proposed approach to acquisition and recognition proved to be suited to the applicative context thanks to high accuracy and recognition speed, effectively exploiting the advantages of face over other biometrics. As the acquisition system requires the user to  look at a specific target to allow a valid face capture, we are working on a multi-angle stereoscopic camera arrangement, to make this critical task less annoying and more robust to a wide posing range. This 3D face recognition method based on 3D geometry and color texture is aimed to improve robustness to presence/absence of beard and to expressive variations. It proved to be simple and fast and experiments conducted showed high average recognition rate and a measurable effectiveness of both flesh mask and expression weighting mask. Ongoing research will implement a true multi-modal version of the basic algorithm with a second recognition engine dedicated to the color info (texture) which could further enhance the discriminating power.


Want To Know more with

Video ???

Contact for more learning: webmaster@freehost7com




The contents of this webpage are copyrighted © 2008
 All Rights Reserved.