Recognition and representation: two faces of the same coin


While humans and computers both have the capacity to recognize faces, pattern recognition problems in computer vision seek to represent data in an appropriate way for the problem to solve. Machine learning methods approach pattern recognition as a statistical problem of searching for patterns in data. Through classifying data into different categories, reducing the data’s dimensions, approximating parts of the data, and other techniques that exploiting geometry, algebra, and other quantitative features of the data, computers can recognize faces, handwriting, images, and other visual stimuli in ways similar to humans.

Neural networks and deep learning methods have had practical applications in fingerprint analysis, disease etiology, and voice recognition, to name a few examples. Artificial intelligence continues to find success in various areas of research. But before computers can behave like humans, they need to represent the world in some way. The way computers interpret input data lets them represent the world.

These representation methods include principal component analysis (PCA), in which eigenfaces estimate variance among data. These eigenfaces, mathematical representations of key features used in human face recognition, are calculated using the the covariance matrix of the probability distribution over the high-dimensional vector space of face images. In other words, they let computers discern basic patterns among images of faces. PCA reveals eigenfaces that correspond to the least-squares solution so that the data variance is maintained while getting rid of existing correlations that don’t contribute to it. Generating eigenfaces involves extracting relevant facial information, through methods like searching for statistical variation between images, and representing them efficienctly, such as through using symmetry or other geometric features. While eigenfaces are automatic and easy to code, especially in how they can make complicated faces simple, they can become very sensitive to external features such as lighting and struggle to provide useful information about the faces themselves.

Other machine learning techniques such as classification find categories to map input data by discriminating between different features and representing those features with their distance from one another depending on their similarity. Fisherfaces, named after statistician Ronald Fisher, result from the basis vectors of a subspace representaiton of face images when performing linear discriminant analysis (LDA). Fisherfaces are less sensitive to lighting issues than eigenfaces and require data built upon continuous independent variables, such as skin tone or shape.

While eigenfaces depend upon PCA to account for data variance, fisherfaces use LDA searches for differences between classes of data. A team lead by computer science professor Peter Belhaumer at Columbia University found lower error rates among fisherfaces than among eigenfaces. In their paper, “Eigenfaces vs. Fisherfaces: recognition using class specific linear projection,” they accounted for variations in lighting and facial expressions. Regardless, researchers in computer vision can use both methods to minimize error when solving problems in recognition and representation.

Both eigenfaces and fisherfaces also struggle in capturing changes in expression and emotion among faces. For researchers in computer vision to approach facial expression analysis means understanding the nuance and complexity of the face. Muscular movements originate in the nerves by the VIIth cranial nerve from the brainstem between pons and medulla. The motor root of the nerve gives somatic muscle fibers to the face that create facial expressions. With enough data and efficient routines, pattern recognition can identify emotions from expressions and determine which parts of the brain may be involved in creating them. Neuroscientists have shown face muscles in lower parts of the face are more represented in the motor cortex which are especially involved in speech. From these patterns from hundreds and thousands of faces, computers may soon be be able to discern function from the form of a face itself.

The seven universal facial expressions of emotion (happy, surprise, sadness, fear, anger, contempt, and disgust) have seven ways to regulate themselves (expression, deamplified, neutralized, qualified, masked, amplified, and simulated). Psychologist Paul Ekman determined these emotions and expressions were universal among humans across different countries and levels of industrialization or development in his manuscripts “A New Pan-Cultural Facial Expression of Emotion” and ” The repertoire of nonverbal behavior: Categories, origins, usage, and coding.” He found that humans produced these expressions in response to similar conditions even regardless of how we may judge a face as expressing a certain emotion. The extent to which the emotions are universal, however, remains up to debate. Ekman supported his work by surveying humans across different civilizations, including tribes in New Guinea, but he argued this non-cognitive component is only a part of emotion. This automatic appraisal detects stimuli almost instantly identify elicitors which then activate the seven universal expressions that further cause the physiological elements of the emotional response. This would include any bodily change such as the feeling of one’s heart dropping, skeletal muscles tightening, facial muscles loosening,  changing voice pitch, and other features of the nervous system.

As recognition and representation depend upon one another, neuroscience and artificial intelligence continue to support one another despite their different paths. As expression and emotion intertwine into one another, we create a more nuanced picture of perception that speaks to who we are as humans. While eigenfaces and fisherfaces support similar goals as well, their different methods lets computer vision researchers attend to the variety of challenges deep learning has to offer.