Development of framework for facial expression analysis using representation learning
Loading...
Files
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Facial expressions play a crucial role in human social interaction; and this is the primary component needed to be integrated into machines to make human computer interaction more user friendly. Although, humans are very efficient in recognizing even the minute changes in facial expression but for machines it is a very complex
task. Recently, this area of research has attracted much needed attention due to its broad spectrum of application. However, expression analysis in an unconstrained environment is a very difficult task. Variations in illumination, facial features, head pose and changes in background make it very difficult to correctly recognize emotions in an open setup for commercial applications. This thesis develops deep learning based representation learning methods for analyzing facial expressions. In this work, multiple frameworks are developed for different applications of facial expression analysis. First proposed framework analyzes the
emotional sentiment represented by an image based on its content. The proposed system investigates the faces and background in the image, and extracts facial and scene features from them, respectively. Two different convolutional neural networks are used to extract facial and scene features. Conditional occurrence of these features is modeled using long short term memory networks to predict the sentiment represented by the image.
The second framework, proposed in this work, predicts likability of the multimedia content based on the facial expression of the viewer. A database with two different sets of video samples was collected for the task under unconstrained environment. First set of samples consists of videos to be watched by recruited subjects called
as stimulants. Second set of samples are recordings of facial expressions of subjects while watching stimulants. The proposed framework is a multimodal system which learns spatio-temporal features from the videos of subject to predict the likability. Combination of ’3D convolutional neural network’ and ’convolutional neural network - long short term memory network’ models are used to extract the features from iv spatial and temporal variations of face. Another model (designed using long short term memory networks) was used to analyze motion of facial landmarks in the videos of subjects.
Lastly, an activation function (Linearized Sigmoidal Activation) is proposed to improve the learning capacity of the deep learning models. Modeling non-linear dependencies in the data is very complex task. Higher nonlinearity activation function tend to have the problem of vanishing and exploding gradients. While, linear activation functions, like ReLU, do not have enough learning capacity. The proposed activation function have segment wise nonlinear behavior. It has linear characteristics inside a given segment, whereas nonlinear relationship exists among the segments.
