Inertial Body Tracking for Affective Computing using AutoBAP

Figure 1: Sensing setup. Participants were recorded by a Prosilica video camera (A), positioned next to a fiducial marker (B). They performed actions displayed on an LCD screen (C) whilst being tracked by an SMI eye tracker (D) and an Xsens BIOMECH motion capture suit (E).

Eduardo Velloso1, Andreas Bulling2, Hans Gellersen1

1 School of Computing and Communications, Lancaster University
Perceptual User Interfaces Group, Max Planck Institute for Informatics

Affective Computing is “computing that relates to, arises from, or deliberately influences emotion” [2].  An important task in Affective Computing is recognizing the affective state of the user. Several modalities can be used for this task, including facial expressions, prosodic features of speech, electrodermal activity, etc. An often overlooked modality is that of affective body expressions. Because a lot of our psychological states are conveyed by our body language, we are interested in recognizing affect from these bodily movements and postures. Analogously to how researchers have recognized emotions from facial expressions by breaking them down into FACS (Facial Action Coding System) units, we propose AutoBAP, a system that breaks down affective movements and postures into BAP (Body Action and Posture Coding System) units automatically using data provided by an Xsens MVN BIOMECH suit [3].

AutoBAP extracts BAP units from body movement and posture data using a machine learning algorithm that segments and classifies the data. In order to train this algorithm, we collected a motion tracking dataset in which participants performed different behaviours covered by the BAP coding system [1]. Six healthy participants with a mean age of 25 years (range 18 to 31) took part in the study. Each participant was asked to perform a sequence of actions while wearing an MVN BIOEMCH suit to track their bodies and an SMI pair of glasses that tracked their gaze point (see Figure 1). We exported the data into an XML file that contained the timestamped position, orientation, velocity and angular velocity of each segment in the global frame and the angle on each joint in their own reference frame, as well as the gaze position.

We then manually annotated the data according to BAP’s coding instructions using Anvil. To classify the behaviours, we segmented the data using Velloso et al.’s approach to unsupervised motion modelling [4] and trained 28 J48 decision trees, assisted by a set of hardcoded rules based on the coding system. Our classifier then outputs the classification into an XML file that can be visualised and edited in Anvil’s graphical user interface.

We evaluated the system using cross-validation, using the data from five participants for training and one for testing. We then compared the output of the system with manually annotated data by calculating the agreement as computed by Cohen’s Kappa.

Results and Discussion
Our study results show that AutoBAP can encode 172 out of the 274 labels in the complete BAP coding scheme with good agreement with a manual annotator (Cohen’s kappa > 0.6).The Body Action and Posture coding system is still in its early days at the time of writing. As the coding system matures and increases in adoption, studies in Affective Computing will lead to a better understanding of how these labels correlate to affective states. Hence, the automatic extractions of such labels will make it feasible to implement affect recognition systems that take into account the domain knowledge.

We recorded a scripted dataset to cover as many labels as possible, but this means that the actions were not natural. Future work will include recording unscripted affective data to improve the training dataset and to evaluate the classifiers in a realistic dataset. This will also allow us to explore the classification of action functions, such as emblems, illustrators and manipulators as well as the possibilities of using automatically extracted labels for affect recognition. In this first prototype we simulated the interlocutor as a fiducial marker and annotated gaze according to the distance between the gaze point and the fiducial marker as extracted by a computer vision toolkit. In the future, we would like to replace this for a face recognition system, so the system may be used in a real life setting.

Because manually annotating video data is a highly time-consuming and error-prone activity, it is unsuitable for long recordings. By using our approach, it is possible to annotate bigger datasets, making it possible to apply BAP to new application areas that take into account longer periods of time such as computational behaviour analysis or life logging.

[1] Dael, N., Mortillaro, M. and Scherer, K.R. 2012. The Body Action and Posture coding system (BAP): Development and reliability. Journal of Nonverbal Behavior. (2012), 1–25.
[2] Picard, R.W. 2000. Affective computing. MIT press.
[3] Velloso, E., Bulling, A. and Gellersen, H. 2013. AutoBAP: Automatic coding of body action and posture units from wearable sensors. Affective Computing and Intelligent Interaction (ACII), 2013 Humaine Association Conference on (2013), 135–140.
[4] Velloso, E., Bulling, A. and Gellersen, H. 2013. MotionMA: Motion Modelling and Analysis by Demonstration. Proc. of the 31st SIGCHI International Conference on Human Factors in Computing Systems (2013).


Figure 1: Sensing setup. Participants were recorded by a Prosilica video camera (A), positioned next to a fiducial marker (B). They performed actions displayed on an LCD screen (C) whilst being tracked by an SMI eye tracker (D) and an Xsens BIOMECH motion capture suit (E).


Xsens newsletter

Signup today to stay updated on all major Xsens news and developments. Subscribe now