Mapping Indoor Environments Based On Human Activity

Slawomir Grzonka, Andreas Karwath, Frederic Dijoux, Barbara Frank, Michael Ruhnke, Maxim Tatarchenko, Wolfram Burgard
University of Freiburg, Department of Computer Science, Autonomous Intelligent Systems


We present a novel approach to build approximate maps of structured environments utilizing human motion and activity (Grzonka et al., 2010, 2012). Our approach uses data recorded with an Xsens MVN data suit which is equipped with several IMUs to detect movements of a person and different activities including opening and closing doors as well as walking up and down stair cases. In our approach, we interpret the movements as motion constraints and door handling events as landmark detections in a graph-based SLAM framework. As we cannot distinguish between individual doors, we employ a multi-hypothesis approach on top of the SLAM system to deal with the high data-association uncertainty. As a result, our approach is able to accurately and robustly recover the trajectory of the person. We additionally take advantage of the fact that people traverse free space and that doors separate rooms to recover the geometric structure of the environment after the graph optimization. We evaluated our approach in several experiments carried out with dierent users and in environments of dierent types.

Activity Recognition

To be able to reconstruct environment models from the observation of human activities, we need to be able to recognize dierent kinds of activity such as walking, sitting down, opening doors. We investigated the use of motion templates presented by Muller and Roder (2006) for this task. A motion template consists of a set of Boolean features, for instance, the right hand is above the hip, together with their values over time. A template can be learned for each predened activity from a set of labelled training examples and allows us to reliably detect events such as opening and closing doors within 1.5ms of the true activity. To correct for the relatively large drift in the z-direction and to obtain an accurate estimate of the z-position of the human, we additionally considered the activity of climbing up and down staircases. Since the temporal accuracy of motion templates is not sucient for detection of this short activity, we trained a neural network to detect stair climbing events. This technique allows us to eciently detect stairs with a temporal accuracy of 12 ms and a detection rate of 95%.


Recently, we extended the system to reconstruct 3D models of indoor environments from human motion. These models contain several objects that are relevant to humans such as chairs, tables, and walls. The size and position of walls, tables, etc., is inferred from the human motion, for the walls, we additionally perform collision checks. Furthermore, we consider the activities and objects as landmark observations in a graph-based formulation of the simultaneous localization and mapping problem. This allows us to include loopclosure constraints whenever the user revisits an already reconstructed object using a nearest-neighbor data association. Whenever a loop is closed, we perform a least-squares optimization of the graph to correct for accumulated errors and to obtain a globally optimal solution.


To evaluate our mapping approach, we recorded data in a typical university building containing several floors and including small seminar rooms as well as big lecture halls and a small library. The trajectory is approximately 2.85 km long covering three floor levels. This experiment is challenging due to two reasons. First, the metal disturbances rising from the metal structure of the building itself and from walking closely to chairs and tables lead to a high pose error as can be seen in the raw data depicted in Figure 1. Second, the first and the second floor are nearly identical on one side of the building which results in many potential loop closure candidates. In this experiment we detected 175 out of 178 door handling events with an average error of 1m0.41 m. We also had one false alarm at the third floor level which originates from moving a chair away in the library which was blocking the users path. Regarding the stair detection, we missed 62 out of 473 stairs (42 stairs up and 20 stairs down). The average dierence between the calculated stair heights is 1.3 cm. A video illustrating an experimental run can be found here:

In a second experiment, we illustrate the 3D reconstruction of an environment with several chairs, walls, tables and screens (Figure 2). The model is constructed incrementally.

Whenever the user revisits an object and the system detects a loop-closure, we perform an optimization of the graph containing the trajectory of the user and the positions of objects in the model. The model construction is also illustrated in the following video:

More details, further experiments and videos can be found on our project website:


S. Grzonka, F. Dijoux, A. Karwath, and W. Burgard. Mapping indoor environments based on human activity. In Proc. IEEE International Conference on Robotics and Automation (ICRA), Anchorage, Ak, USA, 2010. doi: 10.1109/ROBOT.2010.5509976.

S. Grzonka, A. Karwath, F. Dijoux, and W. Burgard. Activity-based indoor mapping and estimation of human trajectories. IEEE Transactions on Robotics (T-RO), 8(1):234{245, March 2012. doi: 10.1109/TRO.2011.2165372.

M. Muller and T. Roder. Motion templates for automatic classication and retrieval of motion capture data. In Proceedings of the 2006 ACM SIGGRAPH/Eurographics symposium on Computer animation, pages 137{146, 2006.


I want MVN Analyze