You are here

Spatial and Temporal Modeling for Human Activity Recognition from Multimodal Sequential Data

Download pdf | Full Screen View

Date Issued:
2016
Abstract/Description:
Human Activity Recognition (HAR) has been an intense research area for more than a decade. Different sensors, ranging from 2D and 3D cameras to accelerometers, gyroscopes, and magnetometers, have been employed to generate multimodal signals to detect various human activities. With the advancement of sensing technology and the popularity of mobile devices, depth cameras and wearable devices, such as Microsoft Kinect and smart wristbands, open a unprecedented opportunity to solve the challenging HAR problem by learning expressive representations from the multimodal signals recording huge amounts of daily activities which comprise a rich set of categories.Although competitive performance has been reported, existing methods focus on the statistical or spatial representation of the human activity sequence;while the internal temporal dynamics of the human activity sequence arenot sufficiently exploited. As a result, they often face the challenge of recognizing visually similar activities composed of dynamic patterns in different temporal order. In addition, many model-driven methods based on sophisticated features and carefully-designed classifiers are computationally demanding and unable to scale to a large dataset. In this dissertation, we propose to address these challenges from three different perspectives; namely, 3D spatial relationship modeling, dynamic temporal quantization, and temporal order encoding.We propose a novel octree-based algorithm for computing the 3D spatial relationships between objects from a 3D point cloud captured by a Kinect sensor. A set of 26 3D spatial directions are defined to describe the spatial relationship of an object with respect to a reference object. These 3D directions are implemented as a set of spatial operators, such as "AboveSouthEast" and "BelowNorthWest," of an event query language to query human activities in an indoor environment; for example, "A person walks in the hallway from north to south." The performance is quantitatively evaluated in a public RGBD object dataset and qualitatively investigated in a live video computing platform.In order to address the challenge of temporal modeling in human action recognition, we introduce the dynamic temporal quantization, a clustering-like algorithm to quantize human action sequences of varied lengths into fixed-size quantized vectors. A two-step optimization algorithm is proposed to jointly optimize the quantization of the original sequence. In the aggregation step, frames falling into the sample segment are aggregated by max-polling and produce the quantized representation of the segment. During the assignment step, frame-segment assignment is updated according to dynamic time warping, while the temporal order of the entire sequence is preserved. The proposed technique is evaluated on three public 3D human action datasets and achieves state-of-the-art performance.Finally, we propose a novel temporal order encoding approach that models the temporal dynamics of the sequential data for human activity recognition. The algorithm encodes the temporal order of the latent patterns extracted by the subspace projection and generates a highly compact First-Take-All (FTA) feature vector representing the entire sequential data. An optimization algorithm is further introduced to learn the optimized projections in order to increase the discriminative power of the FTA feature. The compactness of the FTA feature makes it extremely efficient for human activity recognition with nearest neighbor search based on Hamming distance. Experimental results on two public human activity datasets demonstrate the advantages of the FTA feature over state-of-the-art methods in both accuracy and efficiency.
Title: Spatial and Temporal Modeling for Human Activity Recognition from Multimodal Sequential Data.
43 views
24 downloads
Name(s): Ye, Jun, Author
Hua, Kien, Committee Chair
Foroosh, Hassan, Committee Member
Zou, Changchun, Committee Member
Karwowski, Waldemar, Committee Member
University of Central Florida, Degree Grantor
Type of Resource: text
Date Issued: 2016
Publisher: University of Central Florida
Language(s): English
Abstract/Description: Human Activity Recognition (HAR) has been an intense research area for more than a decade. Different sensors, ranging from 2D and 3D cameras to accelerometers, gyroscopes, and magnetometers, have been employed to generate multimodal signals to detect various human activities. With the advancement of sensing technology and the popularity of mobile devices, depth cameras and wearable devices, such as Microsoft Kinect and smart wristbands, open a unprecedented opportunity to solve the challenging HAR problem by learning expressive representations from the multimodal signals recording huge amounts of daily activities which comprise a rich set of categories.Although competitive performance has been reported, existing methods focus on the statistical or spatial representation of the human activity sequence;while the internal temporal dynamics of the human activity sequence arenot sufficiently exploited. As a result, they often face the challenge of recognizing visually similar activities composed of dynamic patterns in different temporal order. In addition, many model-driven methods based on sophisticated features and carefully-designed classifiers are computationally demanding and unable to scale to a large dataset. In this dissertation, we propose to address these challenges from three different perspectives; namely, 3D spatial relationship modeling, dynamic temporal quantization, and temporal order encoding.We propose a novel octree-based algorithm for computing the 3D spatial relationships between objects from a 3D point cloud captured by a Kinect sensor. A set of 26 3D spatial directions are defined to describe the spatial relationship of an object with respect to a reference object. These 3D directions are implemented as a set of spatial operators, such as "AboveSouthEast" and "BelowNorthWest," of an event query language to query human activities in an indoor environment; for example, "A person walks in the hallway from north to south." The performance is quantitatively evaluated in a public RGBD object dataset and qualitatively investigated in a live video computing platform.In order to address the challenge of temporal modeling in human action recognition, we introduce the dynamic temporal quantization, a clustering-like algorithm to quantize human action sequences of varied lengths into fixed-size quantized vectors. A two-step optimization algorithm is proposed to jointly optimize the quantization of the original sequence. In the aggregation step, frames falling into the sample segment are aggregated by max-polling and produce the quantized representation of the segment. During the assignment step, frame-segment assignment is updated according to dynamic time warping, while the temporal order of the entire sequence is preserved. The proposed technique is evaluated on three public 3D human action datasets and achieves state-of-the-art performance.Finally, we propose a novel temporal order encoding approach that models the temporal dynamics of the sequential data for human activity recognition. The algorithm encodes the temporal order of the latent patterns extracted by the subspace projection and generates a highly compact First-Take-All (FTA) feature vector representing the entire sequential data. An optimization algorithm is further introduced to learn the optimized projections in order to increase the discriminative power of the FTA feature. The compactness of the FTA feature makes it extremely efficient for human activity recognition with nearest neighbor search based on Hamming distance. Experimental results on two public human activity datasets demonstrate the advantages of the FTA feature over state-of-the-art methods in both accuracy and efficiency.
Identifier: CFE0006516 (IID), ucf:51367 (fedora)
Note(s): 2016-12-01
Ph.D.
Engineering and Computer Science, Computer Science
Doctoral
This record was generated from author submitted information.
Subject(s): Human Activity Recognition -- Human Action Recognition -- Sequential Data Recognition -- 3D Spatial Relationship -- Temporal Quantization -- Temporal Order Encoding -- Wearable Sensor Signals -- Optimization -- First-Take-All
Persistent Link to This Record: http://purl.flvc.org/ucf/fd/CFE0006516
Restrictions on Access: campus 2017-12-15
Host Institution: UCF

In Collections