You are here

PATTERNS OF MOTION: DISCOVERY AND GENERALIZED REPRESENTATION

Download pdf | Full Screen View

Date Issued:
2011
Abstract/Description:
In this dissertation, we address the problem of discovery and representation of motion patterns in a variety of scenarios, commonly encountered in vision applications. The overarching goal is to devise a generic representation, that captures any kind of object motion observable in video sequences. Such motion is a significant source of information typically employed for diverse applications such as tracking, anomaly detection, and action and event recognition. We present statistical frameworks for representation of motion characteristics of objects, learned from tracks or optical flow, for static as well as moving cameras, and propose algorithms for their application to a variety of problems. The proposed motion pattern models and learning methods are general enough to be employed in a variety of problems as we demonstrate experimentally. We first propose a novel method to model and learn the scene activity, observed by a static camera. The motion patterns of objects in the scene are modeled in the form of a multivariate non-parametric probability density function of spatiotemporal variables (object locations and transition times between them). Kernel Density Estimation (KDE) is used to learn this model in a completely unsupervised fashion. Learning is accomplished by observing the trajectories of objects by a static camera over extended periods of time. The model encodes the probabilistic nature of the behavior of moving objects in the scene and is useful for activity analysis applications, such as persistent tracking and anomalous motion detection. In addition, the model also captures salient scene features, such as, the areas of occlusion and most likely paths. Once the model is learned, we use a unified Markov Chain Monte-Carlo (MCMC) based framework for generating the most likely paths in the scene, improving foreground detection, persistent labelling of objects during tracking and deciding whether a given trajectory represents an anomaly to the observed motion patterns. Experiments with real world videos are reported which validate the proposed approach. The representation and estimation framework proposed above, however, has a few limitations. This algorithm proposes to use a single global statistical distribution to represent all kinds of motion observed in a particular scene. It therefore, does not find a separation between multiple semantically distinct motion patterns in the scene. Instead, the learned model is a joint distribution over all possible patterns followed by objects. To overcome this limitation, we then propose a superior method for the discovery and statistical representation of motion patterns in a scene. The advantages of this approach over the first one are two-fold: first, this model is applicable to scenes of dense crowded motion where tracking may not be feasible, and second, it distinguishes between motion patterns that are distinct at a semantic level of abstraction. We propose a mixture model representation of salient patterns of optical flow, and present an algorithm for learning these patterns from dense optical flow in a hierarchical, unsupervised fashion. Using low level cues of noisy optical flow, K-means is employed to initialize a Gaussian mixture model for temporally segmented clips of video. The components of this mixture are then filtered and instances of motion patterns are computed using a simple motion model, by linking components across space and time. Motion patterns are then initialized and membership of instances in different motion patterns is established by using KL divergence between mixture distributions of pattern instances. Finally, a pixel level representation of motion patterns is proposed by deriving conditional expectation of optical flow. Results of extensive experiments are presented for multiple surveillance sequences containing numerous patterns involving both pedestrian and vehicular traffic. The proposed method exploits optical flow as the low level feature and performs a hierarchical clustering to obtain motion patterns; and we observe that the use of optical flow is also an integral part of a variety of other vision applications, for example, as features based representation of human actions. We, therefore, propose a new representation for articulated human actions using the motion patterns. The representation is based on hierarchical clustering of observed optical flow in four dimensional, spatial and motion flow space. The automatically discovered motion patterns, are the primitive actions, representative of flow at salient regions on the human body, much like trajectories of body joints, which are notoriously difficult to obtain automatically. The proposed method works in a completely unsupervised fashion, and in sharp contrast to state of the art representations like bag of video words, provides a truly semantically meaningful representation. Each primitive action depicts the most atomic sub-action, like left arm moving upwards, or right leg moving downward and leftward, and is represented by a mixture of four dimensional Gaussian distributions. A sequence of primitive actions are discovered in the test video, and labelled by computing the KL divergence between mixtures. The entire video sequence containing the human action, is thus reduced to a simple string, which is matched against similar strings of training videos to classify the action. The string matching is performed by global alignment, using the well-known Needleman-Wunsch algorithm. Experiments reported on multiple human actions data sets, confirm the validity, simplicity, and semantically meaningful nature of the proposed representation. Results obtained are encouraging and comparable to the state of the art.
Title: PATTERNS OF MOTION: DISCOVERY AND GENERALIZED REPRESENTATION.
33 views
14 downloads
Name(s): Saleemi, Imran, Author
Shah, Mubarak, Committee Chair
University of Central Florida, Degree Grantor
Type of Resource: text
Date Issued: 2011
Publisher: University of Central Florida
Language(s): English
Abstract/Description: In this dissertation, we address the problem of discovery and representation of motion patterns in a variety of scenarios, commonly encountered in vision applications. The overarching goal is to devise a generic representation, that captures any kind of object motion observable in video sequences. Such motion is a significant source of information typically employed for diverse applications such as tracking, anomaly detection, and action and event recognition. We present statistical frameworks for representation of motion characteristics of objects, learned from tracks or optical flow, for static as well as moving cameras, and propose algorithms for their application to a variety of problems. The proposed motion pattern models and learning methods are general enough to be employed in a variety of problems as we demonstrate experimentally. We first propose a novel method to model and learn the scene activity, observed by a static camera. The motion patterns of objects in the scene are modeled in the form of a multivariate non-parametric probability density function of spatiotemporal variables (object locations and transition times between them). Kernel Density Estimation (KDE) is used to learn this model in a completely unsupervised fashion. Learning is accomplished by observing the trajectories of objects by a static camera over extended periods of time. The model encodes the probabilistic nature of the behavior of moving objects in the scene and is useful for activity analysis applications, such as persistent tracking and anomalous motion detection. In addition, the model also captures salient scene features, such as, the areas of occlusion and most likely paths. Once the model is learned, we use a unified Markov Chain Monte-Carlo (MCMC) based framework for generating the most likely paths in the scene, improving foreground detection, persistent labelling of objects during tracking and deciding whether a given trajectory represents an anomaly to the observed motion patterns. Experiments with real world videos are reported which validate the proposed approach. The representation and estimation framework proposed above, however, has a few limitations. This algorithm proposes to use a single global statistical distribution to represent all kinds of motion observed in a particular scene. It therefore, does not find a separation between multiple semantically distinct motion patterns in the scene. Instead, the learned model is a joint distribution over all possible patterns followed by objects. To overcome this limitation, we then propose a superior method for the discovery and statistical representation of motion patterns in a scene. The advantages of this approach over the first one are two-fold: first, this model is applicable to scenes of dense crowded motion where tracking may not be feasible, and second, it distinguishes between motion patterns that are distinct at a semantic level of abstraction. We propose a mixture model representation of salient patterns of optical flow, and present an algorithm for learning these patterns from dense optical flow in a hierarchical, unsupervised fashion. Using low level cues of noisy optical flow, K-means is employed to initialize a Gaussian mixture model for temporally segmented clips of video. The components of this mixture are then filtered and instances of motion patterns are computed using a simple motion model, by linking components across space and time. Motion patterns are then initialized and membership of instances in different motion patterns is established by using KL divergence between mixture distributions of pattern instances. Finally, a pixel level representation of motion patterns is proposed by deriving conditional expectation of optical flow. Results of extensive experiments are presented for multiple surveillance sequences containing numerous patterns involving both pedestrian and vehicular traffic. The proposed method exploits optical flow as the low level feature and performs a hierarchical clustering to obtain motion patterns; and we observe that the use of optical flow is also an integral part of a variety of other vision applications, for example, as features based representation of human actions. We, therefore, propose a new representation for articulated human actions using the motion patterns. The representation is based on hierarchical clustering of observed optical flow in four dimensional, spatial and motion flow space. The automatically discovered motion patterns, are the primitive actions, representative of flow at salient regions on the human body, much like trajectories of body joints, which are notoriously difficult to obtain automatically. The proposed method works in a completely unsupervised fashion, and in sharp contrast to state of the art representations like bag of video words, provides a truly semantically meaningful representation. Each primitive action depicts the most atomic sub-action, like left arm moving upwards, or right leg moving downward and leftward, and is represented by a mixture of four dimensional Gaussian distributions. A sequence of primitive actions are discovered in the test video, and labelled by computing the KL divergence between mixtures. The entire video sequence containing the human action, is thus reduced to a simple string, which is matched against similar strings of training videos to classify the action. The string matching is performed by global alignment, using the well-known Needleman-Wunsch algorithm. Experiments reported on multiple human actions data sets, confirm the validity, simplicity, and semantically meaningful nature of the proposed representation. Results obtained are encouraging and comparable to the state of the art.
Identifier: CFE0003646 (IID), ucf:48836 (fedora)
Note(s): 2011-05-01
Ph.D.
Engineering and Computer Science, School of Electrical Engineering and Computer Science
Masters
This record was generated from author submitted information.
Subject(s): computer vision
motion estimation
motion representation
motion patterns
traffic patterns
tracking
foreground detection
anomaly detection
kernel density estimation
Markov chain monte carlo (MCMC)
Metropolis-Hastings
optical flow
k-means
mixture of Gaussians
KL divergence
action recognition
primitive actions
string matching
unsupervised learning
Persistent Link to This Record: http://purl.flvc.org/ucf/fd/CFE0003646
Restrictions on Access: public 2011-04-01
Host Institution: UCF

In Collections