Current Search: pattern recognition (x)
View All Items
- Title
- LEARNING SEMANTIC FEATURES FOR VISUAL RECOGNITION.
- Creator
-
Liu, Jingen, Shah, Mubarak, University of Central Florida
- Abstract / Description
-
Visual recognition (e.g., object, scene and action recognition) is an active area of research in computer vision due to its increasing number of real-world applications such as video (image) indexing and search, intelligent surveillance, human-machine interaction, robot navigation, etc. Effective modeling of the objects, scenes and actions is critical for visual recognition. Recently, bag of visual words (BoVW) representation, in which the image patches or video cuboids are quantized into...
Show moreVisual recognition (e.g., object, scene and action recognition) is an active area of research in computer vision due to its increasing number of real-world applications such as video (image) indexing and search, intelligent surveillance, human-machine interaction, robot navigation, etc. Effective modeling of the objects, scenes and actions is critical for visual recognition. Recently, bag of visual words (BoVW) representation, in which the image patches or video cuboids are quantized into visual words (i.e., mid-level features) based on their appearance similarity using clustering, has been widely and successfully explored. The advantages of this representation are: no explicit detection of objects or object parts and their tracking are required; the representation is somewhat tolerant to within-class deformations, and it is e±cient for matching. However, the performance of the BoVW is sensitive to the size of the visual vocabulary. Therefore, computationally expensive cross-validation is needed to find the appropriate quantization granularity. This limitation is partially due to the fact that the visual words are not semantically meaningful. This limits the effectiveness and compactness of the representation. To overcome these shortcomings, in this thesis we present principled approach to learn a semantic vocabulary (i.e. high-level features) from a large amount of visual words (mid-level features). In this context, the thesis makes two major contributions. First, we have developed an algorithm to discover a compact yet discriminative semantic vocabulary. This vocabulary is obtained by grouping the visual-words based on their distribution in videos (images) into visual-word clusters. The mutual information (MI) be- tween the clusters and the videos (images) depicts the discriminative power of the semantic vocabulary, while the MI between visual-words and visual-word clusters measures the compactness of the vocabulary. We apply the information bottleneck (IB) algorithm to find the optimal number of visual-word clusters by ¯nding the good tradeoff between compactness and discriminative power. We tested our proposed approach on the state-of-the-art KTH dataset, and obtained average accuracy of 94.2%. However, this approach performs one-side clustering, because only visual words are clustered regardless of which video they appear in. In order to leverage the co-occurrence of visual words and images, we have developed the co-clustering algorithm to simultaneously group the visual words and images. We tested our approach on the publicly available ¯fteen scene dataset and have obtained about 4% increase in the average accuracy compared to the one side clustering approaches. Second, instead of grouping the mid-level features, we first embed the features into a low-dimensional semantic space by manifold learning, and then perform the clustering. We apply Diffusion Maps (DM) to capture the local geometric structure of the mid-level feature space. The DM embedding is able to preserve the explicitly defined diffusion distance, which reflects the semantic similarity between any two features. Furthermore, the DM provides multi-scale analysis capability by adjusting the time steps in the Markov transition matrix. The experiments on KTH dataset show that DM can perform much better (about 3% to 6% improvement in average accuracy) than other manifold learning approaches and IB method. Above methods use only single type of features. In order to combine multiple heterogeneous features for visual recognition, we further propose the Fielder Embedding to capture the complicated semantic relationships between all entities (i.e., videos, images,heterogeneous features). The discovered relationships are then employed to further increase the recognition rate. We tested our approach on Weizmann dataset, and achieved about 17% 21% improvements in the average accuracy.
Show less - Date Issued
- 2009
- Identifier
- CFE0002936, ucf:47961
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0002936
- Title
- EFFICIENT ALGORITHMS FOR CORRELATION PATTERN RECOGNITION.
- Creator
-
Ragothaman, Pradeep, Mikhael, Wasfy, University of Central Florida
- Abstract / Description
-
The mathematical operation of correlation is a very simple concept, yet has a very rich history of application in a variety of engineering fields. It is essentially nothing but a technique to measure if and to what degree two signals match each other. Since this is a very basic and universal task in a wide variety of fields such as signal processing, communications, computer vision etc., it has been an important tool. The field of pattern recognition often deals with the task of analyzing...
Show moreThe mathematical operation of correlation is a very simple concept, yet has a very rich history of application in a variety of engineering fields. It is essentially nothing but a technique to measure if and to what degree two signals match each other. Since this is a very basic and universal task in a wide variety of fields such as signal processing, communications, computer vision etc., it has been an important tool. The field of pattern recognition often deals with the task of analyzing signals or useful information from signals and classifying them into classes. Very often, these classes are predetermined, and examples (templates) are available for comparison. This task naturally lends itself to the application of correlation as a tool to accomplish this goal. Thus the field of Correlation Pattern Recognition has developed over the past few decades as an important area of research. From the signal processing point of view, correlation is nothing but a filtering operation. Thus there has been a great deal of work in using concepts from filter theory to develop Correlation Filters for pattern recognition. While considerable work has been to done to develop linear correlation filters over the years, especially in the field of Automatic Target Recognition, a lot of attention has recently been paid to the development of Quadratic Correlation Filters (QCF). QCFs offer the advantages of linear filters while optimizing a bank of these simultaneously to offer much improved performance. This dissertation develops efficient QCFs that offer significant savings in storage requirements and computational complexity over existing designs. Firstly, an adaptive algorithm is presented that is able to modify the QCF coefficients as new data is observed. Secondly, a transform domain implementation of the QCF is presented that has the benefits of lower computational complexity and computational requirements while retaining excellent recognition accuracy. Finally, a two dimensional QCF is presented that holds the potential to further save on storage and computations. The techniques are developed based on the recently proposed Rayleigh Quotient Quadratic Correlation Filter (RQQCF) and simulation results are provided on synthetic and real datasets.
Show less - Date Issued
- 2007
- Identifier
- CFE0001974, ucf:47429
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0001974
- Title
- SPEAKER IDENTIFICATION BASED ON DISCRIMINATIVE VECTOR QUANTIZATION AND DATA FUSION.
- Creator
-
Zhou, Guangyu, Mikhael, Wasfy, University of Central Florida
- Abstract / Description
-
Speaker Identification (SI) approaches based on discriminative Vector Quantization (VQ) and data fusion techniques are presented in this dissertation. The SI approaches based on Discriminative VQ (DVQ) proposed in this dissertation are the DVQ for SI (DVQSI), the DVQSI with Unique speech feature vector space segmentation for each speaker pair (DVQSI-U), and the Adaptive DVQSI (ADVQSI) methods. The difference of the probability distributions of the speech feature vector sets from various...
Show moreSpeaker Identification (SI) approaches based on discriminative Vector Quantization (VQ) and data fusion techniques are presented in this dissertation. The SI approaches based on Discriminative VQ (DVQ) proposed in this dissertation are the DVQ for SI (DVQSI), the DVQSI with Unique speech feature vector space segmentation for each speaker pair (DVQSI-U), and the Adaptive DVQSI (ADVQSI) methods. The difference of the probability distributions of the speech feature vector sets from various speakers (or speaker groups) is called the interspeaker variation between speakers (or speaker groups). The interspeaker variation is the measure of template differences between speakers (or speaker groups). All DVQ based techniques presented in this contribution take advantage of the interspeaker variation, which are not exploited in the previous proposed techniques by others that employ traditional VQ for SI (VQSI). All DVQ based techniques have two modes, the training mode and the testing mode. In the training mode, the speech feature vector space is first divided into a number of subspaces based on the interspeaker variations. Then, a discriminative weight is calculated for each subspace of each speaker or speaker pair in the SI group based on the interspeaker variation. The subspaces with higher interspeaker variations play more important roles in SI than the ones with lower interspeaker variations by assigning larger discriminative weights. In the testing mode, discriminative weighted average VQ distortions instead of equally weighted average VQ distortions are used to make the SI decision. The DVQ based techniques lead to higher SI accuracies than VQSI. DVQSI and DVQSI-U techniques consider the interspeaker variation for each speaker pair in the SI group. In DVQSI, speech feature vector space segmentations for all the speaker pairs are exactly the same. However, each speaker pair of DVQSI-U is treated individually in the speech feature vector space segmentation. In both DVQSI and DVQSI-U, the discriminative weights for each speaker pair are calculated by trial and error. The SI accuracies of DVQSI-U are higher than those of DVQSI at the price of much higher computational burden. ADVQSI explores the interspeaker variation between each speaker and all speakers in the SI group. In contrast with DVQSI and DVQSI-U, in ADVQSI, the feature vector space segmentation is for each speaker instead of each speaker pair based on the interspeaker variation between each speaker and all the speakers in the SI group. Also, adaptive techniques are used in the discriminative weights computation for each speaker in ADVQSI. The SI accuracies employing ADVQSI and DVQSI-U are comparable. However, the computational complexity of ADVQSI is much less than that of DVQSI-U. Also, a novel algorithm to convert the raw distortion outputs of template-based SI classifiers into compatible probability measures is proposed in this dissertation. After this conversion, data fusion techniques at the measurement level can be applied to SI. In the proposed technique, stochastic models of the distortion outputs are estimated. Then, the posteriori probabilities of the unknown utterance belonging to each speaker are calculated. Compatible probability measures are assigned based on the posteriori probabilities. The proposed technique leads to better SI performance at the measurement level than existing approaches.
Show less - Date Issued
- 2005
- Identifier
- CFE0000720, ucf:46621
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0000720
- Title
- SELF DESIGNING PATTERN RECOGNITION SYSTEM EMPLOYING MULTISTAGE CLASSIFICATION.
- Creator
-
ABDELWAHAB, MANAL MAHMOUD, Mikhael, Wasfy, University of Central Florida
- Abstract / Description
-
Recently, pattern recognition/classification has received a considerable attention in diverse engineering fields such as biomedical imaging, speaker identification, fingerprint recognition, etc. In most of these applications, it is desirable to maintain the classification accuracy in the presence of corrupted and/or incomplete data. The quality of a given classification technique is measured by the computational complexity, execution time of algorithms, and the number of patterns that can be...
Show moreRecently, pattern recognition/classification has received a considerable attention in diverse engineering fields such as biomedical imaging, speaker identification, fingerprint recognition, etc. In most of these applications, it is desirable to maintain the classification accuracy in the presence of corrupted and/or incomplete data. The quality of a given classification technique is measured by the computational complexity, execution time of algorithms, and the number of patterns that can be classified correctly despite any distortion. Some classification techniques that are introduced in the literature are described in Chapter one.In this dissertation, a pattern recognition approach that can be designed to have evolutionary learning by developing the features and selecting the criteria that are best suited for the recognition problem under consideration is proposed. Chapter two presents some of the features used in developing the set of criteria employed by the system to recognize different types of signals. It also presents some of the preprocessing techniques used by the system. The system operates in two modes, namely, the learning (training) mode, and the running mode. In the learning mode, the original and preprocessed signals are projected into different transform domains. The technique automatically tests many criteria over the range of parameters for each criterion. A large number of criteria are developed from the features extracted from these domains. The optimum set of criteria, satisfying specific conditions, is selected. This set of criteria is employed by the system to recognize the original or noisy signals in the running mode. The modes of operation and the classification structures employed by the system are described in details in Chapter three.The proposed pattern recognition system is capable of recognizing an enormously large number of patterns by virtue of the fact that it analyzes the signal in different domains and explores the distinguishing characteristics in each of these domains. In other words, this approach uses available information and extracts more characteristics from the signals, for classification purposes, by projecting the signal in different domains. Some experimental results are given in Chapter four showing the effect of using mathematical transforms in conjunction with preprocessing techniques on the classification accuracy. A comparison between some of the classification approaches, in terms of classification rate in case of distortion, is also given.A sample of experimental implementations is presented in chapter 5 and chapter 6 to illustrate the performance of the proposed pattern recognition system. Preliminary results given confirm the superior performance of the proposed technique relative to the single transform neural network and multi-input neural network approaches for image classification in the presence of additive noise.
Show less - Date Issued
- 2004
- Identifier
- CFE0000020, ucf:46077
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0000020
- Title
- DETECTING CURVED OBJECTS AGAINST CLUTTERED BACKGROUNDS.
- Creator
-
Prokaj, Jan, Lobo, Niels, University of Central Florida
- Abstract / Description
-
Detecting curved objects against cluttered backgrounds is a hard problem in computer vision. We present new low-level and mid-level features to function in these environments. The low-level features are fast to compute, because they employ an integral image approach, which makes them especially useful in real-time applications. The mid-level features are built from low-level features, and are optimized for curved object detection. The usefulness of these features is tested by designing an...
Show moreDetecting curved objects against cluttered backgrounds is a hard problem in computer vision. We present new low-level and mid-level features to function in these environments. The low-level features are fast to compute, because they employ an integral image approach, which makes them especially useful in real-time applications. The mid-level features are built from low-level features, and are optimized for curved object detection. The usefulness of these features is tested by designing an object detection algorithm using these features. Object detection is accomplished by transforming the mid-level features into weak classifiers, which then produce a strong classifier using AdaBoost. The resulting strong classifier is then tested on the problem of detecting heads with shoulders. On a database of over 500 images of people, cropped to contain head and shoulders, and with a diverse set of backgrounds, the detection rate is 90% while the false positive rate on a database of 500 negative images is less than 2%.
Show less - Date Issued
- 2008
- Identifier
- CFE0002102, ucf:47535
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0002102
- Title
- ADAPTIVE INTELLIGENT USER INTERFACES WITH EMOTION RECOGNITION.
- Creator
-
NASOZ, FATMA, Christine Lisetti, Dr L., University of Central Florida
- Abstract / Description
-
The focus of this dissertation is on creating Adaptive Intelligent User Interfaces to facilitate enhanced natural communication during the Human-Computer Interaction by recognizing users' affective states (i.e., emotions experienced by the users) and responding to those emotions by adapting to the current situation via an affective user model created for each user. Controlled experiments were designed and conducted in a laboratory environment and in a Virtual Reality environment to collect...
Show moreThe focus of this dissertation is on creating Adaptive Intelligent User Interfaces to facilitate enhanced natural communication during the Human-Computer Interaction by recognizing users' affective states (i.e., emotions experienced by the users) and responding to those emotions by adapting to the current situation via an affective user model created for each user. Controlled experiments were designed and conducted in a laboratory environment and in a Virtual Reality environment to collect physiological data signals from participants experiencing specific emotions. Algorithms (k-Nearest Neighbor [KNN], Discriminant Function Analysis [DFA], Marquardt-Backpropagation [MBP], and Resilient Backpropagation [RBP]) were implemented to analyze the collected data signals and to find unique physiological patterns of emotions. Emotion Elicitation with Movie Clips Experiment was conducted to elicit Sadness, Anger, Surprise, Fear, Frustration, and Amusement from participants. Overall, the three algorithms: KNN, DFA, and MBP, could recognize emotions with 72.3%, 75.0%, and 84.1% accuracy, respectively. Driving Simulator experiment was conducted to elicit driving-related emotions and states (panic/fear, frustration/anger, and boredom/sleepiness). The KNN, MBP and RBP Algorithms were used to classify the physiological signals by corresponding emotions. Overall, KNN could classify these three emotions with 66.3%, MBP could classify them with 76.7% and RBP could classify them with 91.9% accuracy. Adaptation of the interface was designed to provide multi-modal feedback to the users about their current affective state and to respond to users' negative emotional states in order to decrease the possible negative impacts of those emotions. Bayesian Belief Networks formalization was employed to develop the User Model to enable the intelligent system to appropriately adapt to the current context and situation by considering user-dependent factors, such as: personality traits and preferences.
Show less - Date Issued
- 2004
- Identifier
- CFE0000126, ucf:46201
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0000126
- Title
- INVESTIGATION OF DAMAGE DETECTION METHODOLOGIES FOR STRUCTURAL HEALTH MONITORING.
- Creator
-
Gul, Mustafa, Catbas, F. Necati, University of Central Florida
- Abstract / Description
-
Structural Health Monitoring (SHM) is employed to track and evaluate damage and deterioration during regular operation as well as after extreme events for aerospace, mechanical and civil structures. A complete SHM system incorporates performance metrics, sensing, signal processing, data analysis, transmission and management for decision-making purposes. Damage detection in the context of SHM can be successful by employing a collection of robust and practical damage detection methodologies...
Show moreStructural Health Monitoring (SHM) is employed to track and evaluate damage and deterioration during regular operation as well as after extreme events for aerospace, mechanical and civil structures. A complete SHM system incorporates performance metrics, sensing, signal processing, data analysis, transmission and management for decision-making purposes. Damage detection in the context of SHM can be successful by employing a collection of robust and practical damage detection methodologies that can be used to identify, locate and quantify damage or, in general terms, changes in observable behavior. In this study, different damage detection methods are investigated for global condition assessment of structures. First, different parametric and non-parametric approaches are re-visited and further improved for damage detection using vibration data. Modal flexibility, modal curvature and un-scaled flexibility based on the dynamic properties that are obtained using Complex Mode Indicator Function (CMIF) are used as parametric damage features. Second, statistical pattern recognition approaches using time series modeling in conjunction with outlier detection are investigated as a non-parametric damage detection technique. Third, a novel methodology using ARX models (Auto-Regressive models with eXogenous output) is proposed for damage identification. By using this new methodology, it is shown that damage can be detected, located and quantified without the need of external loading information. Next, laboratory studies are conducted on different test structures with a number of different damage scenarios for the evaluation of the techniques in a comparative fashion. Finally, application of the methodologies to real life data is also presented along with the capabilities and limitations of each approach in light of analysis results of the laboratory and real life data.
Show less - Date Issued
- 2009
- Identifier
- CFE0002830, ucf:48069
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0002830
- Title
- An Unsupervised Consensus Control Chart Pattern Recognition Framework.
- Creator
-
Haghtalab, Siavash, Xanthopoulos, Petros, Pazour, Jennifer, Rabelo, Luis, University of Central Florida
- Abstract / Description
-
Early identification and detection of abnormal time series patterns is vital for a number of manufacturing.Slide shifts and alterations of time series patterns might be indicative of some anomalyin the production process, such as machinery malfunction. Usually due to the continuous flow of data monitoring of manufacturing processes requires automated Control Chart Pattern Recognition(CCPR) algorithms. The majority of CCPR literature consists of supervised classification algorithms. Less...
Show moreEarly identification and detection of abnormal time series patterns is vital for a number of manufacturing.Slide shifts and alterations of time series patterns might be indicative of some anomalyin the production process, such as machinery malfunction. Usually due to the continuous flow of data monitoring of manufacturing processes requires automated Control Chart Pattern Recognition(CCPR) algorithms. The majority of CCPR literature consists of supervised classification algorithms. Less studies consider unsupervised versions of the problem. Despite the profound advantageof unsupervised methodology for less manual data labeling their use is limited due to thefact that their performance is not robust enough for practical purposes. In this study we propose the use of a consensus clustering framework. Computational results show robust behavior compared to individual clustering algorithms.
Show less - Date Issued
- 2014
- Identifier
- CFE0005178, ucf:50670
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0005178
- Title
- EXPLOITING OPPONENT MODELING FOR LEARNING IN MULTI-AGENT ADVERSARIAL GAMES.
- Creator
-
Laviers, Kennard, Sukthankar, Gita, University of Central Florida
- Abstract / Description
-
An issue with learning effective policies in multi-agent adversarial games is that the size of the search space can be prohibitively large when the actions of both teammates and opponents are considered simultaneously. Opponent modeling, predicting an opponent's actions in advance of execution, is one approach for selecting actions in adversarial settings, but it is often performed in an ad hoc way. In this dissertation, we introduce several methods for using opponent modeling, in the form of...
Show moreAn issue with learning effective policies in multi-agent adversarial games is that the size of the search space can be prohibitively large when the actions of both teammates and opponents are considered simultaneously. Opponent modeling, predicting an opponent's actions in advance of execution, is one approach for selecting actions in adversarial settings, but it is often performed in an ad hoc way. In this dissertation, we introduce several methods for using opponent modeling, in the form of predictions about the players' physical movements, to learn team policies. To explore the problem of decision-making in multi-agent adversarial scenarios, we use our approach for both offline play generation and real-time team response in the Rush 2008 American football simulator. Simultaneously predicting the movement trajectories, future reward, and play strategies of multiple players in real-time is a daunting task but we illustrate how it is possible to divide and conquer this problem with an assortment of data-driven models. By leveraging spatio-temporal traces of player movements, we learn discriminative models of defensive play for opponent modeling. With the reward information from previous play matchups, we use a modified version of UCT (Upper Conference Bounds applied to Trees) to create new offensive plays and to learn play repairs to counter predicted opponent actions. In team games, players must coordinate effectively to accomplish tasks while foiling their opponents either in a preplanned or emergent manner. An effective team policy must generate the necessary coordination, yet considering all possibilities for creating coordinating subgroups is computationally infeasible. Automatically identifying and preserving the coordination between key subgroups of teammates can make search more productive by pruning policies that disrupt these relationships. We demonstrate that combining opponent modeling with automatic subgroup identification can be used to create team policies with a higher average yardage than either the baseline game or domain-specific heuristics.
Show less - Date Issued
- 2011
- Identifier
- CFE0003914, ucf:48720
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0003914
- Title
- Cost-Sensitive Learning-based Methods for Imbalanced Classification Problems with Applications.
- Creator
-
Razzaghi, Talayeh, Xanthopoulos, Petros, Karwowski, Waldemar, Pazour, Jennifer, Mikusinski, Piotr, University of Central Florida
- Abstract / Description
-
Analysis and predictive modeling of massive datasets is an extremely significant problem that arises in many practical applications. The task of predictive modeling becomes even more challenging when data are imperfect or uncertain. The real data are frequently affected by outliers, uncertain labels, and uneven distribution of classes (imbalanced data). Such uncertainties createbias and make predictive modeling an even more difficult task. In the present work, we introduce a cost-sensitive...
Show moreAnalysis and predictive modeling of massive datasets is an extremely significant problem that arises in many practical applications. The task of predictive modeling becomes even more challenging when data are imperfect or uncertain. The real data are frequently affected by outliers, uncertain labels, and uneven distribution of classes (imbalanced data). Such uncertainties createbias and make predictive modeling an even more difficult task. In the present work, we introduce a cost-sensitive learning method (CSL) to deal with the classification of imperfect data. Typically, most traditional approaches for classification demonstrate poor performance in an environment with imperfect data. We propose the use of CSL with Support Vector Machine, which is a well-known data mining algorithm. The results reveal that the proposed algorithm produces more accurate classifiers and is more robust with respect to imperfect data. Furthermore, we explore the best performance measures to tackle imperfect data along with addressing real problems in quality control and business analytics.
Show less - Date Issued
- 2014
- Identifier
- CFE0005542, ucf:50298
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0005542
- Title
- PATTERNS OF MOTION: DISCOVERY AND GENERALIZED REPRESENTATION.
- Creator
-
Saleemi, Imran, Shah, Mubarak, University of Central Florida
- Abstract / Description
-
In this dissertation, we address the problem of discovery and representation of motion patterns in a variety of scenarios, commonly encountered in vision applications. The overarching goal is to devise a generic representation, that captures any kind of object motion observable in video sequences. Such motion is a significant source of information typically employed for diverse applications such as tracking, anomaly detection, and action and event recognition. We present statistical...
Show moreIn this dissertation, we address the problem of discovery and representation of motion patterns in a variety of scenarios, commonly encountered in vision applications. The overarching goal is to devise a generic representation, that captures any kind of object motion observable in video sequences. Such motion is a significant source of information typically employed for diverse applications such as tracking, anomaly detection, and action and event recognition. We present statistical frameworks for representation of motion characteristics of objects, learned from tracks or optical flow, for static as well as moving cameras, and propose algorithms for their application to a variety of problems. The proposed motion pattern models and learning methods are general enough to be employed in a variety of problems as we demonstrate experimentally. We first propose a novel method to model and learn the scene activity, observed by a static camera. The motion patterns of objects in the scene are modeled in the form of a multivariate non-parametric probability density function of spatiotemporal variables (object locations and transition times between them). Kernel Density Estimation (KDE) is used to learn this model in a completely unsupervised fashion. Learning is accomplished by observing the trajectories of objects by a static camera over extended periods of time. The model encodes the probabilistic nature of the behavior of moving objects in the scene and is useful for activity analysis applications, such as persistent tracking and anomalous motion detection. In addition, the model also captures salient scene features, such as, the areas of occlusion and most likely paths. Once the model is learned, we use a unified Markov Chain Monte-Carlo (MCMC) based framework for generating the most likely paths in the scene, improving foreground detection, persistent labelling of objects during tracking and deciding whether a given trajectory represents an anomaly to the observed motion patterns. Experiments with real world videos are reported which validate the proposed approach. The representation and estimation framework proposed above, however, has a few limitations. This algorithm proposes to use a single global statistical distribution to represent all kinds of motion observed in a particular scene. It therefore, does not find a separation between multiple semantically distinct motion patterns in the scene. Instead, the learned model is a joint distribution over all possible patterns followed by objects. To overcome this limitation, we then propose a superior method for the discovery and statistical representation of motion patterns in a scene. The advantages of this approach over the first one are two-fold: first, this model is applicable to scenes of dense crowded motion where tracking may not be feasible, and second, it distinguishes between motion patterns that are distinct at a semantic level of abstraction. We propose a mixture model representation of salient patterns of optical flow, and present an algorithm for learning these patterns from dense optical flow in a hierarchical, unsupervised fashion. Using low level cues of noisy optical flow, K-means is employed to initialize a Gaussian mixture model for temporally segmented clips of video. The components of this mixture are then filtered and instances of motion patterns are computed using a simple motion model, by linking components across space and time. Motion patterns are then initialized and membership of instances in different motion patterns is established by using KL divergence between mixture distributions of pattern instances. Finally, a pixel level representation of motion patterns is proposed by deriving conditional expectation of optical flow. Results of extensive experiments are presented for multiple surveillance sequences containing numerous patterns involving both pedestrian and vehicular traffic. The proposed method exploits optical flow as the low level feature and performs a hierarchical clustering to obtain motion patterns; and we observe that the use of optical flow is also an integral part of a variety of other vision applications, for example, as features based representation of human actions. We, therefore, propose a new representation for articulated human actions using the motion patterns. The representation is based on hierarchical clustering of observed optical flow in four dimensional, spatial and motion flow space. The automatically discovered motion patterns, are the primitive actions, representative of flow at salient regions on the human body, much like trajectories of body joints, which are notoriously difficult to obtain automatically. The proposed method works in a completely unsupervised fashion, and in sharp contrast to state of the art representations like bag of video words, provides a truly semantically meaningful representation. Each primitive action depicts the most atomic sub-action, like left arm moving upwards, or right leg moving downward and leftward, and is represented by a mixture of four dimensional Gaussian distributions. A sequence of primitive actions are discovered in the test video, and labelled by computing the KL divergence between mixtures. The entire video sequence containing the human action, is thus reduced to a simple string, which is matched against similar strings of training videos to classify the action. The string matching is performed by global alignment, using the well-known Needleman-Wunsch algorithm. Experiments reported on multiple human actions data sets, confirm the validity, simplicity, and semantically meaningful nature of the proposed representation. Results obtained are encouraging and comparable to the state of the art.
Show less - Date Issued
- 2011
- Identifier
- CFE0003646, ucf:48836
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0003646