Current Search: Tappen, Marshall (x)
View All Items
- Title
- MODELING PEDESTRIAN BEHAVIOR IN VIDEO.
- Creator
-
Scovanner, Paul, Tappen, Marshall, University of Central Florida
- Abstract / Description
-
The purpose of this dissertation is to address the problem of predicting pedestrian movement and behavior in and among crowds. Specifically, we will focus on an agent based approach where pedestrians are treated individually and parameters for an energy model are trained by real world video data. These learned pedestrian models are useful in applications such as tracking, simulation, and artificial intelligence. The applications of this method are explored and experimental results show that...
Show moreThe purpose of this dissertation is to address the problem of predicting pedestrian movement and behavior in and among crowds. Specifically, we will focus on an agent based approach where pedestrians are treated individually and parameters for an energy model are trained by real world video data. These learned pedestrian models are useful in applications such as tracking, simulation, and artificial intelligence. The applications of this method are explored and experimental results show that our trained pedestrian motion model is beneficial for predicting unseen or lost tracks as well as guiding appearance based tracking algorithms. The method we have developed for training such a pedestrian model operates by optimizing a set of weights governing an aggregate energy function in order to minimize a loss function computed between a model's prediction and annotated ground-truth pedestrian tracks. The formulation of the underlying energy function is such that using tight convex upper bounds, we are able to efficiently approximate the derivative of the loss function with respect to the parameters of the model. Once this is accomplished, the model parameters are updated using straightforward gradient descent techniques in order to achieve an optimal solution. This formulation also lends itself towards the development of a multiple behavior model. The multiple pedestrian behavior styles, informally referred to as "stereotypes", are common in real data. In our model we show that it is possible, due to the unique ability to compute the derivative of the loss function, to build a new model which utilizes a soft-minimization of single behavior models. This allows unsupervised training of multiple different behavior models in parallel. This novel extension makes our method unique among other methods in the attempt to accurately describe human pedestrian behavior for the myriad of applications that exist. The ability to describe multiple behaviors shows significant improvements in the task of pedestrian motion prediction.
Show less - Date Issued
- 2011
- Identifier
- CFE0004043, ucf:49146
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004043
- Title
- FEATURE PRUNING FOR ACTION RECOGNITION IN COMPLEX ENVIRONMENT.
- Creator
-
Nagaraja, Adarsh, Tappen, Marshall, University of Central Florida
- Abstract / Description
-
A significant number of action recognition research efforts use spatio-temporal interest point detectors for feature extraction. Although the extracted features provide useful information for recognizing actions, a significant number of them contain irrelevant motion and background clutter. In many cases, the extracted features are included as is in the classification pipeline, and sophisticated noise removal techniques are subsequently used to alleviate their effect on classification. We...
Show moreA significant number of action recognition research efforts use spatio-temporal interest point detectors for feature extraction. Although the extracted features provide useful information for recognizing actions, a significant number of them contain irrelevant motion and background clutter. In many cases, the extracted features are included as is in the classification pipeline, and sophisticated noise removal techniques are subsequently used to alleviate their effect on classification. We introduce a new action database, created from the Weizmann database, that reveals a significant weakness in systems based on popular cuboid descriptors. Experiments show that introducing complex backgrounds, stationary or dynamic, into the video causes a significant degradation in recognition performance. Moreover, this degradation cannot be fixed by fine-tuning the system or selecting better interest points. Instead, we show that the problem lies at the descriptor level and must be addressed by modifying descriptors.
Show less - Date Issued
- 2011
- Identifier
- CFE0003882, ucf:48721
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0003882
- Title
- Algorithms for Rendering Optimization.
- Creator
-
Johnson, Jared, Hughes, Charles, Tappen, Marshall, Foroosh, Hassan, Shirley, Peter, University of Central Florida
- Abstract / Description
-
This dissertation explores algorithms for rendering optimization realizable within a modern, complex rendering engine. The first part contains optimized rendering algorithms for ray tracing. Ray tracing algorithms typically provide properties of simplicity and robustness that are highly desirable in computer graphics. We offer several novel contributions to the problem of interactive ray tracing of complex lighting environments. We focus on the problem of maintaining interactivity as both...
Show moreThis dissertation explores algorithms for rendering optimization realizable within a modern, complex rendering engine. The first part contains optimized rendering algorithms for ray tracing. Ray tracing algorithms typically provide properties of simplicity and robustness that are highly desirable in computer graphics. We offer several novel contributions to the problem of interactive ray tracing of complex lighting environments. We focus on the problem of maintaining interactivity as both geometric and lighting complexity grows without effecting the simplicity or robustness of ray tracing. First, we present a new algorithm called occlusion caching for accelerating the calculation of direct lighting from many light sources. We cache light visibility information sparsely across a scene. When rendering direct lighting for all pixels in a frame, we combine cached lighting information to determine whether or not shadow rays are needed. Since light visibility and scene location are highly correlated, our approach precludes the need for most shadow rays. Second, we present improvements to the irradiance caching algorithm. Here we demonstrate a new elliptical cache point spacing heuristic that reduces the number of cache points required by taking into account the direction of irradiance gradients. We also accelerate irradiance caching by efficiently and intuitively coupling it with occlusion caching.In the second part of this dissertation, we present optimizations to rendering algorithms for participating media. Specifically, we explore the implementation and use of photon beams as an efficient, intuitive artistic primitive. We detail our implementation of the photon beams algorithm into PhotoRealistic RenderMan (PRMan). We show how our implementation maintains the benefits of the industry standard Reyes rendering pipeline, with proper motion blur and depth of field. We detail an automatic photon beam generation algorithm, utilizing PRMan shadow maps. We accelerate the rendering of camera-facing photon beams by utilizing Gaussian quadrature for path integrals in place of ray marching. Our optimized implementation allows for incredible versatility and intuitiveness in artistic control of volumetric lighting effects. Finally, we demonstrate the usefulness of photon beams as artistic primitives by detailing their use in a feature-length animated film.
Show less - Date Issued
- 2012
- Identifier
- CFE0004557, ucf:49231
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004557
- Title
- A Study of Localization and Latency Reduction for Action Recognition.
- Creator
-
Masood, Syed, Tappen, Marshall, Foroosh, Hassan, Stanley, Kenneth, Sukthankar, Rahul, University of Central Florida
- Abstract / Description
-
The success of recognizing periodic actions in single-person-simple-background datasets, such as Weizmann and KTH, has created a need for more complex datasets to push the performance of action recognition systems. In this work, we create a new synthetic action dataset and use it to highlight weaknesses in current recognition systems. Experiments show that introducing background complexity to action video sequences causes a significant degradation in recognition performance. Moreover, this...
Show moreThe success of recognizing periodic actions in single-person-simple-background datasets, such as Weizmann and KTH, has created a need for more complex datasets to push the performance of action recognition systems. In this work, we create a new synthetic action dataset and use it to highlight weaknesses in current recognition systems. Experiments show that introducing background complexity to action video sequences causes a significant degradation in recognition performance. Moreover, this degradation cannot be fixed by fine-tuning system parameters or by selecting better feature points. Instead, we show that the problem lies in the spatio-temporal cuboid volume extracted from the interest point locations. Having identified the problem, we show how improved results can be achieved by simple modifications to the cuboids.For the above method however, one requires near-perfect localization of the action within a video sequence. To achieve this objective, we present a two stage weakly supervised probabilistic model for simultaneous localization and recognition of actions in videos. Different from previous approaches, our method is novel in that it (1) eliminates the need for manual annotations for the training procedure and (2) does not require any human detection or tracking in the classification stage. The first stage of our framework is a probabilistic action localization model which extracts the most promising sub-windows in a video sequence where an action can take place. We use a non-linear classifier in the second stage of our framework for the final classification task. We show the effectiveness of our proposed model on two well known real-world datasets: UCF Sports and UCF11 datasets.Another application of the weakly supervised probablistic model proposed above is in the gaming environment. An important aspect in designing interactive, action-based interfaces is reliably recognizing actions with minimal latency. High latency causes the system's feedback to lag behind and thus significantly degrade the interactivity of the user experience. With slight modification to the weakly supervised probablistic model we proposed for action localization, we show how it can be used for reducing latency when recognizing actions in Human Computer Interaction (HCI) environments. This latency-aware learning formulation trains a logistic regression-based classifier that automatically determines distinctive canonical poses from the data and uses these to robustly recognize actions in the presence of ambiguous poses. We introduce a novel (publicly released) dataset for the purpose of our experiments. Comparisons of our method against both a Bag of Words and a Conditional Random Field (CRF) classifier show improved recognition performance for both pre-segmented and online classification tasks.
Show less - Date Issued
- 2012
- Identifier
- CFE0004575, ucf:49210
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004575
- Title
- Towards Real-time Mixed Reality Matting in Natural Scenes.
- Creator
-
Beato, Nicholas, Hughes, Charles, Foroosh, Hassan, Tappen, Marshall, Moshell, Jack, University of Central Florida
- Abstract / Description
-
In Mixed Reality scenarios, background replacement is a common way to immerse a user in a synthetic environment. Properly identifying the background pixels in an image or video is a difficult problem known as matting. In constant color matting, research identifies and replaces a background that is a single color, known as the chroma key color. Unfortunately, the algorithms force a controlled physical environment and favor constant, uniform lighting. More generic approaches, such as natural...
Show moreIn Mixed Reality scenarios, background replacement is a common way to immerse a user in a synthetic environment. Properly identifying the background pixels in an image or video is a difficult problem known as matting. In constant color matting, research identifies and replaces a background that is a single color, known as the chroma key color. Unfortunately, the algorithms force a controlled physical environment and favor constant, uniform lighting. More generic approaches, such as natural image matting, have made progress finding alpha matte solutions in environments with naturally occurring backgrounds. However, even for the quicker algorithms, the generation of trimaps, indicating regions of known foreground and background pixels, normally requires human interaction or offline computation. This research addresses ways to automatically solve an alpha matte for an image in real-time, and by extension video, using a consumer level GPU. It do so even in the context of noisy environments that result in less reliable constraints than found in controlled settings. To attack these challenges, we are particularly interested in automatically generating trimaps from depth buffers for dynamic scenes so that algorithms requiring more dense constraints may be used. We then explore a sub-image based approach to parallelize an existing hierarchical approach on high resolution imagery by taking advantage of local information. We show that locality can be exploited to significantly reduce the memory and compute requirements of previously necessary when computing alpha mattes of high resolution images. We achieve this using a parallelizable scheme that is both independent of the matting algorithm and image features. Combined, these research topics provide a basis for Mixed Reality scenarios using real-time natural image matting on high definition video sources.
Show less - Date Issued
- 2012
- Identifier
- CFE0004515, ucf:49284
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004515
- Title
- STUDY OF HUMAN ACTIVITY IN VIDEO DATA WITH AN EMPHASIS ON VIEW-INVARIANCE.
- Creator
-
Ashraf, Nazim, Foroosh, Hassan, Hughes, Charles, Tappen, Marshall, Moshell, Jack, University of Central Florida
- Abstract / Description
-
The perception and understanding of human motion and action is an important area of research in computer vision that plays a crucial role in various applications such as surveillance, HCI, ergonomics, etc. In this thesis, we focus on the recognition of actions in the case of varying viewpoints and different and unknown camera intrinsic parameters. The challenges to be addressed include perspective distortions, differences in viewpoints, anthropometric variations,and the large degrees of...
Show moreThe perception and understanding of human motion and action is an important area of research in computer vision that plays a crucial role in various applications such as surveillance, HCI, ergonomics, etc. In this thesis, we focus on the recognition of actions in the case of varying viewpoints and different and unknown camera intrinsic parameters. The challenges to be addressed include perspective distortions, differences in viewpoints, anthropometric variations,and the large degrees of freedom of articulated bodies. In addition, we are interested in methods that require little or no training. The current solutions to action recognition usually assume that there is a huge dataset of actions available so that a classifier can be trained. However, thismeans that in order to define a new action, the user has to record a number of videos fromdifferent viewpoints with varying camera intrinsic parameters and then retrain the classifier, which is not very practical from a development point of view. We propose algorithms that overcome these challenges and require just a few instances of the action from any viewpoint with any intrinsic camera parameters. Our first algorithm is based on the rank constraint on the family of planar homographies associated with triplets of body points. We represent action as a sequence of poses, and decompose the pose into triplets. Therefore, the pose transition is brokendown into a set of movement of body point planes. In this way, we transform the non-rigid motion of the body points into a rigid motion of body point planes. We use the fact that the family of homographies associated with two identical poses would have rank 4 to gauge similarity of the pose between two subjects, observed by different perspective cameras and from different viewpoints. This method requires only one instance of the action. We then show that it is possible to extend the concept of triplets to line segments. In particular, we establish that if we look at the movement of line segments instead of triplets, we have more redundancy in data thus leading to better results. We demonstrate this concept on (")fundamental ratios.(") We decompose a human body pose into line segments instead of triplets and look at set of movement of line segments. This method needs only three instances of the action. If a larger dataset is available, we can also apply weighting on line segments for better accuracy. The last method is based onthe concept of (")Projective Depth("). Given a plane, we can find the relative depth of a point relative to the given plane. We propose three different ways of using (")projective depth:(") (i) Triplets - the three points of a triplet along with the epipole defines the plane and the movement of points relative to these body planes can be used to recognize actions; (ii) Ground plane - if we are able to extract the ground plane, we can find the (")projective depth(") of the body points withrespect to it. Therefore, the problem of action recognition would translate to curve matching; and (iii) Mirror person (-) We can use the mirror view of the person to extract mirror symmetric planes. This method also needs only one instance of the action. Extensive experiments are reported on testing view invariance, robustness to noisy localization and occlusions of bodypoints, and action recognition. The experimental results are very promising and demonstrate the efficiency of our proposed invariants.
Show less - Date Issued
- 2012
- Identifier
- CFE0004352, ucf:49449
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004352
- Title
- Active Learning with Unreliable Annotations.
- Creator
-
Zhao, Liyue, Sukthankar, Gita, Tappen, Marshall, Georgiopoulos, Michael, Sukthankar, Rahul, University of Central Florida
- Abstract / Description
-
With the proliferation of social media, gathering data has became cheaper and easier than before. However, this data can not be used for supervised machine learning without labels. Asking experts to annotate sufficient data for training is both expensive and time-consuming. Current techniques provide two solutions to reducing the cost and providing sufficient labels: crowdsourcing and active learning. Crowdsourcing, which outsources tasks to a distributed group of people, can be used to...
Show moreWith the proliferation of social media, gathering data has became cheaper and easier than before. However, this data can not be used for supervised machine learning without labels. Asking experts to annotate sufficient data for training is both expensive and time-consuming. Current techniques provide two solutions to reducing the cost and providing sufficient labels: crowdsourcing and active learning. Crowdsourcing, which outsources tasks to a distributed group of people, can be used to provide a large quantity of labels but controlling the quality of labels is hard. Active learning, which requires experts to annotate a subset of the most informative or uncertain data, is very sensitive to the annotation errors. Though these two techniques can be used independently of one another, by using them in combination they can complement each other's weakness. In this thesis, I investigate the development of active learning Support Vector Machines (SVMs) and expand this model to sequential data. Then I discuss the weakness of combining active learning and crowdsourcing, since the active learning is very sensitive to low quality annotations which are unavoidable for labels collected from crowdsourcing. In this thesis, I propose three possible strategies, incremental relabeling, importance-weighted label prediction and active Bayesian Networks. The incremental relabeling strategy requires workers to devote more annotations to uncertain samples, compared to majority voting which allocates different samples the same number of labels. Importance-weighted label prediction employs an ensemble of classifiers to guide the label requests from a pool of unlabeled training data. An active learning version of Bayesian Networks is used to model the difficulty of samples and the expertise of workers simultaneously to evaluate the relative weight of workers' labels during the active learning process. All three strategies apply different techniques with the same expectation -- identifying the optimal solution for applying an active learning model with mixed label quality to crowdsourced data. However, the active Bayesian Networks model, which is the core element of this thesis, provides additional benefits by estimating the expertise of workers during the training phase. As an example application, I also demonstrate the utility of crowdsourcing for human activity recognition problems.
Show less - Date Issued
- 2013
- Identifier
- CFE0004965, ucf:49579
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004965
- Title
- Dictionary Learning for Image Analysis.
- Creator
-
Khan, Muhammad Nazar, Tappen, Marshall, Foroosh, Hassan, Stanley, Kenneth, Li, Xin, University of Central Florida
- Abstract / Description
-
In this thesis, we investigate the use of dictionary learning for discriminative tasks on natural images. Our contributions can be summarized as follows:1) We introduce discriminative deviation based learning to achieve principled handling of the reconstruction-discrimination tradeoff that is inherent to discriminative dictionary learning.2) Since natural images obey a strong smoothness prior, we show how spatial smoothness constraints can be incorporated into the learning formulation by...
Show moreIn this thesis, we investigate the use of dictionary learning for discriminative tasks on natural images. Our contributions can be summarized as follows:1) We introduce discriminative deviation based learning to achieve principled handling of the reconstruction-discrimination tradeoff that is inherent to discriminative dictionary learning.2) Since natural images obey a strong smoothness prior, we show how spatial smoothness constraints can be incorporated into the learning formulation by embedding dictionary learning into Conditional Random Field (CRF) learning. We demonstrate that such smoothness constraints can lead to state-of-the-art performance for pixel-classification tasks.3) Finally, we lay down the foundations of super-latent learning. By treating sparse codes on a CRF as latent variables, dictionary learning can also be performed via the Latent (Structural) SVM formulation for jointly learning a classifier over the sparse codes. The dictionary is treated as a super-latent variable that generates the latent variables.
Show less - Date Issued
- 2013
- Identifier
- CFE0004701, ucf:49844
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004701
- Title
- Using Freebase, an Automatically Generated Dictionary, and a Classifier to Identify a Person's Profession in Tweets.
- Creator
-
Hall, Abraham, Gomez, Fernando, Dechev, Damian, Tappen, Marshall, University of Central Florida
- Abstract / Description
-
Algorithms for classifying pre-tagged person entities in tweets into one of eight profession categories are presented. A classifier using a semi-supervised learning algorithm that takes into consideration the local context surrounding the entity in the tweet, hash tag information, and topic signature scores is described. In addition to the classifier, this research investigates two dictionaries containing the professions of persons. These two dictionaries are used in their own classification...
Show moreAlgorithms for classifying pre-tagged person entities in tweets into one of eight profession categories are presented. A classifier using a semi-supervised learning algorithm that takes into consideration the local context surrounding the entity in the tweet, hash tag information, and topic signature scores is described. In addition to the classifier, this research investigates two dictionaries containing the professions of persons. These two dictionaries are used in their own classification algorithms which are independent of the classifier. The method for creating the first dictionary dynamically from the web and the algorithm that accesses this dictionary to classify a person into one of the eight profession categories are explained next. The second dictionary is freebase, an openly available online database that is maintained by its online community. The algorithm that uses freebase for classifying a person into one of the eight professions is described. The results also show that classifications made using the automated constructed dictionary, freebase, or the classifier are all moderately successful. The results also show that classifications made with the automated constructed person dictionary are slightly more accurate than classifications made using freebase. Various hybrid methods, combining the classifier and the two dictionaries are also explained. The results of those hybrid methods show significant improvement over any of the individual methods.
Show less - Date Issued
- 2013
- Identifier
- CFE0004858, ucf:49715
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004858
- Title
- Gradient based MRF learning for image restoration and segmentation.
- Creator
-
Samuel, Kegan, Tappen, Marshall, Da Vitoria Lobo, Niels, Foroosh, Hassan, Li, Xin, University of Central Florida
- Abstract / Description
-
The undirected graphical model or Markov Random Field (MRF) is one of the more popular models used in computer vision and is the type of model with which this work is concerned. Models based on these methods have proven to be particularly useful in low-level vision systems and have led to state-of-the-art results for MRF-based systems. The research presented will describe a new discriminative training algorithm and its implementation.The MRF model will be trained by optimizing its parameters...
Show moreThe undirected graphical model or Markov Random Field (MRF) is one of the more popular models used in computer vision and is the type of model with which this work is concerned. Models based on these methods have proven to be particularly useful in low-level vision systems and have led to state-of-the-art results for MRF-based systems. The research presented will describe a new discriminative training algorithm and its implementation.The MRF model will be trained by optimizing its parameters so that the minimum energy solution of the model is as similar as possible to the ground-truth. While previous work has relied on time-consuming iterative approximations or stochastic approximations, this work will demonstrate how implicit differentiation can be used to analytically differentiate the overall training loss with respect to the MRF parameters. This framework leads to an efficient, flexible learning algorithm that can be applied to a number of different models.The effectiveness of the proposed learning method will then be demonstrated by learning the parameters of two related models applied to the task of denoising images. The experimental results will demonstrate that the proposed learning algorithm is comparable and, at times, better than previous training methods applied to the same tasks.A new segmentation model will also be introduced and trained using the proposed learning method. The proposed segmentation model is based on an energy minimization framework that is novel in how it incorporates priors on the size of the segments in a way that is straightforward to implement. While other methods, such as normalized cuts, tend to produce segmentations of similar sizes, this method is able to overcome that problem and produce more realistic segmentations.
Show less - Date Issued
- 2012
- Identifier
- CFE0004595, ucf:49207
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004595
- Title
- Analysis of Behaviors in Crowd Videos.
- Creator
-
Mehran, Ramin, Shah, Mubarak, Sukthankar, Gita, Behal, Aman, Tappen, Marshall, Moore, Brian, University of Central Florida
- Abstract / Description
-
In this dissertation, we address the problem of discovery and representation of group activity of humans and objects in a variety of scenarios, commonly encountered in vision applications. The overarching goal is to devise a discriminative representation of human motion in social settings, which captures a wide variety of human activities observable in video sequences. Such motion emerges from the collective behavior of individuals and their interactions and is a significant source of...
Show moreIn this dissertation, we address the problem of discovery and representation of group activity of humans and objects in a variety of scenarios, commonly encountered in vision applications. The overarching goal is to devise a discriminative representation of human motion in social settings, which captures a wide variety of human activities observable in video sequences. Such motion emerges from the collective behavior of individuals and their interactions and is a significant source of information typically employed for applications such as event detection, behavior recognition, and activity recognition. We present new representations of human group motion for static cameras, and propose algorithms for their application to variety of problems.We first propose a method to model and learn the scene activity of a crowd using Social Force Model for the first time in the computer vision community. We present a method to densely estimate the interaction forces between people in a crowd, observed by a static camera. Latent Dirichlet Allocation (LDA) is used to learn the model of the normal activities over extended periods of time. Randomly selected spatio-temporal volumes of interaction forces are used to learn the model of normal behavior of the scene. The model encodes the latent topics of social interaction forces in the scene for normal behaviors. We classify a short video sequence of $n$ frames as normal or abnormal by using the learnt model. Once a sequence of frames is classified as an abnormal, the regions of anomalies in the abnormal frames are localized using the magnitude of interaction forces.The representation and estimation framework proposed above, however, has a few limitations. This algorithm proposes to use a global estimation of the interaction forces within the crowd. It, therefore, is incapable of identifying different groups of objects based on motion or behavior in the scene. Although the algorithm is capable of learning the normal behavior and detects the abnormality, but it is incapable of capturing the dynamics of different behaviors.To overcome these limitations, we then propose a method based on the Lagrangian framework for fluid dynamics, by introducing a streakline representation of flow. Streaklines are traced in a fluid flow by injecting color material, such as smoke or dye, which is transported with the flow and used for visualization. In the context of computer vision, streaklines may be used in a similar way to transport information about a scene, and they are obtained by repeatedly initializing a fixed grid of particles at each frame, then moving both current and past particles using optical flow. Streaklines are the locus of points that connect particles which originated from the same initial position.This approach is advantageous over the previous representations in two aspects: first, its rich representation captures the dynamics of the crowd and changes in space and time in the scene where the optical flow representation is not enough, and second, this model is capable of discovering groups of similar behavior within a crowd scene by performing motion segmentation. We propose a method to distinguish different group behaviors such as divergent/convergent motion and lanes using this framework. Finally, we introduce flow potentials as a discriminative feature to recognize crowd behaviors in a scene. Results of extensive experiments are presented for multiple real life crowd sequences involving pedestrian and vehicular traffic.The proposed method exploits optical flow as the low level feature and performs integration and clustering to obtain coherent group motion patterns. However, we observe that in crowd video sequences, as well as a variety of other vision applications, the co-occurrence and inter-relation of motion patterns are the main characteristics of group behaviors. In other words, the group behavior of objects is a mixture of individual actions or behaviors in specific geometrical layout and temporal order.We, therefore, propose a new representation for group behaviors of humans using the inter-relation of motion patterns in a scene. The representation is based on bag of visual phrases of spatio-temporal visual words. We present a method to match the high-order spatial layout of visual words that preserve the geometry of the visual words under similarity transformations. To perform the experiments we collected a dataset of group choreography performances from the YouTube website. The dataset currently contains four categories of group dances.
Show less - Date Issued
- 2011
- Identifier
- CFE0004482, ucf:49317
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004482
- Title
- Human Action Localization and Recognition in Unconstrained Videos.
- Creator
-
Boyraz, Hakan, Tappen, Marshall, Foroosh, Hassan, Lin, Mingjie, Zhang, Shaojie, Sukthankar, Rahul, University of Central Florida
- Abstract / Description
-
As imaging systems become ubiquitous, the ability to recognize human actions is becoming increasingly important. Just as in the object detection and recognition literature, action recognition can be roughly divided into classification tasks, where the goal is to classify a video according to the action depicted in the video, and detection tasks, where the goal is to detect and localize a human performing a particular action. A growing literature is demonstrating the benefits of localizing...
Show moreAs imaging systems become ubiquitous, the ability to recognize human actions is becoming increasingly important. Just as in the object detection and recognition literature, action recognition can be roughly divided into classification tasks, where the goal is to classify a video according to the action depicted in the video, and detection tasks, where the goal is to detect and localize a human performing a particular action. A growing literature is demonstrating the benefits of localizing discriminative sub-regions of images and videos when performing recognition tasks. In this thesis, we address the action detection and recognition problems. Action detection in video is a particularly difficult problem because actions must not only be recognized correctly, but must also be localized in the 3D spatio-temporal volume. We introduce a technique that transforms the 3D localization problem into a series of 2D detection tasks. This is accomplished by dividing the video into overlapping segments, then representing each segment with a 2D video projection. The advantage of the 2D projection is that it makes it convenient to apply the best techniques from object detection to the action detection problem. We also introduce a novel, straightforward method for searching the 2D projections to localize actions, termed Two-Point Subwindow Search (TPSS). Finally, we show how to connect the local detections in time using a chaining algorithm to identify the entire extent of the action. Our experiments show that video projection outperforms the latest results on action detection in a direct comparison.Second, we present a probabilistic model learning to identify discriminative regions in videos from weakly-supervised data where each video clip is only assigned a label describing what action is present in the frame or clip. While our first system requires every action to be manually outlined in every frame of the video, this second system only requires that the video be given a single high-level tag. From this data, the system is able to identify discriminative regions that correspond well to the regions containing the actual actions. Our experiments on both the MSR Action Dataset II and UCF Sports Dataset show that the localizations produced by this weakly supervised system are comparable in quality to localizations produced by systems that require each frame to be manually annotated. This system is able to detect actions in both 1) non-temporally segmented action videos and 2) recognition tasks where a single label is assigned to the clip. We also demonstrate the action recognition performance of our method on two complex datasets, i.e. HMDB and UCF101. Third, we extend our weakly-supervised framework by replacing the recognition stage with a two-stage neural network and apply dropout for preventing overfitting of the parameters on the training data. Dropout technique has been recently introduced to prevent overfitting of the parameters in deep neural networks and it has been applied successfully to object recognition problem. To our knowledge, this is the first system using dropout for action recognition problem. We demonstrate that using dropout improves the action recognition accuracies on HMDB and UCF101 datasets.
Show less - Date Issued
- 2013
- Identifier
- CFE0004977, ucf:49562
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004977
- Title
- GPU Ray Traced Rendering And Image Fusion Based Visualization Of Urban Terrain For Enhanced Situation Awareness.
- Creator
-
Sik, Lingling, Pattanaik, Sumanta, Kincaid, John, Proctor, Michael, Tappen, Marshall, Graniela Ortiz, Benito, University of Central Florida
- Abstract / Description
-
Urban activities involving planning, preparing for and responding to time critical situations often demands sound situational awareness of overall settings. Decision makers, who are tasked to respond effectively to emergencies, must be equipped with information on the details of what is happening, and must stay informed with updates as the event unfolds and remain attentive to the extent of impact the dynamics of the surrounding settings might have. Recent increases in the volumes of geo...
Show moreUrban activities involving planning, preparing for and responding to time critical situations often demands sound situational awareness of overall settings. Decision makers, who are tasked to respond effectively to emergencies, must be equipped with information on the details of what is happening, and must stay informed with updates as the event unfolds and remain attentive to the extent of impact the dynamics of the surrounding settings might have. Recent increases in the volumes of geo-spatial data such as satellite imageries, elevation maps, street-level photographs and real-time imageries from remote sensory devices affect the way decision makers make assessments in time-critical situations. When terrain related spatial information are presented accurately, timely, and are augmented with terrain analysis such as viewshed computations, enhanced situational understanding could be formed. Painting such enhanced situational pictures, however, demands efficient techniques to process and present volumes of geo-spatial data. Modern Graphics Processing Units (GPUs) have opened up a wide field of applications far beyond processing millions of polygons. This dissertation presents approaches that harness graphics rendering techniques and GPU programmability to visualize urban terrain with accuracy, viewshed analysis and real-time imageries. The GPU ray tracing and image fusion visualization techniques presented herein have the potential to aid in achieving enhanced urban situational awareness and understanding.Current state of the art polygon based terrain representations often use coarse representations for terrain features of less importance to improve rendering rate. This results in reduced geometrical accuracy for selective terrain features that are considered less critical to the visualization or simulation needs. Alternatively, to render highly accurate urban terrain, considerable computational effort is needed. A compromise between achieving real-time rendering rate and accurate terrain representations would have to be made. Likewise, computational tasks involved in terrain-related calculations such as viewshed analysis are highly computational intensive and are traditionally performed at a non-interactive rate. The first contribution of the research involves using GPU ray tracing, a rendering approach, conventionally not employed in the simulation community in favor of rasterization, to achieve accurate visualization and improved understanding of urban terrain. The efficiency of using GPU ray tracing is demonstrated in two areas, namely, in depicting complex, large scale terrain and in visualizing viewshed terrain effects at interactive rate. Another contribution entails designing a novel approach to create an efficient and real-time mapping system. The solution achieves updating and visualizing terrain textures using 2D geo-referenced imageries for enhanced situational awareness. Fusing myriad of multi-view 2D inputs spatially for a complex 3D urban scene typically involves a large number of computationally demanding tasks such as image registrations, mosaickings and texture mapping. Current state of the art solutions essentially belongs to two groups. Each strives to either provide near real-time situational pictures in 2D or off-line complex 3D reconstructions for subsequent usages. The solution proposed in this research relies on using prior constructed synthetic terrains as backdrops to be updated with real-time geo-referenced images. The solution achieves speed in fusing information in 3D. Mapping geo-referenced images spatially in 3D puts them into context. It aids in conveying spatial relationships among the data. Prototypes to evaluate the effectiveness of the aforementioned techniques are also implemented. The benefits of augmenting situational displays with viewshed analysis and real-time geo-referenced images in relation to enhancing the user's situational awareness are also evaluated. Preliminary results from user evaluation studies demonstrate the usefulness of the techniques in enhancing operators' performances, in relation to situational awareness and understanding.
Show less - Date Issued
- 2013
- Identifier
- CFE0005115, ucf:50757
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0005115
- Title
- Learning Collective Behavior in Multi-relational Networks.
- Creator
-
Wang, Xi, Sukthankar, Gita, Tappen, Marshall, Georgiopoulos, Michael, Hu, Haiyan, Anagnostopoulos, Georgios, University of Central Florida
- Abstract / Description
-
With the rapid expansion of the Internet and WWW, the problem of analyzing social media data has received an increasing amount of attention in the past decade. The boom in social media platforms offers many possibilities to study human collective behavior and interactions on an unprecedented scale. In the past, much work has been done on the problem of learning from networked data with homogeneous topologies, where instances are explicitly or implicitly inter-connected by a single type of...
Show moreWith the rapid expansion of the Internet and WWW, the problem of analyzing social media data has received an increasing amount of attention in the past decade. The boom in social media platforms offers many possibilities to study human collective behavior and interactions on an unprecedented scale. In the past, much work has been done on the problem of learning from networked data with homogeneous topologies, where instances are explicitly or implicitly inter-connected by a single type of relationship. In contrast to traditional content-only classification methods, relational learning succeeds in improving classification performance by leveraging the correlation of the labels between linked instances. However, networked data extracted from social media, web pages, and bibliographic databases can contain entities of multiple classes and linked by various causal reasons, hence treating all links in a homogeneous way can limit the performance of relational classifiers. Learning the collective behavior and interactions in heterogeneous networks becomes much more complex.The contribution of this dissertation include 1) two classification frameworks for identifying human collective behavior in multi-relational social networks; 2) unsupervised and supervised learning models for relationship prediction in multi-relational collaborative networks. Our methods improve the performance of homogeneous predictive models by differentiating heterogeneous relations and capturing the prominent interaction patterns underlying the network structure. The work has been evaluated in various real-world social networks. We believe that this study will be useful for analyzing human collective behavior and interactions specifically in the scenario when the heterogeneous relationships in the network arise from various causal reasons.
Show less - Date Issued
- 2014
- Identifier
- CFE0005439, ucf:50376
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0005439
- Title
- On Kernel-base Multi-Task Learning.
- Creator
-
Li, Cong, Georgiopoulos, Michael, Anagnostopoulos, Georgios, Tappen, Marshall, Hu, Haiyan, Ni, Liqiang, University of Central Florida
- Abstract / Description
-
Multi-Task Learning (MTL) has been an active research area in machine learning for two decades. By training multiple relevant tasks simultaneously with information shared across tasks, it is possible to improve the generalization performance of each task, compared to training each individual task independently. During the past decade, most MTL research has been based on the Regularization-Loss framework due to its flexibility in specifying various types of information sharing strategies, the...
Show moreMulti-Task Learning (MTL) has been an active research area in machine learning for two decades. By training multiple relevant tasks simultaneously with information shared across tasks, it is possible to improve the generalization performance of each task, compared to training each individual task independently. During the past decade, most MTL research has been based on the Regularization-Loss framework due to its flexibility in specifying various types of information sharing strategies, the opportunity it offers to yield a kernel-based methods and its capability in promoting sparse feature representations.However, certain limitations exist in both theoretical and practical aspects of Regularization-Loss-based MTL. Theoretically, previous research on generalization bounds in connection to MTL Hypothesis Space (HS)s, where data of all tasks are pre-processed by a (partially) common operator, has been limited in two aspects: First, all previous works assumed linearity of the operator, therefore completely excluding kernel-based MTL HSs, for which the operator is potentially non-linear. Secondly, all previous works, rather unnecessarily, assumed that all the task weights to be constrained within norm-balls, whose radii are equal. The requirement of equal radii leads to significant inflexibility of the relevant HSs, which may cause the generalization performance of the corresponding MTL models to deteriorate. Practically, various algorithms have been developed for kernel-based MTL models, due to different characteristics of the formulations. Most of these algorithms are a burden to develop and end up being quite sophisticated, so that practitioners may face a hard task in interpreting and implementing them, especially when multiple models are involved. This is even more so, when Multi-Task Multiple Kernel Learning (MT-MKL) models are considered. This research largely resolves the above limitations. Theoretically, a pair of new kernel-based HSs are proposed: one for single-kernel MTL, and another one for MT-MKL. Unlike previous works, we allow each task weight to be constrained within a norm-ball, whose radius is learned during training. By deriving and analyzing the generalization bounds of these two HSs, we show that, indeed, such a flexibility leads to much tighter generalization bounds, which often results to significantly better generalization performance. Based on this observation, a pair of new models is developed, one for each case: single-kernel MTL, and another one for MT-MKL. From a practical perspective, we propose a general MT-MKL framework that covers most of the prominent MT-MKL approaches, including our new MT-MKL formulation. Then, a general purpose algorithm is developed to solve the framework, which can also be employed for training all other models subsumed by this framework. A series of experiments is conducted to assess the merits of the proposed mode when trained by the new algorithm. Certain properties of our HSs and formulations are demonstrated, and the advantage of our model in terms of classification accuracy is shown via these experiments.
Show less - Date Issued
- 2014
- Identifier
- CFE0005517, ucf:50321
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0005517
- Title
- Exploring sparsity, self-similarity, and low rank approximation in action recognition, motion retrieval, and action spotting.
- Creator
-
Sun, Chuan, Foroosh, Hassan, Hughes, Charles, Tappen, Marshall, Sukthankar, Rahul, Moshell, Jack, University of Central Florida
- Abstract / Description
-
This thesis consists of $4$ major parts. In the first part (Chapters $1-2$), we introduce the overview, motivation, and contribution of our works, and extensively survey the current literature for $6$ related topics. In the second part (Chapters $3-7$), we explore the concept of ``Self-Similarity" in two challenging scenarios, namely, the Action Recognition and the Motion Retrieval. We build three-dimensional volume representations for both scenarios, and devise effective techniques that can...
Show moreThis thesis consists of $4$ major parts. In the first part (Chapters $1-2$), we introduce the overview, motivation, and contribution of our works, and extensively survey the current literature for $6$ related topics. In the second part (Chapters $3-7$), we explore the concept of ``Self-Similarity" in two challenging scenarios, namely, the Action Recognition and the Motion Retrieval. We build three-dimensional volume representations for both scenarios, and devise effective techniques that can produce compact representations encoding the internal dynamics of data. In the third part (Chapter $8$), we explore the challenging action spotting problem, and propose a feature-independent unsupervised framework that is effective in spotting action under various real situations, even under heavily perturbed conditions. The final part (Chapters $9$) is dedicated to conclusions and future works.For action recognition, we introduce a generic method that does not depend on one particular type of input feature vector. We make three main contributions: (i) We introduce the concept of Joint Self-Similarity Volume (Joint SSV) for modeling dynamical systems, and show that by using a new optimized rank-1 tensor approximation of Joint SSV one can obtain compact low-dimensional descriptors that very accurately preserve the dynamics of the original system, e.g. an action video sequence; (ii) The descriptor vectors derived from the optimized rank-1 approximation make it possible to recognize actions without explicitly aligning the action sequences of varying speed of execution or difference frame rates; (iii) The method is generic and can be applied using different low-level features such as silhouettes, histogram of oriented gradients (HOG), etc. Hence, it does not necessarily require explicit tracking of features in the space-time volume. Our experimental results on five public datasets demonstrate that our method produces very good results and outperforms many baseline methods.For action recognition for incomplete videos, we determine whether incomplete videos that are often discarded carry useful information for action recognition, and if so, how one can represent such mixed collection of video data (complete versus incomplete, and labeled versus unlabeled) in a unified manner. We propose a novel framework to handle incomplete videos in action classification, and make three main contributions: (i) We cast the action classification problem for a mixture of complete and incomplete data as a semi-supervised learning problem of labeled and unlabeled data. (ii) We introduce a two-step approach to convert the input mixed data into a uniform compact representation. (iii) Exhaustively scrutinizing $280$ configurations, we experimentally show on our two created benchmarks that, even when the videos are extremely sparse and incomplete, it is still possible to recover useful information from them, and classify unknown actions by a graph based semi-supervised learning framework.For motion retrieval, we present a framework that allows for a flexible and an efficient retrieval of motion capture data in huge databases. The method first converts an action sequence into a self-similarity matrix (SSM), which is based on the notion of self-similarity. This conversion of the motion sequences into compact and low-rank subspace representations greatly reduces the spatiotemporal dimensionality of the sequences. The SSMs are then used to construct order-3 tensors, and we propose a low-rank decomposition scheme that allows for converting the motion sequence volumes into compact lower dimensional representations, without losing the nonlinear dynamics of the motion manifold. Thus, unlike existing linear dimensionality reduction methods that distort the motion manifold and lose very critical and discriminative components, the proposed method performs well, even when inter-class differences are small or intra-class differences are large. In addition, the method allows for an efficient retrieval and does not require the time-alignment of the motion sequences. We evaluate the performance of our retrieval framework on the CMU mocap dataset under two experimental settings, both demonstrating very good retrieval rates.For action spotting, our framework does not depend on any specific feature (e.g. HOG/HOF, STIP, silhouette, bag-of-words, etc.), and requires no human localization, segmentation, or framewise tracking. This is achieved by treating the problem holistically as that of extracting the internal dynamics of video cuboids by modeling them in their natural form as multilinear tensors. To extract their internal dynamics, we devised a novel Two-Phase Decomposition (TP-Decomp) of a tensor that generates very compact and discriminative representations that are robust to even heavily perturbed data. Technically, a Rank-based Tensor Core Pyramid (Rank-TCP) descriptor is generated by combining multiple tensor cores under multiple ranks, allowing to represent video cuboids in a hierarchical tensor pyramid. The problem then reduces to a template matching problem, which is solved efficiently by using two boosting strategies: (i) to reduce the search space, we filter the dense trajectory cloud extracted from the target video; (ii) to boost the matching speed, we perform matching in an iterative coarse-to-fine manner. Experiments on 5 benchmarks show that our method outperforms current state-of-the-art under various challenging conditions. We also created a challenging dataset called Heavily Perturbed Video Arrays (HPVA) to validate the robustness of our framework under heavily perturbed situations.
Show less - Date Issued
- 2014
- Identifier
- CFE0005554, ucf:50290
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0005554
- Title
- Bridging the Gap Between Fun and Fitness: Instructional Techniques and Real-World Applications for Full-Body Dance Games.
- Creator
-
Charbonneau, Emiko, Laviola II, Joseph, Hughes, Charles, Tappen, Marshall, Angelopoulos, Theodore, Mueller, Florian, University of Central Florida
- Abstract / Description
-
Full-body controlled games offer the opportunity for not only entertainment, but education and exercise as well. Refined gameplay mechanics and content can boost intrinsic motivation and keep people playing over a long period of time, which is desirable for individuals who struggle with maintaining a regular exercise program. Within this gameplay genre, dance rhythm games have proven to be popular with game console owners. Yet, while other types of games utilize story mechanics that keep...
Show moreFull-body controlled games offer the opportunity for not only entertainment, but education and exercise as well. Refined gameplay mechanics and content can boost intrinsic motivation and keep people playing over a long period of time, which is desirable for individuals who struggle with maintaining a regular exercise program. Within this gameplay genre, dance rhythm games have proven to be popular with game console owners. Yet, while other types of games utilize story mechanics that keep players engaged for dozens of hours, motion-controlled dance games are just beginning to incorporate these elements. In addition, this control scheme is still young, only becoming commercially available in the last few years. Instructional displays and clear real-time feedback remain difficult challenges.This thesis investigates the potential for full-body dance games to be used as tools for entertainment, education, and fitness. We built several game prototypes to investigate visual, aural, and tactile methods for instruction and feedback. We also evaluated the fitness potential of the game Dance Central 2 both by itself and with extra game content which unlocked based on performance.Significant contributions include a framework for running a longitudinal video game study, results indicating high engagement with some fitness potential, and informed discussion of how dance games could make exertion a more enjoyable experience.
Show less - Date Issued
- 2013
- Identifier
- CFE0004829, ucf:49690
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004829