Current Search: computer vision (x)
View All Items
Pages
- Title
- Holistic Representations for Activities and Crowd Behaviors.
- Creator
-
Solmaz, Berkan, Shah, Mubarak, Da Vitoria Lobo, Niels, Jha, Sumit, Ilie, Marcel, Moore, Brian, University of Central Florida
- Abstract / Description
-
In this dissertation, we address the problem of analyzing the activities of people in a variety of scenarios, this is commonly encountered in vision applications. The overarching goal is to devise new representations for the activities, in settings where individuals or a number of people may take a part in specific activities. Different types of activities can be performed by either an individual at the fine level or by several people constituting a crowd at the coarse level. We take into...
Show moreIn this dissertation, we address the problem of analyzing the activities of people in a variety of scenarios, this is commonly encountered in vision applications. The overarching goal is to devise new representations for the activities, in settings where individuals or a number of people may take a part in specific activities. Different types of activities can be performed by either an individual at the fine level or by several people constituting a crowd at the coarse level. We take into account the domain specific information for modeling these activities. The summary of the proposed solutions is presented in the following.The holistic description of videos is appealing for visual detection and classification tasks for several reasons including capturing the spatial relations between the scene components, simplicity, and performance [1, 2, 3]. First, we present a holistic (global) frequency spectrum based descriptor for representing the atomic actions performed by individuals such as: bench pressing, diving, hand waving, boxing, playing guitar, mixing, jumping, horse riding, hula hooping etc. We model and learn these individual actions for classifying complex user uploaded videos. Our method bypasses the detection of interest points, the extraction of local video descriptors and the quantization of local descriptors into a code book; it represents each video sequence as a single feature vector. This holistic feature vector is computed by applying a bank of 3-D spatio-temporal filters on the frequency spectrum of a video sequence; hence it integrates the information about the motion and scene structure. We tested our approach on two of the most challenging datasets, UCF50 [4] and HMDB51 [5], and obtained promising results which demonstrates the robustness and the discriminative power of our holistic video descriptor for classifying videos of various realistic actions.In the above approach, a holistic feature vector of a video clip is acquired by dividing the video into spatio-temporal blocks then concatenating the features of the individual blocks together. However, such a holistic representation blindly incorporates all the video regions regardless of their contribution in classification. Next, we present an approach which improves the performance of the holistic descriptors for activity recognition. In our novel method, we improve the holistic descriptors by discovering the discriminative video blocks. We measure the discriminativity of a block by examining its response to a pre-learned support vector machine model. In particular, a block is considered discriminative if it responds positively for positive training samples, and negatively for negative training samples. We pose the problem of finding the optimal blocks as a problem of selecting a sparse set of blocks, which maximizes the total classifier discriminativity. Through a detailed set of experiments on benchmark datasets [6, 7, 8, 9, 5, 10], we show that our method discovers the useful regions in the videos and eliminates the ones which are confusing for classification, which results in significant performance improvement over the state-of-the-art.In contrast to the scenes where an individual performs a primitive action, there may be scenes with several people, where crowd behaviors may take place. For these types of scenes the traditional approaches for recognition will not work due to severe occlusion and computational requirements. The number of videos is limited and the scenes are complicated, hence learning these behaviors is not feasible. For this problem, we present a novel approach, based on the optical flow in a video sequence, for identifying five specific and common crowd behaviors in visual scenes. In the algorithm, the scene is overlaid by a grid of particles, initializing a dynamical system which is derived from the optical flow. Numerical integration of the optical flow provides particle trajectories that represent the motion in the scene. Linearization of the dynamical system allows a simple and practical analysis and classification of the behavior through the Jacobian matrix. Essentially, the eigenvalues of this matrix are used to determine the dynamic stability of points in the flow and each type of stability corresponds to one of the five crowd behaviors. The identified crowd behaviors are (1) bottlenecks: where many pedestrians/vehicles from various points in the scene are entering through one narrow passage, (2) fountainheads: where many pedestrians/vehicles are emerging from a narrow passage only to separate in many directions, (3) lanes: where many pedestrians/vehicles are moving at the same speeds in the same direction, (4) arches or rings: where the collective motion is curved or circular, and (5) blocking: where there is a opposing motion and desired movement of groups of pedestrians is somehow prohibited. The implementation requires identifying a region of interest in the scene, and checking the eigenvalues of the Jacobian matrix in that region to determine the type of flow, that corresponds to various well-defined crowd behaviors. The eigenvalues are only considered in these regions of interest, consistent with the linear approximation and the implied behaviors. Since changes in eigenvalues can mean changes in stability, corresponding to changes in behavior, we can repeat the algorithm over clips of long video sequences to locate changes in behavior. This method was tested on over real videos representing crowd and traffic scenes.
Show less - Date Issued
- 2013
- Identifier
- CFE0004941, ucf:49638
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004941
- Title
- STRUCTURAL HEALTH MONITORING WITH EMPHASIS ON COMPUTER VISION, DAMAGE INDICES, AND STATISTICAL ANALYSIS.
- Creator
-
ZAURIN, RICARDO, CATBAS, F. NECATI, University of Central Florida
- Abstract / Description
-
Structural Health Monitoring (SHM) is the sensing and analysis of a structure to detect abnormal behavior, damage and deterioration during regular operations as well as under extreme loadings. SHM is designed to provide objective information for decision-making on safety and serviceability. This research focuses on the SHM of bridges by developing and integrating novel methods and techniques using sensor networks, computer vision, modeling for damage indices and statistical approaches....
Show moreStructural Health Monitoring (SHM) is the sensing and analysis of a structure to detect abnormal behavior, damage and deterioration during regular operations as well as under extreme loadings. SHM is designed to provide objective information for decision-making on safety and serviceability. This research focuses on the SHM of bridges by developing and integrating novel methods and techniques using sensor networks, computer vision, modeling for damage indices and statistical approaches. Effective use of traffic video synchronized with sensor measurements for decision-making is demonstrated. First, some of the computer vision methods and how they can be used for bridge monitoring are presented along with the most common issues and some practical solutions. Second, a conceptual damage index (Unit Influence Line) is formulated using synchronized computer images and sensor data for tracking the structural response under various load conditions. Third, a new index, Nd , is formulated and demonstrated to more effectively identify, localize and quantify damage. Commonly observed damage conditions on real bridges are simulated on a laboratory model for the demonstration of the computer vision method, UIL and the new index. This new method and the index, which are based on outlier detection from the UIL population, can very effectively handle large sets of monitoring data. The methods and techniques are demonstrated on the laboratory model for damage detection and all damage scenarios are identified successfully. Finally, the application of the proposed methods on a real life structure, which has a monitoring system, is presented. It is shown that these methods can be used efficiently for applications such as damage detection and load rating for decision-making. The results from this monitoring project on a movable bridge are demonstrated and presented along with the conclusions and recommendations for future work.
Show less - Date Issued
- 2009
- Identifier
- CFE0002890, ucf:48039
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0002890
- Title
- Learning Algorithms for Fat Quantification and Tumor Characterization.
- Creator
-
Hussein, Sarfaraz, Bagci, Ulas, Shah, Mubarak, Heinrich, Mark, Pensky, Marianna, University of Central Florida
- Abstract / Description
-
Obesity is one of the most prevalent health conditions. About 30% of the world's and over 70% of the United States' adult populations are either overweight or obese, causing an increased risk for cardiovascular diseases, diabetes, and certain types of cancer. Among all cancers, lung cancer is the leading cause of death, whereas pancreatic cancer has the poorest prognosis among all major cancers. Early diagnosis of these cancers can save lives. This dissertation contributes towards the...
Show moreObesity is one of the most prevalent health conditions. About 30% of the world's and over 70% of the United States' adult populations are either overweight or obese, causing an increased risk for cardiovascular diseases, diabetes, and certain types of cancer. Among all cancers, lung cancer is the leading cause of death, whereas pancreatic cancer has the poorest prognosis among all major cancers. Early diagnosis of these cancers can save lives. This dissertation contributes towards the development of computer-aided diagnosis tools in order to aid clinicians in establishing the quantitative relationship between obesity and cancers. With respect to obesity and metabolism, in the first part of the dissertation, we specifically focus on the segmentation and quantification of white and brown adipose tissue. For cancer diagnosis, we perform analysis on two important cases: lung cancer and Intraductal Papillary Mucinous Neoplasm (IPMN), a precursor to pancreatic cancer. This dissertation proposes an automatic body region detection method trained with only a single example. Then a new fat quantification approach is proposed which is based on geometric and appearance characteristics. For the segmentation of brown fat, a PET-guided CT co-segmentation method is presented. With different variants of Convolutional Neural Networks (CNN), supervised learning strategies are proposed for the automatic diagnosis of lung nodules and IPMN. In order to address the unavailability of a large number of labeled examples required for training, unsupervised learning approaches for cancer diagnosis without explicit labeling are proposed. We evaluate our proposed approaches (both supervised and unsupervised) on two different tumor diagnosis challenges: lung and pancreas with 1018 CT and 171 MRI scans respectively. The proposed segmentation, quantification and diagnosis approaches explore the important adiposity-cancer association and help pave the way towards improved diagnostic decision making in routine clinical practice.
Show less - Date Issued
- 2018
- Identifier
- CFE0007196, ucf:52288
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0007196
- Title
- Load Estimation, Structural Identification and Human Comfort Assessment of Flexible Structures.
- Creator
-
Celik, Ozan, Catbas, Necati, Yun, Hae-Bum, Makris, Nicos, Kauffman, Jeffrey L., University of Central Florida
- Abstract / Description
-
Stadiums, pedestrian bridges, dance floors, and concert halls are distinct from other civil engineering structures due to several challenges in their design and dynamic behavior. These challenges originate from the flexible inherent nature of these structures coupled with human interactions in the form of loading. The investigations in past literature on this topic clearly state that the design of flexible structures can be improved with better load modeling strategies acquired with reliable...
Show moreStadiums, pedestrian bridges, dance floors, and concert halls are distinct from other civil engineering structures due to several challenges in their design and dynamic behavior. These challenges originate from the flexible inherent nature of these structures coupled with human interactions in the form of loading. The investigations in past literature on this topic clearly state that the design of flexible structures can be improved with better load modeling strategies acquired with reliable load quantification, a deeper understanding of structural response, generation of simple and efficient human-structure interaction models and new measurement and assessment criteria for acceptable vibration levels. In contribution to these possible improvements, this dissertation taps into three specific areas: the load quantification of lively individuals or crowds, the structural identification under non-stationary and narrowband disturbances and the measurement of excessive vibration levels for human comfort. For load quantification, a computer vision based approach capable of tracking both individual and crowd motion is used. For structural identification, a noise-assisted Multivariate Empirical Mode Decomposition (MEMD) algorithm is incorporated into the operational modal analysis. The measurement of excessive vibration levels and the assessment of human comfort are accomplished through computer vision based human and object tracking, which provides a more convenient means for measurement and computation. All the proposed methods are tested in the laboratory environment utilizing a grandstand simulator and in the field on a pedestrian bridge and on a football stadium. Findings and interpretations from the experimental results are presented. The dissertation is concluded by highlighting the critical findings and the possible future work that may be conducted.
Show less - Date Issued
- 2017
- Identifier
- CFE0006863, ucf:51752
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006863
- Title
- Understanding images and videos using context.
- Creator
-
Vaca Castano, Gonzalo, Da Vitoria Lobo, Niels, Shah, Mubarak, Mikhael, Wasfy, Jones, W Linwood, Wiegand, Rudolf, University of Central Florida
- Abstract / Description
-
In computer vision, context refers to any information that may influence how visual media are understood.(&)nbsp; Traditionally, researchers have studied the influence of several sources of context in relation to the object detection problem in images. In this dissertation, we present a multifaceted review of the problem of context.(&)nbsp; Context is analyzed as a source of improvement in the object detection problem, not only in images but also in videos. In the case of images, we also...
Show moreIn computer vision, context refers to any information that may influence how visual media are understood.(&)nbsp; Traditionally, researchers have studied the influence of several sources of context in relation to the object detection problem in images. In this dissertation, we present a multifaceted review of the problem of context.(&)nbsp; Context is analyzed as a source of improvement in the object detection problem, not only in images but also in videos. In the case of images, we also investigate the influence of the semantic context, determined by objects, relationships, locations, and global composition, to achieve a general understanding of the image content as a whole. In our research, we also attempt to solve the related problem of finding the context associated with visual media. Given a set of visual elements (images), we want to extract the context that can be commonly associated with these images in order to remove ambiguity. The first part of this dissertation concentrates on achieving image understanding using semantic context.(&)nbsp; In spite of the recent success in tasks such as image classi?cation, object detection, image segmentation, and the progress on scene understanding, researchers still lack clarity about computer comprehension of the content of the image as a whole. Hence, we propose a Top-Down Visual Tree (TDVT) image representation that allows the encoding of the content of the image as a hierarchy of objects capturing their importance, co-occurrences, and type of relations. A novel Top-Down Tree LSTM network is presented to learn about the image composition from the training images and their TDVT representations. Given a test image, our algorithm detects objects and determine the hierarchical structure that they form, encoded as a TDVT representation of the image.A single image could have multiple interpretations that may lead to ambiguity about the intentionality of an image.(&)nbsp; What if instead of having only a single image to be interpreted, we have multiple images that represent the same topic. The second part of this dissertation covers how to extract the context information shared by multiple images. We present a method to determine the topic that these images represent. We accomplish this task by transferring tags from an image retrieval database, and by performing operations in the textual space of these tags. As an application, we also present a new image retrieval method that uses multiple images as input. Unlike earlier works that focus either on using just a single query image or using multiple query images with views of the same instance, the new image search paradigm retrieves images based on the underlying concepts that the input images represent.Finally, in the third part of this dissertation, we analyze the influence of context in videos. In this case, the temporal context is utilized to improve scene identification and object detection. We focus on egocentric videos, where agents require some time to change from one location to another. Therefore, we propose a Conditional Random Field (CRF) formulation, which penalizes short-term changes of the scene identity to improve the scene identity accuracy.(&)nbsp; We also show how to improve the object detection outcome by re-scoring the results based on the scene identity of the tested frame. We present a Support Vector Regression (SVR) formulation in the case that explicit knowledge of the scene identity is available during training time. In the case that explicit scene labeling is not available, we propose an LSTM formulation that considers the general appearance of the frame to re-score the object detectors.
Show less - Date Issued
- 2017
- Identifier
- CFE0006922, ucf:51703
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006922
- Title
- PATTERNS OF MOTION: DISCOVERY AND GENERALIZED REPRESENTATION.
- Creator
-
Saleemi, Imran, Shah, Mubarak, University of Central Florida
- Abstract / Description
-
In this dissertation, we address the problem of discovery and representation of motion patterns in a variety of scenarios, commonly encountered in vision applications. The overarching goal is to devise a generic representation, that captures any kind of object motion observable in video sequences. Such motion is a significant source of information typically employed for diverse applications such as tracking, anomaly detection, and action and event recognition. We present statistical...
Show moreIn this dissertation, we address the problem of discovery and representation of motion patterns in a variety of scenarios, commonly encountered in vision applications. The overarching goal is to devise a generic representation, that captures any kind of object motion observable in video sequences. Such motion is a significant source of information typically employed for diverse applications such as tracking, anomaly detection, and action and event recognition. We present statistical frameworks for representation of motion characteristics of objects, learned from tracks or optical flow, for static as well as moving cameras, and propose algorithms for their application to a variety of problems. The proposed motion pattern models and learning methods are general enough to be employed in a variety of problems as we demonstrate experimentally. We first propose a novel method to model and learn the scene activity, observed by a static camera. The motion patterns of objects in the scene are modeled in the form of a multivariate non-parametric probability density function of spatiotemporal variables (object locations and transition times between them). Kernel Density Estimation (KDE) is used to learn this model in a completely unsupervised fashion. Learning is accomplished by observing the trajectories of objects by a static camera over extended periods of time. The model encodes the probabilistic nature of the behavior of moving objects in the scene and is useful for activity analysis applications, such as persistent tracking and anomalous motion detection. In addition, the model also captures salient scene features, such as, the areas of occlusion and most likely paths. Once the model is learned, we use a unified Markov Chain Monte-Carlo (MCMC) based framework for generating the most likely paths in the scene, improving foreground detection, persistent labelling of objects during tracking and deciding whether a given trajectory represents an anomaly to the observed motion patterns. Experiments with real world videos are reported which validate the proposed approach. The representation and estimation framework proposed above, however, has a few limitations. This algorithm proposes to use a single global statistical distribution to represent all kinds of motion observed in a particular scene. It therefore, does not find a separation between multiple semantically distinct motion patterns in the scene. Instead, the learned model is a joint distribution over all possible patterns followed by objects. To overcome this limitation, we then propose a superior method for the discovery and statistical representation of motion patterns in a scene. The advantages of this approach over the first one are two-fold: first, this model is applicable to scenes of dense crowded motion where tracking may not be feasible, and second, it distinguishes between motion patterns that are distinct at a semantic level of abstraction. We propose a mixture model representation of salient patterns of optical flow, and present an algorithm for learning these patterns from dense optical flow in a hierarchical, unsupervised fashion. Using low level cues of noisy optical flow, K-means is employed to initialize a Gaussian mixture model for temporally segmented clips of video. The components of this mixture are then filtered and instances of motion patterns are computed using a simple motion model, by linking components across space and time. Motion patterns are then initialized and membership of instances in different motion patterns is established by using KL divergence between mixture distributions of pattern instances. Finally, a pixel level representation of motion patterns is proposed by deriving conditional expectation of optical flow. Results of extensive experiments are presented for multiple surveillance sequences containing numerous patterns involving both pedestrian and vehicular traffic. The proposed method exploits optical flow as the low level feature and performs a hierarchical clustering to obtain motion patterns; and we observe that the use of optical flow is also an integral part of a variety of other vision applications, for example, as features based representation of human actions. We, therefore, propose a new representation for articulated human actions using the motion patterns. The representation is based on hierarchical clustering of observed optical flow in four dimensional, spatial and motion flow space. The automatically discovered motion patterns, are the primitive actions, representative of flow at salient regions on the human body, much like trajectories of body joints, which are notoriously difficult to obtain automatically. The proposed method works in a completely unsupervised fashion, and in sharp contrast to state of the art representations like bag of video words, provides a truly semantically meaningful representation. Each primitive action depicts the most atomic sub-action, like left arm moving upwards, or right leg moving downward and leftward, and is represented by a mixture of four dimensional Gaussian distributions. A sequence of primitive actions are discovered in the test video, and labelled by computing the KL divergence between mixtures. The entire video sequence containing the human action, is thus reduced to a simple string, which is matched against similar strings of training videos to classify the action. The string matching is performed by global alignment, using the well-known Needleman-Wunsch algorithm. Experiments reported on multiple human actions data sets, confirm the validity, simplicity, and semantically meaningful nature of the proposed representation. Results obtained are encouraging and comparable to the state of the art.
Show less - Date Issued
- 2011
- Identifier
- CFE0003646, ucf:48836
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0003646


