Current Search: Computer Vision (x)
View All Items
Pages
- Title
- Relating First-person and Third-person Vision.
- Creator
-
Ardeshir Behrostaghi, Shervin, Borji, Ali, Shah, Mubarak, Hu, Haiyan, Atia, George, University of Central Florida
- Abstract / Description
-
Thanks to the availability and increasing popularity of wearable devices such as GoPro cameras, smart phones and glasses, we have access to a plethora of videos captured from the first person (egocentric) perspective. Capturing the world from the perspective of one's self, egocentric videos bear characteristics distinct from the more traditional third-person (exocentric) videos. In many computer vision tasks (e.g. identification, action recognition, face recognition, pose estimation, etc.),...
Show moreThanks to the availability and increasing popularity of wearable devices such as GoPro cameras, smart phones and glasses, we have access to a plethora of videos captured from the first person (egocentric) perspective. Capturing the world from the perspective of one's self, egocentric videos bear characteristics distinct from the more traditional third-person (exocentric) videos. In many computer vision tasks (e.g. identification, action recognition, face recognition, pose estimation, etc.), the human actors are the main focus. Hence, detecting, localizing, and recognizing the human actor is often incorporated as a vital component. In an egocentric video however, the person behind the camera is often the person of interest. This would change the nature of the task at hand, given that the camera holder is usually not visible in the content of his/her egocentric video. In other words, our knowledge about the visual appearance, pose, etc. on the egocentric camera holder is very limited, suggesting reliance on other cues in first person videos. First and third person videos have been separately studied in the past in the computer vision community. However, the relationship between first and third person vision has yet to be fully explored. Relating these two views systematically could potentially benefit many computer vision tasks and applications. This thesis studies this relationship in several aspects. We explore supervised and unsupervised approaches for relating these two views seeking different objectives such as identification, temporal alignment, and action classification. We believe that this exploration could lead to a better understanding the relationship of these two drastically different sources of information.
Show less - Date Issued
- 2018
- Identifier
- CFE0007151, ucf:52322
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0007151
- Title
- OBJECT ASSOCIATION ACROSS MULTIPLE MOVING CAMERAS IN PLANAR SCENES.
- Creator
-
Sheikh, Yaser, Shah, Mubarak, University of Central Florida
- Abstract / Description
-
In this dissertation, we address the problem of object detection and object association across multiple cameras over large areas that are well modeled by planes. We present a unifying probabilistic framework that captures the underlying geometry of planar scenes, and present algorithms to estimate geometric relationships between different cameras, which are subsequently used for co-operative association of objects. We first present a local1 object detection scheme that has three fundamental...
Show moreIn this dissertation, we address the problem of object detection and object association across multiple cameras over large areas that are well modeled by planes. We present a unifying probabilistic framework that captures the underlying geometry of planar scenes, and present algorithms to estimate geometric relationships between different cameras, which are subsequently used for co-operative association of objects. We first present a local1 object detection scheme that has three fundamental innovations over existing approaches. First, the model of the intensities of image pixels as independent random variables is challenged and it is asserted that useful correlation exists in intensities of spatially proximal pixels. This correlation is exploited to sustain high levels of detection accuracy in the presence of dynamic scene behavior, nominal misalignments and motion due to parallax. By using a non-parametric density estimation method over a joint domain-range representation of image pixels, complex dependencies between the domain (location) and range (color) are directly modeled. We present a model of the background as a single probability density. Second, temporal persistence is introduced as a detection criterion. Unlike previous approaches to object detection that detect objects by building adaptive models of the background, the foreground is modeled to augment the detection of objects (without explicit tracking), since objects detected in the preceding frame contain substantial evidence for detection in the current frame. Finally, the background and foreground models are used competitively in a MAP-MRF decision framework, stressing spatial context as a condition of detecting interesting objects and the posterior function is maximized efficiently by finding the minimum cut of a capacitated graph. Experimental validation of the method is performed and presented on a diverse set of data. We then address the problem of associating objects across multiple cameras in planar scenes. Since cameras may be moving, there is a possibility of both spatial and temporal non-overlap in the fields of view of the camera. We first address the case where spatial and temporal overlap can be assumed. Since the cameras are moving and often widely separated, direct appearance-based or proximity-based constraints cannot be used. Instead, we exploit geometric constraints on the relationship between the motion of each object across cameras, to test multiple correspondence hypotheses, without assuming any prior calibration information. Here, there are three contributions. First, we present a statistically and geometrically meaningful means of evaluating a hypothesized correspondence between multiple objects in multiple cameras. Second, since multiple cameras exist, ensuring coherency in association, i.e. transitive closure is maintained between more than two cameras, is an essential requirement. To ensure such coherency we pose the problem of object associating across cameras as a k-dimensional matching and use an approximation to find the association. We show that, under appropriate conditions, re-entering objects can also be re-associated to their original labels. Third, we show that as a result of associating objects across the cameras, a concurrent visualization of multiple aerial video streams is possible. Results are shown on a number of real and controlled scenarios with multiple objects observed by multiple cameras, validating our qualitative models. Finally, we present a unifying framework for object association across multiple cameras and for estimating inter-camera homographies between (spatially and temporally) overlapping and non-overlapping cameras, whether they are moving or non-moving. By making use of explicit polynomial models for the kinematics of objects, we present algorithms to estimate inter-frame homographies. Under an appropriate measurement noise model, an EM algorithm is applied for the maximum likelihood estimation of the inter-camera homographies and kinematic parameters. Rather than fit curves locally (in each camera) and match them across views, we present an approach that simultaneously refines the estimates of inter-camera homographies and curve coefficients globally. We demonstrate the efficacy of the approach on a number of real sequences taken from aerial cameras, and report quantitative performance during simulations.
Show less - Date Issued
- 2006
- Identifier
- CFE0001045, ucf:46797
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0001045
- Title
- DETECTING CURVED OBJECTS AGAINST CLUTTERED BACKGROUNDS.
- Creator
-
Prokaj, Jan, Lobo, Niels, University of Central Florida
- Abstract / Description
-
Detecting curved objects against cluttered backgrounds is a hard problem in computer vision. We present new low-level and mid-level features to function in these environments. The low-level features are fast to compute, because they employ an integral image approach, which makes them especially useful in real-time applications. The mid-level features are built from low-level features, and are optimized for curved object detection. The usefulness of these features is tested by designing an...
Show moreDetecting curved objects against cluttered backgrounds is a hard problem in computer vision. We present new low-level and mid-level features to function in these environments. The low-level features are fast to compute, because they employ an integral image approach, which makes them especially useful in real-time applications. The mid-level features are built from low-level features, and are optimized for curved object detection. The usefulness of these features is tested by designing an object detection algorithm using these features. Object detection is accomplished by transforming the mid-level features into weak classifiers, which then produce a strong classifier using AdaBoost. The resulting strong classifier is then tested on the problem of detecting heads with shoulders. On a database of over 500 images of people, cropped to contain head and shoulders, and with a diverse set of backgrounds, the detection rate is 90% while the false positive rate on a database of 500 negative images is less than 2%.
Show less - Date Issued
- 2008
- Identifier
- CFE0002102, ucf:47535
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0002102
- Title
- Gradient based MRF learning for image restoration and segmentation.
- Creator
-
Samuel, Kegan, Tappen, Marshall, Da Vitoria Lobo, Niels, Foroosh, Hassan, Li, Xin, University of Central Florida
- Abstract / Description
-
The undirected graphical model or Markov Random Field (MRF) is one of the more popular models used in computer vision and is the type of model with which this work is concerned. Models based on these methods have proven to be particularly useful in low-level vision systems and have led to state-of-the-art results for MRF-based systems. The research presented will describe a new discriminative training algorithm and its implementation.The MRF model will be trained by optimizing its parameters...
Show moreThe undirected graphical model or Markov Random Field (MRF) is one of the more popular models used in computer vision and is the type of model with which this work is concerned. Models based on these methods have proven to be particularly useful in low-level vision systems and have led to state-of-the-art results for MRF-based systems. The research presented will describe a new discriminative training algorithm and its implementation.The MRF model will be trained by optimizing its parameters so that the minimum energy solution of the model is as similar as possible to the ground-truth. While previous work has relied on time-consuming iterative approximations or stochastic approximations, this work will demonstrate how implicit differentiation can be used to analytically differentiate the overall training loss with respect to the MRF parameters. This framework leads to an efficient, flexible learning algorithm that can be applied to a number of different models.The effectiveness of the proposed learning method will then be demonstrated by learning the parameters of two related models applied to the task of denoising images. The experimental results will demonstrate that the proposed learning algorithm is comparable and, at times, better than previous training methods applied to the same tasks.A new segmentation model will also be introduced and trained using the proposed learning method. The proposed segmentation model is based on an energy minimization framework that is novel in how it incorporates priors on the size of the segments in a way that is straightforward to implement. While other methods, such as normalized cuts, tend to produce segmentations of similar sizes, this method is able to overcome that problem and produce more realistic segmentations.
Show less - Date Issued
- 2012
- Identifier
- CFE0004595, ucf:49207
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004595
- Title
- SPATIO-TEMPORAL MAXIMUM AVERAGE CORRELATION HEIGHT TEMPLATES IN ACTION RECOGNITION AND VIDEO SUMMARIZATION.
- Creator
-
Rodriguez, Mikel, Shah, Mubarak, University of Central Florida
- Abstract / Description
-
Action recognition represents one of the most difficult problems in computer vision given that it embodies the combination of several uncertain attributes, such as the subtle variability associated with individual human behavior and the challenges that come with viewpoint variations, scale changes and different temporal extents. Nevertheless, action recognition solutions are critical in a great number of domains, such video surveillance, assisted living environments, video search, interfaces,...
Show moreAction recognition represents one of the most difficult problems in computer vision given that it embodies the combination of several uncertain attributes, such as the subtle variability associated with individual human behavior and the challenges that come with viewpoint variations, scale changes and different temporal extents. Nevertheless, action recognition solutions are critical in a great number of domains, such video surveillance, assisted living environments, video search, interfaces, and virtual reality. In this dissertation, we investigate template-based action recognition algorithms that can incorporate the information contained in a set of training examples, and we explore how these algorithms perform in action recognition and video summarization. First, we introduce a template-based method for recognizing human actions called Action MACH. Our approach is based on a Maximum Average Correlation Height (MACH) filter. MACH is capable of capturing intra-class variability by synthesizing a single Action MACH filter for a given action class. We generalize the traditional MACH filter to video (3D spatiotemporal volume), and vector valued data. By analyzing the response of the filter in the frequency domain, we avoid the high computational cost commonly incurred in template-based approaches. Vector valued data is analyzed using the Clifford Fourier transform, a generalization of the Fourier transform intended for both scalar and vector-valued data. Next, we address three seldom explored challenges in template-based action recognition. The first is the recognition and localization of human actions in aerial videos obtained from unmanned aerial vehicles (UAVs), a new medium which presents unique challenges due to the small number of pixels per human, pose, and moving camera. The second issue we address is the incorporation of multiple positive and negative examples of a target action class when generating an action template. We address this issue by employing the Fukunaga-Koontz Transform as a means of generating a single quadratic template which, unlike traditional temporal templates (which rely on positive examples alone), effectively captures the variability associated with an action class by including both positive and negative examples in the template training process. Third, we explore the problem of generating video summaries that include specific actions of interest as opposed to all moving objects. In doing so, we explore the role of action templates in video summarization in an effort to provide a means of generating a compact video representation based on a set of activities of interest. We introduce an approach in which a user specifies the activities that interest him and the video is automatically condensed to a short clip which captures the most relevant events based on the user's preference. We follow the output summary video format of non-chronological video synopsis approaches, in which different events which occur at different times may be displayed concurrently, even though they never occur simultaneously in the original video. However, instead of assuming that all moving objects are interesting, priority is given to specific activities of interest which pertain to a user's query. This provides an efficient means of browsing through large collections of video for events of interest.
Show less - Date Issued
- 2010
- Identifier
- CFE0003313, ucf:48507
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0003313
- Title
- Adversarial Attacks On Vision Algorithms Using Deep Learning Features.
- Creator
-
Michel, Andy, Jha, Sumit Kumar, Leavens, Gary, Valliyil Thankachan, Sharma, University of Central Florida
- Abstract / Description
-
Computer vision algorithms, such as those implementing object detection, are known to be sus-ceptible to adversarial attacks. Small barely perceptible perturbations to the input can cause visionalgorithms to incorrectly classify inputs that they would have otherwise classified correctly. Anumber of approaches have been recently investigated to generate such adversarial examples fordeep neural networks. Many of these approaches either require grey-box access to the deep neuralnet being...
Show moreComputer vision algorithms, such as those implementing object detection, are known to be sus-ceptible to adversarial attacks. Small barely perceptible perturbations to the input can cause visionalgorithms to incorrectly classify inputs that they would have otherwise classified correctly. Anumber of approaches have been recently investigated to generate such adversarial examples fordeep neural networks. Many of these approaches either require grey-box access to the deep neuralnet being attacked or rely on adversarial transfer and grey-box access to a surrogate neural network.In this thesis, we present an approach to the synthesis of adversarial examples for computer vi-sion algorithms that only requires black-box access to the algorithm being attacked. Our attackapproach employs fuzzing with features derived from the layers of a convolutional neural networktrained on adversarial examples from an unrelated dataset. Based on our experimental results,we believe that our validation approach will enable designers of cyber-physical systems and otherhigh-assurance use-cases of vision algorithms to stress test their implementations.
Show less - Date Issued
- 2017
- Identifier
- CFE0006898, ucf:51714
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006898
- Title
- Human Detection, Tracking and Segmentation in Surveillance Video.
- Creator
-
Shu, Guang, Shah, Mubarak, Boloni, Ladislau, Wang, Jun, Lin, Mingjie, Sugaya, Kiminobu, University of Central Florida
- Abstract / Description
-
This dissertation addresses the problem of human detection and tracking in surveillance videos. Even though this is a well-explored topic, many challenges remain when confronted with data from real world situations. These challenges include appearance variation, illumination changes, camera motion, cluttered scenes and occlusion. In this dissertation several novel methods for improving on the current state of human detection and tracking based on learning scene-specific information in video...
Show moreThis dissertation addresses the problem of human detection and tracking in surveillance videos. Even though this is a well-explored topic, many challenges remain when confronted with data from real world situations. These challenges include appearance variation, illumination changes, camera motion, cluttered scenes and occlusion. In this dissertation several novel methods for improving on the current state of human detection and tracking based on learning scene-specific information in video feeds are proposed.Firstly, we propose a novel method for human detection which employs unsupervised learning and superpixel segmentation. The performance of generic human detectors is usually degraded in unconstrained video environments due to varying lighting conditions, backgrounds and camera viewpoints. To handle this problem, we employ an unsupervised learning framework that improves the detection performance of a generic detector when it is applied to a particular video. In our approach, a generic DPM human detector is employed to collect initial detection examples. These examples are segmented into superpixels and then represented using Bag-of-Words (BoW) framework. The superpixel-based BoW feature encodes useful color features of the scene, which provides additional information. Finally a new scene-specific classifier is trained using the BoW features extracted from the new examples. Compared to previous work, our method learns scene-specific information through superpixel-based features, hence it can avoid many false detections typically obtained by a generic detector. We are able to demonstrate a significant improvement in the performance of the state-of-the-art detector.Given robust human detection, we propose a robust multiple-human tracking framework using a part-based model. Human detection using part models has become quite popular, yet its extension in tracking has not been fully explored. Single camera-based multiple-person tracking is often hindered by difficulties such as occlusion and changes in appearance. We address such problems by developing an online-learning tracking-by-detection method. Our approach learns part-based person-specific Support Vector Machine (SVM) classifiers which capture articulations of moving human bodies with dynamically changing backgrounds. With the part-based model, our approach is able to handle partial occlusions in both the detection and the tracking stages. In the detection stage, we select the subset of parts which maximizes the probability of detection. This leads to a significant improvement in detection performance in cluttered scenes. In the tracking stage, we dynamically handle occlusions by distributing the score of the learned person classifier among its corresponding parts, which allows us to detect and predict partial occlusions and prevent the performance of the classifiers from being degraded. Extensive experiments using the proposed method on several challenging sequences demonstrate state-of-the-art performance in multiple-people tracking.Next, in order to obtain precise boundaries of humans, we propose a novel method for multiple human segmentation in videos by incorporating human detection and part-based detection potential into a multi-frame optimization framework. In the first stage, after obtaining the superpixel segmentation for each detection window, we separate superpixels corresponding to a human and background by minimizing an energy function using Conditional Random Field (CRF). We use the part detection potentials from the DPM detector, which provides useful information for human shape. In the second stage, the spatio-temporal constraints of the video is leveraged to build a tracklet-based Gaussian Mixture Model for each person, and the boundaries are smoothed by multi-frame graph optimization. Compared to previous work, our method could automatically segment multiple people in videos with accurate boundaries, and it is robust to camera motion. Experimental results show that our method achieves better segmentation performance than previous methods in terms of segmentation accuracy on several challenging video sequences.Most of the work in Computer Vision deals with point solution; a specific algorithm for a specific problem. However, putting different algorithms into one real world integrated system is a big challenge. Finally, we introduce an efficient tracking system, NONA, for high-definition surveillance video. We implement the system using a multi-threaded architecture (Intel Threading Building Blocks (TBB)), which executes video ingestion, tracking, and video output in parallel. To improve tracking accuracy without sacrificing efficiency, we employ several useful techniques. Adaptive Template Scaling is used to handle the scale change due to objects moving towards a camera. Incremental Searching and Local Frame Differencing are used to resolve challenging issues such as scale change, occlusion and cluttered backgrounds. We tested our tracking system on a high-definition video dataset and achieved acceptable tracking accuracy while maintaining real-time performance.
Show less - Date Issued
- 2014
- Identifier
- CFE0005551, ucf:50278
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0005551
- Title
- MULTI-VIEW APPROACHES TO TRACKING, 3D RECONSTRUCTION AND OBJECT CLASS DETECTION.
- Creator
-
khan, saad, Shah, Mubarak, University of Central Florida
- Abstract / Description
-
Multi-camera systems are becoming ubiquitous and have found application in a variety of domains including surveillance, immersive visualization, sports entertainment and movie special effects amongst others. From a computer vision perspective, the challenging task is how to most efficiently fuse information from multiple views in the absence of detailed calibration information and a minimum of human intervention. This thesis presents a new approach to fuse foreground likelihood information...
Show moreMulti-camera systems are becoming ubiquitous and have found application in a variety of domains including surveillance, immersive visualization, sports entertainment and movie special effects amongst others. From a computer vision perspective, the challenging task is how to most efficiently fuse information from multiple views in the absence of detailed calibration information and a minimum of human intervention. This thesis presents a new approach to fuse foreground likelihood information from multiple views onto a reference view without explicit processing in 3D space, thereby circumventing the need for complete calibration. Our approach uses a homographic occupancy constraint (HOC), which states that if a foreground pixel has a piercing point that is occupied by foreground object, then the pixel warps to foreground regions in every view under homographies induced by the reference plane, in effect using cameras as occupancy detectors. Using the HOC we are able to resolve occlusions and robustly determine ground plane localizations of the people in the scene. To find tracks we obtain ground localizations over a window of frames and stack them creating a space time volume. Regions belonging to the same person form contiguous spatio-temporal tracks that are clustered using a graph cuts segmentation approach. Second, we demonstrate that the HOC is equivalent to performing visual hull intersection in the image-plane, resulting in a cross-sectional slice of the object. The process is extended to multiple planes parallel to the reference plane in the framework of plane to plane homologies. Slices from multiple planes are accumulated and the 3D structure of the object is segmented out. Unlike other visual hull based approaches that use 3D constructs like visual cones, voxels or polygonal meshes requiring calibrated views, ours is purely-image based and uses only 2D constructs i.e. planar homographies between views. This feature also renders it conducive to graphics hardware acceleration. The current GPU implementation of our approach is capable of fusing 60 views (480x720 pixels) at the rate of 50 slices/second. We then present an extension of this approach to reconstructing non-rigid articulated objects from monocular video sequences. The basic premise is that due to motion of the object, scene occupancies are blurred out with non-occupancies in a manner analogous to motion blurred imagery. Using our HOC and a novel construct: the temporal occupancy point (TOP), we are able to fuse multiple views of non-rigid objects obtained from a monocular video sequence. The result is a set of blurred scene occupancy images in the corresponding views, where the values at each pixel correspond to the fraction of total time duration that the pixel observed an occupied scene location. We then use a motion de-blurring approach to de-blur the occupancy images and obtain the 3D structure of the non-rigid object. In the final part of this thesis, we present an object class detection method employing 3D models of rigid objects constructed using the above 3D reconstruction approach. Instead of using a complicated mechanism for relating multiple 2D training views, our approach establishes spatial connections between these views by mapping them directly to the surface of a 3D model. To generalize the model for object class detection, features from supplemental views (obtained from Google Image search) are also considered. Given a 2D test image, correspondences between the 3D feature model and the testing view are identified by matching the detected features. Based on the 3D locations of the corresponding features, several hypotheses of viewing planes can be made. The one with the highest confidence is then used to detect the object using feature location matching. Performance of the proposed method has been evaluated by using the PASCAL VOC challenge dataset and promising results are demonstrated.
Show less - Date Issued
- 2008
- Identifier
- CFE0002073, ucf:47593
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0002073
- Title
- Action Recognition, Temporal Localization and Detection in Trimmed and Untrimmed Video.
- Creator
-
Hou, Rui, Shah, Mubarak, Mahalanobis, Abhijit, Hua, Kien, Sukthankar, Rahul, University of Central Florida
- Abstract / Description
-
Automatic understanding of videos is one of the most active areas of computer vision research. It has applications in video surveillance, human computer interaction, video sports analysis, virtual and augmented reality, video retrieval etc. In this dissertation, we address four important tasks in video understanding, namely action recognition, temporal action localization, spatial-temporal action detection and video object/action segmentation. This dissertation makes contributions to above...
Show moreAutomatic understanding of videos is one of the most active areas of computer vision research. It has applications in video surveillance, human computer interaction, video sports analysis, virtual and augmented reality, video retrieval etc. In this dissertation, we address four important tasks in video understanding, namely action recognition, temporal action localization, spatial-temporal action detection and video object/action segmentation. This dissertation makes contributions to above tasks by proposing. First, for video action recognition, we propose a category level feature learning method. Our proposed method automatically identifies such pairs of categories using a criterion of mutual pairwise proximity in the (kernelized) feature space, and a category-level similarity matrix where each entry corresponds to the one-vs-one SVM margin for pairs of categories. Second, for temporal action localization, we propose to exploit the temporal structure of actions by modeling an action as a sequence of sub-actions and present a computationally efficient approach. Third, we propose 3D Tube Convolutional Neural Network (TCNN) based pipeline for action detection. The proposed architecture is a unified deep network that is able to recognize and localize action based on 3D convolution features. It generalizes the popular faster R-CNN framework from images to videos. Last, an end-to-end encoder-decoder based 3D convolutional neural network pipeline is proposed, which is able to segment out the foreground objects from the background. Moreover, the action label can be obtained as well by passing the foreground object into an action classifier. Extensive experiments on several video datasets demonstrate the superior performance of the proposed approach for video understanding compared to the state-of-the-art.
Show less - Date Issued
- 2019
- Identifier
- CFE0007655, ucf:52502
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0007655
- Title
- Development of 3D Vision Testbed for Shape Memory Polymer Structure Applications.
- Creator
-
Thompson, Kenneth, Xu, Yunjun, Gou, Jihua, Lin, Kuo-Chi, University of Central Florida
- Abstract / Description
-
As applications for shape memory polymers (SMPs) become more advanced, it is necessary to have the ability to monitor both the actuation and thermal properties of structures made of such materials. In this paper, a method of using three stereo pairs of webcams and a single thermal camera is studied for the purposes of both tracking three dimensional motion of shape memory polymers, as well as the temperature of points of interest within the SMP structure. The method used includes a stereo...
Show moreAs applications for shape memory polymers (SMPs) become more advanced, it is necessary to have the ability to monitor both the actuation and thermal properties of structures made of such materials. In this paper, a method of using three stereo pairs of webcams and a single thermal camera is studied for the purposes of both tracking three dimensional motion of shape memory polymers, as well as the temperature of points of interest within the SMP structure. The method used includes a stereo camera calibration with integrated local minimum tracking algorithms to locate points of interest on the material and measure their temperature through interpolation techniques. The importance of the proposed method is that it allows a means to cost effectively monitor the surface temperature of a shape memory polymer structure without having to place intrusive sensors on the samples, which would limit the performance of the shape memory effect. The ability to monitor the surface temperatures of a SMP structure allows for more complex configurations to be created while increasing the performance and durability of the material. Additionally, as compared to the previous version, both the functionalities of the testbed and the user interface have been significantly improved.
Show less - Date Issued
- 2015
- Identifier
- CFE0005893, ucf:50860
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0005893
- Title
- Computerized Evaluatution of Microsurgery Skills Training.
- Creator
-
Jotwani, Payal, Foroosh, Hassan, Hughes, Charles, Hua, Kien, University of Central Florida
- Abstract / Description
-
The style of imparting medical training has evolved, over the years. The traditional methods of teaching and practicing basic surgical skills under apprenticeship model, no longer occupy the first place in modern technically demanding advanced surgical disciplines like neurosurgery. Furthermore, the legal and ethical concerns for patient safety as well as cost-effectiveness have forced neurosurgeons to master the necessary microsurgical techniques to accomplish desired results. This has lead...
Show moreThe style of imparting medical training has evolved, over the years. The traditional methods of teaching and practicing basic surgical skills under apprenticeship model, no longer occupy the first place in modern technically demanding advanced surgical disciplines like neurosurgery. Furthermore, the legal and ethical concerns for patient safety as well as cost-effectiveness have forced neurosurgeons to master the necessary microsurgical techniques to accomplish desired results. This has lead to increased emphasis on assessment of clinical and surgical techniques of the neurosurgeons. However, the subjective assessment of microsurgical techniques like micro-suturing under the apprenticeship model cannot be completely unbiased. A few initiatives using computer-based techniques, have been made to introduce objective evaluation of surgical skills.This thesis presents a novel approach involving computerized evaluation of different components of micro-suturing techniques, to eliminate the bias of subjective assessment. The work involved acquisition of cine clips of micro-suturing activity on synthetic material. Image processing and computer vision based techniques were then applied to these videos to assess different characteristics of micro-suturing viz. speed, dexterity and effectualness. In parallel subjective grading on these was done by a senior neurosurgeon. Further correlation and comparative study of both the assessments was done to analyze the efficacy of objective and subjective evaluation.
Show less - Date Issued
- 2015
- Identifier
- CFE0006221, ucf:51056
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006221
- Title
- Visual Geo-Localization and Location-Aware Image Understanding.
- Creator
-
Roshan Zamir, Amir, Shah, Mubarak, Jha, Sumit, Sukthankar, Rahul, Lin, Mingjie, Fathpour, Sasan, University of Central Florida
- Abstract / Description
-
Geo-localization is the problem of discovering the location where an image or video was captured. Recently, large scale geo-localization methods which are devised for ground-level imagery and employ techniques similar to image matching have attracted much interest. In these methods, given a reference dataset composed of geo-tagged images, the problem is to estimate the geo-location of a query by finding its matching reference images.In this dissertation, we address three questions central to...
Show moreGeo-localization is the problem of discovering the location where an image or video was captured. Recently, large scale geo-localization methods which are devised for ground-level imagery and employ techniques similar to image matching have attracted much interest. In these methods, given a reference dataset composed of geo-tagged images, the problem is to estimate the geo-location of a query by finding its matching reference images.In this dissertation, we address three questions central to geo-spatial analysis of ground-level imagery: \textbf{1) How to geo-localize images and videos captured at unknown locations? 2) How to refine the geo-location of already geo-tagged data? 3) How to utilize the extracted geo-tags?}We present a new framework for geo-locating an image utilizing a novel multiple nearest neighbor feature matching method using Generalized Minimum Clique Graphs (GMCP). First, we extract local features (e.g., SIFT) from the query image and retrieve a number of nearest neighbors for each query feature from the reference data set. Next, we apply our GMCP-based feature matching to select a single nearest neighbor for each query feature such that all matches are globally consistent. Our approach to feature matching is based on the proposition that the first nearest neighbors are not necessarily the best choices for finding correspondences in image matching. Therefore, the proposed method considers multiple reference nearest neighbors as potential matches and selects the correct ones by enforcing the consistency among their global features (e.g., GIST) using GMCP. Our evaluations using a new data set of 102k Street View images shows the proposed method outperforms the state-of-the-art by 10 percent.Geo-localization of images can be extended to geo-localization of a video. We have developed a novel method for estimating the geo-spatial trajectory of a moving camera with unknown intrinsic parameters in a city-scale. The proposed method is based on a three step process: 1) individual geo-localization of video frames using Street View images to obtain the likelihood of the location (latitude and longitude) given the current observation, 2) Bayesian tracking to estimate the frame location and video's temporal evolution using previous state probabilities and current likelihood, and 3) applying a novel Minimum Spanning Trees based trajectory reconstruction to eliminate trajectory loops or noisy estimations. Thus far, we have assumed reliable geo-tags for reference imagery are available through crowdsourcing. However, crowdsourced images are well known to suffer from the acute shortcoming of having inaccurate geo-tags. We have developed the first method for refinement of GPS-tags which automatically discovers the subset of corrupted geo-tags and refines them. We employ Random Walks to discover the uncontaminated subset of location estimations and robustify Random Walks with a novel adaptive damping factor that conforms to the level of noise in the input. In location-aware image understanding, we are interested in improving the image analysis by putting it in the right geo-spatial context. This approach is of particular importance as the majority of cameras and mobile devices are now being equipped with GPS chips. Therefore, developing techniques which can leverage the geo-tags of images for improving the performance of traditional computer vision tasks is of particular interest. We have developed a location-aware multimodal approach which incorporates business directories, textual information, and web images to identify businesses in a geo-tagged query image.
Show less - Date Issued
- 2014
- Identifier
- CFE0005544, ucf:50282
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0005544
- Title
- TOWARDS A SELF-CALIBRATING VIDEO CAMERA NETWORK FOR CONTENT ANALYSIS AND FORENSICS.
- Creator
-
Junejo, Imran, Foroosh, Hassan, University of Central Florida
- Abstract / Description
-
Due to growing security concerns, video surveillance and monitoring has received an immense attention from both federal agencies and private firms. The main concern is that a single camera, even if allowed to rotate or translate, is not sufficient to cover a large area for video surveillance. A more general solution with wide range of applications is to allow the deployed cameras to have a non-overlapping field of view (FoV) and to, if possible, allow these cameras to move freely in 3D space....
Show moreDue to growing security concerns, video surveillance and monitoring has received an immense attention from both federal agencies and private firms. The main concern is that a single camera, even if allowed to rotate or translate, is not sufficient to cover a large area for video surveillance. A more general solution with wide range of applications is to allow the deployed cameras to have a non-overlapping field of view (FoV) and to, if possible, allow these cameras to move freely in 3D space. This thesis addresses the issue of how cameras in such a network can be calibrated and how the network as a whole can be calibrated, such that each camera as a unit in the network is aware of its orientation with respect to all the other cameras in the network. Different types of cameras might be present in a multiple camera network and novel techniques are presented for efficient calibration of these cameras. Specifically: (i) For a stationary camera, we derive new constraints on the Image of the Absolute Conic (IAC). These new constraints are shown to be intrinsic to IAC; (ii) For a scene where object shadows are cast on a ground plane, we track the shadows on the ground plane cast by at least two unknown stationary points, and utilize the tracked shadow positions to compute the horizon line and hence compute the camera intrinsic and extrinsic parameters; (iii) A novel solution to a scenario where a camera is observing pedestrians is presented. The uniqueness of formulation lies in recognizing two harmonic homologies present in the geometry obtained by observing pedestrians; (iv) For a freely moving camera, a novel practical method is proposed for its self-calibration which even allows it to change its internal parameters by zooming; and (v) due to the increased application of the pan-tilt-zoom (PTZ) cameras, a technique is presented that uses only two images to estimate five camera parameters. For an automatically configurable multi-camera network, having non-overlapping field of view and possibly containing moving cameras, a practical framework is proposed that determines the geometry of such a dynamic camera network. It is shown that only one automatically computed vanishing point and a line lying on any plane orthogonal to the vertical direction is sufficient to infer the geometry of a dynamic network. Our method generalizes previous work which considers restricted camera motions. Using minimal assumptions, we are able to successfully demonstrate promising results on synthetic as well as on real data. Applications to path modeling, GPS coordinate estimation, and configuring mixed-reality environment are explored.
Show less - Date Issued
- 2007
- Identifier
- CFE0001743, ucf:47296
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0001743
- Title
- DEPTH FROM DEFOCUSED MOTION.
- Creator
-
Myles, Zarina, da Vitoria Lobo, Niels, University of Central Florida
- Abstract / Description
-
Motion in depth and/or zooming causes defocus blur. This work presents a solution to the problem of using defocus blur and optical flow information to compute depth at points that defocus when they move.We first formulate a novel algorithm which recovers defocus blur and affine parameters simultaneously. Next we formulate a novel relationship (the blur-depth relationship) between defocus blur, relative object depth and three parameters based on camera motion and intrinsic camera parameters.We...
Show moreMotion in depth and/or zooming causes defocus blur. This work presents a solution to the problem of using defocus blur and optical flow information to compute depth at points that defocus when they move.We first formulate a novel algorithm which recovers defocus blur and affine parameters simultaneously. Next we formulate a novel relationship (the blur-depth relationship) between defocus blur, relative object depth and three parameters based on camera motion and intrinsic camera parameters.We can handle the situation where a single image has points which have defocused, got sharper or are focally unperturbed. Moreover, our formulation is valid regardless of whether the defocus is due to the image plane being in front of or behind the point of sharp focus.The blur-depth relationship requires a sequence of at least three images taken with the camera moving either towards or away from the object. It can be used to obtain an initial estimate of relative depth using one of several non-linear methods. We demonstrate a solution based on the Extended Kalman Filter in which the measurement equation is the blur-depth relationship.The estimate of relative depth is then used to compute an initial estimate of camera motion parameters. In order to refine depth values, the values of relative depth and camera motion are then input into a second Extended Kalman Filter in which the measurement equations are the discrete motion equations. This set of cascaded Kalman filters can be employed iteratively over a longer sequence of images in order to further refine depth.We conduct several experiments on real scenery in order to demonstrate the range of object shapes that the algorithm can handle. We show that fairly good estimates of depth can be obtained with just three images.
Show less - Date Issued
- 2004
- Identifier
- CFE0000135, ucf:46179
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0000135
- Title
- AUTONOMOUS ROBOTIC GRASPING IN UNSTRUCTURED ENVIRONMENTS.
- Creator
-
Jabalameli, Amirhossein, Behal, Aman, Haralambous, Michael, Pourmohammadi Fallah, Yaser, Boloni, Ladislau, Xu, Yunjun, University of Central Florida
- Abstract / Description
-
A crucial problem in robotics is interacting with known or novel objects in unstructured environments. While the convergence of a multitude of research advances is required to address this problem, our goal is to describe a framework that employs the robot's visual perception to identify and execute an appropriate grasp to pick and place novel objects. Analytical approaches explore for solutions through kinematic and dynamic formulations. On the other hand, data-driven methods retrieve grasps...
Show moreA crucial problem in robotics is interacting with known or novel objects in unstructured environments. While the convergence of a multitude of research advances is required to address this problem, our goal is to describe a framework that employs the robot's visual perception to identify and execute an appropriate grasp to pick and place novel objects. Analytical approaches explore for solutions through kinematic and dynamic formulations. On the other hand, data-driven methods retrieve grasps according to their prior knowledge of either the target object, human experience, or through information obtained from acquired data. In this dissertation, we propose a framework based on the supporting principle that potential contacting regions for a stable grasp can be foundby searching for (i) sharp discontinuities and (ii) regions of locally maximal principal curvature in the depth map. In addition to suggestions from empirical evidence, we discuss this principle by applying the concept of force-closure and wrench convexes. The key point is that no prior knowledge of objects is utilized in the grasp planning process; however, the obtained results show thatthe approach is capable to deal successfully with objects of different shapes and sizes. We believe that the proposed work is novel because the description of the visible portion of objects by theaforementioned edges appearing in the depth map facilitates the process of grasp set-point extraction in the same way as image processing methods with the focus on small-size 2D image areas rather than clustering and analyzing huge sets of 3D point-cloud coordinates. In fact, this approach dismisses reconstruction of objects. These features result in low computational costs and make it possible to run the proposed algorithm in real-time. Finally, the performance of the approach is successfully validated by applying it to the scenes with both single and multiple objects, in both simulation and real-world experiment setups.
Show less - Date Issued
- 2019
- Identifier
- CFE0007892, ucf:52757
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0007892
- Title
- Visionary Ophthalmics: Confluence of Computer Vision and Deep Learning for Ophthalmology.
- Creator
-
Morley, Dustin, Foroosh, Hassan, Bagci, Ulas, Gong, Boqing, Mohapatra, Ram, University of Central Florida
- Abstract / Description
-
Ophthalmology is a medical field ripe with opportunities for meaningful application of computer vision algorithms. The field utilizes data from multiple disparate imaging techniques, ranging from conventional cameras to tomography, comprising a diverse set of computer vision challenges. Computer vision has a rich history of techniques that can adequately meet many of these challenges. However, the field has undergone something of a revolution in recent times as deep learning techniques have...
Show moreOphthalmology is a medical field ripe with opportunities for meaningful application of computer vision algorithms. The field utilizes data from multiple disparate imaging techniques, ranging from conventional cameras to tomography, comprising a diverse set of computer vision challenges. Computer vision has a rich history of techniques that can adequately meet many of these challenges. However, the field has undergone something of a revolution in recent times as deep learning techniques have sprung into the forefront following advances in GPU hardware. This development raises important questions regarding how to best leverage insights from both modern deep learning approaches and more classical computer vision approaches for a given problem. In this dissertation, we tackle challenging computer vision problems in ophthalmology using methods all across this spectrum. Perhaps our most significant work is a highly successful iris registration algorithm for use in laser eye surgery. This algorithm relies on matching features extracted from the structure tensor and a Gabor wavelet (-) a classically driven approach that does not utilize modern machine learning. However, drawing on insight from the deep learning revolution, we demonstrate successful application of backpropagation to optimize the registration significantly faster than the alternative of relying on finite differences. Towards the other end of the spectrum, we also present a novel framework for improving RANSAC segmentation algorithms by utilizing a convolutional neural network (CNN) trained on a RANSAC-based loss function. Finally, we apply state-of-the-art deep learning methods to solve the problem of pathological fluid detection in optical coherence tomography images of the human retina, using a novel retina-specific data augmentation technique to greatly expand the data set. Altogether, our work demonstrates benefits of applying a holistic view of computer vision, which leverages deep learning and associated insights without neglecting techniques and insights from the previous era.
Show less - Date Issued
- 2018
- Identifier
- CFE0007058, ucf:52001
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0007058
- Title
- A Study of Localization and Latency Reduction for Action Recognition.
- Creator
-
Masood, Syed, Tappen, Marshall, Foroosh, Hassan, Stanley, Kenneth, Sukthankar, Rahul, University of Central Florida
- Abstract / Description
-
The success of recognizing periodic actions in single-person-simple-background datasets, such as Weizmann and KTH, has created a need for more complex datasets to push the performance of action recognition systems. In this work, we create a new synthetic action dataset and use it to highlight weaknesses in current recognition systems. Experiments show that introducing background complexity to action video sequences causes a significant degradation in recognition performance. Moreover, this...
Show moreThe success of recognizing periodic actions in single-person-simple-background datasets, such as Weizmann and KTH, has created a need for more complex datasets to push the performance of action recognition systems. In this work, we create a new synthetic action dataset and use it to highlight weaknesses in current recognition systems. Experiments show that introducing background complexity to action video sequences causes a significant degradation in recognition performance. Moreover, this degradation cannot be fixed by fine-tuning system parameters or by selecting better feature points. Instead, we show that the problem lies in the spatio-temporal cuboid volume extracted from the interest point locations. Having identified the problem, we show how improved results can be achieved by simple modifications to the cuboids.For the above method however, one requires near-perfect localization of the action within a video sequence. To achieve this objective, we present a two stage weakly supervised probabilistic model for simultaneous localization and recognition of actions in videos. Different from previous approaches, our method is novel in that it (1) eliminates the need for manual annotations for the training procedure and (2) does not require any human detection or tracking in the classification stage. The first stage of our framework is a probabilistic action localization model which extracts the most promising sub-windows in a video sequence where an action can take place. We use a non-linear classifier in the second stage of our framework for the final classification task. We show the effectiveness of our proposed model on two well known real-world datasets: UCF Sports and UCF11 datasets.Another application of the weakly supervised probablistic model proposed above is in the gaming environment. An important aspect in designing interactive, action-based interfaces is reliably recognizing actions with minimal latency. High latency causes the system's feedback to lag behind and thus significantly degrade the interactivity of the user experience. With slight modification to the weakly supervised probablistic model we proposed for action localization, we show how it can be used for reducing latency when recognizing actions in Human Computer Interaction (HCI) environments. This latency-aware learning formulation trains a logistic regression-based classifier that automatically determines distinctive canonical poses from the data and uses these to robustly recognize actions in the presence of ambiguous poses. We introduce a novel (publicly released) dataset for the purpose of our experiments. Comparisons of our method against both a Bag of Words and a Conditional Random Field (CRF) classifier show improved recognition performance for both pre-segmented and online classification tasks.
Show less - Date Issued
- 2012
- Identifier
- CFE0004575, ucf:49210
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004575
- Title
- MARKERLESS TRACKING USING POLAR CORRELATION OF CAMERA OPTICAL FLOW.
- Creator
-
Gupta, Prince, da Vitoria Lobo, Niels, University of Central Florida
- Abstract / Description
-
We present a novel, real-time, markerless vision-based tracking system, employing a rigid orthogonal configuration of two pairs of opposing cameras. Our system uses optical flow over sparse features to overcome the limitation of vision-based systems that require markers or a pre-loaded model of the physical environment. We show how opposing cameras enable cancellation of common components of optical flow leading to an efficient tracking algorithm that captures five degrees of freedom...
Show moreWe present a novel, real-time, markerless vision-based tracking system, employing a rigid orthogonal configuration of two pairs of opposing cameras. Our system uses optical flow over sparse features to overcome the limitation of vision-based systems that require markers or a pre-loaded model of the physical environment. We show how opposing cameras enable cancellation of common components of optical flow leading to an efficient tracking algorithm that captures five degrees of freedom including direction of translation and angular velocity. Experiments comparing our device with an electromagnetic tracker show that its average tracking accuracy is 80% over 185 frames, and it is able to track large range motions even in outdoor settings. We also present how opposing cameras in vision-based inside-looking-out systems can be used for gesture recognition. To demonstrate our approach, we discuss three different algorithms for recovering motion parameters at different levels of complete recovery. We show how optical flow in opposing cameras can be used to recover motion parameters of the multi-camera rig. Experimental results show gesture recognition accuracy of 88.0%, 90.7% and 86.7% for our three techniques, respectively, across a set of 15 gestures.
Show less - Date Issued
- 2010
- Identifier
- CFE0003163, ucf:48611
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0003163
- Title
- Human Action Localization and Recognition in Unconstrained Videos.
- Creator
-
Boyraz, Hakan, Tappen, Marshall, Foroosh, Hassan, Lin, Mingjie, Zhang, Shaojie, Sukthankar, Rahul, University of Central Florida
- Abstract / Description
-
As imaging systems become ubiquitous, the ability to recognize human actions is becoming increasingly important. Just as in the object detection and recognition literature, action recognition can be roughly divided into classification tasks, where the goal is to classify a video according to the action depicted in the video, and detection tasks, where the goal is to detect and localize a human performing a particular action. A growing literature is demonstrating the benefits of localizing...
Show moreAs imaging systems become ubiquitous, the ability to recognize human actions is becoming increasingly important. Just as in the object detection and recognition literature, action recognition can be roughly divided into classification tasks, where the goal is to classify a video according to the action depicted in the video, and detection tasks, where the goal is to detect and localize a human performing a particular action. A growing literature is demonstrating the benefits of localizing discriminative sub-regions of images and videos when performing recognition tasks. In this thesis, we address the action detection and recognition problems. Action detection in video is a particularly difficult problem because actions must not only be recognized correctly, but must also be localized in the 3D spatio-temporal volume. We introduce a technique that transforms the 3D localization problem into a series of 2D detection tasks. This is accomplished by dividing the video into overlapping segments, then representing each segment with a 2D video projection. The advantage of the 2D projection is that it makes it convenient to apply the best techniques from object detection to the action detection problem. We also introduce a novel, straightforward method for searching the 2D projections to localize actions, termed Two-Point Subwindow Search (TPSS). Finally, we show how to connect the local detections in time using a chaining algorithm to identify the entire extent of the action. Our experiments show that video projection outperforms the latest results on action detection in a direct comparison.Second, we present a probabilistic model learning to identify discriminative regions in videos from weakly-supervised data where each video clip is only assigned a label describing what action is present in the frame or clip. While our first system requires every action to be manually outlined in every frame of the video, this second system only requires that the video be given a single high-level tag. From this data, the system is able to identify discriminative regions that correspond well to the regions containing the actual actions. Our experiments on both the MSR Action Dataset II and UCF Sports Dataset show that the localizations produced by this weakly supervised system are comparable in quality to localizations produced by systems that require each frame to be manually annotated. This system is able to detect actions in both 1) non-temporally segmented action videos and 2) recognition tasks where a single label is assigned to the clip. We also demonstrate the action recognition performance of our method on two complex datasets, i.e. HMDB and UCF101. Third, we extend our weakly-supervised framework by replacing the recognition stage with a two-stage neural network and apply dropout for preventing overfitting of the parameters on the training data. Dropout technique has been recently introduced to prevent overfitting of the parameters in deep neural networks and it has been applied successfully to object recognition problem. To our knowledge, this is the first system using dropout for action recognition problem. We demonstrate that using dropout improves the action recognition accuracies on HMDB and UCF101 datasets.
Show less - Date Issued
- 2013
- Identifier
- CFE0004977, ucf:49562
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004977
- Title
- Online, Supervised and Unsupervised Action Localization in Videos.
- Creator
-
Soomro, Khurram, Shah, Mubarak, Heinrich, Mark, Hu, Haiyan, Bagci, Ulas, Yun, Hae-Bum, University of Central Florida
- Abstract / Description
-
Action recognition classifies a given video among a set of action labels, whereas action localization determines the location of an action in addition to its class. The overall aim of this dissertation is action localization. Many of the existing action localization approaches exhaustively search (spatially and temporally) for an action in a video. However, as the search space increases with high resolution and longer duration videos, it becomes impractical to use such sliding window...
Show moreAction recognition classifies a given video among a set of action labels, whereas action localization determines the location of an action in addition to its class. The overall aim of this dissertation is action localization. Many of the existing action localization approaches exhaustively search (spatially and temporally) for an action in a video. However, as the search space increases with high resolution and longer duration videos, it becomes impractical to use such sliding window techniques. The first part of this dissertation presents an efficient approach for localizing actions by learning contextual relations between different video regions in training. In testing, we use the context information to estimate the probability of each supervoxel belonging to the foreground action and use Conditional Random Field (CRF) to localize actions. In the above method and typical approaches to this problem, localization is performed in an offline manner where all the video frames are processed together. This prevents timely localization and prediction of actions/interactions - an important consideration for many tasks including surveillance and human-machine interaction. Therefore, in the second part of this dissertation we propose an online approach to the challenging problem of localization and prediction of actions/interactions in videos. In this approach, we use human poses and superpixels in each frame to train discriminative appearance models and perform online prediction of actions/interactions with Structural SVM. Above two approaches rely on human supervision in the form of assigning action class labels to videos and annotating actor bounding boxes in each frame of training videos. Therefore, in the third part of this dissertation we address the problem of unsupervised action localization. Given unlabeled videos without annotations, this approach aims at: 1) Discovering action classes using a discriminative clustering approach, and 2) Localizing actions using a variant of Knapsack problem.
Show less - Date Issued
- 2017
- Identifier
- CFE0006917, ucf:51685
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006917