Current Search: Objectivity (x)
View All Items
Pages
- Title
- DEBRIS TRACKING IN A SEMISTABLE BACKGROUND.
- Creator
-
Vanumamalai, KarthikKalathi, Kasparis, Takis, University of Central Florida
- Abstract / Description
-
Object Tracking plays a very pivotal role in many computer vision applications such as video surveillance, human gesture recognition and object based video compressions such as MPEG-4. Automatic detection of any moving object and tracking its motion is always an important topic of computer vision and robotic fields. This thesis deals with the problem of detecting the presence of debris or any other unexpected objects in footage obtained during spacecraft launches, and this poses a challenge...
Show moreObject Tracking plays a very pivotal role in many computer vision applications such as video surveillance, human gesture recognition and object based video compressions such as MPEG-4. Automatic detection of any moving object and tracking its motion is always an important topic of computer vision and robotic fields. This thesis deals with the problem of detecting the presence of debris or any other unexpected objects in footage obtained during spacecraft launches, and this poses a challenge because of the non-stationary background. When the background is stationary, moving objects can be detected by frame differencing. Therefore there is a need for background stabilization before tracking any moving object in the scene. Here two problems are considered and in both footage from Space shuttle launch is considered with the objective to track any debris falling from the Shuttle. The proposed method registers two consecutive frames using FFT based image registration where the amount of transformation parameters (translation, rotation) is calculated automatically. This information is the next passed to a Kalman filtering stage which produces a mask image that is used to find high intensity areas which are of potential interest.
Show less - Date Issued
- 2005
- Identifier
- CFE0000886, ucf:46628
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0000886
- Title
- OBJECT ASSOCIATION ACROSS MULTIPLE MOVING CAMERAS IN PLANAR SCENES.
- Creator
-
Sheikh, Yaser, Shah, Mubarak, University of Central Florida
- Abstract / Description
-
In this dissertation, we address the problem of object detection and object association across multiple cameras over large areas that are well modeled by planes. We present a unifying probabilistic framework that captures the underlying geometry of planar scenes, and present algorithms to estimate geometric relationships between different cameras, which are subsequently used for co-operative association of objects. We first present a local1 object detection scheme that has three fundamental...
Show moreIn this dissertation, we address the problem of object detection and object association across multiple cameras over large areas that are well modeled by planes. We present a unifying probabilistic framework that captures the underlying geometry of planar scenes, and present algorithms to estimate geometric relationships between different cameras, which are subsequently used for co-operative association of objects. We first present a local1 object detection scheme that has three fundamental innovations over existing approaches. First, the model of the intensities of image pixels as independent random variables is challenged and it is asserted that useful correlation exists in intensities of spatially proximal pixels. This correlation is exploited to sustain high levels of detection accuracy in the presence of dynamic scene behavior, nominal misalignments and motion due to parallax. By using a non-parametric density estimation method over a joint domain-range representation of image pixels, complex dependencies between the domain (location) and range (color) are directly modeled. We present a model of the background as a single probability density. Second, temporal persistence is introduced as a detection criterion. Unlike previous approaches to object detection that detect objects by building adaptive models of the background, the foreground is modeled to augment the detection of objects (without explicit tracking), since objects detected in the preceding frame contain substantial evidence for detection in the current frame. Finally, the background and foreground models are used competitively in a MAP-MRF decision framework, stressing spatial context as a condition of detecting interesting objects and the posterior function is maximized efficiently by finding the minimum cut of a capacitated graph. Experimental validation of the method is performed and presented on a diverse set of data. We then address the problem of associating objects across multiple cameras in planar scenes. Since cameras may be moving, there is a possibility of both spatial and temporal non-overlap in the fields of view of the camera. We first address the case where spatial and temporal overlap can be assumed. Since the cameras are moving and often widely separated, direct appearance-based or proximity-based constraints cannot be used. Instead, we exploit geometric constraints on the relationship between the motion of each object across cameras, to test multiple correspondence hypotheses, without assuming any prior calibration information. Here, there are three contributions. First, we present a statistically and geometrically meaningful means of evaluating a hypothesized correspondence between multiple objects in multiple cameras. Second, since multiple cameras exist, ensuring coherency in association, i.e. transitive closure is maintained between more than two cameras, is an essential requirement. To ensure such coherency we pose the problem of object associating across cameras as a k-dimensional matching and use an approximation to find the association. We show that, under appropriate conditions, re-entering objects can also be re-associated to their original labels. Third, we show that as a result of associating objects across the cameras, a concurrent visualization of multiple aerial video streams is possible. Results are shown on a number of real and controlled scenarios with multiple objects observed by multiple cameras, validating our qualitative models. Finally, we present a unifying framework for object association across multiple cameras and for estimating inter-camera homographies between (spatially and temporally) overlapping and non-overlapping cameras, whether they are moving or non-moving. By making use of explicit polynomial models for the kinematics of objects, we present algorithms to estimate inter-frame homographies. Under an appropriate measurement noise model, an EM algorithm is applied for the maximum likelihood estimation of the inter-camera homographies and kinematic parameters. Rather than fit curves locally (in each camera) and match them across views, we present an approach that simultaneously refines the estimates of inter-camera homographies and curve coefficients globally. We demonstrate the efficacy of the approach on a number of real sequences taken from aerial cameras, and report quantitative performance during simulations.
Show less - Date Issued
- 2006
- Identifier
- CFE0001045, ucf:46797
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0001045
- Title
- Human Detection, Tracking and Segmentation in Surveillance Video.
- Creator
-
Shu, Guang, Shah, Mubarak, Boloni, Ladislau, Wang, Jun, Lin, Mingjie, Sugaya, Kiminobu, University of Central Florida
- Abstract / Description
-
This dissertation addresses the problem of human detection and tracking in surveillance videos. Even though this is a well-explored topic, many challenges remain when confronted with data from real world situations. These challenges include appearance variation, illumination changes, camera motion, cluttered scenes and occlusion. In this dissertation several novel methods for improving on the current state of human detection and tracking based on learning scene-specific information in video...
Show moreThis dissertation addresses the problem of human detection and tracking in surveillance videos. Even though this is a well-explored topic, many challenges remain when confronted with data from real world situations. These challenges include appearance variation, illumination changes, camera motion, cluttered scenes and occlusion. In this dissertation several novel methods for improving on the current state of human detection and tracking based on learning scene-specific information in video feeds are proposed.Firstly, we propose a novel method for human detection which employs unsupervised learning and superpixel segmentation. The performance of generic human detectors is usually degraded in unconstrained video environments due to varying lighting conditions, backgrounds and camera viewpoints. To handle this problem, we employ an unsupervised learning framework that improves the detection performance of a generic detector when it is applied to a particular video. In our approach, a generic DPM human detector is employed to collect initial detection examples. These examples are segmented into superpixels and then represented using Bag-of-Words (BoW) framework. The superpixel-based BoW feature encodes useful color features of the scene, which provides additional information. Finally a new scene-specific classifier is trained using the BoW features extracted from the new examples. Compared to previous work, our method learns scene-specific information through superpixel-based features, hence it can avoid many false detections typically obtained by a generic detector. We are able to demonstrate a significant improvement in the performance of the state-of-the-art detector.Given robust human detection, we propose a robust multiple-human tracking framework using a part-based model. Human detection using part models has become quite popular, yet its extension in tracking has not been fully explored. Single camera-based multiple-person tracking is often hindered by difficulties such as occlusion and changes in appearance. We address such problems by developing an online-learning tracking-by-detection method. Our approach learns part-based person-specific Support Vector Machine (SVM) classifiers which capture articulations of moving human bodies with dynamically changing backgrounds. With the part-based model, our approach is able to handle partial occlusions in both the detection and the tracking stages. In the detection stage, we select the subset of parts which maximizes the probability of detection. This leads to a significant improvement in detection performance in cluttered scenes. In the tracking stage, we dynamically handle occlusions by distributing the score of the learned person classifier among its corresponding parts, which allows us to detect and predict partial occlusions and prevent the performance of the classifiers from being degraded. Extensive experiments using the proposed method on several challenging sequences demonstrate state-of-the-art performance in multiple-people tracking.Next, in order to obtain precise boundaries of humans, we propose a novel method for multiple human segmentation in videos by incorporating human detection and part-based detection potential into a multi-frame optimization framework. In the first stage, after obtaining the superpixel segmentation for each detection window, we separate superpixels corresponding to a human and background by minimizing an energy function using Conditional Random Field (CRF). We use the part detection potentials from the DPM detector, which provides useful information for human shape. In the second stage, the spatio-temporal constraints of the video is leveraged to build a tracklet-based Gaussian Mixture Model for each person, and the boundaries are smoothed by multi-frame graph optimization. Compared to previous work, our method could automatically segment multiple people in videos with accurate boundaries, and it is robust to camera motion. Experimental results show that our method achieves better segmentation performance than previous methods in terms of segmentation accuracy on several challenging video sequences.Most of the work in Computer Vision deals with point solution; a specific algorithm for a specific problem. However, putting different algorithms into one real world integrated system is a big challenge. Finally, we introduce an efficient tracking system, NONA, for high-definition surveillance video. We implement the system using a multi-threaded architecture (Intel Threading Building Blocks (TBB)), which executes video ingestion, tracking, and video output in parallel. To improve tracking accuracy without sacrificing efficiency, we employ several useful techniques. Adaptive Template Scaling is used to handle the scale change due to objects moving towards a camera. Incremental Searching and Local Frame Differencing are used to resolve challenging issues such as scale change, occlusion and cluttered backgrounds. We tested our tracking system on a high-definition video dataset and achieved acceptable tracking accuracy while maintaining real-time performance.
Show less - Date Issued
- 2014
- Identifier
- CFE0005551, ucf:50278
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0005551
- Title
- DETECTING CURVED OBJECTS AGAINST CLUTTERED BACKGROUNDS.
- Creator
-
Prokaj, Jan, Lobo, Niels, University of Central Florida
- Abstract / Description
-
Detecting curved objects against cluttered backgrounds is a hard problem in computer vision. We present new low-level and mid-level features to function in these environments. The low-level features are fast to compute, because they employ an integral image approach, which makes them especially useful in real-time applications. The mid-level features are built from low-level features, and are optimized for curved object detection. The usefulness of these features is tested by designing an...
Show moreDetecting curved objects against cluttered backgrounds is a hard problem in computer vision. We present new low-level and mid-level features to function in these environments. The low-level features are fast to compute, because they employ an integral image approach, which makes them especially useful in real-time applications. The mid-level features are built from low-level features, and are optimized for curved object detection. The usefulness of these features is tested by designing an object detection algorithm using these features. Object detection is accomplished by transforming the mid-level features into weak classifiers, which then produce a strong classifier using AdaBoost. The resulting strong classifier is then tested on the problem of detecting heads with shoulders. On a database of over 500 images of people, cropped to contain head and shoulders, and with a diverse set of backgrounds, the detection rate is 90% while the false positive rate on a database of 500 negative images is less than 2%.
Show less - Date Issued
- 2008
- Identifier
- CFE0002102, ucf:47535
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0002102
- Title
- THE EFFECTS OF PRIOR KNOWLEDGE ACTIVATION ON LEARNER RETENTION OF NEW CONCEPTS IN LEARNING OBJECTS.
- Creator
-
Henderson, Kelsey, Hirumi, Atsusi, University of Central Florida
- Abstract / Description
-
Establishing relationships between a learner's prior knowledge and any new concepts he or she will be expected to learn is an important instructional activity. Learning objects are often devoid of such activities in an attempt to maintain their conciseness and reusability in a variety of instructional contexts. The purpose of this study was to examine the efficacy of using questioning as a prior knowledge activation strategy in learning objects. Previous research on the use prior...
Show moreEstablishing relationships between a learner's prior knowledge and any new concepts he or she will be expected to learn is an important instructional activity. Learning objects are often devoid of such activities in an attempt to maintain their conciseness and reusability in a variety of instructional contexts. The purpose of this study was to examine the efficacy of using questioning as a prior knowledge activation strategy in learning objects. Previous research on the use prior knowledge activation strategies supports their effectiveness in helping to improve learner retention. Approaches such as questioning, advance organizers, and group discussions are examples of techniques used in previous studies. Participants enrolled in a Navy engineering curriculum were randomly assigned to two groups (experimental and comparison). The experimental group was exposed to a prior knowledge activation component at the start of session I, while the comparison group received no treatment. Participants in both groups were tested at three different times during the course of the study the pretest, at the start of session 1, posttest I, at the conclusion of session1, and posttest II, during session 2. The findings indicate that the prior knowledge activation strategy did not result in statistically significant differences between the levels of retention gained by the experimental and comparison groups. Due to administrative constraints experienced during the course of the study, statistical power was not achieved due to an insufficiently sized sample. Potential limitations and implications for future research directions are described.
Show less - Date Issued
- 2007
- Identifier
- CFE0001739, ucf:47307
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0001739
- Title
- DESIGN OF A DYNAMIC FOCUSING MICROSCOPE OBJECTIVE FOR OCT IMAGING.
- Creator
-
Murali, Supraja, Rolland, Jannick, University of Central Florida
- Abstract / Description
-
Optical Coherence Tomography (OCT) is a novel optical imaging technique that has assumed significant importance in bio-medical imaging in the last two decades because it is non-invasive and provides accurate, high resolution images of three dimensional cross-sections of body tissue, exceeding the capabilities of the current predominant imaging technique ultrasound. In this thesis, high resolution OCT is investigated for in vivo detection of abnormal skin pathology for the early...
Show moreOptical Coherence Tomography (OCT) is a novel optical imaging technique that has assumed significant importance in bio-medical imaging in the last two decades because it is non-invasive and provides accurate, high resolution images of three dimensional cross-sections of body tissue, exceeding the capabilities of the current predominant imaging technique ultrasound. In this thesis, high resolution OCT is investigated for in vivo detection of abnormal skin pathology for the early diagnosis of cancer. The technology presented is based on a dynamic focusing microscopic imaging probe conceived for skin imaging and the detection of abnormalities in the epithelium. A novel method for dynamic focusing in the biological sample using liquid crystal (LC) lens technology to obtain three dimensional images with invariant resolution throughout the cross-section and depth of the sample is presented and discussed. Two different skin probe configurations that incorporate dynamic focusing with LC lenses, one involving a reflective microscope objective sub-system, and the other involving an all-refractive immersion microscope objective sub-system are investigated. In order to ensure high resolution imaging, a low coherence broadband source, namely a femtosecond mode-locked Ti: sapphire laser centered at a wavelength of approximately 800nm is used to illuminate the sample. An in-depth description and analysis of the optical design and predicted performance of the two microscope objectives designed for dynamic three dimensional imaging at 5ìm resolution for the chosen broadband spectrum is presented.
Show less - Date Issued
- 2005
- Identifier
- CFE0000869, ucf:46665
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0000869
- Title
- Spatiotemporal Graphs for Object Segmentation and Human Pose Estimation in Videos.
- Creator
-
Zhang, Dong, Shah, Mubarak, Qi, GuoJun, Bagci, Ulas, Yun, Hae-Bum, University of Central Florida
- Abstract / Description
-
Images and videos can be naturally represented by graphs, with spatial graphs for images and spatiotemporal graphs for videos. However, for different applications, there are usually different formulations of the graphs, and algorithms for each formulation have different complexities. Therefore, wisely formulating the problem to ensure an accurate and efficient solution is one of the core issues in Computer Vision research. We explore three problems in this domain to demonstrate how to...
Show moreImages and videos can be naturally represented by graphs, with spatial graphs for images and spatiotemporal graphs for videos. However, for different applications, there are usually different formulations of the graphs, and algorithms for each formulation have different complexities. Therefore, wisely formulating the problem to ensure an accurate and efficient solution is one of the core issues in Computer Vision research. We explore three problems in this domain to demonstrate how to formulate all of these problems in terms of spatiotemporal graphs and obtain good and efficient solutions.The first problem we explore is video object segmentation. The goal is to segment the primary moving objects in the videos. This problem is important for many applications, such as content based video retrieval, video summarization, activity understanding and targeted content replacement. In our framework, we use object proposals, which are object-like regions obtained by low-level visual cues. Each object proposal has an object-ness score associated with it, which indicates how likely this object proposal corresponds to an object. The problem is formulated as a directed acyclic graph, for which nodes represent the object proposals and edges represent the spatiotemporal relationship between nodes. A dynamic programming solution is employed to select one object proposal from each video frame, while ensuring their consistency throughout the video frames. Gaussian mixture models (GMMs) are used for modeling the background and foreground, and Markov Random Fields (MRFs) are employed to smooth the pixel-level segmentation.In the above spatiotemporal graph formulation, we consider the object segmentation in only single video. Next, we consider multiple videos and model the video co-segmentation problem as a spatiotemporal graph. The goal here is to simultaneously segment the moving objects from multiple videos and assign common objects the same labels. The problem is formulated as a regulated maximum clique problem using object proposals. The object proposals are tracked in adjacent frames to generate a pool of candidate tracklets. Then an undirected graph is built with the nodes corresponding to the tracklets from all the videos and edges representing the similarities between the tracklets. A modified Bron-Kerbosch Algorithm is applied to the graph in order to select the prominent objects contained in these videos, hence relate the segmentation of each object in different videos.In online and surveillance videos, the most important object class is the human. In contrast to generic video object segmentation and co-segmentation, specific knowledge about humans, which is defined by a pose (i.e. human skeleton), can be employed to help the segmentation and tracking of people in the videos. We formulate the problem of human pose estimation in videos using the spatiotemporal graph. In this formulation, the nodes represent different body parts in the video frames and edges represent the spatiotemporal relationship between body parts in adjacent frames. The graph is carefully designed to ensure an exact and efficient solution. The overall objective for the new formulation is to remove the simple cycles from the traditional graph-based formulations. Dynamic programming is employed in different stages in the method to select the best tracklets and human pose configurations
Show less - Date Issued
- 2016
- Identifier
- CFE0006429, ucf:51488
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006429
- Title
- Scene Understanding for Real Time Processing of Queries over Big Data Streaming Video.
- Creator
-
Aved, Alexander, Hua, Kien, Foroosh, Hassan, Zou, Changchun, Ni, Liqiang, University of Central Florida
- Abstract / Description
-
With heightened security concerns across the globe and the increasing need to monitor, preserve and protect infrastructure and public spaces to ensure proper operation, quality assurance and safety, numerous video cameras have been deployed. Accordingly, they also need to be monitored effectively and efficiently. However, relying on human operators to constantly monitor all the video streams is not scalable or cost effective. Humans can become subjective, fatigued, even exhibit bias and it is...
Show moreWith heightened security concerns across the globe and the increasing need to monitor, preserve and protect infrastructure and public spaces to ensure proper operation, quality assurance and safety, numerous video cameras have been deployed. Accordingly, they also need to be monitored effectively and efficiently. However, relying on human operators to constantly monitor all the video streams is not scalable or cost effective. Humans can become subjective, fatigued, even exhibit bias and it is difficult to maintain high levels of vigilance when capturing, searching and recognizing events that occur infrequently or in isolation.These limitations are addressed in the Live Video Database Management System (LVDBMS), a framework for managing and processing live motion imagery data. It enables rapid development of video surveillance software much like traditional database applications are developed today. Such developed video stream processing applications and ad hoc queries are able to "reuse" advanced image processing techniques that have been developed. This results in lower software development and maintenance costs. Furthermore, the LVDBMS can be intensively tested to ensure consistent quality across all associated video database applications. Its intrinsic privacy framework facilitates a formalized approach to the specification and enforcement of verifiable privacy policies. This is an important step towards enabling a general privacy certification for video surveillance systems by leveraging a standardized privacy specification language.With the potential to impact many important fields ranging from security and assembly line monitoring to wildlife studies and the environment, the broader impact of this work is clear. The privacy framework protects the general public from abusive use of surveillance technology; success in addressing the (")trust(") issue will enable many new surveillance-related applications. Although this research focuses on video surveillance, the proposed framework has the potential to support many video-based analytical applications.
Show less - Date Issued
- 2013
- Identifier
- CFE0004648, ucf:49900
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004648
- Title
- A SPARSE PROGRAM DEPENDENCE GRAPH FOR OBJECT ORIENTED PROGRAMMING LANGUAGES.
- Creator
-
Garfield, Keith, Hughes, Charles, University of Central Florida
- Abstract / Description
-
The Program Dependence Graph (PDG) has achieved widespread acceptance as a useful tool for software engineering, program analysis, and automated compiler optimizations. This thesis presents the Sparse Object Oriented Program Dependence Graph (SOOPDG), a formalism that contains elements of traditional PDG's adapted to compactly represent programs written in object-oriented languages such as Java. This formalism is called sparse because, in contrast to other OO and Java-specific adaptations...
Show moreThe Program Dependence Graph (PDG) has achieved widespread acceptance as a useful tool for software engineering, program analysis, and automated compiler optimizations. This thesis presents the Sparse Object Oriented Program Dependence Graph (SOOPDG), a formalism that contains elements of traditional PDG's adapted to compactly represent programs written in object-oriented languages such as Java. This formalism is called sparse because, in contrast to other OO and Java-specific adaptations of PDG's, it introduces few node types and no new edge types beyond those used in traditional dependence-based representations. This results in correct program representations using smaller graph structures and simpler semantics when compared to other OO formalisms. We introduce the Single Flow to Use (SFU) property which requires that exactly one definition of each variable be available for each use. We demonstrate that the SOOPDG, with its support for the SFU property coupled with a higher order rewriting semantics, is sufficient to represent static Java-like programs and dynamic program behavior. We present algorithms for creating SOOPDG representations from program text, and describe graph rewriting semantics. We also present algorithms for common static analysis techniques such as program slicing, inheritance analysis, and call chain analysis. We contrast the SOOPDG with two previously published OO graph structures, the Java System Dependence Graph and the Java Software Dependence Graph. The SOOPDG results in comparatively smaller static representations of programs, cleaner graph semantics, and potentially more accurate program analysis. Finally, we introduce the Simulation Dependence Graph (SDG). The SDG is a related representation that is developed specifically to represent simulation systems, but is extensible to more general component-based software design paradigms. The SDG allows formal reasoning about issues such as component composition, a property critical to the creation and analysis of complex simulation systems and component-based design systems.
Show less - Date Issued
- 2006
- Identifier
- CFE0001499, ucf:47077
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0001499
- Title
- DEVELOPING AN OBJECT-ORIENTED APPROACH FOR OPERATIONS SIMULATION IN SPEEDES.
- Creator
-
Wasadikar, Amit, Rabelo, Luis, University of Central Florida
- Abstract / Description
-
Using simulation techniques, performance of any proposed system can be tested for different scenarios with a generated model. However, it is difficult to rapidly create simulation models that will accurately represent the complexity of the system. In recent years, Object-Oriented Discrete-Event Simulation has emerged as the potential technology to implement rapid simulation schemes. A number of software based on programming languages like C++ and Java are available for carrying out Object...
Show moreUsing simulation techniques, performance of any proposed system can be tested for different scenarios with a generated model. However, it is difficult to rapidly create simulation models that will accurately represent the complexity of the system. In recent years, Object-Oriented Discrete-Event Simulation has emerged as the potential technology to implement rapid simulation schemes. A number of software based on programming languages like C++ and Java are available for carrying out Object Oriented Discrete-Event Simulation. These software packages establish a general framework for simulation in computer programs, but need to be further customized for desired end-use applications. In this thesis, a generic simulation library is created for the distributed Synchronous Parallel Environment for Emulation and Discrete-Event Simulation (SPEEDES). This library offers classes to model the functionality of servers, processes, resources, transporters, and decisions. The library is expected to produce efficient simulation models in less time and with a lesser amount of coding. The class hierarchy is modeled using the Unified Modeling Language (UML). To test the library, the existing SPEEDES Space Shuttle Model is enhanced and recreated. This enhanced model is successfully validated against the original Arena model.
Show less - Date Issued
- 2005
- Identifier
- CFE0000332, ucf:46278
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0000332
- Title
- MULTI-VIEW APPROACHES TO TRACKING, 3D RECONSTRUCTION AND OBJECT CLASS DETECTION.
- Creator
-
khan, saad, Shah, Mubarak, University of Central Florida
- Abstract / Description
-
Multi-camera systems are becoming ubiquitous and have found application in a variety of domains including surveillance, immersive visualization, sports entertainment and movie special effects amongst others. From a computer vision perspective, the challenging task is how to most efficiently fuse information from multiple views in the absence of detailed calibration information and a minimum of human intervention. This thesis presents a new approach to fuse foreground likelihood information...
Show moreMulti-camera systems are becoming ubiquitous and have found application in a variety of domains including surveillance, immersive visualization, sports entertainment and movie special effects amongst others. From a computer vision perspective, the challenging task is how to most efficiently fuse information from multiple views in the absence of detailed calibration information and a minimum of human intervention. This thesis presents a new approach to fuse foreground likelihood information from multiple views onto a reference view without explicit processing in 3D space, thereby circumventing the need for complete calibration. Our approach uses a homographic occupancy constraint (HOC), which states that if a foreground pixel has a piercing point that is occupied by foreground object, then the pixel warps to foreground regions in every view under homographies induced by the reference plane, in effect using cameras as occupancy detectors. Using the HOC we are able to resolve occlusions and robustly determine ground plane localizations of the people in the scene. To find tracks we obtain ground localizations over a window of frames and stack them creating a space time volume. Regions belonging to the same person form contiguous spatio-temporal tracks that are clustered using a graph cuts segmentation approach. Second, we demonstrate that the HOC is equivalent to performing visual hull intersection in the image-plane, resulting in a cross-sectional slice of the object. The process is extended to multiple planes parallel to the reference plane in the framework of plane to plane homologies. Slices from multiple planes are accumulated and the 3D structure of the object is segmented out. Unlike other visual hull based approaches that use 3D constructs like visual cones, voxels or polygonal meshes requiring calibrated views, ours is purely-image based and uses only 2D constructs i.e. planar homographies between views. This feature also renders it conducive to graphics hardware acceleration. The current GPU implementation of our approach is capable of fusing 60 views (480x720 pixels) at the rate of 50 slices/second. We then present an extension of this approach to reconstructing non-rigid articulated objects from monocular video sequences. The basic premise is that due to motion of the object, scene occupancies are blurred out with non-occupancies in a manner analogous to motion blurred imagery. Using our HOC and a novel construct: the temporal occupancy point (TOP), we are able to fuse multiple views of non-rigid objects obtained from a monocular video sequence. The result is a set of blurred scene occupancy images in the corresponding views, where the values at each pixel correspond to the fraction of total time duration that the pixel observed an occupied scene location. We then use a motion de-blurring approach to de-blur the occupancy images and obtain the 3D structure of the non-rigid object. In the final part of this thesis, we present an object class detection method employing 3D models of rigid objects constructed using the above 3D reconstruction approach. Instead of using a complicated mechanism for relating multiple 2D training views, our approach establishes spatial connections between these views by mapping them directly to the surface of a 3D model. To generalize the model for object class detection, features from supplemental views (obtained from Google Image search) are also considered. Given a 2D test image, correspondences between the 3D feature model and the testing view are identified by matching the detected features. Based on the 3D locations of the corresponding features, several hypotheses of viewing planes can be made. The one with the highest confidence is then used to detect the object using feature location matching. Performance of the proposed method has been evaluated by using the PASCAL VOC challenge dataset and promising results are demonstrated.
Show less - Date Issued
- 2008
- Identifier
- CFE0002073, ucf:47593
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0002073
- Title
- A MULTI-OBJECTIVE NO-REGRET DECISION MAKING MODEL WITH BAYESIAN LEARNING FOR AUTONOMOUS UNMANNED SYSTEMS.
- Creator
-
Howard, Matthew, Qu, Zhihua, University of Central Florida
- Abstract / Description
-
The development of a multi-objective decision making and learning model for the use in unmanned systems is the focus of this project. Starting with traditional game theory and psychological learning theories developed in the past, a new model for machine learning is developed. This model incorporates a no-regret decision making model with a Bayesian learning process which has the ability to adapt to errors found in preconceived costs associated with each objective. This learning ability is...
Show moreThe development of a multi-objective decision making and learning model for the use in unmanned systems is the focus of this project. Starting with traditional game theory and psychological learning theories developed in the past, a new model for machine learning is developed. This model incorporates a no-regret decision making model with a Bayesian learning process which has the ability to adapt to errors found in preconceived costs associated with each objective. This learning ability is what sets this model apart from many others. By creating a model based on previously developed human learning models, hundreds of years of experience in these fields can be applied to the recently developing field of machine learning. This also allows for operators to more comfortably adapt to the machine's learning process in order to better understand how to take advantage of its features. One of the main purposes of this system is to incorporate multiple objectives into a decision making process. This feature can better allow its users to clearly define objectives and prioritize these objectives allowing the system to calculate the best approach for completing the mission. For instance, if an operator is given objectives such as obstacle avoidance, safety, and limiting resource usage, the operator would traditionally be required to decide how to meet all of these objectives. The use of a multi-objective decision making process such as the one designed in this project, allows the operator to input the objectives and their priorities and receive an output of the calculated optimal compromise.
Show less - Date Issued
- 2008
- Identifier
- CFE0002453, ucf:47711
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0002453
- Title
- Human Action Detection, Tracking and Segmentation in Videos.
- Creator
-
Tian, Yicong, Shah, Mubarak, Bagci, Ulas, Liu, Fei, Walker, John, University of Central Florida
- Abstract / Description
-
This dissertation addresses the problem of human action detection, human tracking and segmentation in videos. They are fundamental tasks in computer vision and are extremely challenging to solve in realistic videos. We first propose a novel approach for action detection by exploring the generalization of deformable part models from 2D images to 3D spatiotemporal volumes. By focusing on the most distinctive parts of each action, our models adapt to intra-class variation and show robustness to...
Show moreThis dissertation addresses the problem of human action detection, human tracking and segmentation in videos. They are fundamental tasks in computer vision and are extremely challenging to solve in realistic videos. We first propose a novel approach for action detection by exploring the generalization of deformable part models from 2D images to 3D spatiotemporal volumes. By focusing on the most distinctive parts of each action, our models adapt to intra-class variation and show robustness to clutter. This approach deals with detecting action performed by a single person. When there are multiple humans in the scene, humans need to be segmented and tracked from frame to frame before action recognition can be performed. Next, we propose a novel approach for multiple object tracking (MOT) by formulating detection and data association in one framework. Our method allows us to overcome the confinements of data association based MOT approaches, where the performance is dependent on the object detection results provided at input level. We show that automatically detecting and tracking targets in a single framework can help resolve the ambiguities due to frequent occlusion and heavy articulation of targets. In this tracker, targets are represented by bounding boxes, which is a coarse representation. However, pixel-wise object segmentation provides fine level information, which is desirable for later tasks. Finally, we propose a tracker that simultaneously solves three main problems: detection, data association and segmentation. This is especially important because the output of each of those three problems are highly correlated and the solution of one can greatly help improve the others. The proposed approach achieves more accurate segmentation results and also helps better resolve typical difficulties in multiple target tracking, such as occlusion, ID-switch and track drifting.
Show less - Date Issued
- 2018
- Identifier
- CFE0007378, ucf:52069
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0007378
- Title
- CHANGES IN RUNNING AND MULTIPLE OBJECT TRACKING PERFORMANCE DURING A 90-MINUTE INTERMITTENT SOCCER PERFORMANCE TEST (iSPT). A PILOT STUDY.
- Creator
-
Girts, Ryan, Wells, Adam, Stout, Jeffrey, Fukuda, David, Hoffman, Jay, University of Central Florida
- Abstract / Description
-
Multiple object tracking (MOT) is a cognitive process that involves the active processing of dynamic visual information. In athletes, MOT speed is critical for maintaining spatial awareness of teammates, opponents, and the ball while moving at high velocities during a match. Understanding how MOT speed changes throughout the course of a competitive game may enhance strategies for maintaining optimal player performance. The objective of this study was to examine changes in MOT speed and...
Show moreMultiple object tracking (MOT) is a cognitive process that involves the active processing of dynamic visual information. In athletes, MOT speed is critical for maintaining spatial awareness of teammates, opponents, and the ball while moving at high velocities during a match. Understanding how MOT speed changes throughout the course of a competitive game may enhance strategies for maintaining optimal player performance. The objective of this study was to examine changes in MOT speed and running performance during a 90-minute intermittent soccer performance test (iSPT). A secondary purpose was to examine the relationship between aerobic capacity and changes in MOT speed.Seven competitive female soccer players age: 20.4 (&)#177; 1.8 y, height: 166.7 (&)#177; 3.2 cm, weight: 62.4 (&)#177; 4.0 kg, VO2max: 45.8 (&)#177; 4.6 ml/kg/min-1) completed an intermittent soccer performance test (iSPT) on a Curve(TM) non-motorized treadmill (cNMT). The iSPT was divided into two 45-minute halves with a 15-minute halftime [HT] interval, and consisted of six individualized velocity zones. Velocity zones were consistent with previous time motion analyses of competitive soccer matches and based upon individual peak sprint speeds (PSS) as follows: standing (0% PSS, 17.8% of iSPT), walking (20% PSS, 36.4% of iSPT), jogging (35% PSS, 24.0% of iSPT), running (50% PSS, 11.6% of iSPT), fast running (60% PSS, 3.6% of iSPT), and sprinting (80% PSS, 6.7% of iSPT). Stand, walk, jog and run zones were combined to create a low-speed zone (LS). Fast run and sprint zones were combined to create a high-speed zone (HS). MOT speed was assessed at baseline (0 min.) and three times during each half of the iSPT. Dependent t-tests and Pearson correlation coefficients were utilized to analyze the data. Across 15-minute time blocks, significant decreases in distance covered and average speed were noted for jogging, sprinting, low-speed running, high-speed running, and total distance (p's (<) 0.05). Players covered significantly less total distance during the second half compared to the first (p = 0.025). Additionally, significant decreases in distance covered and average speed were observed during the second half for the sprint and HS zones (p's ? 0.008). No significant main effect was noted for MOT speed across 15-minute time blocks. A trend towards a decrease in MOT speed was observed between halves (p = 0.056). A significant correlation was observed between the change in MOT speed and VO2max (r = 0.888, p = 0.007). The fatigue associated with 90 minutes of soccer specific running negatively influenced running performance during the second half. However, increased aerobic capacity appears to be associated with an attenuation of cognitive decline during 90-minutes of soccer specific running. Results of this study indicate the importance of aerobic capacity on maintaining spatial awareness during a match.
Show less - Date Issued
- 2018
- Identifier
- CFE0007183, ucf:52290
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0007183
- Title
- Guided Autonomy for Quadcopter Photography.
- Creator
-
Alabachi, Saif, Sukthankar, Gita, Behal, Aman, Lin, Mingjie, Boloni, Ladislau, Laviola II, Joseph, University of Central Florida
- Abstract / Description
-
Photographing small objects with a quadcopter is non-trivial to perform with many common user interfaces, especially when it requires maneuvering an Unmanned Aerial Vehicle (C) to difficult angles in order to shoot high perspectives. The aim of this research is to employ machine learning to support better user interfaces for quadcopter photography. Human Robot Interaction (HRI) is supported by visual servoing, a specialized vision system for real-time object detection, and control policies...
Show morePhotographing small objects with a quadcopter is non-trivial to perform with many common user interfaces, especially when it requires maneuvering an Unmanned Aerial Vehicle (C) to difficult angles in order to shoot high perspectives. The aim of this research is to employ machine learning to support better user interfaces for quadcopter photography. Human Robot Interaction (HRI) is supported by visual servoing, a specialized vision system for real-time object detection, and control policies acquired through reinforcement learning (RL). Two investigations of guided autonomy were conducted. In the first, the user directed the quadcopter with a sketch based interface, and periods of user direction were interspersed with periods of autonomous flight. In the second, the user directs the quadcopter by taking a single photo with a handheld mobile device, and the quadcopter autonomously flies to the requested vantage point.This dissertation focuses on the following problems: 1) evaluating different user interface paradigms for dynamic photography in a GPS-denied environment; 2) learning better Convolutional Neural Network (CNN) object detection models to assure a higher precision in detecting human subjects than the currently available state-of-the-art fast models; 3) transferring learning from the Gazebo simulation into the real world; 4) learning robust control policies using deep reinforcement learning to maneuver the quadcopter to multiple shooting positions with minimal human interaction.
Show less - Date Issued
- 2019
- Identifier
- CFE0007774, ucf:52369
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0007774
- Title
- THE MEMORY OF FORGOTTEN THINGS.
- Creator
-
Metz, Brittany, Poindexter, Carla, University of Central Florida
- Abstract / Description
-
This thesis investigates my lack of childhood memories and documents how my artwork stands in as a substitute for that lost memory. The first part of the thesis analyzes my early life and influences; the second part analyzes my art making and process. The narrative style of writing is intentionally autobiographical to mimic the narrative style and structure of the thesis installation. My upbringing, interests, creative process, access to materials, and inspiration are fully explored. The...
Show moreThis thesis investigates my lack of childhood memories and documents how my artwork stands in as a substitute for that lost memory. The first part of the thesis analyzes my early life and influences; the second part analyzes my art making and process. The narrative style of writing is intentionally autobiographical to mimic the narrative style and structure of the thesis installation. My upbringing, interests, creative process, access to materials, and inspiration are fully explored. The impact my early life has on my current work is evident. Real memory is combined with created memory in the thesis multi-media installation. I wish to transport the viewer into the dreamlike space I have constructed with found objects and multi-media materials by offering an immersive experience into my world.
Show less - Date Issued
- 2011
- Identifier
- CFE0003649, ucf:48814
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0003649
- Title
- Learning Hierarchical Representations for Video Analysis Using Deep Learning.
- Creator
-
Yang, Yang, Shah, Mubarak, Sukthankar, Gita, Da Vitoria Lobo, Niels, Stanley, Kenneth, Sukthankar, Rahul, University of Central Florida
- Abstract / Description
-
With the exponential growth of the digital data, video content analysis (e.g., action, event recognition) has been drawing increasing attention from computer vision researchers. Effective modeling of the objects, scenes, and motions is critical for visual understanding. Recently there has been a growing interest in the bio-inspired deep learning models, which has shown impressive results in speech and object recognition. The deep learning models are formed by the composition of multiple non...
Show moreWith the exponential growth of the digital data, video content analysis (e.g., action, event recognition) has been drawing increasing attention from computer vision researchers. Effective modeling of the objects, scenes, and motions is critical for visual understanding. Recently there has been a growing interest in the bio-inspired deep learning models, which has shown impressive results in speech and object recognition. The deep learning models are formed by the composition of multiple non-linear transformations of the data, with the goal of yielding more abstract and ultimately more useful representations. The advantages of the deep models are three fold: 1) They learn the features directly from the raw signal in contrast to the hand-designed features. 2) The learning can be unsupervised, which is suitable for large data where labeling all the data is expensive and unpractical. 3) They learn a hierarchy of features one level at a time and the layerwise stacking of feature extraction, this often yields better representations.However, not many deep learning models have been proposed to solve the problems in video analysis, especially videos ``in a wild''. Most of them are either dealing with simple datasets, or limited to the low-level local spatial-temporal feature descriptors for action recognition. Moreover, as the learning algorithms are unsupervised, the learned features preserve generative properties rather than the discriminative ones which are more favorable in the classification tasks. In this context, the thesis makes two major contributions.First, we propose several formulations and extensions of deep learning methods which learn hierarchical representations for three challenging video analysis tasks, including complex event recognition, object detection in videos and measuring action similarity. The proposed methods are extensively demonstrated for each work on the state-of-the-art challenging datasets. Besides learning the low-level local features, higher level representations are further designed to be learned in the context of applications. The data-driven concept representations and sparse representation of the events are learned for complex event recognition; the representations for object body parts and structures are learned for object detection in videos; and the relational motion features and similarity metrics between video pairs are learned simultaneously for action verification.Second, in order to learn discriminative and compact features, we propose a new feature learning method using a deep neural network based on auto encoders. It differs from the existing unsupervised feature learning methods in two ways: first it optimizes both discriminative and generative properties of the features simultaneously, which gives our features a better discriminative ability. Second, our learned features are more compact, while the unsupervised feature learning methods usually learn a redundant set of over-complete features. Extensive experiments with quantitative and qualitative results on the tasks of human detection and action verification demonstrate the superiority of our proposed models.
Show less - Date Issued
- 2013
- Identifier
- CFE0004964, ucf:49593
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004964
- Title
- Projected Surfaces.
- Creator
-
Flynn, Jason, Price, Mark, Kovach, Keith, Raimundi-Ortiz, Wanda, Isenhour, David, University of Central Florida
- Abstract / Description
-
In this paper I will address the philosophies of Susan Sontag, Roland Barthes and Thomas Ruff by considering the object, materials and processes of photography as my primary motivator to create art. I will examine the contrast between photographic imagery, as an illusion of the past, and sculpture, as a physical manifestation of the present, when creating works that ask, (")What else can photography be?(")
- Date Issued
- 2014
- Identifier
- CFE0005166, ucf:50671
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0005166
- Title
- Assessing the Effectiveness of Workload Measures in the Nuclear Domain.
- Creator
-
Mercado, Joseph, Reinerman, Lauren, Hancock, Peter, Lackey, Stephanie, Szalma, James, University of Central Florida
- Abstract / Description
-
An operator's performance and mental workload when interacting with a complex system, such as the main control room (MCR) of a nuclear power plant (NPP), are major concerns when seeking to accomplish safe and successful operations. The impact of performance on operator workload is one of the most widely researched areas in human factors science with over five hundred workload articles published since the 1960s (Brannick, Salas, (&) Prince, 1997; Meshkati (&) Hancock, 2011). Researchers have...
Show moreAn operator's performance and mental workload when interacting with a complex system, such as the main control room (MCR) of a nuclear power plant (NPP), are major concerns when seeking to accomplish safe and successful operations. The impact of performance on operator workload is one of the most widely researched areas in human factors science with over five hundred workload articles published since the 1960s (Brannick, Salas, (&) Prince, 1997; Meshkati (&) Hancock, 2011). Researchers have used specific workload measures across domains to assess the effects of taskload. However, research has not sufficiently assessed the psychometric properties, such as reliability, validity, and sensitivity, which delineates and limits the roles of these measures in workload assessment (Nygren, 1991). As a result, there is no sufficiently effective measure for indicating changes in workload for distinct tasks across multiple domains (Abich, 2013). Abich (2013) was the most recent to systematically test the subjective and objective workload measures for determining the universality and sensitivity of each alone or in combination. This systematic approach assessed taskload changes within three tasks in the context of a military intelligence, surveillance, and reconnaissance (ISR) missions. The purpose for the present experiment was to determine if certain workload measures are sufficiently effective across domains by taking the findings from one domain (military) and testing whether those results hold true in a different domain, that of nuclear. Results showed that only two measures (NASA-TLX frustration and fNIR) were sufficiently effective at indicating workload changes between the three task types in the nuclear domain, but many measures were statistically significant. The results of this research effort combined with the results from Abich (2013) highlight an alarming problem. The ability of subjective and physiological measures to indicate changes in workload varies across tasks (Abich, 2013) and across domain. A single measure is not able to measure the complex construct of workload across different tasks within the same domain or across domains. This research effort highlights the importance of proper methodology. As researchers, we have to identify the appropriate workload measure for all tasks regardless of the domain by investigating the effectiveness of each measure. The findings of the present study suggest that responsible science include evaluating workload measures before use, not relying on prior research or theory. In other words, results indicate that it is only acceptable to use a measure based on prior findings if research has tested that measure on the exact task and manipulations within that specific domain.
Show less - Date Issued
- 2014
- Identifier
- CFE0005666, ucf:50188
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0005666
- Title
- SCENE MONITORING WITH A FOREST OF COOPERATIVE SENSORS.
- Creator
-
Javed, Omar, Shah, Mubarak, University of Central Florida
- Abstract / Description
-
In this dissertation, we present vision based scene interpretation methods for monitoring of people and vehicles, in real-time, within a busy environment using a forest of co-operative electro-optical (EO) sensors. We have developed novel video understanding algorithms with learning capability, to detect and categorize people and vehicles, track them with in a camera and hand-off this information across multiple networked cameras for multi-camera tracking. The ability to learn prevents the...
Show moreIn this dissertation, we present vision based scene interpretation methods for monitoring of people and vehicles, in real-time, within a busy environment using a forest of co-operative electro-optical (EO) sensors. We have developed novel video understanding algorithms with learning capability, to detect and categorize people and vehicles, track them with in a camera and hand-off this information across multiple networked cameras for multi-camera tracking. The ability to learn prevents the need for extensive manual intervention, site models and camera calibration, and provides adaptability to changing environmental conditions. For object detection and categorization in the video stream, a two step detection procedure is used. First, regions of interest are determined using a novel hierarchical background subtraction algorithm that uses color and gradient information for interest region detection. Second, objects are located and classified from within these regions using a weakly supervised learning mechanism based on co-training that employs motion and appearance features. The main contribution of this approach is that it is an online procedure in which separate views (features) of the data are used for co-training, while the combined view (all features) is used to make classification decisions in a single boosted framework. The advantage of this approach is that it requires only a few initial training samples and can automatically adjust its parameters online to improve the detection and classification performance. Once objects are detected and classified they are tracked in individual cameras. Single camera tracking is performed using a voting based approach that utilizes color and shape cues to establish correspondence in individual cameras. The tracker has the capability to handle multiple occluded objects. Next, the objects are tracked across a forest of cameras with non-overlapping views. This is a hard problem because of two reasons. First, the observations of an object are often widely separated in time and space when viewed from non-overlapping cameras. Secondly, the appearance of an object in one camera view might be very different from its appearance in another camera view due to the differences in illumination, pose and camera properties. To deal with the first problem, the system learns the inter-camera relationships to constrain track correspondences. These relationships are learned in the form of multivariate probability density of space-time variables (object entry and exit locations, velocities, and inter-camera transition times) using Parzen windows. To handle the appearance change of an object as it moves from one camera to another, we show that all color transfer functions from a given camera to another camera lie in a low dimensional subspace. The tracking algorithm learns this subspace by using probabilistic principal component analysis and uses it for appearance matching. The proposed system learns the camera topology and subspace of inter-camera color transfer functions during a training phase. Once the training is complete, correspondences are assigned using the maximum a posteriori (MAP) estimation framework using both the location and appearance cues. Extensive experiments and deployment of this system in realistic scenarios has demonstrated the robustness of the proposed methods. The proposed system was able to detect and classify targets, and seamlessly tracked them across multiple cameras. It also generated a summary in terms of key frames and textual description of trajectories to a monitoring officer for final analysis and response decision. This level of interpretation was the goal of our research effort, and we believe that it is a significant step forward in the development of intelligent systems that can deal with the complexities of real world scenarios.
Show less - Date Issued
- 2005
- Identifier
- CFE0000497, ucf:46362
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0000497