Current Search: Foroosh, Hassan (x)
View All Items
Pages
- Title
- EXPRESSION MORPHING BETWEEN DIFFERENT ORIENTATIONS.
- Creator
-
Fu, Tao, Foroosh, Hassan R., University of Central Florida
- Abstract / Description
-
How to generate new views based on given reference images has been an important and interesting topic in the area of image-based rendering. Two important algorithms that can be used are field morphing and view morphing. Field morphing, which is an algorithm of image morphing, generates new views based on two reference images which were taken at the same viewpoint. The most successful result of field morphing is morphing from one person's face to the other one's face. View morphing, which is...
Show moreHow to generate new views based on given reference images has been an important and interesting topic in the area of image-based rendering. Two important algorithms that can be used are field morphing and view morphing. Field morphing, which is an algorithm of image morphing, generates new views based on two reference images which were taken at the same viewpoint. The most successful result of field morphing is morphing from one person's face to the other one's face. View morphing, which is an algorithm of view synthesis, generates in-between views based on two reference views which were taken at different viewpoints for the same object. The result of view morphing is often an animation of moving one object from the viewpoint of one reference image to the viewpoint of the other one.In this thesis, we proposed a new framework that integrates field morphing and view morphing to solve the problem of expression morphing. Based on four reference images, we successfully generate the morphing from one viewpoint with one expression to another viewpoint with a different expression. We also proposed a new approach to eliminate artifacts that frequently occur in view morphing due to occlusions and in field morphing due to some unforeseen combination of feature lines. We solve these problems by relaxing the monotonicity assumption to piece-wise monotonicity along the epipolar lines. Our experimental results demonstrate the efficiency of this approach in handling occlusions for more realistic synthesis of novel views.
Show less - Date Issued
- 2004
- Identifier
- CFE0000070, ucf:46110
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0000070
- Title
- SUB-PIXEL REGISTRATION IN COMPUTATIONAL IMAGING AND APPLICATIONS TO ENHANCEMENT OF MAXILLOFACIAL CT DATA.
- Creator
-
Balci, Murat, Foroosh, Hassan, University of Central Florida
- Abstract / Description
-
In computational imaging, data acquired by sampling the same scene or object at different times or from different orientations result in images in different coordinate systems. Registration is a crucial step in order to be able to compare, integrate and fuse the data obtained from different measurements. Tomography is the method of imaging a single plane or slice of an object. A Computed Tomography (CT) scan, also known as a CAT scan (Computed Axial Tomography scan), is a Helical Tomography,...
Show moreIn computational imaging, data acquired by sampling the same scene or object at different times or from different orientations result in images in different coordinate systems. Registration is a crucial step in order to be able to compare, integrate and fuse the data obtained from different measurements. Tomography is the method of imaging a single plane or slice of an object. A Computed Tomography (CT) scan, also known as a CAT scan (Computed Axial Tomography scan), is a Helical Tomography, which traditionally produces a 2D image of the structures in a thin section of the body. It uses X-ray, which is ionizing radiation. Although the actual dose is typically low, repeated scans should be limited. In dentistry, implant dentistry in specific, there is a need for 3D visualization of internal anatomy. The internal visualization is mainly based on CT scanning technologies. The most important technological advancement which dramatically enhanced the clinician's ability to diagnose, treat, and plan dental implants has been the CT scan. Advanced 3D modeling and visualization techniques permit highly refined and accurate assessment of the CT scan data. However, in addition to imperfections of the instrument and the imaging process, it is not uncommon to encounter other unwanted artifacts in the form of bright regions, flares and erroneous pixels due to dental bridges, metal braces, etc. Currently, removing and cleaning up the data from acquisition backscattering imperfections and unwanted artifacts is performed manually, which is as good as the experience level of the technician. On the other hand the process is error prone, since the editing process needs to be performed image by image. We address some of these issues by proposing novel registration methods and using stonecast models of patient's dental imprint as reference ground truth data. Stone-cast models were originally used by dentists to make complete or partial dentures. The CT scan of such stone-cast models can be used to automatically guide the cleaning of patients' CT scans from defects or unwanted artifacts, and also as an automatic segmentation system for the outliers of the CT scan data without use of stone-cast models. Segmented data is subsequently used to clean the data from artifacts using a new proposed 3D inpainting approach.
Show less - Date Issued
- 2006
- Identifier
- CFE0001443, ucf:47040
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0001443
- Title
- TOWARDS A SELF-CALIBRATING VIDEO CAMERA NETWORK FOR CONTENT ANALYSIS AND FORENSICS.
- Creator
-
Junejo, Imran, Foroosh, Hassan, University of Central Florida
- Abstract / Description
-
Due to growing security concerns, video surveillance and monitoring has received an immense attention from both federal agencies and private firms. The main concern is that a single camera, even if allowed to rotate or translate, is not sufficient to cover a large area for video surveillance. A more general solution with wide range of applications is to allow the deployed cameras to have a non-overlapping field of view (FoV) and to, if possible, allow these cameras to move freely in 3D space....
Show moreDue to growing security concerns, video surveillance and monitoring has received an immense attention from both federal agencies and private firms. The main concern is that a single camera, even if allowed to rotate or translate, is not sufficient to cover a large area for video surveillance. A more general solution with wide range of applications is to allow the deployed cameras to have a non-overlapping field of view (FoV) and to, if possible, allow these cameras to move freely in 3D space. This thesis addresses the issue of how cameras in such a network can be calibrated and how the network as a whole can be calibrated, such that each camera as a unit in the network is aware of its orientation with respect to all the other cameras in the network. Different types of cameras might be present in a multiple camera network and novel techniques are presented for efficient calibration of these cameras. Specifically: (i) For a stationary camera, we derive new constraints on the Image of the Absolute Conic (IAC). These new constraints are shown to be intrinsic to IAC; (ii) For a scene where object shadows are cast on a ground plane, we track the shadows on the ground plane cast by at least two unknown stationary points, and utilize the tracked shadow positions to compute the horizon line and hence compute the camera intrinsic and extrinsic parameters; (iii) A novel solution to a scenario where a camera is observing pedestrians is presented. The uniqueness of formulation lies in recognizing two harmonic homologies present in the geometry obtained by observing pedestrians; (iv) For a freely moving camera, a novel practical method is proposed for its self-calibration which even allows it to change its internal parameters by zooming; and (v) due to the increased application of the pan-tilt-zoom (PTZ) cameras, a technique is presented that uses only two images to estimate five camera parameters. For an automatically configurable multi-camera network, having non-overlapping field of view and possibly containing moving cameras, a practical framework is proposed that determines the geometry of such a dynamic camera network. It is shown that only one automatically computed vanishing point and a line lying on any plane orthogonal to the vertical direction is sufficient to infer the geometry of a dynamic network. Our method generalizes previous work which considers restricted camera motions. Using minimal assumptions, we are able to successfully demonstrate promising results on synthetic as well as on real data. Applications to path modeling, GPS coordinate estimation, and configuring mixed-reality environment are explored.
Show less - Date Issued
- 2007
- Identifier
- CFE0001743, ucf:47296
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0001743
- Title
- LEARNING GEOMETRY-FREE FACE RE-LIGHTING.
- Creator
-
Moore, Thomas, Foroosh, Hassan, University of Central Florida
- Abstract / Description
-
The accurate modeling of the variability of illumination in a class of images is a fundamental problem that occurs in many areas of computer vision and graphics. For instance, in computer vision there is the problem of facial recognition. Simply, one would hope to be able to identify a known face under any illumination. On the other hand, in graphics one could imagine a system that, given an image, the illumination model could be identified and then used to create new images. In this thesis...
Show moreThe accurate modeling of the variability of illumination in a class of images is a fundamental problem that occurs in many areas of computer vision and graphics. For instance, in computer vision there is the problem of facial recognition. Simply, one would hope to be able to identify a known face under any illumination. On the other hand, in graphics one could imagine a system that, given an image, the illumination model could be identified and then used to create new images. In this thesis we describe a method for learning the illumination model for a class of images. Once the model is learnt it is then used to render new images of the same class under the new illumination. Results are shown for both synthetic and real images. The key contribution of this work is that images of known objects can be re-illuminated using small patches of image data and relatively simple kernel regression models. Additionally, our approach does not require any knowledge of the geometry of the class of objects under consideration making it relatively straightforward to implement. As part of this work we will examine existing geometric and image-based re-lighting techniques; give a detailed description of our geometry-free face re-lighting process; present non-linear regression and basis selection with respect to image synthesis; discuss system limitations; and look at possible extensions and future work.
Show less - Date Issued
- 2007
- Identifier
- CFE0001893, ucf:47394
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0001893
- Title
- MULTIPLE VIEW GEOMETRY FOR VIDEO ANALYSIS AND POST-PRODUCTION.
- Creator
-
Cao, Xiaochun, Foroosh, Hassan, University of Central Florida
- Abstract / Description
-
Multiple view geometry is the foundation of an important class of computer vision techniques for simultaneous recovery of camera motion and scene structure from a set of images. There are numerous important applications in this area. Examples include video post-production, scene reconstruction, registration, surveillance, tracking, and segmentation. In video post-production, which is the topic being addressed in this dissertation, computer analysis of the motion of the camera can replace the...
Show moreMultiple view geometry is the foundation of an important class of computer vision techniques for simultaneous recovery of camera motion and scene structure from a set of images. There are numerous important applications in this area. Examples include video post-production, scene reconstruction, registration, surveillance, tracking, and segmentation. In video post-production, which is the topic being addressed in this dissertation, computer analysis of the motion of the camera can replace the currently used manual methods for correctly aligning an artificially inserted object in a scene. However, existing single view methods typically require multiple vanishing points, and therefore would fail when only one vanishing point is available. In addition, current multiple view techniques, making use of either epipolar geometry or trifocal tensor, do not exploit fully the properties of constant or known camera motion. Finally, there does not exist a general solution to the problem of synchronization of N video sequences of distinct general scenes captured by cameras undergoing similar ego-motions, which is the necessary step for video post-production among different input videos. This dissertation proposes several advancements that overcome these limitations. These advancements are used to develop an efficient framework for video analysis and post-production in multiple cameras. In the first part of the dissertation, the novel inter-image constraints are introduced that are particularly useful for scenes where minimal information is available. This result extends the current state-of-the-art in single view geometry techniques to situations where only one vanishing point is available. The property of constant or known camera motion is also described in this dissertation for applications such as calibration of a network of cameras in video surveillance systems, and Euclidean reconstruction from turn-table image sequences in the presence of zoom and focus. We then propose a new framework for the estimation and alignment of camera motions, including both simple (panning, tracking and zooming) and complex (e.g. hand-held) camera motions. Accuracy of these results is demonstrated by applying our approach to video post-production applications such as video cut-and-paste and shadow synthesis. As realistic image-based rendering problems, these applications require extreme accuracy in the estimation of camera geometry, the position and the orientation of the light source, and the photometric properties of the resulting cast shadows. In each case, the theoretical results are fully supported and illustrated by both numerical simulations and thorough experimentation on real data.
Show less - Date Issued
- 2006
- Identifier
- CFE0001014, ucf:46840
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0001014
- Title
- PHASE-SHIFTING HAAR WAVELETS FOR IMAGE-BASED RENDERING APPLICATIONS.
- Creator
-
Alnasser, Mais, Foroosh, Hassan, University of Central Florida
- Abstract / Description
-
In this thesis, we establish the underlying research background necessary for tackling the problem of phase-shifting in the wavelet transform domain. Solving this problem is the key to reducing the redundancy and huge storage requirement in Image-Based Rendering (IBR) applications, which utilize wavelets. Image-based methods for rendering of dynamic glossy objects do not truly scale to all possible frequencies and high sampling rates without trading storage, glossiness, or computational time,...
Show moreIn this thesis, we establish the underlying research background necessary for tackling the problem of phase-shifting in the wavelet transform domain. Solving this problem is the key to reducing the redundancy and huge storage requirement in Image-Based Rendering (IBR) applications, which utilize wavelets. Image-based methods for rendering of dynamic glossy objects do not truly scale to all possible frequencies and high sampling rates without trading storage, glossiness, or computational time, while varying both lighting and viewpoint. This is due to the fact that current approaches are limited to precomputed radiance transfer (PRT), which is prohibitively expensive in terms of memory requirements when both lighting and viewpoint variation are required together with high sampling rates for high frequency lighting of glossy material. At the root of the above problem is the lack of a closed-form run-time solution to the nontrivial problem of rotating wavelets, which we solve in this thesis. We specifically target Haar wavelets, which provide the most efficient solution to solving the tripleproduct integral, which in turn is fundamental to solving the environment lighting problem. The problem is divided into three main steps, each of which provides several key theoretical contributions. First, we derive closed-form expressions for linear phase-shifting in the Haar domain for one-dimensional signals, which can be generalized to N-dimensional signals due to separability. Second, we derive closed-form expressions for linear phase-shifting for two-dimensional signals that are projected using the non-separable Haar transform. For both cases, we show that the coefficients of the shifted data can be computed solely by using the coefficients of the original data. We also derive closed-form expressions for non-integer shifts, which has not been reported before. As an application example of these results, we apply the new formulae to image shifting, rotation and interpolation, and demonstrate the superiority of the proposed solutions to existing methods. In the third step, we establish a solution for non-linear phase-shifting of two-dimensional non-separable Haar-transformed signals, which is directly applicable to the original problem of image-based rendering. Our solution is the first attempt to provide an analytic solution to the difficult problem of rotating wavelets in the transform domain.
Show less - Date Issued
- 2008
- Identifier
- CFE0002214, ucf:47882
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0002214
- Title
- GEOMETRIC INVARIANCE IN THE ANALYSIS OF HUMAN MOTION IN VIDEO DATA.
- Creator
-
Shen, Yuping, Foroosh, Hassan, University of Central Florida
- Abstract / Description
-
Human motion analysis is one of the major problems in computer vision research. It deals with the study of the motion of human body in video data from different aspects, ranging from the tracking of body parts and reconstruction of 3D human body configuration, to higher level of interpretation of human action and activities in image sequences. When human motion is observed through video camera, it is perspectively distorted and may appear totally different from different viewpoints. Therefore...
Show moreHuman motion analysis is one of the major problems in computer vision research. It deals with the study of the motion of human body in video data from different aspects, ranging from the tracking of body parts and reconstruction of 3D human body configuration, to higher level of interpretation of human action and activities in image sequences. When human motion is observed through video camera, it is perspectively distorted and may appear totally different from different viewpoints. Therefore it is highly challenging to establish correct relationships between human motions across video sequences with different camera settings. In this work, we investigate the geometric invariance in the motion of human body, which is critical to accurately understand human motion in video data regardless of variations in camera parameters and viewpoints. In human action analysis, the representation of human action is a very important issue, and it usually determines the nature of the solutions, including their limits in resolving the problem. Unlike existing research that study human motion as a whole 2D/3D object or a sequence of postures, we study human motion as a sequence of body pose transitions. We also decompose a human body pose further into a number of body point triplets, and break down a pose transition into the transition of a set of body point triplets. In this way the study of complex non-rigid motion of human body is reduced to that of the motion of rigid body point triplets, i.e. a collection of planes in motion. As a result, projective geometry and linear algebra can be applied to explore the geometric invariance in human motion. Based on this formulation, we have discovered the fundamental ratio invariant and the eigenvalue equality invariant in human motion. We also propose solutions based on these geometric invariants to the problems of view-invariant recognition of human postures and actions, as well as analysis of human motion styles. These invariants and their applicability have been validated by experimental results supporting that their effectiveness in understanding human motion with various camera parameters and viewpoints.
Show less - Date Issued
- 2009
- Identifier
- CFE0002945, ucf:47970
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0002945
- Title
- LABELED SAMPLING CONSENSUS: A NOVEL ALGORITHM FOR ROBUSTLY FITTING MULTIPLE STRUCTURES USING COMPRESSED SAMPLING.
- Creator
-
Messina, Carl, Foroosh, Hassan, University of Central Florida
- Abstract / Description
-
The ability to robustly fit structures in datasets that contain outliers is a very important task in Image Processing, Pattern Recognition and Computer Vision. Random Sampling Consensus or RANSAC is a very popular method for this task, due to its ability to handle over 50% outliers. The problem with RANSAC is that it is only capable of finding a single structure. Therefore, if a dataset contains multiple structures, they must be found sequentially by finding the best fit, removing the points,...
Show moreThe ability to robustly fit structures in datasets that contain outliers is a very important task in Image Processing, Pattern Recognition and Computer Vision. Random Sampling Consensus or RANSAC is a very popular method for this task, due to its ability to handle over 50% outliers. The problem with RANSAC is that it is only capable of finding a single structure. Therefore, if a dataset contains multiple structures, they must be found sequentially by finding the best fit, removing the points, and repeating the process. However, removing incorrect points from the dataset could prove disastrous. This thesis offers a novel approach to sampling consensus that extends its ability to discover multiple structures in a single iteration through the dataset. The process introduced is an unsupervised method, requiring no previous knowledge to the distribution of the input data. It uniquely assigns labels to different instances of similar structures. The algorithm is thus called Labeled Sampling Consensus or L-SAC. These unique instances will tend to cluster around one another allowing the individual structures to be extracted using simple clustering techniques. Since divisions instead of modes are analyzed, only a single instance of a structure need be recovered. This ability of L-SAC allows a novel sampling procedure to be presented "compressing" the required samples needed compared to traditional sampling schemes while ensuring all structures have been found. L-SAC is a flexible framework that can be applied to many problem domains.
Show less - Date Issued
- 2011
- Identifier
- CFE0003893, ucf:48727
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0003893
- Title
- Visionary Ophthalmics: Confluence of Computer Vision and Deep Learning for Ophthalmology.
- Creator
-
Morley, Dustin, Foroosh, Hassan, Bagci, Ulas, Gong, Boqing, Mohapatra, Ram, University of Central Florida
- Abstract / Description
-
Ophthalmology is a medical field ripe with opportunities for meaningful application of computer vision algorithms. The field utilizes data from multiple disparate imaging techniques, ranging from conventional cameras to tomography, comprising a diverse set of computer vision challenges. Computer vision has a rich history of techniques that can adequately meet many of these challenges. However, the field has undergone something of a revolution in recent times as deep learning techniques have...
Show moreOphthalmology is a medical field ripe with opportunities for meaningful application of computer vision algorithms. The field utilizes data from multiple disparate imaging techniques, ranging from conventional cameras to tomography, comprising a diverse set of computer vision challenges. Computer vision has a rich history of techniques that can adequately meet many of these challenges. However, the field has undergone something of a revolution in recent times as deep learning techniques have sprung into the forefront following advances in GPU hardware. This development raises important questions regarding how to best leverage insights from both modern deep learning approaches and more classical computer vision approaches for a given problem. In this dissertation, we tackle challenging computer vision problems in ophthalmology using methods all across this spectrum. Perhaps our most significant work is a highly successful iris registration algorithm for use in laser eye surgery. This algorithm relies on matching features extracted from the structure tensor and a Gabor wavelet (-) a classically driven approach that does not utilize modern machine learning. However, drawing on insight from the deep learning revolution, we demonstrate successful application of backpropagation to optimize the registration significantly faster than the alternative of relying on finite differences. Towards the other end of the spectrum, we also present a novel framework for improving RANSAC segmentation algorithms by utilizing a convolutional neural network (CNN) trained on a RANSAC-based loss function. Finally, we apply state-of-the-art deep learning methods to solve the problem of pathological fluid detection in optical coherence tomography images of the human retina, using a novel retina-specific data augmentation technique to greatly expand the data set. Altogether, our work demonstrates benefits of applying a holistic view of computer vision, which leverages deep learning and associated insights without neglecting techniques and insights from the previous era.
Show less - Date Issued
- 2018
- Identifier
- CFE0007058, ucf:52001
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0007058
- Title
- A Decision Support Tool for Video Retinal Angiography.
- Creator
-
Laha, Sumit, Bagci, Ulas, Foroosh, Hassan, Song, Sam, University of Central Florida
- Abstract / Description
-
Fluorescein angiogram (FA) is a medical procedure that helps the ophthalmologists to monitor the status of the retinal blood vessels and to diagnose proper treatment. This research is motivated by the necessity of blood vessel segmentation of the retina. Retinal vessel segmentation has been a major challenge and has long drawn the attention of researchers for decades due to the presence of complex blood vessels with varying size, shape, angles and branching pattern of vessels, and non-uniform...
Show moreFluorescein angiogram (FA) is a medical procedure that helps the ophthalmologists to monitor the status of the retinal blood vessels and to diagnose proper treatment. This research is motivated by the necessity of blood vessel segmentation of the retina. Retinal vessel segmentation has been a major challenge and has long drawn the attention of researchers for decades due to the presence of complex blood vessels with varying size, shape, angles and branching pattern of vessels, and non-uniform illumination and huge anatomical variability between subjects. In this thesis, we introduce a new computational tool that combines deep learning based machine learning algorithm and a signal processing based video magnification method to support physicians in analyzing and diagnosing retinal angiogram videos for the first time in the literature.The proposed approach has a pipeline-based architecture containing three phases - image registration for large motion removal from video angiogram, retinal vessel segmentation and video magnification based on the segmented vessels. In image registration phase, we align distorted frames in the FA video using rigid registration approaches. In the next phase, we use baseline capsule based neural networks for retinal vessel segmentation in comparison with the state-of-the-art methods. We move away from traditional convolutional network approaches to capsule networks in this work. This is because, despite being widely used in different computer vision applications, convolutional neural networks suffer from learning ability to understand the object-part relationships, have high computational times due to additive nature of neurons and, loose information in the pooling layer. Although having these drawbacks, we use deep learning methods like U-Net and Tiramisu to measure the performance and accuracy of SegCaps. Lastly, we apply Eulerian video magnification to magnify the subtle changes in the retinal video. In this phase, magnification is applied to segmented videos to visualize the flow of blood in the retinal vessels.
Show less - Date Issued
- 2018
- Identifier
- CFE0007342, ucf:52125
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0007342
- Title
- Solution of linear ill-posed problems using overcomplete dictionaries.
- Creator
-
Gupta, Pawan, Pensky, Marianna, Swanson, Jason, Zhang, Teng, Foroosh, Hassan, University of Central Florida
- Abstract / Description
-
In this dissertation, we consider an application of overcomplete dictionaries to the solution of general ill-posed linear inverse problems. In the context of regression problems, there has been an enormous amount of effort to recover an unknown function using such dictionaries. While some research on the subject has been already carried out, there are still many gaps to address. In particular, one of the most popular methods, lasso, and its variants, is based on minimizing the empirical...
Show moreIn this dissertation, we consider an application of overcomplete dictionaries to the solution of general ill-posed linear inverse problems. In the context of regression problems, there has been an enormous amount of effort to recover an unknown function using such dictionaries. While some research on the subject has been already carried out, there are still many gaps to address. In particular, one of the most popular methods, lasso, and its variants, is based on minimizing the empirical likelihood and unfortunately, requires stringent assumptions on the dictionary, the so-called, compatibility conditions. Though compatibility conditions are hard to satisfy, it is well known that this can be accomplished by using random dictionaries. In the first part of the dissertation, we show how one can apply random dictionaries to the solution of ill-posed linear inverse problems with Gaussian noise. We put a theoretical foundation under the suggested methodology and study its performance via simulations and real-data example. In the second part of the dissertation, we investigate the application of lasso to the linear ill-posed problems with non-Gaussian noise. We have developed a theoretical background for the application of lasso to such problems and studied its performance via simulations.
Show less - Date Issued
- 2019
- Identifier
- CFE0007811, ucf:52345
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0007811
- Title
- Algorithms for Rendering Optimization.
- Creator
-
Johnson, Jared, Hughes, Charles, Tappen, Marshall, Foroosh, Hassan, Shirley, Peter, University of Central Florida
- Abstract / Description
-
This dissertation explores algorithms for rendering optimization realizable within a modern, complex rendering engine. The first part contains optimized rendering algorithms for ray tracing. Ray tracing algorithms typically provide properties of simplicity and robustness that are highly desirable in computer graphics. We offer several novel contributions to the problem of interactive ray tracing of complex lighting environments. We focus on the problem of maintaining interactivity as both...
Show moreThis dissertation explores algorithms for rendering optimization realizable within a modern, complex rendering engine. The first part contains optimized rendering algorithms for ray tracing. Ray tracing algorithms typically provide properties of simplicity and robustness that are highly desirable in computer graphics. We offer several novel contributions to the problem of interactive ray tracing of complex lighting environments. We focus on the problem of maintaining interactivity as both geometric and lighting complexity grows without effecting the simplicity or robustness of ray tracing. First, we present a new algorithm called occlusion caching for accelerating the calculation of direct lighting from many light sources. We cache light visibility information sparsely across a scene. When rendering direct lighting for all pixels in a frame, we combine cached lighting information to determine whether or not shadow rays are needed. Since light visibility and scene location are highly correlated, our approach precludes the need for most shadow rays. Second, we present improvements to the irradiance caching algorithm. Here we demonstrate a new elliptical cache point spacing heuristic that reduces the number of cache points required by taking into account the direction of irradiance gradients. We also accelerate irradiance caching by efficiently and intuitively coupling it with occlusion caching.In the second part of this dissertation, we present optimizations to rendering algorithms for participating media. Specifically, we explore the implementation and use of photon beams as an efficient, intuitive artistic primitive. We detail our implementation of the photon beams algorithm into PhotoRealistic RenderMan (PRMan). We show how our implementation maintains the benefits of the industry standard Reyes rendering pipeline, with proper motion blur and depth of field. We detail an automatic photon beam generation algorithm, utilizing PRMan shadow maps. We accelerate the rendering of camera-facing photon beams by utilizing Gaussian quadrature for path integrals in place of ray marching. Our optimized implementation allows for incredible versatility and intuitiveness in artistic control of volumetric lighting effects. Finally, we demonstrate the usefulness of photon beams as artistic primitives by detailing their use in a feature-length animated film.
Show less - Date Issued
- 2012
- Identifier
- CFE0004557, ucf:49231
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004557
- Title
- A Study of Localization and Latency Reduction for Action Recognition.
- Creator
-
Masood, Syed, Tappen, Marshall, Foroosh, Hassan, Stanley, Kenneth, Sukthankar, Rahul, University of Central Florida
- Abstract / Description
-
The success of recognizing periodic actions in single-person-simple-background datasets, such as Weizmann and KTH, has created a need for more complex datasets to push the performance of action recognition systems. In this work, we create a new synthetic action dataset and use it to highlight weaknesses in current recognition systems. Experiments show that introducing background complexity to action video sequences causes a significant degradation in recognition performance. Moreover, this...
Show moreThe success of recognizing periodic actions in single-person-simple-background datasets, such as Weizmann and KTH, has created a need for more complex datasets to push the performance of action recognition systems. In this work, we create a new synthetic action dataset and use it to highlight weaknesses in current recognition systems. Experiments show that introducing background complexity to action video sequences causes a significant degradation in recognition performance. Moreover, this degradation cannot be fixed by fine-tuning system parameters or by selecting better feature points. Instead, we show that the problem lies in the spatio-temporal cuboid volume extracted from the interest point locations. Having identified the problem, we show how improved results can be achieved by simple modifications to the cuboids.For the above method however, one requires near-perfect localization of the action within a video sequence. To achieve this objective, we present a two stage weakly supervised probabilistic model for simultaneous localization and recognition of actions in videos. Different from previous approaches, our method is novel in that it (1) eliminates the need for manual annotations for the training procedure and (2) does not require any human detection or tracking in the classification stage. The first stage of our framework is a probabilistic action localization model which extracts the most promising sub-windows in a video sequence where an action can take place. We use a non-linear classifier in the second stage of our framework for the final classification task. We show the effectiveness of our proposed model on two well known real-world datasets: UCF Sports and UCF11 datasets.Another application of the weakly supervised probablistic model proposed above is in the gaming environment. An important aspect in designing interactive, action-based interfaces is reliably recognizing actions with minimal latency. High latency causes the system's feedback to lag behind and thus significantly degrade the interactivity of the user experience. With slight modification to the weakly supervised probablistic model we proposed for action localization, we show how it can be used for reducing latency when recognizing actions in Human Computer Interaction (HCI) environments. This latency-aware learning formulation trains a logistic regression-based classifier that automatically determines distinctive canonical poses from the data and uses these to robustly recognize actions in the presence of ambiguous poses. We introduce a novel (publicly released) dataset for the purpose of our experiments. Comparisons of our method against both a Bag of Words and a Conditional Random Field (CRF) classifier show improved recognition performance for both pre-segmented and online classification tasks.
Show less - Date Issued
- 2012
- Identifier
- CFE0004575, ucf:49210
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004575
- Title
- Towards Real-time Mixed Reality Matting in Natural Scenes.
- Creator
-
Beato, Nicholas, Hughes, Charles, Foroosh, Hassan, Tappen, Marshall, Moshell, Jack, University of Central Florida
- Abstract / Description
-
In Mixed Reality scenarios, background replacement is a common way to immerse a user in a synthetic environment. Properly identifying the background pixels in an image or video is a difficult problem known as matting. In constant color matting, research identifies and replaces a background that is a single color, known as the chroma key color. Unfortunately, the algorithms force a controlled physical environment and favor constant, uniform lighting. More generic approaches, such as natural...
Show moreIn Mixed Reality scenarios, background replacement is a common way to immerse a user in a synthetic environment. Properly identifying the background pixels in an image or video is a difficult problem known as matting. In constant color matting, research identifies and replaces a background that is a single color, known as the chroma key color. Unfortunately, the algorithms force a controlled physical environment and favor constant, uniform lighting. More generic approaches, such as natural image matting, have made progress finding alpha matte solutions in environments with naturally occurring backgrounds. However, even for the quicker algorithms, the generation of trimaps, indicating regions of known foreground and background pixels, normally requires human interaction or offline computation. This research addresses ways to automatically solve an alpha matte for an image in real-time, and by extension video, using a consumer level GPU. It do so even in the context of noisy environments that result in less reliable constraints than found in controlled settings. To attack these challenges, we are particularly interested in automatically generating trimaps from depth buffers for dynamic scenes so that algorithms requiring more dense constraints may be used. We then explore a sub-image based approach to parallelize an existing hierarchical approach on high resolution imagery by taking advantage of local information. We show that locality can be exploited to significantly reduce the memory and compute requirements of previously necessary when computing alpha mattes of high resolution images. We achieve this using a parallelizable scheme that is both independent of the matting algorithm and image features. Combined, these research topics provide a basis for Mixed Reality scenarios using real-time natural image matting on high definition video sources.
Show less - Date Issued
- 2012
- Identifier
- CFE0004515, ucf:49284
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004515
- Title
- SetPad: A Sketch-Based Tool For Exploring Discrete Math Set Problems.
- Creator
-
Cossairt, Travis, Laviola II, Joseph, Foroosh, Hassan, Hughes, Charles, University of Central Florida
- Abstract / Description
-
We present SetPad, a new application prototype that lets computer science students explore discrete math problems by sketching set expressions using pen-based input. Students can manipulate the expressions interactively with the tool via pen or multi-touch interface. Likewise, discrete mathematics instructors can use SetPad to display and work through set problems via a projector to better demonstrate the solutions to the students. We discuss the implementation and feature set of the...
Show moreWe present SetPad, a new application prototype that lets computer science students explore discrete math problems by sketching set expressions using pen-based input. Students can manipulate the expressions interactively with the tool via pen or multi-touch interface. Likewise, discrete mathematics instructors can use SetPad to display and work through set problems via a projector to better demonstrate the solutions to the students. We discuss the implementation and feature set of the application, as well as results from both an informal perceived usefulness evaluation for students taking a computer science foundation exam in addition to a formal user study measuring the effectiveness of the tool when solving set proof problems. The results indicate that SetPad was well received, allows for efficient solutions to proof problems, and has the potential to have a positive impact when used as as an individual student application or an instructional tool.
Show less - Date Issued
- 2012
- Identifier
- CFE0004240, ucf:49507
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004240
- Title
- STUDY OF HUMAN ACTIVITY IN VIDEO DATA WITH AN EMPHASIS ON VIEW-INVARIANCE.
- Creator
-
Ashraf, Nazim, Foroosh, Hassan, Hughes, Charles, Tappen, Marshall, Moshell, Jack, University of Central Florida
- Abstract / Description
-
The perception and understanding of human motion and action is an important area of research in computer vision that plays a crucial role in various applications such as surveillance, HCI, ergonomics, etc. In this thesis, we focus on the recognition of actions in the case of varying viewpoints and different and unknown camera intrinsic parameters. The challenges to be addressed include perspective distortions, differences in viewpoints, anthropometric variations,and the large degrees of...
Show moreThe perception and understanding of human motion and action is an important area of research in computer vision that plays a crucial role in various applications such as surveillance, HCI, ergonomics, etc. In this thesis, we focus on the recognition of actions in the case of varying viewpoints and different and unknown camera intrinsic parameters. The challenges to be addressed include perspective distortions, differences in viewpoints, anthropometric variations,and the large degrees of freedom of articulated bodies. In addition, we are interested in methods that require little or no training. The current solutions to action recognition usually assume that there is a huge dataset of actions available so that a classifier can be trained. However, thismeans that in order to define a new action, the user has to record a number of videos fromdifferent viewpoints with varying camera intrinsic parameters and then retrain the classifier, which is not very practical from a development point of view. We propose algorithms that overcome these challenges and require just a few instances of the action from any viewpoint with any intrinsic camera parameters. Our first algorithm is based on the rank constraint on the family of planar homographies associated with triplets of body points. We represent action as a sequence of poses, and decompose the pose into triplets. Therefore, the pose transition is brokendown into a set of movement of body point planes. In this way, we transform the non-rigid motion of the body points into a rigid motion of body point planes. We use the fact that the family of homographies associated with two identical poses would have rank 4 to gauge similarity of the pose between two subjects, observed by different perspective cameras and from different viewpoints. This method requires only one instance of the action. We then show that it is possible to extend the concept of triplets to line segments. In particular, we establish that if we look at the movement of line segments instead of triplets, we have more redundancy in data thus leading to better results. We demonstrate this concept on (")fundamental ratios.(") We decompose a human body pose into line segments instead of triplets and look at set of movement of line segments. This method needs only three instances of the action. If a larger dataset is available, we can also apply weighting on line segments for better accuracy. The last method is based onthe concept of (")Projective Depth("). Given a plane, we can find the relative depth of a point relative to the given plane. We propose three different ways of using (")projective depth:(") (i) Triplets - the three points of a triplet along with the epipole defines the plane and the movement of points relative to these body planes can be used to recognize actions; (ii) Ground plane - if we are able to extract the ground plane, we can find the (")projective depth(") of the body points withrespect to it. Therefore, the problem of action recognition would translate to curve matching; and (iii) Mirror person (-) We can use the mirror view of the person to extract mirror symmetric planes. This method also needs only one instance of the action. Extensive experiments are reported on testing view invariance, robustness to noisy localization and occlusions of bodypoints, and action recognition. The experimental results are very promising and demonstrate the efficiency of our proposed invariants.
Show less - Date Issued
- 2012
- Identifier
- CFE0004352, ucf:49449
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004352
- Title
- Forecasting Volcanic Activity Using An Event Tree Analysis System and Logistic Regression.
- Creator
-
Junek, William, Jones, W, Simaan, Marwan, Foroosh, Hassan, Woods, Mark, University of Central Florida
- Abstract / Description
-
Forecasts of short term volcanic activity are generated using an event tree process that is driven by a set of empirical statistical models derived through logistic regression. Each of the logistic models are constructed from a sparse and geographically diverse dataset that was assembled from a collection of historic volcanic unrest episodes. The dataset consists of monitoring measurements (e.g. seismic), source modeling results, and historic eruption information. Incorporating this data into...
Show moreForecasts of short term volcanic activity are generated using an event tree process that is driven by a set of empirical statistical models derived through logistic regression. Each of the logistic models are constructed from a sparse and geographically diverse dataset that was assembled from a collection of historic volcanic unrest episodes. The dataset consists of monitoring measurements (e.g. seismic), source modeling results, and historic eruption information. Incorporating this data into a single set of models provides a simple mechanism for simultaneously accounting for the geophysical changes occurring within the volcano and the historic behavior of analog volcanoes. A bootstrapping analysis of the training dataset allowed for the estimation of robust logistic model coefficients. Probabilities generated from the logistic models increase with positive modeling results, escalating seismicity, and high eruption frequency. The cross validation process produced a series of receiver operating characteristic (ROC) curves with areas ranging between 0.78 - 0.81, which indicate the algorithm has good predictive capabilities. In addition, ROC curves also allowed for the determination of a false positive rate and optimum detection threshold for each stage of the algorithm. The results demonstrate the logistic models are highly transportable and can compete with, and in some cases outperform, non-transportable empirical models trained with site specific information. The incorporation of source modeling results into the event tree's decision making process has begun the transition of volcano monitoring applications from simple mechanized pattern recognition algorithms to a physical model based forecasting system.
Show less - Date Issued
- 2012
- Identifier
- CFE0004253, ucf:49517
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004253
- Title
- Simulation of a Capacitive Micromachined Ultrasonic Transducer with a Parylene Membrane and Graphene Electrodes.
- Creator
-
Sadat, David, Chen, Quanfang, Xu, Yunjun, Foroosh, Hassan, University of Central Florida
- Abstract / Description
-
Medical ultrasound technology accounts for over half of all imaging tests performed worldwide. In comparison to other methods, ultrasonic imaging is more portable and lower cost, and is becoming more accessible to remote regions where traditionally no medical imaging can be done. However, conventional ultrasonic imaging systems still rely on expensive PZT-based ultrasound probes that limit broader applications. In addition, the resolution of PZT based transducers is low due to the limitation...
Show moreMedical ultrasound technology accounts for over half of all imaging tests performed worldwide. In comparison to other methods, ultrasonic imaging is more portable and lower cost, and is becoming more accessible to remote regions where traditionally no medical imaging can be done. However, conventional ultrasonic imaging systems still rely on expensive PZT-based ultrasound probes that limit broader applications. In addition, the resolution of PZT based transducers is low due to the limitation in hand-fabrication methods of the piezoelectric ceramics.Capacitive Micromachined Ultrasonic Transducers (CMUTs) appears as an alternative to the piezoelectric (PZT) ceramic based transducer for ultrasound medical imaging. CMUTs show better ultrasound transducer design for batch fabrication, higher axial resolution of images, lower fabrication costs of the elements, ease of fabricating large arrays of cells using MEMS fabrication, and the extremely important potential to monolithically integrate the 2D transducer arrays directly with IC circuits for real-time 3D imaging.Currently most efforts on CMUTs are silicon based. Problems with current silicon-based CMUT designs include low pressure transmission and high-temperature fabrication processes. The pressure output from the silicon based CMUTs cells during transmission are too low when compared to commercially available PZT transducers, resulting in relatively blurry ultrasound images. The fabrication of the silicon-based cells, although easier than PZT transducers, still suffers from inevitable high temperature process and require specialized and expensive equipment. Manufacturing at an elevated temperature hinders the capability of fabricating front end analog processing IC circuits, thus it is difficult to achieve true 3D/4D imaging. Therefore novel low temperature fabrication with a low cost nature is needed. A polymer (Parylene) based CMUTs transducer has been investigated recently at UCF and aims to overcome limitations posted from the silicon based counterparts. This thesis describes the numerical simulation work and proposed fabrication steps of the Parylene based CMUT. The issue of transducer cost and pressure transmission is addressed by proposing the use of low cost and low temperature Chemical Vapor Deposition (CVD) fabrication of Parylene-C as the structural membrane plus graphene for the membrane electrodes. This study focuses mainly on comparing traditional silicon-based CMUT designs against the Parylene-C/Graphene CMUT based transducer, by using MEMS modules in COMSOL. For a fair comparison, single CMUT cells are modeled and held at a constant diameter and the similar operational frequency at the structural center. The numerical CMUT model is characterized for: collapse voltage, membrane deflection profile, center frequency, peak output pressure transmission over the membrane surface, and the sensitivity to the change in electrode surface charge. This study took the unique approaches in defining sensitivity of the CMUT by calculating the membrane response and the change in the electrode surface charge due to an incoming pressure wave. Optimal design has been achieved based on the simulation results. In comparison to silicon based CMUTs, the Parylene/Graphene based CMUT transducer produces 55% more in volume displacement and more than 35% in pressure output. The thesis has also laid out the detailed fabrication processes of the Parylene/Graphene based CMUT transducers. Parylene/Graphene based ultrasonic transducers can find wide applications in both medical imaging and Non destructive evaluation (NDE).
Show less - Date Issued
- 2012
- Identifier
- CFE0004333, ucf:49458
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004333
- Title
- 4D-CT Lung Registration and its Application for Lung Radiation Therapy.
- Creator
-
Min, Yugang, Pattanaik, Sumanta, Hughes, Charles, Foroosh, Hassan, Santhanam, Anand, University of Central Florida
- Abstract / Description
-
Radiation therapy has been successful in treating lung cancer patients, but its efficacy is limited by the inability to account for the respiratory motion during treatment planning and radiation dose delivery. Physics-based lung deformation models facilitate the motion computation of both tumor and local lung tissue during radiation therapy. In this dissertation, a novel method is discussed to accurately register 3D lungs across the respiratory phases from 4D-CT datasets, which facilitates...
Show moreRadiation therapy has been successful in treating lung cancer patients, but its efficacy is limited by the inability to account for the respiratory motion during treatment planning and radiation dose delivery. Physics-based lung deformation models facilitate the motion computation of both tumor and local lung tissue during radiation therapy. In this dissertation, a novel method is discussed to accurately register 3D lungs across the respiratory phases from 4D-CT datasets, which facilitates the estimation of the volumetric lung deformation models. This method uses multi-level and multi-resolution optical flow registration coupled with thin plate splines (TPS), to address registration issue of inconsistent intensity across respiratory phases. It achieves higher accuracy as compared to multi-resolution optical flow registration and other commonly used registration methods. Results of validation show that the lung registration is computed with 3 mm Target Registration Error (TRE) and approximately 3 mm Inverse Consistency Error (ICE). This registration method is further implemented in GPU based real time dose delivery simulation to assist radiation therapy planning.
Show less - Date Issued
- 2012
- Identifier
- CFE0004300, ucf:49464
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004300
- Title
- Super Resolution of Wavelet-Encoded Images and Videos.
- Creator
-
Atalay, Vildan, Foroosh, Hassan, Bagci, Ulas, Hughes, Charles, Pensky, Marianna, University of Central Florida
- Abstract / Description
-
In this dissertation, we address the multiframe super resolution reconstruction problem for wavelet-encoded images and videos. The goal of multiframe super resolution is to obtain one or more high resolution images by fusing a sequence of degraded or aliased low resolution images of the same scene. Since the low resolution images may be unaligned, a registration step is required before super resolution reconstruction. Therefore, we first explore in-band (i.e. in the wavelet-domain) image...
Show moreIn this dissertation, we address the multiframe super resolution reconstruction problem for wavelet-encoded images and videos. The goal of multiframe super resolution is to obtain one or more high resolution images by fusing a sequence of degraded or aliased low resolution images of the same scene. Since the low resolution images may be unaligned, a registration step is required before super resolution reconstruction. Therefore, we first explore in-band (i.e. in the wavelet-domain) image registration; then, investigate super resolution.Our motivation for analyzing the image registration and super resolution problems in the wavelet domain is the growing trend in wavelet-encoded imaging, and wavelet-encoding for image/video compression. Due to drawbacks of widely used discrete cosine transform in image and video compression, a considerable amount of literature is devoted to wavelet-based methods. However, since wavelets are shift-variant, existing methods cannot utilize wavelet subbands efficiently. In order to overcome this drawback, we establish and explore the direct relationship between the subbands under a translational shift, for image registration and super resolution. We then employ our devised in-band methodology, in a motion compensated video compression framework, to demonstrate the effective usage of wavelet subbands.Super resolution can also be used as a post-processing step in video compression in order to decrease the size of the video files to be compressed, with downsampling added as a pre-processing step. Therefore, we present a video compression scheme that utilizes super resolution to reconstruct the high frequency information lost during downsampling. In addition, super resolution is a crucial post-processing step for satellite imagery, due to the fact that it is hard to update imaging devices after a satellite is launched. Thus, we also demonstrate the usage of our devised methods in enhancing resolution of pansharpened multispectral images.
Show less - Date Issued
- 2017
- Identifier
- CFE0006854, ucf:51744
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006854