Current Search: Foroosh, Hassan (x)
View All Items
Pages
- Title
- Spatial and Temporal Modeling for Human Activity Recognition from Multimodal Sequential Data.
- Creator
-
Ye, Jun, Hua, Kien, Foroosh, Hassan, Zou, Changchun, Karwowski, Waldemar, University of Central Florida
- Abstract / Description
-
Human Activity Recognition (HAR) has been an intense research area for more than a decade. Different sensors, ranging from 2D and 3D cameras to accelerometers, gyroscopes, and magnetometers, have been employed to generate multimodal signals to detect various human activities. With the advancement of sensing technology and the popularity of mobile devices, depth cameras and wearable devices, such as Microsoft Kinect and smart wristbands, open a unprecedented opportunity to solve the...
Show moreHuman Activity Recognition (HAR) has been an intense research area for more than a decade. Different sensors, ranging from 2D and 3D cameras to accelerometers, gyroscopes, and magnetometers, have been employed to generate multimodal signals to detect various human activities. With the advancement of sensing technology and the popularity of mobile devices, depth cameras and wearable devices, such as Microsoft Kinect and smart wristbands, open a unprecedented opportunity to solve the challenging HAR problem by learning expressive representations from the multimodal signals recording huge amounts of daily activities which comprise a rich set of categories.Although competitive performance has been reported, existing methods focus on the statistical or spatial representation of the human activity sequence;while the internal temporal dynamics of the human activity sequence arenot sufficiently exploited. As a result, they often face the challenge of recognizing visually similar activities composed of dynamic patterns in different temporal order. In addition, many model-driven methods based on sophisticated features and carefully-designed classifiers are computationally demanding and unable to scale to a large dataset. In this dissertation, we propose to address these challenges from three different perspectives; namely, 3D spatial relationship modeling, dynamic temporal quantization, and temporal order encoding.We propose a novel octree-based algorithm for computing the 3D spatial relationships between objects from a 3D point cloud captured by a Kinect sensor. A set of 26 3D spatial directions are defined to describe the spatial relationship of an object with respect to a reference object. These 3D directions are implemented as a set of spatial operators, such as "AboveSouthEast" and "BelowNorthWest," of an event query language to query human activities in an indoor environment; for example, "A person walks in the hallway from north to south." The performance is quantitatively evaluated in a public RGBD object dataset and qualitatively investigated in a live video computing platform.In order to address the challenge of temporal modeling in human action recognition, we introduce the dynamic temporal quantization, a clustering-like algorithm to quantize human action sequences of varied lengths into fixed-size quantized vectors. A two-step optimization algorithm is proposed to jointly optimize the quantization of the original sequence. In the aggregation step, frames falling into the sample segment are aggregated by max-polling and produce the quantized representation of the segment. During the assignment step, frame-segment assignment is updated according to dynamic time warping, while the temporal order of the entire sequence is preserved. The proposed technique is evaluated on three public 3D human action datasets and achieves state-of-the-art performance.Finally, we propose a novel temporal order encoding approach that models the temporal dynamics of the sequential data for human activity recognition. The algorithm encodes the temporal order of the latent patterns extracted by the subspace projection and generates a highly compact First-Take-All (FTA) feature vector representing the entire sequential data. An optimization algorithm is further introduced to learn the optimized projections in order to increase the discriminative power of the FTA feature. The compactness of the FTA feature makes it extremely efficient for human activity recognition with nearest neighbor search based on Hamming distance. Experimental results on two public human activity datasets demonstrate the advantages of the FTA feature over state-of-the-art methods in both accuracy and efficiency.
Show less - Date Issued
- 2016
- Identifier
- CFE0006516, ucf:51367
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006516
- Title
- Gesture Assessment of Teachers in an Immersive Rehearsal Environment.
- Creator
-
Barmaki, Roghayeh, Hughes, Charles, Foroosh, Hassan, Sukthankar, Gita, Dieker, Lisa, University of Central Florida
- Abstract / Description
-
Interactive training environments typically include feedback mechanisms designed to help trainees improve their performance through either guided- or self-reflection. When the training system deals with human-to-human communications, as one would find in a teacher, counselor, enterprise culture or cross-cultural trainer, such feedback needs to focus on all aspects of human communication. This means that, in addition to verbal communication, nonverbal messages must be captured and analyzed for...
Show moreInteractive training environments typically include feedback mechanisms designed to help trainees improve their performance through either guided- or self-reflection. When the training system deals with human-to-human communications, as one would find in a teacher, counselor, enterprise culture or cross-cultural trainer, such feedback needs to focus on all aspects of human communication. This means that, in addition to verbal communication, nonverbal messages must be captured and analyzed for semantic meaning.?The goal of this dissertation is to employ machine-learning algorithms that semi-automate and, where supported, automate event tagging in training systems developed to improve human-to-human interaction. The specific context in which we prototype and validate these models is the TeachLivE teacher rehearsal environment developed at the University of Central Florida. The choice of this environment was governed by its availability, large user population, ?extensibility and existing reflection tools found within the AMITIES ??framework underlying the TeachLivE system.?Our contribution includes accuracy improvement of the existing data-driven gesture recognition utility from Microsoft; called Visual Gesture Builder. Using this proposed methodology and tracking sensors, we created a gesture database and used it for the implementation of our proposed online gesture recognition and feedback application. We also investigated multiple methods of feedback provision, including visual and haptics. The results from the conducted user studies indicate the positive impact of the proposed feedback applications and informed body language in teaching competency.In this dissertation, we describe the context in which the algorithms have been developed, the importance of recognizing nonverbal communication in this context, the means of providing semi- and fully-automated feedback associated with nonverbal messaging, and a series of preliminary studies developed to inform the research. Furthermore, we outline future research directions on new case studies, and multimodal annotation and analysis, in order to understand the synchrony of acoustic features and gestures in teaching context.
Show less - Date Issued
- 2016
- Identifier
- CFE0006260, ucf:51053
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006260
- Title
- Computerized Evaluatution of Microsurgery Skills Training.
- Creator
-
Jotwani, Payal, Foroosh, Hassan, Hughes, Charles, Hua, Kien, University of Central Florida
- Abstract / Description
-
The style of imparting medical training has evolved, over the years. The traditional methods of teaching and practicing basic surgical skills under apprenticeship model, no longer occupy the first place in modern technically demanding advanced surgical disciplines like neurosurgery. Furthermore, the legal and ethical concerns for patient safety as well as cost-effectiveness have forced neurosurgeons to master the necessary microsurgical techniques to accomplish desired results. This has lead...
Show moreThe style of imparting medical training has evolved, over the years. The traditional methods of teaching and practicing basic surgical skills under apprenticeship model, no longer occupy the first place in modern technically demanding advanced surgical disciplines like neurosurgery. Furthermore, the legal and ethical concerns for patient safety as well as cost-effectiveness have forced neurosurgeons to master the necessary microsurgical techniques to accomplish desired results. This has lead to increased emphasis on assessment of clinical and surgical techniques of the neurosurgeons. However, the subjective assessment of microsurgical techniques like micro-suturing under the apprenticeship model cannot be completely unbiased. A few initiatives using computer-based techniques, have been made to introduce objective evaluation of surgical skills.This thesis presents a novel approach involving computerized evaluation of different components of micro-suturing techniques, to eliminate the bias of subjective assessment. The work involved acquisition of cine clips of micro-suturing activity on synthetic material. Image processing and computer vision based techniques were then applied to these videos to assess different characteristics of micro-suturing viz. speed, dexterity and effectualness. In parallel subjective grading on these was done by a senior neurosurgeon. Further correlation and comparative study of both the assessments was done to analyze the efficacy of objective and subjective evaluation.
Show less - Date Issued
- 2015
- Identifier
- CFE0006221, ucf:51056
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006221
- Title
- Code Park: A New 3D Code Visualization Tool and IDE.
- Creator
-
Khaloo, Pooya, Laviola II, Joseph, Foroosh, Hassan, Leavens, Gary, University of Central Florida
- Abstract / Description
-
We introduce Code Park, a novel tool for visualizing codebases in a 3D game-like environment. Code Park aims to improve a programmer's understanding of an existing codebase in a manner that is both engaging and fun to be appealing especially for novice users such as students. It achieves these goals by laying out the codebase in a 3D park-like environment. Each class in the codebase is represented as a 3D room-like structure. Constituent parts of the class (variable, member functions, etc.)...
Show moreWe introduce Code Park, a novel tool for visualizing codebases in a 3D game-like environment. Code Park aims to improve a programmer's understanding of an existing codebase in a manner that is both engaging and fun to be appealing especially for novice users such as students. It achieves these goals by laying out the codebase in a 3D park-like environment. Each class in the codebase is represented as a 3D room-like structure. Constituent parts of the class (variable, member functions, etc.) are laid out on the walls, resembling a syntax-aware (")wallpaper("). The users can interact with the codebase using an overview, and a first-person viewer mode. They also can edit, compile and run code in this environment. We conducted three user studies to evaluate Code Park's usability and suitability for organizing an existing project. Our results indicate that Code Park is easy to get familiar with and significantly helps in code understanding. Further, the users unanimously believed that Code Park was an engaging tool to work with.
Show less - Date Issued
- 2017
- Identifier
- CFE0006752, ucf:51838
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006752
- Title
- Modeling User Transportation Patterns Using Mobile Devices.
- Creator
-
Davami, Erfan, Sukthankar, Gita, Gonzalez, Avelino, Foroosh, Hassan, Sukthankar, Rahul, University of Central Florida
- Abstract / Description
-
Participatory sensing frameworks use humans and their computing devices as a large mobile sensing network. Dramatic accessibility and affordability have turned mobile devices (smartphone and tablet computers) into the most popular computational machines in the world, exceeding laptops. By the end of 2013, more than 1.5 billion people on earth will have a smartphone. Increased coverage and higher speeds of cellular networks have given these devices the power to constantly stream large amounts...
Show moreParticipatory sensing frameworks use humans and their computing devices as a large mobile sensing network. Dramatic accessibility and affordability have turned mobile devices (smartphone and tablet computers) into the most popular computational machines in the world, exceeding laptops. By the end of 2013, more than 1.5 billion people on earth will have a smartphone. Increased coverage and higher speeds of cellular networks have given these devices the power to constantly stream large amounts of data.Most mobile devices are equipped with advanced sensors such as GPS, cameras, and microphones. This expansion of smartphone numbers and power has created a sensing system capable of achieving tasks practically impossible for conventional sensing platforms. One of the advantages of participatory sensing platforms is their mobility, since human users are often in motion. This dissertation presents a set of techniques for modeling and predicting user transportation patterns from cell-phone and social media check-ins. To study large-scale transportation patterns, I created a mobile phone app, Kpark, for estimating parking lot occupancy on the UCF campus. Kpark aggregates individual user reports on parking space availability to produce a global picture across all the campus lots using crowdsourcing. An issue with crowdsourcing is the possibility of receiving inaccurate information from users, either through error or malicious motivations. One method of combating this problem is to model the trustworthiness of individual participants to use that information to selectively include or discard data.This dissertation presents a comprehensive study of the performance of different worker quality and data fusion models with plausible simulated user populations, as well as an evaluation of their performance on the real data obtained from a full release of the Kpark app on the UCF Orlando campus. To evaluate individual trust prediction methods, an algorithm selection portfolio was introduced to take advantage of the strengths of each method and maximize the overall prediction performance.Like many other crowdsourced applications, user incentivization is an important aspect of creating a successful crowdsourcing workflow. For this project a form of non-monetized incentivization called gamification was used in order to create competition among users with the aim of increasing the quantity and quality of data submitted to the project. This dissertation reports on the performance of Kpark at predicting parking occupancy, increasing user app usage, and predicting worker quality.
Show less - Date Issued
- 2015
- Identifier
- CFE0005597, ucf:50258
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0005597
- Title
- Scene Understanding for Real Time Processing of Queries over Big Data Streaming Video.
- Creator
-
Aved, Alexander, Hua, Kien, Foroosh, Hassan, Zou, Changchun, Ni, Liqiang, University of Central Florida
- Abstract / Description
-
With heightened security concerns across the globe and the increasing need to monitor, preserve and protect infrastructure and public spaces to ensure proper operation, quality assurance and safety, numerous video cameras have been deployed. Accordingly, they also need to be monitored effectively and efficiently. However, relying on human operators to constantly monitor all the video streams is not scalable or cost effective. Humans can become subjective, fatigued, even exhibit bias and it is...
Show moreWith heightened security concerns across the globe and the increasing need to monitor, preserve and protect infrastructure and public spaces to ensure proper operation, quality assurance and safety, numerous video cameras have been deployed. Accordingly, they also need to be monitored effectively and efficiently. However, relying on human operators to constantly monitor all the video streams is not scalable or cost effective. Humans can become subjective, fatigued, even exhibit bias and it is difficult to maintain high levels of vigilance when capturing, searching and recognizing events that occur infrequently or in isolation.These limitations are addressed in the Live Video Database Management System (LVDBMS), a framework for managing and processing live motion imagery data. It enables rapid development of video surveillance software much like traditional database applications are developed today. Such developed video stream processing applications and ad hoc queries are able to "reuse" advanced image processing techniques that have been developed. This results in lower software development and maintenance costs. Furthermore, the LVDBMS can be intensively tested to ensure consistent quality across all associated video database applications. Its intrinsic privacy framework facilitates a formalized approach to the specification and enforcement of verifiable privacy policies. This is an important step towards enabling a general privacy certification for video surveillance systems by leveraging a standardized privacy specification language.With the potential to impact many important fields ranging from security and assembly line monitoring to wildlife studies and the environment, the broader impact of this work is clear. The privacy framework protects the general public from abusive use of surveillance technology; success in addressing the (")trust(") issue will enable many new surveillance-related applications. Although this research focuses on video surveillance, the proposed framework has the potential to support many video-based analytical applications.
Show less - Date Issued
- 2013
- Identifier
- CFE0004648, ucf:49900
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004648
- Title
- SketChart: A Pen-Based Tool for Chart Generation and Interaction.
- Creator
-
Vargas Gonzalez, Andres, Laviola II, Joseph, Foroosh, Hassan, Hua, Kien, University of Central Florida
- Abstract / Description
-
It has been shown that representing data with the right visualization increases the understanding of qualitative and quantitative information encoded in documents. However, current tools for generating such visualizations involve the use of traditional WIMP techniques, which perhaps makes free interaction and direct manipulation of the content harder. In this thesis, we present a pen-based prototype for data visualization using 10 different types of bar based charts. The prototype lets users...
Show moreIt has been shown that representing data with the right visualization increases the understanding of qualitative and quantitative information encoded in documents. However, current tools for generating such visualizations involve the use of traditional WIMP techniques, which perhaps makes free interaction and direct manipulation of the content harder. In this thesis, we present a pen-based prototype for data visualization using 10 different types of bar based charts. The prototype lets users sketch a chart and interact with the information once the drawing is identified. The prototype's user interface consists of an area to sketch and touch based elements that will be displayed depending on the context and nature of the outline. Brainstorming and live presentations can benefit from the prototype due to the ability to visualize and manipulate data in real time. We also perform a short, informal user study to measure effectiveness of the tool while recognizing sketches and users acceptance while interacting with the system. Results show SketChart strengths and weaknesses and areas for improvement.
Show less - Date Issued
- 2014
- Identifier
- CFE0005434, ucf:50405
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0005434
- Title
- Dictionary Learning for Image Analysis.
- Creator
-
Khan, Muhammad Nazar, Tappen, Marshall, Foroosh, Hassan, Stanley, Kenneth, Li, Xin, University of Central Florida
- Abstract / Description
-
In this thesis, we investigate the use of dictionary learning for discriminative tasks on natural images. Our contributions can be summarized as follows:1) We introduce discriminative deviation based learning to achieve principled handling of the reconstruction-discrimination tradeoff that is inherent to discriminative dictionary learning.2) Since natural images obey a strong smoothness prior, we show how spatial smoothness constraints can be incorporated into the learning formulation by...
Show moreIn this thesis, we investigate the use of dictionary learning for discriminative tasks on natural images. Our contributions can be summarized as follows:1) We introduce discriminative deviation based learning to achieve principled handling of the reconstruction-discrimination tradeoff that is inherent to discriminative dictionary learning.2) Since natural images obey a strong smoothness prior, we show how spatial smoothness constraints can be incorporated into the learning formulation by embedding dictionary learning into Conditional Random Field (CRF) learning. We demonstrate that such smoothness constraints can lead to state-of-the-art performance for pixel-classification tasks.3) Finally, we lay down the foundations of super-latent learning. By treating sparse codes on a CRF as latent variables, dictionary learning can also be performed via the Latent (Structural) SVM formulation for jointly learning a classifier over the sparse codes. The dictionary is treated as a super-latent variable that generates the latent variables.
Show less - Date Issued
- 2013
- Identifier
- CFE0004701, ucf:49844
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004701
- Title
- Fast Compressed Automatic Target Recognition for a Compressive Infrared Imager.
- Creator
-
Millikan, Brian, Foroosh, Hassan, Rahnavard, Nazanin, Muise, Robert, Atia, George, Mahalanobis, Abhijit, Sun, Qiyu, University of Central Florida
- Abstract / Description
-
Many military systems utilize infrared sensors which allow an operator to see targets at night. Several of these are either mid-wave or long-wave high resolution infrared sensors, which are expensive to manufacture. But compressive sensing, which has primarily been demonstrated in medical applications, can be used to minimize the number of measurements needed to represent a high-resolution image. Using these techniques, a relatively low cost mid-wave infrared sensor can be realized which has...
Show moreMany military systems utilize infrared sensors which allow an operator to see targets at night. Several of these are either mid-wave or long-wave high resolution infrared sensors, which are expensive to manufacture. But compressive sensing, which has primarily been demonstrated in medical applications, can be used to minimize the number of measurements needed to represent a high-resolution image. Using these techniques, a relatively low cost mid-wave infrared sensor can be realized which has a high effective resolution. In traditional military infrared sensing applications, like targeting systems, automatic targeting recognition algorithms are employed to locate and identify targets of interest to reduce the burden on the operator. The resolution of the sensor can increase the accuracy and operational range of a targeting system. When using a compressive sensing infrared sensor, traditional decompression techniques can be applied to form a spatial-domain infrared image, but most are iterative and not ideal for real-time environments. A more efficient method is to adapt the target recognition algorithms to operate directly on the compressed samples. In this work, we will present a target recognition algorithm which utilizes a compressed target detection method to identify potential target areas and then a specialized target recognition technique that operates directly on the same compressed samples. We will demonstrate our method on the U.S. Army Night Vision and Electronic Sensors Directorate ATR Algorithm Development Image Database which has been made available by the Sensing Information Analysis Center.
Show less - Date Issued
- 2018
- Identifier
- CFE0007408, ucf:52739
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0007408
- Title
- Sampling and Subspace Methods for Learning Sparse Group Structures in Computer Vision.
- Creator
-
Jaberi, Maryam, Foroosh, Hassan, Pensky, Marianna, Gong, Boqing, Qi, GuoJun, Pensky, Marianna, University of Central Florida
- Abstract / Description
-
The unprecedented growth of data in volume and dimension has led to an increased number of computationally-demanding and data-driven decision-making methods in many disciplines, such as computer vision, genomics, finance, etc. Research on big data aims to understand and describe trends in massive volumes of high-dimensional data. High volume and dimension are the determining factors in both computational and time complexity of algorithms. The challenge grows when the data are formed of the...
Show moreThe unprecedented growth of data in volume and dimension has led to an increased number of computationally-demanding and data-driven decision-making methods in many disciplines, such as computer vision, genomics, finance, etc. Research on big data aims to understand and describe trends in massive volumes of high-dimensional data. High volume and dimension are the determining factors in both computational and time complexity of algorithms. The challenge grows when the data are formed of the union of group-structures of different dimensions embedded in a high-dimensional ambient space.To address the problem of high volume, we propose a sampling method referred to as the Sparse Withdrawal of Inliers in a First Trial (SWIFT), which determines the smallest sample size in one grab so that all group-structures are adequately represented and discovered with high probability. The key features of SWIFT are: (i) sparsity, which is independent of the population size; (ii) no prior knowledge of the distribution of data, or the number of underlying group-structures; and (iii) robustness in the presence of an overwhelming number of outliers. We report a comprehensive study of the proposed sampling method in terms of accuracy, functionality, and effectiveness in reducing the computational cost in various applications of computer vision. In the second part of this dissertation, we study dimensionality reduction for multi-structural data. We propose a probabilistic subspace clustering method that unifies soft- and hard-clustering in a single framework. This is achieved by introducing a delayed association of uncertain points to subspaces of lower dimensions based on a confidence measure. Delayed association yields higher accuracy in clustering subspaces that have ambiguities, i.e. due to intersections and high-level of outliers/noise, and hence leads to more accurate self-representation of underlying subspaces. Altogether, this dissertation addresses the key theoretical and practically issues of size and dimension in big data analysis.
Show less - Date Issued
- 2018
- Identifier
- CFE0007017, ucf:52039
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0007017
- Title
- Personalized Digital Body: Enhancing Body Ownership and Spatial Presence in Virtual Reality.
- Creator
-
Jung, Sungchul, Hughes, Charles, Foroosh, Hassan, Wisniewski, Pamela, Bruder, Gerd, Sandor, Christian, University of Central Florida
- Abstract / Description
-
person's sense of acceptance of a virtual body as his or her own is generally called virtual body ownership (VBOI). Having such a mental model of one's own body transferred to a virtual human surrogate is known to play a critical role in one's sense of presence in a virtual environment. Our focus in this dissertation is on top-down processing based on visual perception in both the visuomotor and the visuotactile domains, using visually personalized body cues. The visual cues we study here...
Show moreperson's sense of acceptance of a virtual body as his or her own is generally called virtual body ownership (VBOI). Having such a mental model of one's own body transferred to a virtual human surrogate is known to play a critical role in one's sense of presence in a virtual environment. Our focus in this dissertation is on top-down processing based on visual perception in both the visuomotor and the visuotactile domains, using visually personalized body cues. The visual cues we study here range from ones that we refer to as direct and others that we classify as indirect. Direct cues are associated with body parts that play a central role in the task we are performing. Such parts typically dominate a person's foveal view and will include one or both of their hands. Indirect body cues come from body parts that are normally seen in our peripheral view, e.g., legs and torso, and that are often observed through some mediation and are not directly associated with the current task.This dissertation studies how and to what degree direct and indirect cues affect a person's sense of VBOI for which they are receiving direct and, sometimes, inaccurate cues, and to investigate the relationship between enhanced virtual body ownership and task performance. Our experiments support the importance of a personalized representation, even for indirect cues. Additionally, we studied gradual versus instantaneous transition between one's own body and a virtual surrogate body, and between one's real-world environment and a virtual environment. We demonstrate that gradual transition has a significant influence on virtual body ownership and presence. In a follow-on study, we increase fidelity by using a personalized hand. Here, we demonstrate that a personalized hand significantly improves dominant visual illusions, resulting in more accurate perception of virtual object sizes.
Show less - Date Issued
- 2018
- Identifier
- CFE0007024, ucf:52033
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0007024
- Title
- Estimation and clustering in statistical ill-posed linear inverse problems.
- Creator
-
Rajapakshage, Rasika, Pensky, Marianna, Swanson, Jason, Zhang, Teng, Bagci, Ulas, Foroosh, Hassan, University of Central Florida
- Abstract / Description
-
The main focus of the dissertation is estimation and clustering in statistical ill-posed linear inverse problems. The dissertation deals with a problem of simultaneously estimating a collection of solutions of ill-posed linear inverse problems from their noisy images under an operator that does not have a bounded inverse, when the solutions are related in a certain way. The dissertation defense consists of three parts. In the first part, the collection consists of measurements of temporal...
Show moreThe main focus of the dissertation is estimation and clustering in statistical ill-posed linear inverse problems. The dissertation deals with a problem of simultaneously estimating a collection of solutions of ill-posed linear inverse problems from their noisy images under an operator that does not have a bounded inverse, when the solutions are related in a certain way. The dissertation defense consists of three parts. In the first part, the collection consists of measurements of temporal functions at various spatial locations. In particular, we studythe problem of estimating a three-dimensional function based on observations of its noisy Laplace convolution. In the second part, we recover classes of similar curves when the class memberships are unknown. Problems of this kind appear in many areas of application where clustering is carried out at the pre-processing step and then the inverse problem is solved for each of the cluster averages separately. As a result, the errors of the procedures are usually examined for the estimation step only. In both parts, we construct the estimators, study their minimax optimality and evaluate their performance via a limited simulation study. In the third part, we propose a new computational platform to better understand the patterns of R-fMRI by taking into account the challenge of inevitable signal fluctuations and interpretthe success of dynamic functional connectivity approaches. Towards this, we revisit an auto-regressive and vector auto-regressive signal modeling approach for estimating temporal changes of the signal in brain regions. We then generate inverse covariance matrices fromthe generated windows and use a non-parametric statistical approach to select significant features. Finally, we use Lasso to perform classification of the data. The effectiveness of theproposed method is evidenced in the classification of R-fMRI scans
Show less - Date Issued
- 2019
- Identifier
- CFE0007710, ucf:52450
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0007710
- Title
- Exploring Natural User Abstractions For Shared Perceptual Manipulator Task Modeling (&) Recovery.
- Creator
-
Koh, Senglee, Laviola II, Joseph, Foroosh, Hassan, Zhang, Shaojie, Kim, Si Jung, University of Central Florida
- Abstract / Description
-
State-of-the-art domestic robot assistants are essentially autonomous mobile manipulators capable of exerting human-scale precision grasps. To maximize utility and economy, non-technical end-users would need to be nearly as efficient as trained roboticists in control and collaboration of manipulation task behaviors. However, it remains a significant challenge given that many WIMP-style tools require superficial proficiency in robotics, 3D graphics, and computer science for rapid task modeling...
Show moreState-of-the-art domestic robot assistants are essentially autonomous mobile manipulators capable of exerting human-scale precision grasps. To maximize utility and economy, non-technical end-users would need to be nearly as efficient as trained roboticists in control and collaboration of manipulation task behaviors. However, it remains a significant challenge given that many WIMP-style tools require superficial proficiency in robotics, 3D graphics, and computer science for rapid task modeling and recovery. But research on robot-centric collaboration has garnered momentum in recent years; robots are now planning in partially observable environments that maintain geometries and semantic maps, presenting opportunities for non-experts to cooperatively control task behavior with autonomous-planning agents exploiting the knowledge. However, as autonomous systems are not immune to errors under perceptual difficulty, a human-in-the-loop is needed to bias autonomous-planning towards recovery conditions that resume the task and avoid similar errors.In this work, we explore interactive techniques allowing non-technical users to model task behaviors and perceive cooperatively with a service robot under robot-centric collaboration. We evaluate stylus and touch modalities that users can intuitively and effectively convey natural abstractions of high-level tasks, semantic revisions, and geometries about the world. Experiments are conducted with `pick-and-place' tasks in an ideal `Blocks World' environment using a Kinova JACO six degree-of-freedom manipulator. Possibilities for the architecture and interface are demonstrated with the following features; (1) Semantic `Object' and `Location' grounding that describe function and ambiguous geometries (2) Task specification with an unordered list of goal predicates, and (3) Guiding task recovery with implied scene geometries and trajectory via symmetry cues and configuration space abstraction. Empirical results from four user studies show our interface was much preferred than the control condition, demonstrating high learnability and ease-of-use that enable our non-technical participants to model complex tasks, provide effective recovery assistance, and teleoperative control.
Show less - Date Issued
- 2018
- Identifier
- CFE0007209, ucf:52292
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0007209
- Title
- Gradient based MRF learning for image restoration and segmentation.
- Creator
-
Samuel, Kegan, Tappen, Marshall, Da Vitoria Lobo, Niels, Foroosh, Hassan, Li, Xin, University of Central Florida
- Abstract / Description
-
The undirected graphical model or Markov Random Field (MRF) is one of the more popular models used in computer vision and is the type of model with which this work is concerned. Models based on these methods have proven to be particularly useful in low-level vision systems and have led to state-of-the-art results for MRF-based systems. The research presented will describe a new discriminative training algorithm and its implementation.The MRF model will be trained by optimizing its parameters...
Show moreThe undirected graphical model or Markov Random Field (MRF) is one of the more popular models used in computer vision and is the type of model with which this work is concerned. Models based on these methods have proven to be particularly useful in low-level vision systems and have led to state-of-the-art results for MRF-based systems. The research presented will describe a new discriminative training algorithm and its implementation.The MRF model will be trained by optimizing its parameters so that the minimum energy solution of the model is as similar as possible to the ground-truth. While previous work has relied on time-consuming iterative approximations or stochastic approximations, this work will demonstrate how implicit differentiation can be used to analytically differentiate the overall training loss with respect to the MRF parameters. This framework leads to an efficient, flexible learning algorithm that can be applied to a number of different models.The effectiveness of the proposed learning method will then be demonstrated by learning the parameters of two related models applied to the task of denoising images. The experimental results will demonstrate that the proposed learning algorithm is comparable and, at times, better than previous training methods applied to the same tasks.A new segmentation model will also be introduced and trained using the proposed learning method. The proposed segmentation model is based on an energy minimization framework that is novel in how it incorporates priors on the size of the segments in a way that is straightforward to implement. While other methods, such as normalized cuts, tend to produce segmentations of similar sizes, this method is able to overcome that problem and produce more realistic segmentations.
Show less - Date Issued
- 2012
- Identifier
- CFE0004595, ucf:49207
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004595
- Title
- Human Group Behavior Modeling for Virtual Worlds.
- Creator
-
Shah, Syed Fahad Allam, Sukthankar, Gita, Georgiopoulos, Michael, Foroosh, Hassan, Anagnostopoulos, Georgios, University of Central Florida
- Abstract / Description
-
Virtual worlds and massively-multiplayer online games are rich sources of information about large-scale teams and groups, offering the tantalizing possibility of harvesting data about group formation, social networks, and network evolution. They provide new outlets for human social interaction that differ from both face-to-face interactions and non-physically-embodied social networking tools such as Facebook and Twitter. We aim to study group dynamics in these virtual worlds by collecting and...
Show moreVirtual worlds and massively-multiplayer online games are rich sources of information about large-scale teams and groups, offering the tantalizing possibility of harvesting data about group formation, social networks, and network evolution. They provide new outlets for human social interaction that differ from both face-to-face interactions and non-physically-embodied social networking tools such as Facebook and Twitter. We aim to study group dynamics in these virtual worlds by collecting and analyzing public conversational patterns of users grouped in close physical proximity. To do this, we created a set of tools for monitoring, partitioning, and analyzing unstructured conversations between changing groups of participants in Second Life, a massively multi-player online user-constructed environment that allows users to construct and inhabit their own 3D world. Although there are some cues in the dialog, determining social interactions from unstructured chat data alone is a difficult problem, since these environments lack many of the cues that facilitate natural language processing in other conversational settings and different types of social media. Public chat data often features players who speak simultaneously, use jargon and emoticons, and only erratically adhere to conversational norms.Humans are adept social animals capable of identifying friendship groups from a combination of linguistic cues and social network patterns. But what is more important, the content of what people say or their history of social interactions? Moreover, is it possible to identify whether people are part of a group with changing membership merely from general network properties, such as measures of centrality and latent communities? These are the questions that we aim to answer in this thesis. The contributions of this thesis include: 1) a link prediction algorithm for identifying friendship relationships from unstructured chat data 2) a method for identifying social groups based on the results of community detection and topic analysis.The output of these two algorithms (links and group membership) are useful for studying a variety of research questions about human behavior in virtual worlds. To demonstrate this we have performed a longitudinal analysis of human groups in different regions of the Second Life virtual world. We believe that studies performed with our tools in virtual worlds will be a useful stepping stone toward creating a rich computational model of human group dynamics.
Show less - Date Issued
- 2011
- Identifier
- CFE0004164, ucf:49074
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004164
- Title
- Human Action Localization and Recognition in Unconstrained Videos.
- Creator
-
Boyraz, Hakan, Tappen, Marshall, Foroosh, Hassan, Lin, Mingjie, Zhang, Shaojie, Sukthankar, Rahul, University of Central Florida
- Abstract / Description
-
As imaging systems become ubiquitous, the ability to recognize human actions is becoming increasingly important. Just as in the object detection and recognition literature, action recognition can be roughly divided into classification tasks, where the goal is to classify a video according to the action depicted in the video, and detection tasks, where the goal is to detect and localize a human performing a particular action. A growing literature is demonstrating the benefits of localizing...
Show moreAs imaging systems become ubiquitous, the ability to recognize human actions is becoming increasingly important. Just as in the object detection and recognition literature, action recognition can be roughly divided into classification tasks, where the goal is to classify a video according to the action depicted in the video, and detection tasks, where the goal is to detect and localize a human performing a particular action. A growing literature is demonstrating the benefits of localizing discriminative sub-regions of images and videos when performing recognition tasks. In this thesis, we address the action detection and recognition problems. Action detection in video is a particularly difficult problem because actions must not only be recognized correctly, but must also be localized in the 3D spatio-temporal volume. We introduce a technique that transforms the 3D localization problem into a series of 2D detection tasks. This is accomplished by dividing the video into overlapping segments, then representing each segment with a 2D video projection. The advantage of the 2D projection is that it makes it convenient to apply the best techniques from object detection to the action detection problem. We also introduce a novel, straightforward method for searching the 2D projections to localize actions, termed Two-Point Subwindow Search (TPSS). Finally, we show how to connect the local detections in time using a chaining algorithm to identify the entire extent of the action. Our experiments show that video projection outperforms the latest results on action detection in a direct comparison.Second, we present a probabilistic model learning to identify discriminative regions in videos from weakly-supervised data where each video clip is only assigned a label describing what action is present in the frame or clip. While our first system requires every action to be manually outlined in every frame of the video, this second system only requires that the video be given a single high-level tag. From this data, the system is able to identify discriminative regions that correspond well to the regions containing the actual actions. Our experiments on both the MSR Action Dataset II and UCF Sports Dataset show that the localizations produced by this weakly supervised system are comparable in quality to localizations produced by systems that require each frame to be manually annotated. This system is able to detect actions in both 1) non-temporally segmented action videos and 2) recognition tasks where a single label is assigned to the clip. We also demonstrate the action recognition performance of our method on two complex datasets, i.e. HMDB and UCF101. Third, we extend our weakly-supervised framework by replacing the recognition stage with a two-stage neural network and apply dropout for preventing overfitting of the parameters on the training data. Dropout technique has been recently introduced to prevent overfitting of the parameters in deep neural networks and it has been applied successfully to object recognition problem. To our knowledge, this is the first system using dropout for action recognition problem. We demonstrate that using dropout improves the action recognition accuracies on HMDB and UCF101 datasets.
Show less - Date Issued
- 2013
- Identifier
- CFE0004977, ucf:49562
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004977
- Title
- A study of holistic strategies for the recognition of characters in natural scene images.
- Creator
-
Ali, Muhammad, Foroosh, Hassan, Hughes, Charles, Sukthankar, Gita, Wiegand, Rudolf, Yun, Hae-Bum, University of Central Florida
- Abstract / Description
-
Recognition and understanding of text in scene images is an important and challenging task. The importance can be seen in the context of tasks such as assisted navigation for the blind, providing directions to driverless cars, e.g. Google car, etc. Other applications include automated document archival services, mining text from images, and so on. The challenge comes from a variety of factors, like variable typefaces, uncontrolled imaging conditions, and various sources of noise corrupting...
Show moreRecognition and understanding of text in scene images is an important and challenging task. The importance can be seen in the context of tasks such as assisted navigation for the blind, providing directions to driverless cars, e.g. Google car, etc. Other applications include automated document archival services, mining text from images, and so on. The challenge comes from a variety of factors, like variable typefaces, uncontrolled imaging conditions, and various sources of noise corrupting the captured images. In this work, we study and address the fundamental problem of recognition of characters extracted from natural scene images, and contribute three holistic strategies to deal with this challenging task. Scene text recognition (STR) has been a known problem in computer vision and pattern recognition community for over two decades, and is still an active area of research owing to the fact that the recognition performance has still got a lot of room for improvement. Recognition of characters lies at the heart of STR and is a crucial component for a reliable STR system. Most of the current methods heavily rely on discriminative power of local features, such as histograms of oriented gradient (HoG), scale invariant feature transform (SIFT), shape contexts (SC), geometric blur (GB), etc. One of the problems with such methods is that the local features are rasterized in an ad hoc manner to get a single vector for subsequent use in recognition. This rearrangement of features clearly perturbs the spatial correlations that may carry crucial information vis-(&)#224;-vis recognition. Moreover, such approaches, in general, do not take into account the rotational invariance property that often leads to failed recognition in cases where characters in scene images do not occur in upright position. To eliminate this local feature dependency and the associated problems, we propose the following three holistic solutions: The first one is based on modelling character images of a class as a 3-mode tensor and then factoring it into a set of rank-1 matrices and the associated mixing coefficients. Each set of rank-1 matrices spans the solution subspace of a specific image class and enables us to capture the required holistic signature for each character class along with the mixing coefficients associated with each character image. During recognition, we project each test image onto the candidate subspaces to derive its mixing coefficients, which are eventually used for final classification.The second approach we study in this work lets us form a novel holistic feature for character recognition based on active contour model, also known as snakes. Our feature vector is based on two variables, direction and distance, cumulatively traversed by each point as the initial circular contour evolves under the force field induced by the character image. The initial contour design in conjunction with cross-correlation based similarity metric enables us to account for rotational variance in the character image. Our third approach is based on modelling a 3-mode tensor via rotation of a single image. This is different from our tensor based approach described above in that we form the tensor using a single image instead of collecting a specific number of samples of a particular class. In this case, to generate a 3D image cube, we rotate an image through a predefined range of angles. This enables us to explicitly capture rotational variance and leads to better performance than various local approaches.Finally, as an application, we use our holistic model to recognize word images extracted from natural scenes. Here we first use our novel word segmentation method based on image seam analysis to split a scene word into individual character images. We then apply our holistic model to recognize individual letters and use a spell-checker module to get the final word prediction. Throughout our work, we employ popular scene text datasets, like Chars74K-Font, Chars74K-Image, SVT, and ICDAR03, which include synthetic and natural image sets, to test the performance of our strategies. We compare results of our recognition models with several baseline methods and show comparable or better performance than several local feature-based methods justifying thus the importance of holistic strategies.
Show less - Date Issued
- 2016
- Identifier
- CFE0006247, ucf:51076
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006247
- Title
- The WOZ Recognizer: A Tool For Understanding User Perceptions of Sketch-Based Interfaces.
- Creator
-
Bott, Jared, Laviola II, Joseph, Hughes, Charles, Foroosh, Hassan, Lank, Edward, University of Central Florida
- Abstract / Description
-
Sketch recognition has the potential to be an important input method for computers in the coming years; however, designing and building an accurate and sophisticated sketch recognition system is a time consuming and daunting task. Since sketch recognition is still at a level where mistakes are common, it is important to understand how users perceive and tolerate recognition errors and other user interface elements with these imperfect systems. A problem in performing this type of research is...
Show moreSketch recognition has the potential to be an important input method for computers in the coming years; however, designing and building an accurate and sophisticated sketch recognition system is a time consuming and daunting task. Since sketch recognition is still at a level where mistakes are common, it is important to understand how users perceive and tolerate recognition errors and other user interface elements with these imperfect systems. A problem in performing this type of research is that we cannot easily control aspects of recognition in order to rigorously study the systems. We performed a study examining user perceptions of three pen-based systems for creating logic gate diagrams: a sketch-based interface, a WIMP-based interface, and a hybrid interface that combined elements of sketching and WIMP. We found that users preferred the sketch-based interface and we identified important criteria for pen-based application design. This work exposed the issue of studying recognition systems without fine-grained control over accuracy, recognition mode, and other recognizer properties. In order to solve this problem, we developed a Wizard of Oz sketch recognition tool, the WOZ Recognizer, that supports controlled symbol and position accuracy and batch and streaming recognition modes for a variety of sketching domains. We present the design of the WOZ Recognizer, modeling recognition domains using graphs, symbol alphabets, and grammars; and discuss the types of recognition errors we included in its design. Further, we discuss how the WOZ Recognizer simulates sketch recognition, controlling the WOZ Recognizer, and how users interact with it. In addition, we present an evaluative user study of the WOZ Recognizer and the lessons we learned.We have used the WOZ Recognizer to perform two user studies examining user perceptions of sketch recognition; both studies focused on mathematical sketching. In the first study, we examined whether users prefer recognition feedback now (real-time recognition) or later (batch recognition) in relation to different recognition accuracies and sketch complexities. We found that participants displayed a preference for real-time recognition in some situations (multiple expressions, low accuracy), but no statistical preference in others. In our second study, we examined whether users displayed a greater tolerance for recognition errors when they used mathematical sketching applications they found interesting or useful compared to applications they found less interesting. Participants felt they had a greater tolerance for the applications they preferred, although our statistical analysis did not positively support this.In addition to the research already performed, we propose several avenues for future research into user perceptions of sketch recognition that we believe will be of value to sketch recognizer researchers and application designers.
Show less - Date Issued
- 2016
- Identifier
- CFE0006077, ucf:50945
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006077
- Title
- Improving Efficiency in Deep Learning for Large Scale Visual Recognition.
- Creator
-
Liu, Baoyuan, Foroosh, Hassan, Qi, GuoJun, Welch, Gregory, Sukthankar, Rahul, Pensky, Marianna, University of Central Florida
- Abstract / Description
-
The emerging recent large scale visual recognition methods, and in particular the deep Convolutional Neural Networks (CNN), are promising to revolutionize many computer vision based artificial intelligent applications, such as autonomous driving and online image retrieval systems. One of the main challenges in large scale visual recognition is the complexity of the corresponding algorithms. This is further exacerbated by the fact that in most real-world scenarios they need to run in real time...
Show moreThe emerging recent large scale visual recognition methods, and in particular the deep Convolutional Neural Networks (CNN), are promising to revolutionize many computer vision based artificial intelligent applications, such as autonomous driving and online image retrieval systems. One of the main challenges in large scale visual recognition is the complexity of the corresponding algorithms. This is further exacerbated by the fact that in most real-world scenarios they need to run in real time and on platforms that have limited computational resources. This dissertation focuses on improving the efficiency of such large scale visual recognition algorithms from several perspectives. First, to reduce the complexity of large scale classification to sub-linear with the number of classes, a probabilistic label tree framework is proposed. A test sample is classified by traversing the label tree from the root node. Each node in the tree is associated with a probabilistic estimation of all the labels. The tree is learned recursively with iterative maximum likelihood optimization. Comparing to the hard label partition proposed previously, the probabilistic framework performs classification more accurately with similar efficiency. Second, we explore the redundancy of parameters in Convolutional Neural Networks (CNN) and employ sparse decomposition to significantly reduce both the amount of parameters and computational complexity. Both inter-channel and inner-channel redundancy is exploit to achieve more than 90\% sparsity with approximately 1\% drop of classification accuracy. We also propose a CPU based efficient sparse matrix multiplication algorithm to reduce the actual running time of CNN models with sparse convolutional kernels. Third, we propose a multi-stage framework based on CNN to achieve better efficiency than a single traditional CNN model. With a combination of cascade model and the label tree framework, the proposed method divides the input images in both the image space and the label space, and processes each image with CNN models that are most suitable and efficient. The average complexity of the framework is significantly reduced, while the overall accuracy remains the same as in the single complex model.
Show less - Date Issued
- 2016
- Identifier
- CFE0006472, ucf:51436
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006472
- Title
- Confluence of Vision and Natural Language Processing for Cross-media Semantic Relations Extraction.
- Creator
-
Tariq, Amara, Foroosh, Hassan, Qi, GuoJun, Gonzalez, Avelino, Pensky, Marianna, University of Central Florida
- Abstract / Description
-
In this dissertation, we focus on extracting and understanding semantically meaningful relationshipsbetween data items of various modalities; especially relations between images and naturallanguage. We explore the ideas and techniques to integrate such cross-media semantic relationsfor machine understanding of large heterogeneous datasets, made available through the expansionof the World Wide Web. The datasets collected from social media websites, news media outletsand blogging platforms...
Show moreIn this dissertation, we focus on extracting and understanding semantically meaningful relationshipsbetween data items of various modalities; especially relations between images and naturallanguage. We explore the ideas and techniques to integrate such cross-media semantic relationsfor machine understanding of large heterogeneous datasets, made available through the expansionof the World Wide Web. The datasets collected from social media websites, news media outletsand blogging platforms usually contain multiple modalities of data. Intelligent systems are needed to automatically make sense out of these datasets and present them in such a way that humans can find the relevant pieces of information or get a summary of the available material. Such systems have to process multiple modalities of data such as images, text, linguistic features, and structured data in reference to each other. For example, image and video search and retrieval engines are required to understand the relations between visual and textual data so that they can provide relevant answers in the form of images and videos to the users' queries presented in the form of text.We emphasize the automatic extraction of semantic topics or concepts from the data available in any form such as images, free-flowing text or metadata. These semantic concepts/topics become the basis of semantic relations across heterogeneous data types, e.g., visual and textual data. A classic problem involving image-text relations is the automatic generation of textual descriptions of images. This problem is the main focus of our work. In many cases, large amount of text is associated with images. Deep exploration of linguistic features of such text is required to fully utilize the semantic information encoded in it. A news dataset involving images and news articles is an example of this scenario. We devise frameworks for automatic news image description generation based on the semantic relations of images, as well as semantic understanding of linguistic features of the news articles.
Show less - Date Issued
- 2016
- Identifier
- CFE0006507, ucf:51401
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006507