Current Search: Shah, Mubarak (x)
View All Items
Pages
- Title
- Visual Saliency Detection and Semantic Segmentation.
- Creator
-
Souly, Nasim, Shah, Mubarak, Bagci, Ulas, Qi, GuoJun, Pensky, Marianna, University of Central Florida
- Abstract / Description
-
Visual saliency is the ability to select the most relevant data in the scene and reduce the amount of data that needs to be processed. We propose a novel unsupervised approach to detect visual saliency in videos. For this, we employ a hierarchical segmentation technique to obtain supervoxels of a video, and simultaneously, we build a dictionary from cuboids of the video. Then we create a feature matrix from coefficients of dictionary elements. Next, we decompose this matrix into sparse and...
Show moreVisual saliency is the ability to select the most relevant data in the scene and reduce the amount of data that needs to be processed. We propose a novel unsupervised approach to detect visual saliency in videos. For this, we employ a hierarchical segmentation technique to obtain supervoxels of a video, and simultaneously, we build a dictionary from cuboids of the video. Then we create a feature matrix from coefficients of dictionary elements. Next, we decompose this matrix into sparse and redundant parts and obtain salient regions using group lasso. Our experiments provide promising results in terms of predicting eye movement. Moreover, we apply our method on action recognition task and achieve better results. Saliency detection only highlights important regions, in Semantic Segmentation, the aim is to assign a semantic label to each pixel in the image. Even though semantic segmentation can be achieved by simply applying classifiers to each pixel or a region, the results may not be desirable since general context information is not considered. To address this issue, we propose two supervised methods. First, an approach to discover interactions between labels and regions using a sparse estimation of precision matrix obtained by graphical lasso. Second, a knowledge-based method to incorporate dependencies among regions in the image during inference. High-level knowledge rules - such as co-occurrence- are extracted from training data and transformed into constraints in Integer Programming formulation. A difficulty in the most supervised semantic segmentation approaches is the lack of enough training data. To address this, a semi-supervised learning approach to exploit the plentiful amount of available unlabeled,as well as synthetic images generated via Generative Adversarial Networks (GAN), is presented. Furthermore, an extension of the proposed model to use additional weakly labeled data is proposed. We demonstrate our approaches on three challenging bench-marking datasets
Show less - Date Issued
- 2017
- Identifier
- CFE0006918, ucf:51694
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006918
- Title
- Online, Supervised and Unsupervised Action Localization in Videos.
- Creator
-
Soomro, Khurram, Shah, Mubarak, Heinrich, Mark, Hu, Haiyan, Bagci, Ulas, Yun, Hae-Bum, University of Central Florida
- Abstract / Description
-
Action recognition classifies a given video among a set of action labels, whereas action localization determines the location of an action in addition to its class. The overall aim of this dissertation is action localization. Many of the existing action localization approaches exhaustively search (spatially and temporally) for an action in a video. However, as the search space increases with high resolution and longer duration videos, it becomes impractical to use such sliding window...
Show moreAction recognition classifies a given video among a set of action labels, whereas action localization determines the location of an action in addition to its class. The overall aim of this dissertation is action localization. Many of the existing action localization approaches exhaustively search (spatially and temporally) for an action in a video. However, as the search space increases with high resolution and longer duration videos, it becomes impractical to use such sliding window techniques. The first part of this dissertation presents an efficient approach for localizing actions by learning contextual relations between different video regions in training. In testing, we use the context information to estimate the probability of each supervoxel belonging to the foreground action and use Conditional Random Field (CRF) to localize actions. In the above method and typical approaches to this problem, localization is performed in an offline manner where all the video frames are processed together. This prevents timely localization and prediction of actions/interactions - an important consideration for many tasks including surveillance and human-machine interaction. Therefore, in the second part of this dissertation we propose an online approach to the challenging problem of localization and prediction of actions/interactions in videos. In this approach, we use human poses and superpixels in each frame to train discriminative appearance models and perform online prediction of actions/interactions with Structural SVM. Above two approaches rely on human supervision in the form of assigning action class labels to videos and annotating actor bounding boxes in each frame of training videos. Therefore, in the third part of this dissertation we address the problem of unsupervised action localization. Given unlabeled videos without annotations, this approach aims at: 1) Discovering action classes using a discriminative clustering approach, and 2) Localizing actions using a variant of Knapsack problem.
Show less - Date Issued
- 2017
- Identifier
- CFE0006917, ucf:51685
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006917
- Title
- Global Data Association for Multiple Pedestrian Tracking.
- Creator
-
Dehghan, Afshin, Shah, Mubarak, Qi, GuoJun, Bagci, Ulas, Zhang, Shaojie, Zheng, Qipeng, University of Central Florida
- Abstract / Description
-
Multi-object tracking is one of the fundamental problems in computer vision. Almost all multi-object tracking systems consist of two main components; detection and data association. In the detection step, object hypotheses are generated in each frame of a sequence. Later, detections that belong to the same target are linked together to form final trajectories. The latter step is called data association. There are several challenges that render this problem difficult, such as occlusion,...
Show moreMulti-object tracking is one of the fundamental problems in computer vision. Almost all multi-object tracking systems consist of two main components; detection and data association. In the detection step, object hypotheses are generated in each frame of a sequence. Later, detections that belong to the same target are linked together to form final trajectories. The latter step is called data association. There are several challenges that render this problem difficult, such as occlusion, background clutter and pose changes. This dissertation aims to address these challenges by tackling the data association component of tracking and contributes three novel methods for solving data association. Firstly, this dissertation will present a new framework for multi-target tracking that uses a novel data association technique using the Generalized Maximum Clique Problem (GMCP) formulation. The majority of current methods, such as bipartite matching, incorporate a limited temporal locality of the sequence into the data association problem. This makes these methods inherently prone to ID-switches and difficulties caused by long-term occlusions, a cluttered background and crowded scenes. On the other hand, our approach incorporates both motion and appearance in a global manner. Unlike limited temporal locality methods which incorporate a few frames into the data association problem, this method incorporates the whole temporal span and solves the data association problem for one object at a time. Generalized Minimum Clique Graph (GMCP) is used to solve the optimization problem of our data association method. The proposed method is supported by superior results on several benchmark sequences. GMCP leads us to a more accurate approach to multi-object tracking by considering all the pairwise relationships in a batch of frames; however, it has some limitations. Firstly, it finds target trajectories one-by-one, missing joint optimization. Secondly, for optimization we use a greedy solver, based on local neighborhood search, making our optimization prone to local minimas. Finally GMCP tracker is slow, which is a burden when dealing with time-sensitive applications. In order to address these problems, we propose a new graph theoretic problem, called Generalized Maximum Multi Clique Problem (GMMCP). GMMCP tracker has all the advantages of the GMCP tracker while addressing its limitations. A solution is presented to GMMCP where no simplification is assumed in problem formulation or problem optimization. GMMCP is NP hard but it can be formulated through a Binary-Integer Program where the solution to small- and medium-sized tracking problems can be found efficiently. To improve speed, Aggregated Dummy Nodes are used for modeling occlusions and miss detections. This also reduces the size of the input graph without using any heuristics. We show that using the speed-up method, our tracker lends itself to a real-time implementation, increasing its potential usefulness in many applications. In test against several tracking datasets, we show that the proposed method outperforms competitive methods. Thus far we have assumed that the number of people do not exceed a few dozens. However, this is not always the case. In many scenarios such as, marathon, political rallies or religious rites, the number of people in a frame may reach few hundreds or even few thousands. Tracking in high-density crowd sequences is a challenging problem due to several reasons. Human detection methods often fail to localize objects correctly in extremely crowded scenes. This limits the use of data association based tracking methods. Additionally, it is hard to extend existing multi-target tracking to track targets in highly-crowded scenes, because the large number of targets increases the computational complexity. Furthermore, the small apparent target size makes it challenging to extract features to discriminate targets from their surroundings. Finally, we present a tracker that addresses the above-mentioned problems. We formulate online crowd tracking as a Binary Quadratic Programing, where both detection and data association problems are solved together. Our formulation employs target's individual information in the form of appearance and motion as well as contextual cues in the form of neighborhood motion, spatial proximity and grouping constraints. Due to large number of targets, state-of-the-art commercial quadratic programing solvers fail to efficiently find the solution to the proposed optimization. In order to overcome the computational complexity of available solvers, we propose to use the most recent version of Modified Frank-Wolfe algorithms with SWAP steps. The proposed tracker can track hundreds of targets efficiently and improves state-of-the-art results by significant margin on high density crowd sequences.
Show less - Date Issued
- 2016
- Identifier
- CFE0006095, ucf:51201
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006095
- Title
- Computer Vision Based Structural Identification Framework for Bridge Health Mornitoring.
- Creator
-
Khuc, Tung, Catbas, Necati, Oloufa, Amr, Mackie, Kevin, Zaurin, Ricardo, Shah, Mubarak, University of Central Florida
- Abstract / Description
-
The objective of this dissertation is to develop a comprehensive Structural Identification (St-Id) framework with damage for bridge type structures by using cameras and computer vision technologies. The traditional St-Id frameworks rely on using conventional sensors. In this study, the collected input and output data employed in the St-Id system are acquired by series of vision-based measurements. The following novelties are proposed, developed and demonstrated in this project: a) vehicle...
Show moreThe objective of this dissertation is to develop a comprehensive Structural Identification (St-Id) framework with damage for bridge type structures by using cameras and computer vision technologies. The traditional St-Id frameworks rely on using conventional sensors. In this study, the collected input and output data employed in the St-Id system are acquired by series of vision-based measurements. The following novelties are proposed, developed and demonstrated in this project: a) vehicle load (input) modeling using computer vision, b) bridge response (output) using full non-contact approach using video/image processing, c) image-based structural identification using input-output measurements and new damage indicators. The input (loading) data due vehicles such as vehicle weights and vehicle locations on the bridges, are estimated by employing computer vision algorithms (detection, classification, and localization of objects) based on the video images of vehicles. Meanwhile, the output data as structural displacements are also obtained by defining and tracking image key-points of measurement locations. Subsequently, the input and output data sets are analyzed to construct novel types of damage indicators, named Unit Influence Surface (UIS). Finally, the new damage detection and localization framework is introduced that does not require a network of sensors, but much less number of sensors.The main research significance is the first time development of algorithms that transform the measured video images into a form that is highly damage-sensitive/change-sensitive for bridge assessment within the context of Structural Identification with input and output characterization. The study exploits the unique attributes of computer vision systems, where the signal is continuous in space. This requires new adaptations and transformations that can handle computer vision data/signals for structural engineering applications. This research will significantly advance current sensor-based structural health monitoring with computer-vision techniques, leading to practical applications for damage detection of complex structures with a novel approach. By using computer vision algorithms and cameras as special sensors for structural health monitoring, this study proposes an advance approach in bridge monitoring through which certain type of data that could not be collected by conventional sensors such as vehicle loads and location, can be obtained practically and accurately.
Show less - Date Issued
- 2016
- Identifier
- CFE0006127, ucf:51174
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006127
- Title
- Recognition of Complex Events in Open-source Web-scale Videos: Features, Intermediate Representations and their Temporal Interactions.
- Creator
-
Bhattacharya, Subhabrata, Shah, Mubarak, Guha, Ratan, Laviola II, Joseph, Sukthankar, Rahul, Moore, Brian, University of Central Florida
- Abstract / Description
-
Recognition of complex events in consumer uploaded Internet videos, captured under real-world settings, has emerged as a challenging area of research across both computer vision and multimedia community. In this dissertation, we present a systematic decomposition of complex events into hierarchical components and make an in-depth analysis of how existing research are being used to cater to various levels of this hierarchy and identify three key stages where we make novel contributions,...
Show moreRecognition of complex events in consumer uploaded Internet videos, captured under real-world settings, has emerged as a challenging area of research across both computer vision and multimedia community. In this dissertation, we present a systematic decomposition of complex events into hierarchical components and make an in-depth analysis of how existing research are being used to cater to various levels of this hierarchy and identify three key stages where we make novel contributions, keeping complex events in focus. These are listed as follows: (a) Extraction of novel semi-global features -- firstly, we introduce a Lie-algebra based representation of dominant camera motion present while capturing videos and show how this can be used as a complementary feature for video analysis. Secondly, we propose compact clip level descriptors of a video based on covariance of appearance and motion features which we further use in a sparse coding framework to recognize realistic actions and gestures. (b) Construction of intermediate representations -- We propose an efficient probabilistic representation from low-level features computed from videos, basedon Maximum Likelihood Estimates which demonstrates state of the art performancein large scale visual concept detection, and finally, (c) Modeling temporal interactions between intermediate concepts -- Using block Hankel matrices and harmonic analysis of slowly evolving Linear Dynamical Systems, we propose two new discriminative feature spaces for complex event recognition and demonstratesignificantly improved recognition rates over previously proposed approaches.
Show less - Date Issued
- 2013
- Identifier
- CFE0004817, ucf:49724
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004817
- Title
- Holistic Representations for Activities and Crowd Behaviors.
- Creator
-
Solmaz, Berkan, Shah, Mubarak, Da Vitoria Lobo, Niels, Jha, Sumit, Ilie, Marcel, Moore, Brian, University of Central Florida
- Abstract / Description
-
In this dissertation, we address the problem of analyzing the activities of people in a variety of scenarios, this is commonly encountered in vision applications. The overarching goal is to devise new representations for the activities, in settings where individuals or a number of people may take a part in specific activities. Different types of activities can be performed by either an individual at the fine level or by several people constituting a crowd at the coarse level. We take into...
Show moreIn this dissertation, we address the problem of analyzing the activities of people in a variety of scenarios, this is commonly encountered in vision applications. The overarching goal is to devise new representations for the activities, in settings where individuals or a number of people may take a part in specific activities. Different types of activities can be performed by either an individual at the fine level or by several people constituting a crowd at the coarse level. We take into account the domain specific information for modeling these activities. The summary of the proposed solutions is presented in the following.The holistic description of videos is appealing for visual detection and classification tasks for several reasons including capturing the spatial relations between the scene components, simplicity, and performance [1, 2, 3]. First, we present a holistic (global) frequency spectrum based descriptor for representing the atomic actions performed by individuals such as: bench pressing, diving, hand waving, boxing, playing guitar, mixing, jumping, horse riding, hula hooping etc. We model and learn these individual actions for classifying complex user uploaded videos. Our method bypasses the detection of interest points, the extraction of local video descriptors and the quantization of local descriptors into a code book; it represents each video sequence as a single feature vector. This holistic feature vector is computed by applying a bank of 3-D spatio-temporal filters on the frequency spectrum of a video sequence; hence it integrates the information about the motion and scene structure. We tested our approach on two of the most challenging datasets, UCF50 [4] and HMDB51 [5], and obtained promising results which demonstrates the robustness and the discriminative power of our holistic video descriptor for classifying videos of various realistic actions.In the above approach, a holistic feature vector of a video clip is acquired by dividing the video into spatio-temporal blocks then concatenating the features of the individual blocks together. However, such a holistic representation blindly incorporates all the video regions regardless of their contribution in classification. Next, we present an approach which improves the performance of the holistic descriptors for activity recognition. In our novel method, we improve the holistic descriptors by discovering the discriminative video blocks. We measure the discriminativity of a block by examining its response to a pre-learned support vector machine model. In particular, a block is considered discriminative if it responds positively for positive training samples, and negatively for negative training samples. We pose the problem of finding the optimal blocks as a problem of selecting a sparse set of blocks, which maximizes the total classifier discriminativity. Through a detailed set of experiments on benchmark datasets [6, 7, 8, 9, 5, 10], we show that our method discovers the useful regions in the videos and eliminates the ones which are confusing for classification, which results in significant performance improvement over the state-of-the-art.In contrast to the scenes where an individual performs a primitive action, there may be scenes with several people, where crowd behaviors may take place. For these types of scenes the traditional approaches for recognition will not work due to severe occlusion and computational requirements. The number of videos is limited and the scenes are complicated, hence learning these behaviors is not feasible. For this problem, we present a novel approach, based on the optical flow in a video sequence, for identifying five specific and common crowd behaviors in visual scenes. In the algorithm, the scene is overlaid by a grid of particles, initializing a dynamical system which is derived from the optical flow. Numerical integration of the optical flow provides particle trajectories that represent the motion in the scene. Linearization of the dynamical system allows a simple and practical analysis and classification of the behavior through the Jacobian matrix. Essentially, the eigenvalues of this matrix are used to determine the dynamic stability of points in the flow and each type of stability corresponds to one of the five crowd behaviors. The identified crowd behaviors are (1) bottlenecks: where many pedestrians/vehicles from various points in the scene are entering through one narrow passage, (2) fountainheads: where many pedestrians/vehicles are emerging from a narrow passage only to separate in many directions, (3) lanes: where many pedestrians/vehicles are moving at the same speeds in the same direction, (4) arches or rings: where the collective motion is curved or circular, and (5) blocking: where there is a opposing motion and desired movement of groups of pedestrians is somehow prohibited. The implementation requires identifying a region of interest in the scene, and checking the eigenvalues of the Jacobian matrix in that region to determine the type of flow, that corresponds to various well-defined crowd behaviors. The eigenvalues are only considered in these regions of interest, consistent with the linear approximation and the implied behaviors. Since changes in eigenvalues can mean changes in stability, corresponding to changes in behavior, we can repeat the algorithm over clips of long video sequences to locate changes in behavior. This method was tested on over real videos representing crowd and traffic scenes.
Show less - Date Issued
- 2013
- Identifier
- CFE0004941, ucf:49638
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004941
- Title
- Human Detection, Tracking and Segmentation in Surveillance Video.
- Creator
-
Shu, Guang, Shah, Mubarak, Boloni, Ladislau, Wang, Jun, Lin, Mingjie, Sugaya, Kiminobu, University of Central Florida
- Abstract / Description
-
This dissertation addresses the problem of human detection and tracking in surveillance videos. Even though this is a well-explored topic, many challenges remain when confronted with data from real world situations. These challenges include appearance variation, illumination changes, camera motion, cluttered scenes and occlusion. In this dissertation several novel methods for improving on the current state of human detection and tracking based on learning scene-specific information in video...
Show moreThis dissertation addresses the problem of human detection and tracking in surveillance videos. Even though this is a well-explored topic, many challenges remain when confronted with data from real world situations. These challenges include appearance variation, illumination changes, camera motion, cluttered scenes and occlusion. In this dissertation several novel methods for improving on the current state of human detection and tracking based on learning scene-specific information in video feeds are proposed.Firstly, we propose a novel method for human detection which employs unsupervised learning and superpixel segmentation. The performance of generic human detectors is usually degraded in unconstrained video environments due to varying lighting conditions, backgrounds and camera viewpoints. To handle this problem, we employ an unsupervised learning framework that improves the detection performance of a generic detector when it is applied to a particular video. In our approach, a generic DPM human detector is employed to collect initial detection examples. These examples are segmented into superpixels and then represented using Bag-of-Words (BoW) framework. The superpixel-based BoW feature encodes useful color features of the scene, which provides additional information. Finally a new scene-specific classifier is trained using the BoW features extracted from the new examples. Compared to previous work, our method learns scene-specific information through superpixel-based features, hence it can avoid many false detections typically obtained by a generic detector. We are able to demonstrate a significant improvement in the performance of the state-of-the-art detector.Given robust human detection, we propose a robust multiple-human tracking framework using a part-based model. Human detection using part models has become quite popular, yet its extension in tracking has not been fully explored. Single camera-based multiple-person tracking is often hindered by difficulties such as occlusion and changes in appearance. We address such problems by developing an online-learning tracking-by-detection method. Our approach learns part-based person-specific Support Vector Machine (SVM) classifiers which capture articulations of moving human bodies with dynamically changing backgrounds. With the part-based model, our approach is able to handle partial occlusions in both the detection and the tracking stages. In the detection stage, we select the subset of parts which maximizes the probability of detection. This leads to a significant improvement in detection performance in cluttered scenes. In the tracking stage, we dynamically handle occlusions by distributing the score of the learned person classifier among its corresponding parts, which allows us to detect and predict partial occlusions and prevent the performance of the classifiers from being degraded. Extensive experiments using the proposed method on several challenging sequences demonstrate state-of-the-art performance in multiple-people tracking.Next, in order to obtain precise boundaries of humans, we propose a novel method for multiple human segmentation in videos by incorporating human detection and part-based detection potential into a multi-frame optimization framework. In the first stage, after obtaining the superpixel segmentation for each detection window, we separate superpixels corresponding to a human and background by minimizing an energy function using Conditional Random Field (CRF). We use the part detection potentials from the DPM detector, which provides useful information for human shape. In the second stage, the spatio-temporal constraints of the video is leveraged to build a tracklet-based Gaussian Mixture Model for each person, and the boundaries are smoothed by multi-frame graph optimization. Compared to previous work, our method could automatically segment multiple people in videos with accurate boundaries, and it is robust to camera motion. Experimental results show that our method achieves better segmentation performance than previous methods in terms of segmentation accuracy on several challenging video sequences.Most of the work in Computer Vision deals with point solution; a specific algorithm for a specific problem. However, putting different algorithms into one real world integrated system is a big challenge. Finally, we introduce an efficient tracking system, NONA, for high-definition surveillance video. We implement the system using a multi-threaded architecture (Intel Threading Building Blocks (TBB)), which executes video ingestion, tracking, and video output in parallel. To improve tracking accuracy without sacrificing efficiency, we employ several useful techniques. Adaptive Template Scaling is used to handle the scale change due to objects moving towards a camera. Incremental Searching and Local Frame Differencing are used to resolve challenging issues such as scale change, occlusion and cluttered backgrounds. We tested our tracking system on a high-definition video dataset and achieved acceptable tracking accuracy while maintaining real-time performance.
Show less - Date Issued
- 2014
- Identifier
- CFE0005551, ucf:50278
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0005551
- Title
- Automatic Detection of Brain Functional Disorder Using Imaging Data.
- Creator
-
Dey, Soumyabrata, Shah, Mubarak, Jha, Sumit, Hu, Haiyan, Weeks, Arthur, Rao, Ravishankar, University of Central Florida
- Abstract / Description
-
Recently, Attention Deficit Hyperactive Disorder (ADHD) is getting a lot of attention mainly for two reasons. First, it is one of the most commonly found childhood behavioral disorders. Around 5-10% of the children all over the world are diagnosed with ADHD. Second, the root cause of the problem is still unknown and therefore no biological measure exists to diagnose ADHD. Instead, doctors need to diagnose it based on the clinical symptoms, such as inattention, impulsivity and hyperactivity,...
Show moreRecently, Attention Deficit Hyperactive Disorder (ADHD) is getting a lot of attention mainly for two reasons. First, it is one of the most commonly found childhood behavioral disorders. Around 5-10% of the children all over the world are diagnosed with ADHD. Second, the root cause of the problem is still unknown and therefore no biological measure exists to diagnose ADHD. Instead, doctors need to diagnose it based on the clinical symptoms, such as inattention, impulsivity and hyperactivity, which are all subjective.Functional Magnetic Resonance Imaging (fMRI) data has become a popular tool to understand the functioning of the brain such as identifying the brain regions responsible for different cognitive tasks or analyzing the statistical differences of the brain functioning between the diseased and control subjects. ADHD is also being studied using the fMRI data. In this dissertation we aim to solve the problem of automatic diagnosis of the ADHD subjects using their resting state fMRI (rs-fMRI) data.As a core step of our approach, we model the functions of a brain as a connectivity network, which is expected to capture the information about how synchronous different brain regions are in terms of their functional activities. The network is constructed by representing different brain regions as the nodes where any two nodes of the network are connected by an edge if the correlation of the activity patterns of the two nodes is higher than some threshold. The brain regions, represented as the nodes of the network, can be selected at different granularities e.g. single voxels or cluster of functionally homogeneous voxels. The topological differences of the constructed networks of the ADHD and control group of subjects are then exploited in the classification approach.We have developed a simple method employing the Bag-of-Words (BoW) framework for the classification of the ADHD subjects. We represent each node in the network by a 4-D feature vector: node degree and 3-D location. The 4-D vectors of all the network nodes of the training data are then grouped in a number of clusters using K-means; where each such cluster is termed as a word. Finally, each subject is represented by a histogram (bag) of such words. The Support Vector Machine (SVM) classifier is used for the detection of the ADHD subjects using their histogram representation. The method is able to achieve 64% classification accuracy.The above simple approach has several shortcomings. First, there is a loss of spatial information while constructing the histogram because it only counts the occurrences of words ignoring the spatial positions. Second, features from the whole brain are used for classification, but some of the brain regions may not contain any useful information and may only increase the feature dimensions and noise of the system. Third, in our study we used only one network feature, the degree of a node which measures the connectivity of the node, while other complex network features may be useful for solving the proposed problem.In order to address the above shortcomings, we hypothesize that only a subset of the nodes of the network possesses important information for the classification of the ADHD subjects. To identify the important nodes of the network we have developed a novel algorithm. The algorithm generates different random subset of nodes each time extracting the features from a subset to compute the feature vector and perform classification. The subsets are then ranked based on the classification accuracy and the occurrences of each node in the top ranked subsets are measured. Our algorithm selects the highly occurring nodes for the final classification. Furthermore, along with the node degree, we employ three more node features: network cycles, the varying distance degree and the edge weight sum. We concatenate the features of the selected nodes in a fixed order to preserve the relative spatial information. Experimental validation suggests that the use of the features from the nodes selected using our algorithm indeed help to improve the classification accuracy. Also, our finding is in concordance with the existing literature as the brain regions identified by our algorithms are independently found by many other studies on the ADHD. We achieved a classification accuracy of 69.59% using this approach. However, since this method represents each voxel as a node of the network which makes the number of nodes of the network several thousands. As a result, the network construction step becomes computationally very expensive. Another limitation of the approach is that the network features, which are computed for each node of the network, captures only the local structures while ignore the global structure of the network.Next, in order to capture the global structure of the networks, we use the Multi-Dimensional Scaling (MDS) technique to project all the subjects from an unknown network-space to a low dimensional space based on their inter-network distance measures. For the purpose of computing distance between two networks, we represent each node by a set of attributes such as the node degree, the average power, the physical location, the neighbor node degrees, and the average powers of the neighbor nodes. The nodes of the two networks are then mapped in such a way that for all pair of nodes, the sum of the attribute distances, which is the inter-network distance, is minimized. To reduce the network computation cost, we enforce that the maximum relevant information is preserved with minimum redundancy. To achieve this, the nodes of the network are constructed with clusters of highly active voxels while the activity levels of the voxels are measured based on the average power of their corresponding fMRI time-series. Our method shows promise as we achieve impressive classification accuracies (73.55%) on the ADHD-200 data set. Our results also reveal that the detection rates are higher when classification is performed separately on the male and female groups of subjects.So far, we have only used the fMRI data for solving the ADHD diagnosis problem. Finally, we investigated the answers of the following questions. Do the structural brain images contain useful information related to the ADHD diagnosis problem? Can the classification accuracy of the automatic diagnosis system be improved combining the information of the structural and functional brain data? Towards that end, we developed a new method to combine the information of structural and functional brain images in a late fusion framework. For structural data we input the gray matter (GM) brain images to a Convolutional Neural Network (CNN). The output of the CNN is a feature vector per subject which is used to train the SVM classifier. For the functional data we compute the average power of each voxel based on its fMRI time series. The average power of the fMRI time series of a voxel measures the activity level of the voxel. We found significant differences in the voxel power distribution patterns of the ADHD and control groups of subjects. The Local binary pattern (LBP) texture feature is used on the voxel power map to capture these differences. We achieved 74.23% accuracy using GM features, 77.30% using LBP features and 79.14% using combined information.In summary this dissertation demonstrated that the structural and functional brain imaging data are useful for the automatic detection of the ADHD subjects as we achieve impressive classification accuracies on the ADHD-200 data set. Our study also helps to identify the brain regions which are useful for ADHD subject classification. These findings can help in understanding the pathophysiology of the problem. Finally, we expect that our approaches will contribute towards the development of a biological measure for the diagnosis of the ADHD subjects.
Show less - Date Issued
- 2014
- Identifier
- CFE0005786, ucf:50060
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0005786
- Title
- Visual Analysis of Extremely Dense Crowded Scenes.
- Creator
-
Idrees, Haroon, Shah, Mubarak, Da Vitoria Lobo, Niels, Stanley, Kenneth, Atia, George, Saleh, Bahaa, University of Central Florida
- Abstract / Description
-
Visual analysis of dense crowds is particularly challenging due to large number of individuals, occlusions, clutter, and fewer pixels per person which rarely occur in ordinary surveillance scenarios. This dissertation aims to address these challenges in images and videos of extremely dense crowds containing hundreds to thousands of humans. The goal is to tackle the fundamental problems of counting, detecting and tracking people in such images and videos using visual and contextual cues that...
Show moreVisual analysis of dense crowds is particularly challenging due to large number of individuals, occlusions, clutter, and fewer pixels per person which rarely occur in ordinary surveillance scenarios. This dissertation aims to address these challenges in images and videos of extremely dense crowds containing hundreds to thousands of humans. The goal is to tackle the fundamental problems of counting, detecting and tracking people in such images and videos using visual and contextual cues that are automatically derived from the crowded scenes.For counting in an image of extremely dense crowd, we propose to leverage multiple sources of information to compute an estimate of the number of individuals present in the image. Our approach relies on sources such as low confidence head detections, repetition of texture elements (using SIFT), and frequency-domain analysis to estimate counts, along with confidence associated with observing individuals, in an image region. Furthermore, we employ a global consistency constraint on counts using Markov Random Field which caters for disparity in counts in local neighborhoods and across scales. We tested this approach on crowd images with the head counts ranging from 94 to 4543 and obtained encouraging results. Through this approach, we are able to count people in images of high-density crowds unlike previous methods which are only applicable to videos of low to medium density crowded scenes. However, the counting procedure just outputs a single number for a large patch or an entire image. With just the counts, it becomes difficult to measure the counting error for a query image with unknown number of people. For this, we propose to localize humans by finding repetitive patterns in the crowd image. Starting with detections from an underlying head detector, we correlate them within the image after their selection through several criteria: in a pre-defined grid, locally, or at multiple scales by automatically finding the patches that are most representative of recurring patterns in the crowd image. Finally, the set of generated hypotheses is selected using binary integer quadratic programming with Special Ordered Set (SOS) Type 1 constraints.Human Detection is another important problem in the analysis of crowded scenes where the goal is to place a bounding box on visible parts of individuals. Primarily applicable to images depicting medium to high density crowds containing several hundred humans, it is a crucial pre-requisite for many other visual tasks, such as tracking, action recognition or detection of anomalous behaviors, exhibited by individuals in a dense crowd. For detecting humans, we explore context in dense crowds in the form of locally-consistent scale prior which captures the similarity in scale in local neighborhoods with smooth variation over the image. Using the scale and confidence of detections obtained from an underlying human detector, we infer scale and confidence priors using Markov Random Field. In an iterative mechanism, the confidences of detections are modified to reflect consistency with the inferred priors, and the priors are updated based on the new detections. The final set of detections obtained are then reasoned for occlusion using Binary Integer Programming where overlaps and relations between parts of individuals are encoded as linear constraints. Both human detection and occlusion reasoning in this approach are solved with local neighbor-dependent constraints, thereby respecting the inter-dependence between individuals characteristic to dense crowd analysis. In addition, we propose a mechanism to detect different combinations of body parts without requiring annotations for individual combinations.Once human detection and localization is performed, we then use it for tracking people in dense crowds. Similar to the use of context as scale prior for human detection, we exploit it in the form of motion concurrence for tracking individuals in dense crowds. The proposed method for tracking provides an alternative and complementary approach to methods that require modeling of crowd flow. Simultaneously, it is less likely to fail in the case of dynamic crowd flows and anomalies by minimally relying on previous frames. The approach begins with the automatic identification of prominent individuals from the crowd that are easy to track. Then, we use Neighborhood Motion Concurrence to model the behavior of individuals in a dense crowd, this predicts the position of an individual based on the motion of its neighbors. When the individual moves with the crowd flow, we use Neighborhood Motion Concurrence to predict motion while leveraging five-frame instantaneous flow in case of dynamically changing flow and anomalies. All these aspects are then embedded in a framework which imposes hierarchy on the order in which positions of individuals are updated. The results are reported on eight sequences of medium to high density crowds and our approach performs on par with existing approaches without learning or modeling patterns of crowd flow.We experimentally demonstrate the efficacy and reliability of our algorithms by quantifying the performance of counting, localization, as well as human detection and tracking on new and challenging datasets containing hundreds to thousands of humans in a given scene.
Show less - Date Issued
- 2014
- Identifier
- CFE0005508, ucf:50367
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0005508
- Title
- Visual Geo-Localization and Location-Aware Image Understanding.
- Creator
-
Roshan Zamir, Amir, Shah, Mubarak, Jha, Sumit, Sukthankar, Rahul, Lin, Mingjie, Fathpour, Sasan, University of Central Florida
- Abstract / Description
-
Geo-localization is the problem of discovering the location where an image or video was captured. Recently, large scale geo-localization methods which are devised for ground-level imagery and employ techniques similar to image matching have attracted much interest. In these methods, given a reference dataset composed of geo-tagged images, the problem is to estimate the geo-location of a query by finding its matching reference images.In this dissertation, we address three questions central to...
Show moreGeo-localization is the problem of discovering the location where an image or video was captured. Recently, large scale geo-localization methods which are devised for ground-level imagery and employ techniques similar to image matching have attracted much interest. In these methods, given a reference dataset composed of geo-tagged images, the problem is to estimate the geo-location of a query by finding its matching reference images.In this dissertation, we address three questions central to geo-spatial analysis of ground-level imagery: \textbf{1) How to geo-localize images and videos captured at unknown locations? 2) How to refine the geo-location of already geo-tagged data? 3) How to utilize the extracted geo-tags?}We present a new framework for geo-locating an image utilizing a novel multiple nearest neighbor feature matching method using Generalized Minimum Clique Graphs (GMCP). First, we extract local features (e.g., SIFT) from the query image and retrieve a number of nearest neighbors for each query feature from the reference data set. Next, we apply our GMCP-based feature matching to select a single nearest neighbor for each query feature such that all matches are globally consistent. Our approach to feature matching is based on the proposition that the first nearest neighbors are not necessarily the best choices for finding correspondences in image matching. Therefore, the proposed method considers multiple reference nearest neighbors as potential matches and selects the correct ones by enforcing the consistency among their global features (e.g., GIST) using GMCP. Our evaluations using a new data set of 102k Street View images shows the proposed method outperforms the state-of-the-art by 10 percent.Geo-localization of images can be extended to geo-localization of a video. We have developed a novel method for estimating the geo-spatial trajectory of a moving camera with unknown intrinsic parameters in a city-scale. The proposed method is based on a three step process: 1) individual geo-localization of video frames using Street View images to obtain the likelihood of the location (latitude and longitude) given the current observation, 2) Bayesian tracking to estimate the frame location and video's temporal evolution using previous state probabilities and current likelihood, and 3) applying a novel Minimum Spanning Trees based trajectory reconstruction to eliminate trajectory loops or noisy estimations. Thus far, we have assumed reliable geo-tags for reference imagery are available through crowdsourcing. However, crowdsourced images are well known to suffer from the acute shortcoming of having inaccurate geo-tags. We have developed the first method for refinement of GPS-tags which automatically discovers the subset of corrupted geo-tags and refines them. We employ Random Walks to discover the uncontaminated subset of location estimations and robustify Random Walks with a novel adaptive damping factor that conforms to the level of noise in the input. In location-aware image understanding, we are interested in improving the image analysis by putting it in the right geo-spatial context. This approach is of particular importance as the majority of cameras and mobile devices are now being equipped with GPS chips. Therefore, developing techniques which can leverage the geo-tags of images for improving the performance of traditional computer vision tasks is of particular interest. We have developed a location-aware multimodal approach which incorporates business directories, textual information, and web images to identify businesses in a geo-tagged query image.
Show less - Date Issued
- 2014
- Identifier
- CFE0005544, ucf:50282
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0005544
- Title
- Taming Wild Faces: Web-Scale, Open-Universe Face Identification in Still and Video Imagery.
- Creator
-
Ortiz, Enrique, Shah, Mubarak, Sukthankar, Rahul, Da Vitoria Lobo, Niels, Wang, Jun, Li, Xin, University of Central Florida
- Abstract / Description
-
With the increasing pervasiveness of digital cameras, the Internet, and social networking, there is a growing need to catalog and analyze large collections of photos and videos. In this dissertation, we explore unconstrained still-image and video-based face recognition in real-world scenarios, e.g. social photo sharing and movie trailers, where people of interest are recognized and all others are ignored. In such a scenario, we must obtain high precision in recognizing the known identities,...
Show moreWith the increasing pervasiveness of digital cameras, the Internet, and social networking, there is a growing need to catalog and analyze large collections of photos and videos. In this dissertation, we explore unconstrained still-image and video-based face recognition in real-world scenarios, e.g. social photo sharing and movie trailers, where people of interest are recognized and all others are ignored. In such a scenario, we must obtain high precision in recognizing the known identities, while accurately rejecting those of no interest.Recent advancements in face recognition research has seen Sparse Representation-based Classification (SRC) advance to the forefront of competing methods. However, its drawbacks, slow speed and sensitivity to variations in pose, illumination, and occlusion, have hindered its wide-spread applicability. The contributions of this dissertation are three-fold: 1. For still-image data, we propose a novel Linearly Approximated Sparse Representation-based Classification (LASRC) algorithm that uses linear regression to perform sample selection for l1-minimization, thus harnessing the speed of least-squares and the robustness of SRC. On our large dataset collected from Facebook, LASRC performs equally to standard SRC with a speedup of 100-250x.2. For video, applying the popular l1-minimization for face recognition on a frame-by-frame basis is prohibitively expensive computationally, so we propose a new algorithm Mean Sequence SRC (MSSRC) that performs video face recognition using a joint optimization leveraging all of the available video data and employing the knowledge that the face track frames belong to the same individual. Employing MSSRC results in a speedup of 5x on average over SRC on a frame-by-frame basis.3. Finally, we make the observation that MSSRC sometimes assigns inconsistent identities to the same individual in a scene that could be corrected based on their visual similarity. Therefore, we construct a probabilistic affinity graph combining appearance and co-occurrence similarities to model the relationship between face tracks in a video. Using this relationship graph, we employ random walk analysis to propagate strong class predictions among similar face tracks, while dampening weak predictions. Our method results in a performance gain of 15.8% in average precision over using MSSRC alone.
Show less - Date Issued
- 2014
- Identifier
- CFE0005536, ucf:50313
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0005536
- Title
- Robust Subspace Estimation Using Low-Rank Optimization. Theory and Applications in Scene Reconstruction, Video Denoising, and Activity Recognition.
- Creator
-
Oreifej, Omar, Shah, Mubarak, Da Vitoria Lobo, Niels, Stanley, Kenneth, Lin, Mingjie, Li, Xin, University of Central Florida
- Abstract / Description
-
In this dissertation, we discuss the problem of robust linear subspace estimation using low-rank optimization and propose three formulations of it. We demonstrate how these formulations can be used to solve fundamental computer vision problems, and provide superior performance in terms of accuracy and running time.Consider a set of observations extracted from images (such as pixel gray values, local features, trajectories...etc). If the assumption that these observations are drawn from a...
Show moreIn this dissertation, we discuss the problem of robust linear subspace estimation using low-rank optimization and propose three formulations of it. We demonstrate how these formulations can be used to solve fundamental computer vision problems, and provide superior performance in terms of accuracy and running time.Consider a set of observations extracted from images (such as pixel gray values, local features, trajectories...etc). If the assumption that these observations are drawn from a liner subspace (or can be linearly approximated) is valid, then the goal is to represent each observation as a linear combination of a compact basis, while maintaining a minimal reconstruction error. One of the earliest, yet most popular, approaches to achieve that is Principal Component Analysis (PCA). However, PCA can only handle Gaussian noise, and thus suffers when the observations are contaminated with gross and sparse outliers. To this end, in this dissertation, we focus on estimating the subspace robustly using low-rank optimization, where the sparse outliers are detected and separated through the `1 norm. The robust estimation has a two-fold advantage: First, the obtained basis better represents the actual subspace because it does not include contributions from the outliers. Second, the detected outliers are often of a specific interest in many applications, as we will show throughout this thesis. We demonstrate four different formulations and applications for low-rank optimization. First, we consider the problem of reconstructing an underwater sequence by removing the turbulence caused by the water waves. The main drawback of most previous attempts to tackle this problem is that they heavily depend on modelling the waves, which in fact is ill-posed since the actual behavior of the waves along with the imaging process are complicated and include several noise components; therefore, their results are not satisfactory. In contrast, we propose a novel approach which outperforms the state-of-the-art. The intuition behind our method is that in a sequence where the water is static, the frames would be linearly correlated. Therefore, in the presence of water waves, we may consider the frames as noisy observations drawn from a the subspace of linearly correlated frames. However, the noise introduced by the water waves is not sparse, and thus cannot directly be detected using low-rank optimization. Therefore, we propose a data-driven two-stage approach, where the first stage (")sparsifies(") the noise, and the second stage detects it. The first stage leverages the temporal mean of the sequence to overcome the structured turbulence of the waves through an iterative registration algorithm. The result of the first stage is a high quality mean and a better structured sequence; however, the sequence still contains unstructured sparse noise. Thus, we employ a second stage at which we extract the sparse errors from the sequence through rank minimization. Our method converges faster, and drastically outperforms state of the art on all testing sequences. Secondly, we consider a closely related situation where an independently moving object is also present in the turbulent video. More precisely, we consider video sequences acquired in a desert battlefields, where atmospheric turbulence is typically present, in addition to independently moving targets. Typical approaches for turbulence mitigation follow averaging or de-warping techniques. Although these methods can reduce the turbulence, they distort the independently moving objects which can often be of great interest. Therefore, we address the problem of simultaneous turbulence mitigation and moving object detection. We propose a novel three-term low-rank matrix decomposition approach in which we decompose the turbulence sequence into three components: the background, the turbulence, and the object. We simplify this extremely difficult problem into a minimization of nuclear norm, Frobenius norm, and L1 norm. Our method is based on two observations: First, the turbulence causes dense and Gaussian noise, and therefore can be captured by Frobenius norm, while the moving objects are sparse and thus can be captured by L1 norm. Second, since the object's motion is linear and intrinsically different than the Gaussian-like turbulence, a Gaussian-based turbulence model can be employed to enforce an additional constraint on the search space of the minimization. We demonstrate the robustness of our approach on challenging sequences which are significantly distorted with atmospheric turbulence and include extremely tiny moving objects. In addition to robustly detecting the subspace of the frames of a sequence, we consider using trajectories as observations in the low-rank optimization framework. In particular, in videos acquired by moving cameras, we track all the pixels in the video and use that to estimate the camera motion subspace. This is particularly useful in activity recognition, which typically requires standard preprocessing steps such as motion compensation, moving object detection, and object tracking. The errors from the motion compensation step propagate to the object detection stage, resulting in miss-detections, which further complicates the tracking stage, resulting in cluttered and incorrect tracks. In contrast, we propose a novel approach which does not follow the standard steps, and accordingly avoids the aforementioned difficulties. Our approach is based on Lagrangian particle trajectories which are a set of dense trajectories obtained by advecting optical flow over time, thus capturing the ensemble motions of a scene. This is done in frames of unaligned video, and no object detection is required. In order to handle the moving camera, we decompose the trajectories into their camera-induced and object-induced components. Having obtained the relevant object motion trajectories, we compute a compact set of chaotic invariant features, which captures the characteristics of the trajectories. Consequently, a SVM is employed to learn and recognize the human actions using the computed motion features. We performed intensive experiments on multiple benchmark datasets, and obtained promising results.Finally, we consider a more challenging problem referred to as complex event recognition, where the activities of interest are complex and unconstrained. This problem typically pose significant challenges because it involves videos of highly variable content, noise, length, frame size ... etc. In this extremely challenging task, high-level features have recently shown a promising direction as in [53, 129], where core low-level events referred to as concepts are annotated and modeled using a portion of the training data, then each event is described using its content of these concepts. However, because of the complex nature of the videos, both the concept models and the corresponding high-level features are significantly noisy. In order to address this problem, we propose a novel low-rank formulation, which combines the precisely annotated videos used to train the concepts, with the rich high-level features. Our approach finds a new representation for each event, which is not only low-rank, but also constrained to adhere to the concept annotation, thus suppressing the noise, and maintaining a consistent occurrence of the concepts in each event. Extensive experiments on large scale real world dataset TRECVID Multimedia Event Detection 2011 and 2012 demonstrate that our approach consistently improves the discriminativity of the high-level features by a significant margin.
Show less - Date Issued
- 2013
- Identifier
- CFE0004732, ucf:49835
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004732
- Title
- Understanding images and videos using context.
- Creator
-
Vaca Castano, Gonzalo, Da Vitoria Lobo, Niels, Shah, Mubarak, Mikhael, Wasfy, Jones, W Linwood, Wiegand, Rudolf, University of Central Florida
- Abstract / Description
-
In computer vision, context refers to any information that may influence how visual media are understood.(&)nbsp; Traditionally, researchers have studied the influence of several sources of context in relation to the object detection problem in images. In this dissertation, we present a multifaceted review of the problem of context.(&)nbsp; Context is analyzed as a source of improvement in the object detection problem, not only in images but also in videos. In the case of images, we also...
Show moreIn computer vision, context refers to any information that may influence how visual media are understood.(&)nbsp; Traditionally, researchers have studied the influence of several sources of context in relation to the object detection problem in images. In this dissertation, we present a multifaceted review of the problem of context.(&)nbsp; Context is analyzed as a source of improvement in the object detection problem, not only in images but also in videos. In the case of images, we also investigate the influence of the semantic context, determined by objects, relationships, locations, and global composition, to achieve a general understanding of the image content as a whole. In our research, we also attempt to solve the related problem of finding the context associated with visual media. Given a set of visual elements (images), we want to extract the context that can be commonly associated with these images in order to remove ambiguity. The first part of this dissertation concentrates on achieving image understanding using semantic context.(&)nbsp; In spite of the recent success in tasks such as image classi?cation, object detection, image segmentation, and the progress on scene understanding, researchers still lack clarity about computer comprehension of the content of the image as a whole. Hence, we propose a Top-Down Visual Tree (TDVT) image representation that allows the encoding of the content of the image as a hierarchy of objects capturing their importance, co-occurrences, and type of relations. A novel Top-Down Tree LSTM network is presented to learn about the image composition from the training images and their TDVT representations. Given a test image, our algorithm detects objects and determine the hierarchical structure that they form, encoded as a TDVT representation of the image.A single image could have multiple interpretations that may lead to ambiguity about the intentionality of an image.(&)nbsp; What if instead of having only a single image to be interpreted, we have multiple images that represent the same topic. The second part of this dissertation covers how to extract the context information shared by multiple images. We present a method to determine the topic that these images represent. We accomplish this task by transferring tags from an image retrieval database, and by performing operations in the textual space of these tags. As an application, we also present a new image retrieval method that uses multiple images as input. Unlike earlier works that focus either on using just a single query image or using multiple query images with views of the same instance, the new image search paradigm retrieves images based on the underlying concepts that the input images represent.Finally, in the third part of this dissertation, we analyze the influence of context in videos. In this case, the temporal context is utilized to improve scene identification and object detection. We focus on egocentric videos, where agents require some time to change from one location to another. Therefore, we propose a Conditional Random Field (CRF) formulation, which penalizes short-term changes of the scene identity to improve the scene identity accuracy.(&)nbsp; We also show how to improve the object detection outcome by re-scoring the results based on the scene identity of the tested frame. We present a Support Vector Regression (SVR) formulation in the case that explicit knowledge of the scene identity is available during training time. In the case that explicit scene labeling is not available, we propose an LSTM formulation that considers the general appearance of the frame to re-score the object detectors.
Show less - Date Issued
- 2017
- Identifier
- CFE0006922, ucf:51703
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006922
- Title
- Analytical study of computer vision-based pavement crack quantification using machine learning techniques.
- Creator
-
Mokhtari, Soroush, Yun, Hae-Bum, Nam, Boo Hyun, Catbas, Necati, Shah, Mubarak, Xanthopoulos, Petros, University of Central Florida
- Abstract / Description
-
Image-based techniques are a promising non-destructive approach for road pavement condition evaluation. The main objective of this study is to extract, quantify and evaluate important surface defects, such as cracks, using an automated computer vision-based system to provide a better understanding of the pavement deterioration process. To achieve this objective, an automated crack-recognition software was developed, employing a series of image processing algorithms of crack extraction, crack...
Show moreImage-based techniques are a promising non-destructive approach for road pavement condition evaluation. The main objective of this study is to extract, quantify and evaluate important surface defects, such as cracks, using an automated computer vision-based system to provide a better understanding of the pavement deterioration process. To achieve this objective, an automated crack-recognition software was developed, employing a series of image processing algorithms of crack extraction, crack grouping, and crack detection. Bottom-hat morphological technique was used to remove the random background of pavement images and extract cracks, selectively based on their shapes, sizes, and intensities using a relatively small number of user-defined parameters. A technical challenge with crack extraction algorithms, including the Bottom-hat transform, is that extracted crack pixels are usually fragmented along crack paths. For de-fragmenting those crack pixels, a novel crack-grouping algorithm is proposed as an image segmentation method, so called MorphLink-C. Statistical validation of this method using flexible pavement images indicated that MorphLink-C not only improves crack-detection accuracy but also reduces crack detection time.Crack characterization was performed by analysing imagerial features of the extracted crack image components. A comprehensive statistical analysis was conducted using filter feature subset selection (FSS) methods, including Fischer score, Gini index, information gain, ReliefF, mRmR, and FCBF to understand the statistical characteristics of cracks in different deterioration stages. Statistical significance of crack features was ranked based on their relevancy and redundancy. The statistical method used in this study can be employed to avoid subjective crack rating based on human visual inspection. Moreover, the statistical information can be used as fundamental data to justify rehabilitation policies in pavement maintenance.Finally, the application of four classification algorithms, including Artificial Neural Network (ANN), Decision Tree (DT), k-Nearest Neighbours (kNN) and Adaptive Neuro-Fuzzy Inference System (ANFIS) is investigated for the crack detection framework. The classifiers were evaluated in the following five criteria: 1) prediction performance, 2) computation time, 3) stability of results for highly imbalanced datasets in which, the number of crack objects are significantly smaller than the number of non-crack objects, 4) stability of the classifiers performance for pavements in different deterioration stages, and 5) interpretability of results and clarity of the procedure. Comparison results indicate the advantages of white-box classification methods for computer vision based pavement evaluation. Although black-box methods, such as ANN provide superior classification performance, white-box methods, such as ANFIS, provide useful information about the logic of classification and the effect of feature values on detection results. Such information can provide further insight for the image-based pavement crack detection application.
Show less - Date Issued
- 2015
- Identifier
- CFE0005671, ucf:50186
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0005671