Current Search: Wang, Jun (x)
Towards High-Efficiency Data Management In the Next-Generation Persistent Memory System.
Chen, Xunchao, Wang, Jun, Fan, Deliang, Lin, Mingjie, Ewetz, Rickard, Zhang, Shaojie, University of Central Florida
Abstract / Description
For the sake of higher cell density while achieving near-zero standby power, recent research progress in Magnetic Tunneling Junction (MTJ) devices has leveraged Multi-Level Cell (MLC) configurations of Spin-Transfer Torque Random Access Memory (STT-RAM). However, in order to mitigate the write disturbance in an MLC strategy, data stored in the soft bit must be restored back immediately after the hard bit switching is completed. Furthermore, as the result of MTJ feature size scaling, the soft...
Show moreFor the sake of higher cell density while achieving near-zero standby power, recent research progress in Magnetic Tunneling Junction (MTJ) devices has leveraged Multi-Level Cell (MLC) configurations of Spin-Transfer Torque Random Access Memory (STT-RAM). However, in order to mitigate the write disturbance in an MLC strategy, data stored in the soft bit must be restored back immediately after the hard bit switching is completed. Furthermore, as the result of MTJ feature size scaling, the soft bit can be expected to become disturbed by the read sensing current, thus requiring an immediate restore operation to ensure the data reliability. In this paper, we design and analyze a novel Adaptive Restore Scheme for Write Disturbance (ARS-WD) and Read Disturbance (ARS-RD), respectively. ARS-WD alleviates restoration overhead by intentionally overwriting soft bit lines which are less likely to be read. ARS-RD, on the other hand, aggregates the potential writes and restore the soft bit line at the time of its eviction from higher level cache. Both of these two schemes are based on a lightweight forecasting approach for the future read behavior of the cache block. Our experimental results show substantial reduction in soft bit line restore operations. Moreover, ARS promotes advantages of MLC to provide a preferable L2 design alternative in terms of energy, area and latency product compared to SLC STT-RAM alternatives. Whereas the popular Cell Split Mapping (CSM) for MLC STT-RAM leverages the inter-block nonuniform access frequency, the intra-block data access features remain untapped in the MLC design. Aiming to minimize the energy-hungry write request to Hard-Bit Line (HBL) and maximize the dynamic range in the advantageous Soft-Bit Line (SBL), an hybrid mapping strategy for MLC STT-RAM cache (Double-S) is advocated in the paper. Double-S couples the contemporary Cell-Split-Mapping with the novel Word-Split-Mapping (WSM). Sparse cache block detector and read depth based data allocation/ migration policy are proposed to release the full potential of Double-S.
Show less
Date Issued
CFE0006865, ucf:51751
Document (PDF)
Probabilistic-Based Computing Transformation with Reconfigurable Logic Fabrics.
Alawad, Mohammed, Lin, Mingjie, DeMara, Ronald, Mikhael, Wasfy, Wang, Jun, Das, Tuhin, University of Central Florida
Abstract / Description
Effectively tackling the upcoming (")zettabytes(") data explosion requires a huge quantum leapin our computing power and energy efficiency. However, with the Moore's law dwindlingquickly, the physical limits of CMOS technology make it almost intractable to achieve highenergy efficiency if the traditional (")deterministic and precise(") computing model still dominates.Worse, the upcoming data explosion mostly comprises statistics gleaned from uncertain,imperfect real-world environment. As such...
Show moreEffectively tackling the upcoming (")zettabytes(") data explosion requires a huge quantum leapin our computing power and energy efficiency. However, with the Moore's law dwindlingquickly, the physical limits of CMOS technology make it almost intractable to achieve highenergy efficiency if the traditional (")deterministic and precise(") computing model still dominates.Worse, the upcoming data explosion mostly comprises statistics gleaned from uncertain,imperfect real-world environment. As such, the traditional computing means of first-principlemodeling or explicit statistical modeling will very likely be ineffective to achieveflexibility, autonomy, and human interaction. The bottom line is clear: given where we areheaded, the fundamental principle of modern computing(-)deterministic logic circuits canflawlessly emulate propositional logic deduction governed by Boolean algebra(-)has to bereexamined, and transformative changes in the foundation of modern computing must bemade.This dissertation presents a novel stochastic-based computing methodology. It efficientlyrealizes the algorithmatic computing through the proposed concept of Probabilistic DomainTransform (PDT). The essence of PDT approach is to encode the input signal asthe probability density function, perform stochastic computing operations on the signal inthe probabilistic domain, and decode the output signal by estimating the probability densityfunction of the resulting random samples. The proposed methodology possesses manynotable advantages. Specifically, it uses much simplified circuit units to conduct complexoperations, which leads to highly area- and energy-efficient designs suitable for parallel processing.Moreover, it is highly fault-tolerant because the information to be processed isencoded with a large ensemble of random samples. As such, the local perturbations of itscomputing accuracy will be dissipated globally, thus becoming inconsequential to the final overall results. Finally, the proposed probabilistic-based computing can facilitate buildingscalable precision systems, which provides an elegant way to trade-off between computingaccuracy and computing performance/hardware efficiency for many real-world applications.To validate the effectiveness of the proposed PDT methodology, two important signal processingapplications, discrete convolution and 2-D FIR filtering, are first implemented andbenchmarked against other deterministic-based circuit implementations. Furthermore, alarge-scale Convolutional Neural Network (CNN), a fundamental algorithmic building blockin many computer vision and artificial intelligence applications that follow the deep learningprinciple, is also implemented with FPGA based on a novel stochastic-based and scalablehardware architecture and circuit design. The key idea is to implement all key componentsof a deep learning CNN, including multi-dimensional convolution, activation, and poolinglayers, completely in the probabilistic computing domain. The proposed architecture notonly achieves the advantages of stochastic-based computation, but can also solve severalchallenges in conventional CNN, such as complexity, parallelism, and memory storage.Overall, being highly scalable and energy efficient, the proposed PDT-based architecture iswell-suited for a modular vision engine with the goal of performing real-time detection, recognitionand segmentation of mega-pixel images, especially those perception-based computingtasks that are inherently fault-tolerant.
Show less
Date Issued
CFE0006828, ucf:51768
Document (PDF)
Hashing for Multimedia Similarity Modeling and Large-Scale Retrieval.
Li, Kai, Hua, Kien, Qi, GuoJun, Hu, Haiyan, Wang, Chung-Ching, University of Central Florida
Abstract / Description
In recent years, the amount of multimedia data such as images, texts, and videos have been growing rapidly on the Internet. Motivated by such trends, this thesis is dedicated to exploiting hashing-based solutions to reveal multimedia data correlations and support intra-media and inter-media similarity search among huge volumes of multimedia data.We start by investigating a hashing-based solution for audio-visual similarity modeling and apply it to the audio-visual sound source localization...
Show moreIn recent years, the amount of multimedia data such as images, texts, and videos have been growing rapidly on the Internet. Motivated by such trends, this thesis is dedicated to exploiting hashing-based solutions to reveal multimedia data correlations and support intra-media and inter-media similarity search among huge volumes of multimedia data.We start by investigating a hashing-based solution for audio-visual similarity modeling and apply it to the audio-visual sound source localization problem. We show that synchronized signals in audio and visual modalities demonstrate similar temporal changing patterns in certain feature spaces. We propose to use a permutation-based random hashing technique to capture the temporal order dynamics of audio and visual features by hashing them along the temporal axis into a common Hamming space. In this way, the audio-visual correlation problem is transformed into a similarity search problem in the Hamming space. Our hashing-based audio-visual similarity modeling has shown superior performances in the localization and segmentation of sounding objects in videos.The success of the permutation-based hashing method motivates us to generalize and formally define the supervised ranking-based hashing problem, and study its application to large-scale image retrieval. Specifically, we propose an effective supervised learning procedure to learn optimized ranking-based hash functions that can be used for large-scale similarity search. Compared with the randomized version, the optimized ranking-based hash codes are much more compact and discriminative. Moreover, it can be easily extended to kernel space to discover more complex ranking structures that cannot be revealed in linear subspaces. Experiments on large image datasets demonstrate the effectiveness of the proposed method for image retrieval.We further studied the ranking-based hashing method for the cross-media similarity search problem. Specifically, we propose two optimization methods to jointly learn two groups of linear subspaces, one for each media type, so that features' ranking orders in different linear subspaces maximally preserve the cross-media similarities. Additionally, we develop this ranking-based hashing method in the cross-media context into a flexible hashing framework with a more general solution. We have demonstrated through extensive experiments on several real-world datasets that the proposed cross-media hashing method can achieve superior cross-media retrieval performances against several state-of-the-art algorithms.Lastly, to make better use of the supervisory label information, as well as to further improve the efficiency and accuracy of supervised hashing, we propose a novel multimedia discrete hashing framework that optimizes an instance-wise loss objective, as compared to the pairwise losses, using an efficient discrete optimization method. In addition, the proposed method decouples the binary codes learning and hash function learning into two separate stages, thus making the proposed method equally applicable for both single-media and cross-media search. Extensive experiments on both single-media and cross-media retrieval tasks demonstrate the effectiveness of the proposed method.
Show less
Date Issued
CFE0006759, ucf:51840
Document (PDF)
Human Detection, Tracking and Segmentation in Surveillance Video.
Shu, Guang, Shah, Mubarak, Boloni, Ladislau, Wang, Jun, Lin, Mingjie, Sugaya, Kiminobu, University of Central Florida
Abstract / Description
This dissertation addresses the problem of human detection and tracking in surveillance videos. Even though this is a well-explored topic, many challenges remain when confronted with data from real world situations. These challenges include appearance variation, illumination changes, camera motion, cluttered scenes and occlusion. In this dissertation several novel methods for improving on the current state of human detection and tracking based on learning scene-specific information in video...
Show moreThis dissertation addresses the problem of human detection and tracking in surveillance videos. Even though this is a well-explored topic, many challenges remain when confronted with data from real world situations. These challenges include appearance variation, illumination changes, camera motion, cluttered scenes and occlusion. In this dissertation several novel methods for improving on the current state of human detection and tracking based on learning scene-specific information in video feeds are proposed.Firstly, we propose a novel method for human detection which employs unsupervised learning and superpixel segmentation. The performance of generic human detectors is usually degraded in unconstrained video environments due to varying lighting conditions, backgrounds and camera viewpoints. To handle this problem, we employ an unsupervised learning framework that improves the detection performance of a generic detector when it is applied to a particular video. In our approach, a generic DPM human detector is employed to collect initial detection examples. These examples are segmented into superpixels and then represented using Bag-of-Words (BoW) framework. The superpixel-based BoW feature encodes useful color features of the scene, which provides additional information. Finally a new scene-specific classifier is trained using the BoW features extracted from the new examples. Compared to previous work, our method learns scene-specific information through superpixel-based features, hence it can avoid many false detections typically obtained by a generic detector. We are able to demonstrate a significant improvement in the performance of the state-of-the-art detector.Given robust human detection, we propose a robust multiple-human tracking framework using a part-based model. Human detection using part models has become quite popular, yet its extension in tracking has not been fully explored. Single camera-based multiple-person tracking is often hindered by difficulties such as occlusion and changes in appearance. We address such problems by developing an online-learning tracking-by-detection method. Our approach learns part-based person-specific Support Vector Machine (SVM) classifiers which capture articulations of moving human bodies with dynamically changing backgrounds. With the part-based model, our approach is able to handle partial occlusions in both the detection and the tracking stages. In the detection stage, we select the subset of parts which maximizes the probability of detection. This leads to a significant improvement in detection performance in cluttered scenes. In the tracking stage, we dynamically handle occlusions by distributing the score of the learned person classifier among its corresponding parts, which allows us to detect and predict partial occlusions and prevent the performance of the classifiers from being degraded. Extensive experiments using the proposed method on several challenging sequences demonstrate state-of-the-art performance in multiple-people tracking.Next, in order to obtain precise boundaries of humans, we propose a novel method for multiple human segmentation in videos by incorporating human detection and part-based detection potential into a multi-frame optimization framework. In the first stage, after obtaining the superpixel segmentation for each detection window, we separate superpixels corresponding to a human and background by minimizing an energy function using Conditional Random Field (CRF). We use the part detection potentials from the DPM detector, which provides useful information for human shape. In the second stage, the spatio-temporal constraints of the video is leveraged to build a tracklet-based Gaussian Mixture Model for each person, and the boundaries are smoothed by multi-frame graph optimization. Compared to previous work, our method could automatically segment multiple people in videos with accurate boundaries, and it is robust to camera motion. Experimental results show that our method achieves better segmentation performance than previous methods in terms of segmentation accuracy on several challenging video sequences.Most of the work in Computer Vision deals with point solution; a specific algorithm for a specific problem. However, putting different algorithms into one real world integrated system is a big challenge. Finally, we introduce an efficient tracking system, NONA, for high-definition surveillance video. We implement the system using a multi-threaded architecture (Intel Threading Building Blocks (TBB)), which executes video ingestion, tracking, and video output in parallel. To improve tracking accuracy without sacrificing efficiency, we employ several useful techniques. Adaptive Template Scaling is used to handle the scale change due to objects moving towards a camera. Incremental Searching and Local Frame Differencing are used to resolve challenging issues such as scale change, occlusion and cluttered backgrounds. We tested our tracking system on a high-definition video dataset and achieved acceptable tracking accuracy while maintaining real-time performance.
Show less
Date Issued
CFE0005551, ucf:50278
Document (PDF)
Taming Wild Faces: Web-Scale, Open-Universe Face Identification in Still and Video Imagery.
Ortiz, Enrique, Shah, Mubarak, Sukthankar, Rahul, Da Vitoria Lobo, Niels, Wang, Jun, Li, Xin, University of Central Florida
Abstract / Description
With the increasing pervasiveness of digital cameras, the Internet, and social networking, there is a growing need to catalog and analyze large collections of photos and videos. In this dissertation, we explore unconstrained still-image and video-based face recognition in real-world scenarios, e.g. social photo sharing and movie trailers, where people of interest are recognized and all others are ignored. In such a scenario, we must obtain high precision in recognizing the known identities,...
Show moreWith the increasing pervasiveness of digital cameras, the Internet, and social networking, there is a growing need to catalog and analyze large collections of photos and videos. In this dissertation, we explore unconstrained still-image and video-based face recognition in real-world scenarios, e.g. social photo sharing and movie trailers, where people of interest are recognized and all others are ignored. In such a scenario, we must obtain high precision in recognizing the known identities, while accurately rejecting those of no interest.Recent advancements in face recognition research has seen Sparse Representation-based Classification (SRC) advance to the forefront of competing methods. However, its drawbacks, slow speed and sensitivity to variations in pose, illumination, and occlusion, have hindered its wide-spread applicability. The contributions of this dissertation are three-fold: 1. For still-image data, we propose a novel Linearly Approximated Sparse Representation-based Classification (LASRC) algorithm that uses linear regression to perform sample selection for l1-minimization, thus harnessing the speed of least-squares and the robustness of SRC. On our large dataset collected from Facebook, LASRC performs equally to standard SRC with a speedup of 100-250x.2. For video, applying the popular l1-minimization for face recognition on a frame-by-frame basis is prohibitively expensive computationally, so we propose a new algorithm Mean Sequence SRC (MSSRC) that performs video face recognition using a joint optimization leveraging all of the available video data and employing the knowledge that the face track frames belong to the same individual. Employing MSSRC results in a speedup of 5x on average over SRC on a frame-by-frame basis.3. Finally, we make the observation that MSSRC sometimes assigns inconsistent identities to the same individual in a scene that could be corrected based on their visual similarity. Therefore, we construct a probabilistic affinity graph combining appearance and co-occurrence similarities to model the relationship between face tracks in a video. Using this relationship graph, we employ random walk analysis to propagate strong class predictions among similar face tracks, while dampening weak predictions. Our method results in a performance gain of 15.8% in average precision over using MSSRC alone.
Show less
Date Issued
CFE0005536, ucf:50313
Document (PDF)
Design Disjunction for Resilient Reconfigurable Hardware.
Alzahrani, Ahmad, DeMara, Ronald, Yuan, Jiann-Shiun, Lin, Mingjie, Wang, Jun, Turgut, Damla, University of Central Florida
Abstract / Description
Contemporary reconfigurable hardware devices have the capability to achieve high performance, powerefficiency, and adaptability required to meet a wide range of design goals. With scaling challenges facing current complementary metal oxide semiconductor (CMOS), new concepts and methodologies supportingefficient adaptation to handle reliability issues are becoming increasingly prominent. Reconfigurable hardware and their ability to realize self-organization features are expected to play a key...
Show moreContemporary reconfigurable hardware devices have the capability to achieve high performance, powerefficiency, and adaptability required to meet a wide range of design goals. With scaling challenges facing current complementary metal oxide semiconductor (CMOS), new concepts and methodologies supportingefficient adaptation to handle reliability issues are becoming increasingly prominent. Reconfigurable hardware and their ability to realize self-organization features are expected to play a key role in designingfuture dependable hardware architectures. However, the exponential increase in density and complexity of current commercial SRAM-based field-programmable gate arrays (FPGAs) has escalated the overheadassociated with dynamic runtime design adaptation. Traditionally, static modular redundancy techniques areconsidered to surmount this limitation; however, they can incur substantial overheads in both area andpower requirements. To achieve a better trade-off among performance, area, power, and reliability, thisresearch proposes design-time approaches that enable fine selection of redundancy level based on target reliability goals and autonomous adaptation to runtime demands. To achieve this goal, three studies were conducted:First, a graph and set theoretic approach, named Hypergraph-Cover Diversity (HCD), is introduced as a preemptive design technique to shift the dominant costs of resiliency to design-time. In particular, union-freehypergraphs are exploited to partition the reconfigurable resources pool into highly separable subsets ofresources, each of which can be utilized by the same synthesized application netlist. The diverseimplementations provide reconfiguration-based resilience throughout the system lifetime while avoiding thesignificant overheads associated with runtime placement and routing phases. Evaluation on a Motion-JPEGimage compression core using a Xilinx 7-series-based FPGA hardware platform has demonstrated thepotential of the proposed FT method to achieve 37.5% area saving and up to 66% reduction in powerconsumption compared to the frequently-used TMR scheme while providing superior fault tolerance.Second, Design Disjunction based on non-adaptive group testing is developed to realize a low-overheadfault tolerant system capable of handling self-testing and self-recovery using runtime partial reconfiguration.Reconfiguration is guided by resource grouping procedures which employ non-linear measurements given by the constructive property of f-disjunctness to extend runtime resilience to a large fault space and realize a favorable range of tradeoffs. Disjunct designs are created using the mosaic convergence algorithmdeveloped such that at least one configuration in the library evades any occurrence of up to d resource faults, where d is lower-bounded by f. Experimental results for a set of MCNC and ISCAS benchmarks havedemonstrated f-diagnosability at the individual slice level with average isolation resolution of 96.4% (94.4%) for f=1 (f=2) while incurring an average critical path delay impact of only 1.49% and area cost roughly comparable to conventional 2-MR approaches. Finally, the proposed Design Disjunction method is evaluated as a design-time method to improve timing yield in the presence of large random within-die (WID) process variations for application with a moderately high production capacity.
Show less
Date Issued
CFE0006250, ucf:51086
Document (PDF)