Current Search: Autonomous Fault Recovery (x)
View All Items
- Title
- A SUSTAINABLE AUTONOMIC ARCHITECTURE FOR ORGANICALLY RECONFIGURABLE COMPUTING SYSTEMS.
- Creator
-
Oreifej, Rashad, DeMara, Ronald, University of Central Florida
- Abstract / Description
-
A Sustainable Autonomic Architecture for Organically Reconfigurable Computing System based on SRAM Field Programmable Gate Arrays (FPGAs) is proposed, modeled analytically, simulated, prototyped, and measured. Low-level organic elements are analyzed and designed to achieve novel self-monitoring, self-diagnosis, and self-repair organic properties. The prototype of a 2-D spatial gradient Sobel video edge-detection organic system use-case developed on a XC4VSX35 Xilinx Virtex-4 Video Starter Kit...
Show moreA Sustainable Autonomic Architecture for Organically Reconfigurable Computing System based on SRAM Field Programmable Gate Arrays (FPGAs) is proposed, modeled analytically, simulated, prototyped, and measured. Low-level organic elements are analyzed and designed to achieve novel self-monitoring, self-diagnosis, and self-repair organic properties. The prototype of a 2-D spatial gradient Sobel video edge-detection organic system use-case developed on a XC4VSX35 Xilinx Virtex-4 Video Starter Kit is presented. Experimental results demonstrate the applicability of the proposed architecture and provide the infrastructure to quantify the performance and overcome fault-handling limitations. Dynamic online autonomous functionality restoration after a malfunction or functionality shift due to changing requirements is achieved at a fine granularity by exploiting dynamic Partial Reconfiguration (PR) techniques. A Genetic Algorithm (GA)-based hardware/software platform for intrinsic evolvable hardware is designed and evaluated for digital circuit repair using a variety of well-accepted benchmarks. Dynamic bitstream compilation for enhanced mutation and crossover operators is achieved by directly manipulating the bitstream using a layered toolset. Experimental results on the edge-detector organic system prototype have shown complete organic online refurbishment after a hard fault. In contrast to previous toolsets requiring many milliseconds or seconds, an average of 0.47 microseconds is required to perform the genetic mutation, 4.2 microseconds to perform the single point conventional crossover, 3.1 microseconds to perform Partial Match Crossover (PMX) as well as Order Crossover (OX), 2.8 microseconds to perform Cycle Crossover (CX), and 1.1 milliseconds for one input pattern intrinsic evaluation. These represent a performance advantage of three orders of magnitude over the JBITS software framework and more than seven orders of magnitude over the Xilinx design flow. Combinatorial Group Testing (CGT) technique was combined with the conventional GA in what is called CGT-pruned GA to reduce repair time and increase system availability. Results have shown up to 37.6% convergence advantage using the pruned technique. Lastly, a quantitative stochastic sustainability model for reparable systems is formulated to evaluate the Sustainability of FPGA-based reparable systems. This model computes at design-time the resources required for refurbishment to meet mission availability and lifetime requirements in a given fault-susceptible missions. By applying this model to MCNC benchmark circuits and the Sobel Edge-Detector in a realistic space mission use-case on Xilinx Virtex-4 FPGA, we demonstrate a comprehensive model encompassing the inter-relationships between system sustainability and fault rates, utilized, and redundant hardware resources, repair policy parameters and decaying reparability.
Show less - Date Issued
- 2011
- Identifier
- CFE0003969, ucf:48661
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0003969
- Title
- Design Disjunction for Resilient Reconfigurable Hardware.
- Creator
-
Alzahrani, Ahmad, DeMara, Ronald, Yuan, Jiann-Shiun, Lin, Mingjie, Wang, Jun, Turgut, Damla, University of Central Florida
- Abstract / Description
-
Contemporary reconfigurable hardware devices have the capability to achieve high performance, powerefficiency, and adaptability required to meet a wide range of design goals. With scaling challenges facing current complementary metal oxide semiconductor (CMOS), new concepts and methodologies supportingefficient adaptation to handle reliability issues are becoming increasingly prominent. Reconfigurable hardware and their ability to realize self-organization features are expected to play a key...
Show moreContemporary reconfigurable hardware devices have the capability to achieve high performance, powerefficiency, and adaptability required to meet a wide range of design goals. With scaling challenges facing current complementary metal oxide semiconductor (CMOS), new concepts and methodologies supportingefficient adaptation to handle reliability issues are becoming increasingly prominent. Reconfigurable hardware and their ability to realize self-organization features are expected to play a key role in designingfuture dependable hardware architectures. However, the exponential increase in density and complexity of current commercial SRAM-based field-programmable gate arrays (FPGAs) has escalated the overheadassociated with dynamic runtime design adaptation. Traditionally, static modular redundancy techniques areconsidered to surmount this limitation; however, they can incur substantial overheads in both area andpower requirements. To achieve a better trade-off among performance, area, power, and reliability, thisresearch proposes design-time approaches that enable fine selection of redundancy level based on target reliability goals and autonomous adaptation to runtime demands. To achieve this goal, three studies were conducted:First, a graph and set theoretic approach, named Hypergraph-Cover Diversity (HCD), is introduced as a preemptive design technique to shift the dominant costs of resiliency to design-time. In particular, union-freehypergraphs are exploited to partition the reconfigurable resources pool into highly separable subsets ofresources, each of which can be utilized by the same synthesized application netlist. The diverseimplementations provide reconfiguration-based resilience throughout the system lifetime while avoiding thesignificant overheads associated with runtime placement and routing phases. Evaluation on a Motion-JPEGimage compression core using a Xilinx 7-series-based FPGA hardware platform has demonstrated thepotential of the proposed FT method to achieve 37.5% area saving and up to 66% reduction in powerconsumption compared to the frequently-used TMR scheme while providing superior fault tolerance.Second, Design Disjunction based on non-adaptive group testing is developed to realize a low-overheadfault tolerant system capable of handling self-testing and self-recovery using runtime partial reconfiguration.Reconfiguration is guided by resource grouping procedures which employ non-linear measurements given by the constructive property of f-disjunctness to extend runtime resilience to a large fault space and realize a favorable range of tradeoffs. Disjunct designs are created using the mosaic convergence algorithmdeveloped such that at least one configuration in the library evades any occurrence of up to d resource faults, where d is lower-bounded by f. Experimental results for a set of MCNC and ISCAS benchmarks havedemonstrated f-diagnosability at the individual slice level with average isolation resolution of 96.4% (94.4%) for f=1 (f=2) while incurring an average critical path delay impact of only 1.49% and area cost roughly comparable to conventional 2-MR approaches. Finally, the proposed Design Disjunction method is evaluated as a design-time method to improve timing yield in the presence of large random within-die (WID) process variations for application with a moderately high production capacity.
Show less - Date Issued
- 2015
- Identifier
- CFE0006250, ucf:51086
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006250
- Title
- Autonomous Recovery of Reconfigurable Logic Devices using Priority Escalation of Slack.
- Creator
-
Imran, Syednaveed, DeMara, Ronald, Mikhael, Wasfy, Lin, Mingjie, Yuan, Jiann-Shiun, Geiger, Christopher, University of Central Florida
- Abstract / Description
-
Field Programmable Gate Array (FPGA) devices offer a suitable platform for survivable hardware architectures in mission-critical systems. In this dissertation, active dynamic redundancy-based fault-handling techniques are proposed which exploit the dynamic partial reconfiguration capability of SRAM-based FPGAs. Self-adaptation is realized by employing reconfiguration in detection, diagnosis, and recovery phases.To extend these concepts to semiconductor aging and process variation in the deep...
Show moreField Programmable Gate Array (FPGA) devices offer a suitable platform for survivable hardware architectures in mission-critical systems. In this dissertation, active dynamic redundancy-based fault-handling techniques are proposed which exploit the dynamic partial reconfiguration capability of SRAM-based FPGAs. Self-adaptation is realized by employing reconfiguration in detection, diagnosis, and recovery phases.To extend these concepts to semiconductor aging and process variation in the deep submicron era, resilient adaptable processing systems are sought to maintain quality and throughput requirements despite the vulnerabilities of the underlying computational devices. A new approach to autonomous fault-handling which addresses these goals is developed using only a uniplex hardware arrangement. It operates by observing a health metric to achieve Fault Demotion using Reconfigurable Slack (FaDReS). Here an autonomous fault isolation scheme is employed which neither requires test vectors nor suspends the computational throughput, but instead observes the value of a health metric based on runtime input. The deterministic flow of the fault isolation scheme guarantees success in a bounded number of reconfigurations of the FPGA fabric.FaDReS is then extended to the Priority Using Resource Escalation (PURE) online redundancy scheme which considers fault-isolation latency and throughput trade-offs under a dynamic spare arrangement. While deep-submicron designs introduce new challenges, use of adaptive techniques are seen to provide several promising avenues for improving resilience. The scheme developed is demonstrated by hardware design of various signal processing circuits and their implementation on a Xilinx Virtex-4 FPGA device. These include a Discrete Cosine Transform (DCT) core, Motion Estimation (ME) engine, Finite Impulse Response (FIR) Filter, Support Vector Machine (SVM), and Advanced Encryption Standard (AES) blocks in addition to MCNC benchmark circuits. A significant reduction in power consumption is achieved ranging from 83% for low motion-activity scenes to 12.5% for high motion activity video scenes in a novel ME engine configuration. For a typical benchmark video sequence, PURE is shown to maintain a PSNR baseline near 32dB. The diagnosability, reconfiguration latency, and resource overhead of each approach is analyzed. Compared to previous alternatives, PURE maintains a PSNR within a difference of 4.02dB to 6.67dB from the fault-free baseline by escalating healthy resources to higher-priority signal processing functions. The results indicate the benefits of priority-aware resiliency over conventional redundancy approaches in terms of fault-recovery, power consumption, and resource-area requirements. Together, these provide a broad range of strategies to achieve autonomous recovery of reconfigurable logic devices under a variety of constraints, operating conditions, and optimization criteria.
Show less - Date Issued
- 2013
- Identifier
- CFE0005006, ucf:50005
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0005006