You are here

Design Disjunction for Resilient Reconfigurable Hardware

Download pdf | Full Screen View

Date Issued:
2015
Abstract/Description:
Contemporary reconfigurable hardware devices have the capability to achieve high performance, powerefficiency, and adaptability required to meet a wide range of design goals. With scaling challenges facing current complementary metal oxide semiconductor (CMOS), new concepts and methodologies supportingefficient adaptation to handle reliability issues are becoming increasingly prominent. Reconfigurable hardware and their ability to realize self-organization features are expected to play a key role in designingfuture dependable hardware architectures. However, the exponential increase in density and complexity of current commercial SRAM-based field-programmable gate arrays (FPGAs) has escalated the overheadassociated with dynamic runtime design adaptation. Traditionally, static modular redundancy techniques areconsidered to surmount this limitation; however, they can incur substantial overheads in both area andpower requirements. To achieve a better trade-off among performance, area, power, and reliability, thisresearch proposes design-time approaches that enable fine selection of redundancy level based on target reliability goals and autonomous adaptation to runtime demands. To achieve this goal, three studies were conducted:First, a graph and set theoretic approach, named Hypergraph-Cover Diversity (HCD), is introduced as a preemptive design technique to shift the dominant costs of resiliency to design-time. In particular, union-freehypergraphs are exploited to partition the reconfigurable resources pool into highly separable subsets ofresources, each of which can be utilized by the same synthesized application netlist. The diverseimplementations provide reconfiguration-based resilience throughout the system lifetime while avoiding thesignificant overheads associated with runtime placement and routing phases. Evaluation on a Motion-JPEGimage compression core using a Xilinx 7-series-based FPGA hardware platform has demonstrated thepotential of the proposed FT method to achieve 37.5% area saving and up to 66% reduction in powerconsumption compared to the frequently-used TMR scheme while providing superior fault tolerance.Second, Design Disjunction based on non-adaptive group testing is developed to realize a low-overheadfault tolerant system capable of handling self-testing and self-recovery using runtime partial reconfiguration.Reconfiguration is guided by resource grouping procedures which employ non-linear measurements given by the constructive property of f-disjunctness to extend runtime resilience to a large fault space and realize a favorable range of tradeoffs. Disjunct designs are created using the mosaic convergence algorithmdeveloped such that at least one configuration in the library evades any occurrence of up to d resource faults, where d is lower-bounded by f. Experimental results for a set of MCNC and ISCAS benchmarks havedemonstrated f-diagnosability at the individual slice level with average isolation resolution of 96.4% (94.4%) for f=1 (f=2) while incurring an average critical path delay impact of only 1.49% and area cost roughly comparable to conventional 2-MR approaches. Finally, the proposed Design Disjunction method is evaluated as a design-time method to improve timing yield in the presence of large random within-die (WID) process variations for application with a moderately high production capacity.
Title: Design Disjunction for Resilient Reconfigurable Hardware.
0 views
0 downloads
Name(s): Alzahrani, Ahmad, Author
DeMara, Ronald, Committee Chair
Yuan, Jiann-Shiun, Committee Member
Lin, Mingjie, Committee Member
Wang, Jun, Committee Member
Turgut, Damla, Committee Member
University of Central Florida, Degree Grantor
Type of Resource: text
Date Issued: 2015
Publisher: University of Central Florida
Language(s): English
Abstract/Description: Contemporary reconfigurable hardware devices have the capability to achieve high performance, powerefficiency, and adaptability required to meet a wide range of design goals. With scaling challenges facing current complementary metal oxide semiconductor (CMOS), new concepts and methodologies supportingefficient adaptation to handle reliability issues are becoming increasingly prominent. Reconfigurable hardware and their ability to realize self-organization features are expected to play a key role in designingfuture dependable hardware architectures. However, the exponential increase in density and complexity of current commercial SRAM-based field-programmable gate arrays (FPGAs) has escalated the overheadassociated with dynamic runtime design adaptation. Traditionally, static modular redundancy techniques areconsidered to surmount this limitation; however, they can incur substantial overheads in both area andpower requirements. To achieve a better trade-off among performance, area, power, and reliability, thisresearch proposes design-time approaches that enable fine selection of redundancy level based on target reliability goals and autonomous adaptation to runtime demands. To achieve this goal, three studies were conducted:First, a graph and set theoretic approach, named Hypergraph-Cover Diversity (HCD), is introduced as a preemptive design technique to shift the dominant costs of resiliency to design-time. In particular, union-freehypergraphs are exploited to partition the reconfigurable resources pool into highly separable subsets ofresources, each of which can be utilized by the same synthesized application netlist. The diverseimplementations provide reconfiguration-based resilience throughout the system lifetime while avoiding thesignificant overheads associated with runtime placement and routing phases. Evaluation on a Motion-JPEGimage compression core using a Xilinx 7-series-based FPGA hardware platform has demonstrated thepotential of the proposed FT method to achieve 37.5% area saving and up to 66% reduction in powerconsumption compared to the frequently-used TMR scheme while providing superior fault tolerance.Second, Design Disjunction based on non-adaptive group testing is developed to realize a low-overheadfault tolerant system capable of handling self-testing and self-recovery using runtime partial reconfiguration.Reconfiguration is guided by resource grouping procedures which employ non-linear measurements given by the constructive property of f-disjunctness to extend runtime resilience to a large fault space and realize a favorable range of tradeoffs. Disjunct designs are created using the mosaic convergence algorithmdeveloped such that at least one configuration in the library evades any occurrence of up to d resource faults, where d is lower-bounded by f. Experimental results for a set of MCNC and ISCAS benchmarks havedemonstrated f-diagnosability at the individual slice level with average isolation resolution of 96.4% (94.4%) for f=1 (f=2) while incurring an average critical path delay impact of only 1.49% and area cost roughly comparable to conventional 2-MR approaches. Finally, the proposed Design Disjunction method is evaluated as a design-time method to improve timing yield in the presence of large random within-die (WID) process variations for application with a moderately high production capacity.
Identifier: CFE0006250 (IID), ucf:51086 (fedora)
Note(s): 2015-12-01
Ph.D.
Engineering and Computer Science, Electrical Engineering and Computer Engineering
Doctoral
This record was generated from author submitted information.
Subject(s): Reconfigurable logic devices -- field programmable gate arrays -- autonomous fault handling -- fault-tolerant systems -- run-time fault diagnosis and recovery -- online test -- design space exploration -- process variation.
Persistent Link to This Record: http://purl.flvc.org/ucf/fd/CFE0006250
Restrictions on Access: campus 2017-06-15
Host Institution: UCF

In Collections