View All Items
- Title
- Exploring FPGA Implementation for Binarized Neural Network Inference.
- Creator
-
Yang, Li, Fan, Deliang, Zhang, Wei, Lin, Mingjie, University of Central Florida
- Abstract / Description
-
Deep convolutional neural network has taken an important role in machine learning algorithm. It is widely used in different areas such as computer vision, robotics, and biology. However, the models of deep neural networks become larger and more computation complexity which is a big obstacle for such huge model to implement on embedded systems. Recent works have shown the binarized neural networks (BNN), utilizing binarized (i.e. +1 and -1) convolution kernel and binarized activation function,...
Show moreDeep convolutional neural network has taken an important role in machine learning algorithm. It is widely used in different areas such as computer vision, robotics, and biology. However, the models of deep neural networks become larger and more computation complexity which is a big obstacle for such huge model to implement on embedded systems. Recent works have shown the binarized neural networks (BNN), utilizing binarized (i.e. +1 and -1) convolution kernel and binarized activation function, can significantly reduce the parameter size and computation cost, which makes it hardware-friendly for Field-Programmable Gate Arrays (FPGAs) implementation with efficient energy cost. This thesis proposes to implement a new parallel convolutional binarized neural network (i.e. PC-BNN) on FPGA with accurate inference. The embedded PC-BNN is designed for image classification on CIFAR-10 dataset and explores the hardware architecture and optimization of customized CNN topology.The parallel-convolution binarized neural network has two parallel binarized convolution layers which replaces the original single binarized convolution layer. It achieves around 86% on CIFAR-10 dataset and owns 2.3Mb parameter size. We implement our PC-BNN inference into the Xilinx PYNQ Z1 FPGA board which only has 4.9Mb on-chip Block RAM. Since the ultra-small network parameter, the whole model parameters can be stored on on-chip memory which can greatly reduce energy consumption and computation latency. Meanwhile, we design a new pipeline streaming architecture for PC-BNN hardware inference which can further increase the performance. The experiment results show that our PC-BNN inference on FPGA achieves 930 frames per second and 387.5 FPS/Watt, which are among the best throughput and energy efficiency compared to most recent works.
Show less - Date Issued
- 2018
- Identifier
- CFE0007384, ucf:52067
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0007384
- Title
- A FPGA-BASED ARCHITECTURE FOR LED BACKLIGHT DRIVING.
- Creator
-
Zheng, Zhaoshi, Zhou, Huiyang, University of Central Florida
- Abstract / Description
-
In recent years, Light-emitting Diodes (LEDs) have become a promising candidate for backlighting Liquid Crystal Displays (LCDs). Compared with traditional Cold Cathode Fluorescent Lamps (CCFLs) technology, LEDs offer not only better visual quality, but also improved power efficiency. However, to fully utilized LEDs' capability requires dynamic independent control of individual LEDs, which remains as a challenging topic. A FPGA-based hardware system for LED backlight control is proposed in...
Show moreIn recent years, Light-emitting Diodes (LEDs) have become a promising candidate for backlighting Liquid Crystal Displays (LCDs). Compared with traditional Cold Cathode Fluorescent Lamps (CCFLs) technology, LEDs offer not only better visual quality, but also improved power efficiency. However, to fully utilized LEDs' capability requires dynamic independent control of individual LEDs, which remains as a challenging topic. A FPGA-based hardware system for LED backlight control is proposed in this work. We successfully achieve dynamic adjustment of any individual LED' intensity in each of the three color channels (Red, Green and Blue), in response to a real time incoming video stream. In computing LED intensity, four video content processing algorithms have been implemented and tested, including averaging, histogram equalization, LED zone pattern change detection and non-linear mapping. We also construct two versions of the system. The first employs an embedded processor which performs the above-mentioned algorithms on pre-processed video data; the second embodies the same functionality as the first on fixed hardware logic for better performance and power efficiency. The system servers as the backbone of a consolidated display, which yields better visual quality than common commercial displays, we build in collaboration with a group of researchers from CREOL at UCF.
Show less - Date Issued
- 2010
- Identifier
- CFE0003351, ucf:48451
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0003351
- Title
- PIPELINING OF DOUBLE PRECISION FLOATING POINT DIVISION AND SQUARE ROOT OPERATIONS ON FIELD-PROGRAMMABLE GATE ARRAYS.
- Creator
-
Thakkar, Anuja, Ejnioui, Abdel, University of Central Florida
- Abstract / Description
-
Many space applications, such as vision-based systems, synthetic aperture radar, and radar altimetry rely increasingly on high data rate DSP algorithms. These algorithms use double precision floating point arithmetic operations. While most DSP applications can be executed on DSP processors, the DSP numerical requirements of these new space applications surpass by far the numerical capabilities of many current DSP processors. Since the tradition in DSP processing has been to use fixed point...
Show moreMany space applications, such as vision-based systems, synthetic aperture radar, and radar altimetry rely increasingly on high data rate DSP algorithms. These algorithms use double precision floating point arithmetic operations. While most DSP applications can be executed on DSP processors, the DSP numerical requirements of these new space applications surpass by far the numerical capabilities of many current DSP processors. Since the tradition in DSP processing has been to use fixed point number representation, only recently have DSP processors begun to incorporate floating point arithmetic units, even though most of these units handle only single precision floating point addition/subtraction, multiplication, and occasionally division. While DSP processors are slowly evolving to meet the numerical requirements of newer space applications, FPGA densities have rapidly increased to parallel and surpass even the gate densities of many DSP processors and commodity CPUs. This makes them attractive platforms to implement compute-intensive DSP computations. Even in the presence of this clear advantage on the side of FPGAs, few attempts have been made to examine how wide precision floating point arithmetic, particularly division and square root operations, can perform on FPGAs to support these compute-intensive DSP applications. In this context, this thesis presents the sequential and pipelined designs of IEEE-754 compliant double floating point division and square root operations based on low radix digit recurrence algorithms. FPGA implementations of these algorithms have the advantage of being easily testable. In particular, the pipelined designs are synthesized based on careful partial and full unrolling of the iterations in the digit recurrence algorithms. In the overall, the implementations of the sequential and pipelined designs are common-denominator implementations which do not use any performance-enhancing embedded components such as multipliers and block memory. As these implementations exploit exclusively the fine-grain reconfigurable resources of Virtex FPGAs, they are easily portable to other FPGAs with similar reconfigurable fabrics without any major modifications. The pipelined designs of these two operations are evaluated in terms of area, throughput, and dynamic power consumption as a function of pipeline depth. Pipelining experiments reveal that the area overhead tends to remain constant regardless of the degree of pipelining to which the design is submitted, while the throughput increases with pipeline depth. In addition, these experiments reveal that pipelining reduces power considerably in shallow pipelines. Pipelining further these designs does not necessarily lead to significant power reduction. By partitioning these designs into deeper pipelines, these designs can reach throughputs close to the 100 MFLOPS mark by consuming a modest 1% to 8% of the reconfigurable fabric within a Virtex-II XC2VX000 (e.g., XC2V1000 or XC2V6000) FPGA.
Show less - Date Issued
- 2006
- Identifier
- CFE0000955, ucf:46751
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0000955
- Title
- A COMPETITIVE RECONFIGURATION APPROACH TO AUTONOMOUS FAULT HANDLING USING GENETIC ALGORITHMS.
- Creator
-
Zhang, Kening, DeMara, Ronald F, University of Central Florida
- Abstract / Description
-
In this dissertation, a novel self-repair approach based on Consensus Based Evaluation (CBE) for autonomous repair of SRAM-based Field Programmable Gate Arrays (FPGAs) is developed, evaluated, and refined. An initial population of functionally identical (same input-output behavior), yet physically distinct (alternative design or place-and-route realization) FPGA configurations is produced at design time. During run-time, the CBE approach ranks these alternative configurations after evaluating...
Show moreIn this dissertation, a novel self-repair approach based on Consensus Based Evaluation (CBE) for autonomous repair of SRAM-based Field Programmable Gate Arrays (FPGAs) is developed, evaluated, and refined. An initial population of functionally identical (same input-output behavior), yet physically distinct (alternative design or place-and-route realization) FPGA configurations is produced at design time. During run-time, the CBE approach ranks these alternative configurations after evaluating their discrepancy relative to the consensus formed by the population. Through runtime competition, faults in the logical resources become occluded from the visibility of subsequent FPGA operations. Meanwhile, offspring formed through crossover and mutation of faulty and viable configurations are selected at a controlled re-introduction rate for evaluation and refurbishment. Refurbishments are evolved in-situ, with online real-time input-based performance evaluation, enhancing system availability and sustainability, creating an Organic Embedded System (OES). A fault tolerance model called N Modular Redundancy with Standby (NMRSB) is developed which combines the two popular fault tolerance techniques of NMR and Standby fault tolerance in order to facilitate the CBE approach. This dissertation develops two of instances of the NMRSB system Triple Modular Redundancy with Standby (TMRSB) and Duplex with Standby (DSB). A hypothetical Xilinx Virtex-II Pro FPGA model demonstrates their viability for various applications including a 3-bit x 3-bit multiplier, and the MCNC91 benchmark circuits. Experiments conducted on the model iii evaluate the performance of three new genetic operators and demonstrate progress towards a completely self-contained single-chip implementation so that the FPGA can refurbish itself without requiring a PC host to execute the Genetic Algorithm. This dissertation presents results from the simulations of multiple applications with a CBE model implemented in the C++ programming language. Starting with an initial population of 20 and 30 viable configurations for TMRSB and DSB respectively, a single stuck-at fault is introduced in the logic resources. Fault refurbishment experiments are conducted under supervision of CBE using a fitness state evaluation function based on competing outputs, fitness adjustment, and different level threshold. The device remains online throughout the process by which a complete repair is realized with Hamming Distance and Bitweight voting schemes. The results indicate a Hamming Distance TMRSB approach can prevent the most pervasive fault impacts and realize complete refurbishment. Experimental results also show that the Autonomic Layer demonstrates 100% faulty component isolation for both Functional Elements (FEs) and Autonomous Elements (AEs) with randomly injected single and multiple faults. Using logic circuits from the MCNC-91 benchmark set, availability during repair phases averaged 75.05%, 82.21%, and 65.21% for the z4ml, cm85a, and cm138a circuits respectively under stated conditions. In addition to simulation, the proposed OES architecture synthesized from HDL was prototyped on a Xilinx Virtex II Pro FPGA device supporting partial reconfiguration to demonstrate the feasibility for intrinsic regeneration of the selected circuit.
Show less - Date Issued
- 2008
- Identifier
- CFE0002280, ucf:47849
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0002280
- Title
- RECONFIGURABLE ARCHITECTURE FOR H.264/AVC VARIABLE BLOCK SIZE MOTION ESTIMATION BASED ON MOTION ACTIVITY AND ADAPTIVE SEARCH RANGE.
- Creator
-
Kodipyaka, Sumedha, Lee, Jooheung, University of Central Florida
- Abstract / Description
-
Motion Estimation (ME) technique plays a key role in the video coding systems to achieve high compression ratios by removing temporal redundancies among video frames. Especially in the newest H.264/AVC video coding standard, ME engine demands large amount of computational capabilities due to its support for wide range of different block sizes for a given macroblock in order to increase accuracy in finding best matching block in the previous frames. We propose scalable architecture for H.264...
Show moreMotion Estimation (ME) technique plays a key role in the video coding systems to achieve high compression ratios by removing temporal redundancies among video frames. Especially in the newest H.264/AVC video coding standard, ME engine demands large amount of computational capabilities due to its support for wide range of different block sizes for a given macroblock in order to increase accuracy in finding best matching block in the previous frames. We propose scalable architecture for H.264/AVC Variable Block Size (VBS) Motion Estimation with adaptive computing capability to support various search ranges, input video resolutions, and frame rates. Hardware architecture of the proposed ME consists of scalable Sum of Absolute Difference (SAD) arrays which can perform Full Search Block Matching Algorithm (FSBMA) for smaller 4x4 blocks. It is also shown that by predicting motion activity and adaptively adjusting the Search Range (SR) on the reconfigurable hardware platform, the computational cost of ME required for inter-frame encoding in H.264/AVC video coding standard can be reduced significantly. Dynamic Partial Reconfiguration is a unique feature of Field Programmable Gate Arrays (FPGAs) that makes best use of hardware resources and power by allowing adaptive algorithm to be implemented during run-time. We exploit this feature of FPGA to implement the proposed reconfigurable architecture of ME and maximize the architectural benefits through prediction of motion activities in the video sequences ,adaptation of SR during run-time, and fractional ME refinement. The implemented ME architecture can support real time applications at a maximum frequency of 90MHz with multiple reconfigurable regions. When compared to reconfiguration of complete design, partial reconfiguration process results in smaller bitstream size which allows FPGA to implement different configurations at higher speed. The proposed architecture has modular structure, regular data flow, and efficient memory organization with lower memory accesses. By increasing the number of active partial reconfigurable modules from one to four, there is a 4 fold increase in data re-use. Also, by introducing adaptive SR reduction algorithm at frame level, the computational load of ME is reduced significantly with only small degradation in PSNR (0.1dB).
Show less - Date Issued
- 2010
- Identifier
- CFE0003316, ucf:48488
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0003316
- Title
- FPGA-BASED DESIGN OF A MAXIMUM-POWER-POINT TRACKING SYSTEM FOR SPACE APPLICATIONS.
- Creator
-
Persen, Todd, Ejnioui, Abdel, University of Central Florida
- Abstract / Description
-
Satellites need a source of power throughout their missions to help them remain operational for several years. The power supplies of these satellites, provided primarily by solar arrays, must have high efficiencies and low weights in order to meet stringent design constraints. Power conversion from these arrays is required to provide robust and reliable conversion which performs optimally in varying conditions of peak power, solar flux, and occlusion conditions. Since the role of these arrays...
Show moreSatellites need a source of power throughout their missions to help them remain operational for several years. The power supplies of these satellites, provided primarily by solar arrays, must have high efficiencies and low weights in order to meet stringent design constraints. Power conversion from these arrays is required to provide robust and reliable conversion which performs optimally in varying conditions of peak power, solar flux, and occlusion conditions. Since the role of these arrays is to deliver power, one of the principle factors in achieving maximum power output from an array is tracking and holding its maximum-power point. This point, which varies with temperature, insolation, and loading conditions, must be continuously monitored in order to react to rapid changes. Until recently, the control of maximum power point tracking (MPPT) has been implemented in microcontrollers and digital signal processors (DSPs). While DSPs can provide a reasonable performance, they do not provide the advantages that field-programmable gate arrays (FPGA) chips can potentially offer to the implementation of MPPT control. In comparison to DSP implementations, FPGAs offer lower cost implementations since the functions of various components can be integrated onto the same FPGA chip as opposed to DSPs which can perform only DSP-related computations. In addition, FPGAs can provide equivalent or higher performance with the customization potential of an ASIC. Because FPGAs can be reprogrammed at any time, repairs can be performed in-situ while the system is running thus providing a high degree of robustness. Beside robustness, this reprogrammability can provide a high level of (i) flexibility that can make upgrading an MPPT control system easy by merely updating or modifying the MPPT algorithm running on the FPGA chip, and (ii) expandability that makes expanding an FPGA-based MPPT control system to handle multi-channel control. In addition, this reprogrammability provides a level of testability that DSPs cannot match by allowing the emulation of the entire MPPT control system onto the FPGA chip. This thesis proposes an FPGA-based implementation of an MPPT control system suitable for space applications. At the core of this system, the Perturb-and-observe algorithm is used to track the maximum power point. The algorithm runs on an Alera FLEX 10K FPGA chip. Additional functional blocks, such as the ADC interface, FIR filter, dither generator, and DAC interface, needed to support the MPPT control system are integrated within the same FPGA device thus streamlining the part composition of the physical prototype used to build this control system.
Show less - Date Issued
- 2004
- Identifier
- CFE0000287, ucf:46232
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0000287
- Title
- A MULTI-LAYER FPGA FRAMEWORK SUPPORTING AUTONOMOUS RUNTIME PARTIAL RECONFIGURATION.
- Creator
-
Tan, Heng, DeMara, Ronald, University of Central Florida
- Abstract / Description
-
Partial reconfiguration is a unique capability provided by several Field Programmable Gate Array (FPGA) vendors recently, which involves altering part of the programmed design within an SRAM-based FPGA at run-time. In this dissertation, a Multilayer Runtime Reconfiguration Architecture (MRRA) is developed, evaluated, and refined for Autonomous Runtime Partial Reconfiguration of FPGA devices. Under the proposed MRRA paradigm, FPGA configurations can be manipulated at runtime using on-chip...
Show morePartial reconfiguration is a unique capability provided by several Field Programmable Gate Array (FPGA) vendors recently, which involves altering part of the programmed design within an SRAM-based FPGA at run-time. In this dissertation, a Multilayer Runtime Reconfiguration Architecture (MRRA) is developed, evaluated, and refined for Autonomous Runtime Partial Reconfiguration of FPGA devices. Under the proposed MRRA paradigm, FPGA configurations can be manipulated at runtime using on-chip resources. Operations are partitioned into Logic, Translation, and Reconfiguration layers along with a standardized set of Application Programming Interfaces (APIs). At each level, resource details are encapsulated and managed for efficiency and portability during operation. An MRRA mapping theory is developed to link the general logic function and area allocation information to the device related physical configuration level data by using mathematical data structure and physical constraints. In certain scenarios, configuration bit stream data can be read and modified directly for fast operations, relying on the use of similar logic functions and common interconnection resources for communication. A corresponding logic control flow is also developed to make the entire process autonomous. Several prototype MRRA systems are developed on a Xilinx Virtex II Pro platform. The Virtex II Pro on-chip PowerPC core and block RAM are employed to manage control operations while multiple physical interfaces establish and supplement autonomous reconfiguration capabilities. Area, speed and power optimization techniques are developed based on the developed Xilinx prototype. Evaluations and analysis of these prototype and techniques are performed on a number of benchmark and hashing algorithm case studies. The results indicate that based on a variety of test benches, up to 70% reduction in the resource utilization, up to 50% improvement in power consumption, and up to 10 times increase in run-time performance are achieved using the developed architecture and approaches compared with Xilinx baseline reconfiguration flow. Finally, a Genetic Algorithm (GA) for a FPGA fault tolerance case study is evaluated as a ultimate high-level application running on this architecture. It demonstrated that this is a hardware and software infrastructure that enables an FPGA to dynamically reconfigure itself efficiently under the control of a soft microprocessor core that is instantiated within the FPGA fabric. Such a system contributes to the observed benefits of intelligent control, fast reconfiguration, and low overhead.
Show less - Date Issued
- 2007
- Identifier
- CFE0001933, ucf:47448
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0001933
- Title
- OPTIMIZING DYNAMIC LOGIC REALIZATIONS FOR PARTIAL RECONFIGURATION OF FIELD PROGRAMMABLE GATE ARRAYS.
- Creator
-
Parris, Matthew, DeMara, Ronald, University of Central Florida
- Abstract / Description
-
Many digital logic applications can take advantage of the reconfiguration capability of Field Programmable Gate Arrays (FPGAs) to dynamically patch design flaws, recover from faults, or time-multiplex between functions. Partial reconfiguration is the process by which a user modifies one or more modules residing on the FPGA device independently of the others. Partial Reconfiguration reduces the granularity of reconfiguration to be a set of columns or rectangular region of the device....
Show moreMany digital logic applications can take advantage of the reconfiguration capability of Field Programmable Gate Arrays (FPGAs) to dynamically patch design flaws, recover from faults, or time-multiplex between functions. Partial reconfiguration is the process by which a user modifies one or more modules residing on the FPGA device independently of the others. Partial Reconfiguration reduces the granularity of reconfiguration to be a set of columns or rectangular region of the device. Decreasing the granularity of reconfiguration results in reduced configuration filesizes and, thus, reduced configuration times. When compared to one bitstream of a non-partial reconfiguration implementation, smaller modules resulting in smaller bitstream filesizes allow an FPGA to implement many more hardware configurations with greater speed under similar storage requirements. To realize the benefits of partial reconfiguration in a wider range of applications, this thesis begins with a survey of FPGA fault-handling methods, which are compared using performance-based metrics. Performance analysis of the Genetic Algorithm (GA) Offline Recovery method is investigated and candidate solutions provided by the GA are partitioned by age to improve its efficiency. Parameters of this aging technique are optimized to increase the occurrence rate of complete repairs. Continuing the discussion of partial reconfiguration, the thesis develops a case-study application that implements one partial reconfiguration module to demonstrate the functionality and benefits of time multiplexing and reveal the improved efficiencies of the latest large-capacity FPGA architectures. The number of active partial reconfiguration modules implemented on a single FPGA device is increased from one to eight to implement a dynamic video-processing architecture for Discrete Cosine Transform and Motion Estimation functions to demonstrate a 55-fold reduction in bitstream storage requirements thus improving partial reconfiguration capability.
Show less - Date Issued
- 2008
- Identifier
- CFE0002323, ucf:47793
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0002323
- Title
- RECONFIGURABLE COMPUTING FOR VIDEO CODING.
- Creator
-
Huang, Jian, Lee, Jooheung, University of Central Florida
- Abstract / Description
-
Video coding is widely used in our daily life. Due to its high computational complexity, hardware implementation is usually preferred. In this research, we investigate both ASIC hardware design approach and reconfigurable hardware design approach for video coding applications. First, we present a unified architecture that can perform Discrete Cosine Transform (DCT), Inverse Discrete Cosine Transform (IDCT), DCT domain motion estimation and compensation (DCT-ME/MC). Our proposed architecture...
Show moreVideo coding is widely used in our daily life. Due to its high computational complexity, hardware implementation is usually preferred. In this research, we investigate both ASIC hardware design approach and reconfigurable hardware design approach for video coding applications. First, we present a unified architecture that can perform Discrete Cosine Transform (DCT), Inverse Discrete Cosine Transform (IDCT), DCT domain motion estimation and compensation (DCT-ME/MC). Our proposed architecture is a Wavefront Array-based Processor with a highly modular structure consisting of 8*8 Processing Elements (PEs). By utilizing statistical properties and arithmetic operations, it can be used as a high performance hardware accelerator for video transcoding applications. We show how different core algorithms can be mapped onto the same hardware fabric and can be executed through the pre-defined PEs. In addition to the simplified design process of the proposed architecture and savings of the hardware resources, we also demonstrate that high throughput rate can be achieved for IDCT and DCT-MC by fully utilizing the sparseness property of DCT coefficient matrix. Compared to fixed hardware architecture using ASIC design approach, reconfigurable hardware design approach has higher flexibility, lower cost, and faster time-to-market. We propose a self-reconfigurable platform which can reconfigure the architecture of DCT computations during run-time using dynamic partial reconfiguration. The scalable architecture for DCT computations can compute different number of DCT coefficients in the zig-zag scan order to adapt to different requirements, such as power consumption, hardware resource, and performance. We propose a configuration manager which is implemented in the embedded processor in order to adaptively control the reconfiguration of scalable DCT architecture during run-time. In addition, we use LZSS algorithm for compression of the partial bitstreams and on-chip BlockRAM as a cache to reduce latency overhead for loading the partial bitstreams from the off-chip memory for run-time reconfiguration. A hardware module is designed for parallel reconfiguration of the partial bitstreams. The experimental results show that our approach can reduce the external memory accesses by 69% and can achieve 400 MBytes/s reconfiguration rate. Detailed trade-offs of power, throughput, and quality are investigated, and used as a criterion for self-reconfiguration. Prediction algorithm of zero quantized DCT (ZQDCT) to control the run-time reconfiguration of the proposed scalable architecture has been used, and 12 different modes of DCT computations including zonal coding, multi-block processing, and parallel-sequential stage modes are supported to reduce power consumptions, required hardware resources, and computation time with a small quality degradation. Detailed trade-offs of power, throughput, and quality are investigated, and used as a criterion for self-reconfiguration to meet the requirements set by the users.
Show less - Date Issued
- 2010
- Identifier
- CFE0003262, ucf:48522
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0003262
- Title
- Synchronous Communication System for SAW Sensors Interrogation.
- Creator
-
Troshin, Maxim, Malocha, Donald, Jones, W, Gong, Xun, University of Central Florida
- Abstract / Description
-
During past two decades a variety of SAW based wireless sensors were invented and research is still in progress. As different frequencies, varied bandwidths, coding techniques and constantly changing post processing algorithms are being implemented, there is a constant need for a universal and adjustable synchronous communication system able to interrogate new generations of SAW sensors. This thesis presents the design of a multiple FPGA based communication system with an operational...
Show moreDuring past two decades a variety of SAW based wireless sensors were invented and research is still in progress. As different frequencies, varied bandwidths, coding techniques and constantly changing post processing algorithms are being implemented, there is a constant need for a universal and adjustable synchronous communication system able to interrogate new generations of SAW sensors. This thesis presents the design of a multiple FPGA based communication system with an operational frequency range of 450MHz-2.2GHz capable of producing user programmed modulated signal. The synchronous receiver is designed to have interchangeable chip, replacement of which would allow adjustment of the receiver's bandwidth. Within this paper the performance of the system is only evaluated at 915MHz centered 20MHz bandwidth region. An OFC temperature sensor was interrogated. Post-processing algorithms, measurement results, and proposals for the future use of the system are presented. Detailed overview of the structure and performance of every functional block along with design considerations are analyzed. Previously designed Matlab based software was adapted for post processing of the received signal. New software with simplified GUI was designed for programming of the desired signal.
Show less - Date Issued
- 2012
- Identifier
- CFE0004270, ucf:49543
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004270
- Title
- DIGITAL PULSE WIDTH MODULATOR TECHNIQUES FOR DC - DC CONVERTERS.
- Creator
-
Batarseh, Majd, Batarseh, Issa, University of Central Florida
- Abstract / Description
-
Recent research activities focused on improving the steady-state as well as the dynamic behavior of DC - DC converters for proper system performance, by proposing different design methods and control approaches with growing tendency to using digital implementation over analog practices. Because of the rapid advancement in semiconductors and microprocessor industry, digital control grew in popularity among PWM converters and is taking over analog techniques due to availability of fast speed...
Show moreRecent research activities focused on improving the steady-state as well as the dynamic behavior of DC - DC converters for proper system performance, by proposing different design methods and control approaches with growing tendency to using digital implementation over analog practices. Because of the rapid advancement in semiconductors and microprocessor industry, digital control grew in popularity among PWM converters and is taking over analog techniques due to availability of fast speed microprocessors, flexibility and immunity to noise and environmental variations. Furthermore, increased interest in Field Programmable Gate Arrays (FPGA) makes it a convenient design platform for digitally controlled converters. The objective of this research is to propose new digital control schemes, aiming to improve the steady-state and transient responses of a high switching frequency FPGA-based digitally controlled DC-DC converters. The target is to achieve enhanced performance in terms of tight regulation with minimum power consumption and high efficiency at steady-state, as well as shorter settling time with optimal over- and undershoots during transients. The main task is to develop new and innovative digital PWM techniques in order to achieve: 1. Tight regulation at steady-state: by proposing high resolution DPWM architecture,based on Digital Clock Management (DCM) resources available on FPGA boards. The proposed architecture Window-Masked Segmented Digital Clock Manager-FPGA based Digital Pulse Width Modulator Technique, is designed to achieve high resolution operating at high switching frequencies with minimum power consumption. 2. Enhanced dynamic response: by applying a shift to the basic saw-tooth DPWM signal, in order to benefit from the best linearity and simplest architecture offered by the conventional counter-comparator DPWM. This proposed control scheme will help the compensator reach the steady-state value faster. Dynamically Shifted Ramp Digital Control Technique for Improved Transient Response in DC-DC Converters, is projected to enhance the transient response by dynamically controlling the ramp signal of the DPWM unit.
Show less - Date Issued
- 2010
- Identifier
- CFE0003055, ucf:48314
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0003055
- Title
- Probabilistic-Based Computing Transformation with Reconfigurable Logic Fabrics.
- Creator
-
Alawad, Mohammed, Lin, Mingjie, DeMara, Ronald, Mikhael, Wasfy, Wang, Jun, Das, Tuhin, University of Central Florida
- Abstract / Description
-
Effectively tackling the upcoming (")zettabytes(") data explosion requires a huge quantum leapin our computing power and energy efficiency. However, with the Moore's law dwindlingquickly, the physical limits of CMOS technology make it almost intractable to achieve highenergy efficiency if the traditional (")deterministic and precise(") computing model still dominates.Worse, the upcoming data explosion mostly comprises statistics gleaned from uncertain,imperfect real-world environment. As such...
Show moreEffectively tackling the upcoming (")zettabytes(") data explosion requires a huge quantum leapin our computing power and energy efficiency. However, with the Moore's law dwindlingquickly, the physical limits of CMOS technology make it almost intractable to achieve highenergy efficiency if the traditional (")deterministic and precise(") computing model still dominates.Worse, the upcoming data explosion mostly comprises statistics gleaned from uncertain,imperfect real-world environment. As such, the traditional computing means of first-principlemodeling or explicit statistical modeling will very likely be ineffective to achieveflexibility, autonomy, and human interaction. The bottom line is clear: given where we areheaded, the fundamental principle of modern computing(-)deterministic logic circuits canflawlessly emulate propositional logic deduction governed by Boolean algebra(-)has to bereexamined, and transformative changes in the foundation of modern computing must bemade.This dissertation presents a novel stochastic-based computing methodology. It efficientlyrealizes the algorithmatic computing through the proposed concept of Probabilistic DomainTransform (PDT). The essence of PDT approach is to encode the input signal asthe probability density function, perform stochastic computing operations on the signal inthe probabilistic domain, and decode the output signal by estimating the probability densityfunction of the resulting random samples. The proposed methodology possesses manynotable advantages. Specifically, it uses much simplified circuit units to conduct complexoperations, which leads to highly area- and energy-efficient designs suitable for parallel processing.Moreover, it is highly fault-tolerant because the information to be processed isencoded with a large ensemble of random samples. As such, the local perturbations of itscomputing accuracy will be dissipated globally, thus becoming inconsequential to the final overall results. Finally, the proposed probabilistic-based computing can facilitate buildingscalable precision systems, which provides an elegant way to trade-off between computingaccuracy and computing performance/hardware efficiency for many real-world applications.To validate the effectiveness of the proposed PDT methodology, two important signal processingapplications, discrete convolution and 2-D FIR filtering, are first implemented andbenchmarked against other deterministic-based circuit implementations. Furthermore, alarge-scale Convolutional Neural Network (CNN), a fundamental algorithmic building blockin many computer vision and artificial intelligence applications that follow the deep learningprinciple, is also implemented with FPGA based on a novel stochastic-based and scalablehardware architecture and circuit design. The key idea is to implement all key componentsof a deep learning CNN, including multi-dimensional convolution, activation, and poolinglayers, completely in the probabilistic computing domain. The proposed architecture notonly achieves the advantages of stochastic-based computation, but can also solve severalchallenges in conventional CNN, such as complexity, parallelism, and memory storage.Overall, being highly scalable and energy efficient, the proposed PDT-based architecture iswell-suited for a modular vision engine with the goal of performing real-time detection, recognitionand segmentation of mega-pixel images, especially those perception-based computingtasks that are inherently fault-tolerant.
Show less - Date Issued
- 2016
- Identifier
- CFE0006828, ucf:51768
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006828
- Title
- Heterogeneous Reconfigurable Fabrics for In-circuit Training and Evaluation of Neuromorphic Architectures.
- Creator
-
Mohammadizand, Ramtin, DeMara, Ronald, Lin, Mingjie, Sundaram, Kalpathy, Fan, Deliang, Wu, Annie, University of Central Florida
- Abstract / Description
-
A heterogeneous device technology reconfigurable logic fabric is proposed which leverages the cooperating advantages of distinct magnetic random access memory (MRAM)-based look-up tables (LUTs) to realize sequential logic circuits, along with conventional SRAM-based LUTs to realize combinational logic paths. The resulting Hybrid Spin/Charge FPGA (HSC-FPGA) using magnetic tunnel junction (MTJ) devices within this topology demonstrates commensurate reductions in area and power consumption over...
Show moreA heterogeneous device technology reconfigurable logic fabric is proposed which leverages the cooperating advantages of distinct magnetic random access memory (MRAM)-based look-up tables (LUTs) to realize sequential logic circuits, along with conventional SRAM-based LUTs to realize combinational logic paths. The resulting Hybrid Spin/Charge FPGA (HSC-FPGA) using magnetic tunnel junction (MTJ) devices within this topology demonstrates commensurate reductions in area and power consumption over fabrics having LUTs constructed with either individual technology alone. Herein, a hierarchical top-down design approach is used to develop the HSC(&)#173; FPGA starting from the configurable logic block (CLB) and slice structures down to LUT circuits and the corresponding device fabrication paradigms. This facilitates a novel architectural approach to reduce leakage energy, minimize communication occurrence and energy cost by eliminating unnecessary data transfer, and support auto-tuning for resilience. Furthermore, HSC-FPGA enables new advantages of technology co-design which trades off alternative mappings between emerging devices and transistors at runtime by allowing dynamic remapping to adaptively leverage the intrinsic computing features of each device technology. HSC-FPGA offers a platform for fine-grained Logic-In-Memory architectures and runtime adaptive hardware.An orthogonal dimension of fabric heterogeneity is also non-determinism enabled by either low(&)#173; voltage CMOS or probabilistic emerging devices. It can be realized using probabilistic devices within a reconfigurable network to blend deterministic and probabilistic computational models. Herein, consider the probabilistic spin logic p-bit device as a fabric element comprising a crossbar(&)#173; structured weighted array. The programmability of the resistive network interconnecting p-bit devices can be achieved by modifying the resistive states of the array's weighted connections. Thus, the programmable weighted array forms a CLB-scale macro co-processing element with bitstream programmability. This allows field programmability for a wide range of classification problems and recognition tasks to allow fluid mappings of probabilistic and deterministic computing approaches. In particular, a Deep Belief Network (DBN) is implemented in the field using recurrent layers of co-processing elements to form an n(&)#215; m1(&)#215;m2(&)#215;...(&)#215;mi weighted array as a configurable hardware circuit with an n-input layer followed by i?1 hidden layers. As neuromorphic architectures using post-CMOS devices increase in capability and network size, the utility and benefits of reconfigurable fabrics of neuromorphic modules can be anticipated to continue to accelerate.
Show less - Date Issued
- 2019
- Identifier
- CFE0007502, ucf:52643
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0007502
- Title
- AN ADAPTIVE MODULAR REDUNDANCY TECHNIQUE TO SELF-REGULATE AVAILABILITY, AREA, AND ENERGY CONSUMPTION IN MISSION-CRITICAL APPLICATIONS.
- Creator
-
Al-Haddad, Rawad, DeMara, Ronald, University of Central Florida
- Abstract / Description
-
As reconfigurable devices' capacities and the complexity of applications that use them increase, the need for self-reliance of deployed systems becomes increasingly prominent. A Sustainable Modular Adaptive Redundancy Technique (SMART) composed of a dual-layered organic system is proposed, analyzed, implemented, and experimentally evaluated. SMART relies upon a variety of self-regulating properties to control availability, energy consumption, and area used, in dynamically-changing...
Show moreAs reconfigurable devices' capacities and the complexity of applications that use them increase, the need for self-reliance of deployed systems becomes increasingly prominent. A Sustainable Modular Adaptive Redundancy Technique (SMART) composed of a dual-layered organic system is proposed, analyzed, implemented, and experimentally evaluated. SMART relies upon a variety of self-regulating properties to control availability, energy consumption, and area used, in dynamically-changing environments that require high degree of adaptation. The hardware layer is implemented on a Xilinx Virtex-4 Field Programmable Gate Array (FPGA) to provide self-repair using a novel approach called a Reconfigurable Adaptive Redundancy System (RARS). The software layer supervises the organic activities within the FPGA and extends the self-healing capabilities through application-independent, intrinsic, evolutionary repair techniques to leverage the benefits of dynamic Partial Reconfiguration (PR). A SMART prototype is evaluated using a Sobel edge detection application. This prototype is shown to provide sustainability for stressful occurrences of transient and permanent fault injection procedures while still reducing energy consumption and area requirements. An Organic Genetic Algorithm (OGA) technique is shown capable of consistently repairing hard faults while maintaining correct edge detector outputs, by exploiting spatial redundancy in the reconfigurable hardware. A Monte Carlo driven Continuous Markov Time Chains (CTMC) simulation is conducted to compare SMART's availability to industry-standard Triple Modular Technique (TMR) techniques. Based on nine use cases, parameterized with realistic fault and repair rates acquired from publically available sources, the results indicate that availability is significantly enhanced by the adoption of fast repair techniques targeting aging-related hard-faults. Under harsh environments, SMART is shown to improve system availability from 36.02% with lengthy repair techniques to 98.84% with fast ones. This value increases to "five nines" (99.9998%) under relatively more favorable conditions. Lastly, SMART is compared to twenty eight standard TMR benchmarks that are generated by the widely-accepted BL-TMR tools. Results show that in seven out of nine use cases, SMART is the recommended technique, with power savings ranging from 22% to 29%, and area savings ranging from 17% to 24%, while still maintaining the same level of availability.
Show less - Date Issued
- 2011
- Identifier
- CFE0003993, ucf:48660
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0003993
- Title
- Adaptive Architectural Strategies for Resilient Energy-Aware Computing.
- Creator
-
Ashraf, Rizwan, DeMara, Ronald, Lin, Mingjie, Wang, Jun, Jha, Sumit, Johnson, Mark, University of Central Florida
- Abstract / Description
-
Reconfigurable logic or Field-Programmable Gate Array (FPGA) devices have the ability to dynamically adapt the computational circuit based on user-specified or operating-condition requirements. Such hardware platforms are utilized in this dissertation to develop adaptive techniques for achieving reliable and sustainable operation while autonomously meeting these requirements. In particular, the properties of resource uniformity and in-field reconfiguration via on-chip processors are exploited...
Show moreReconfigurable logic or Field-Programmable Gate Array (FPGA) devices have the ability to dynamically adapt the computational circuit based on user-specified or operating-condition requirements. Such hardware platforms are utilized in this dissertation to develop adaptive techniques for achieving reliable and sustainable operation while autonomously meeting these requirements. In particular, the properties of resource uniformity and in-field reconfiguration via on-chip processors are exploited to implement Evolvable Hardware (EHW). EHW utilize genetic algorithms to realize logic circuits at runtime, as directed by the objective function. However, the size of problems solved using EHW as compared with traditional approaches has been limited to relatively compact circuits. This is due to the increase in complexity of the genetic algorithm with increase in circuit size. To address this research challenge of scalability, the Netlist-Driven Evolutionary Refurbishment (NDER) technique was designed and implemented herein to enable on-the-fly permanent fault mitigation in FPGA circuits. NDER has been shown to achieve refurbishment of relatively large sized benchmark circuits as compared to related works. Additionally, Design Diversity (DD) techniques which are used to aid such evolutionary refurbishment techniques are also proposed and the efficacy of various DD techniques is quantified and evaluated.Similarly, there exists a growing need for adaptable logic datapaths in custom-designed nanometer-scale ICs, for ensuring operational reliability in the presence of Process, Voltage, and Temperature (PVT) and, transistor-aging variations owing to decreased feature sizes for electronic devices. Without such adaptability, excessive design guardbands are required to maintain the desired integration and performance levels. To address these challenges, the circuit-level technique of Self-Recovery Enabled Logic (SREL) was designed herein. At design-time, vulnerable portions of the circuit identified using conventional Electronic Design Automation tools are replicated to provide post-fabrication adaptability via intelligent techniques. In-situ timing sensors are utilized in a feedback loop to activate suitable datapaths based on current conditions that optimize performance and energy consumption. Primarily, SREL is able to mitigate the timing degradations caused due to transistor aging effects in sub-micron devices by reducing the stress induced on active elements by utilizing power-gating. As a result, fewer guardbands need to be included to achieve comparable performance levels which leads to considerable energy savings over the operational lifetime.The need for energy-efficient operation in current computing systems has given rise to Near-Threshold Computing as opposed to the conventional approach of operating devices at nominal voltage. In particular, the goal of exascale computing initiative in High Performance Computing (HPC) is to achieve 1 EFLOPS under the power budget of 20MW. However, it comes at the cost of increased reliability concerns, such as the increase in performance variations and soft errors. This has given rise to increased resiliency requirements for HPC applications in terms of ensuring functionality within given error thresholds while operating at lower voltages. My dissertation research devised techniques and tools to quantify the effects of radiation-induced transient faults in distributed applications on large-scale systems. A combination of compiler-level code transformation and instrumentation are employed for runtime monitoring to assess the speed and depth of application state corruption as a result of fault injection. Finally, fault propagation models are derived for each HPC application that can be used to estimate the number of corrupted memory locations at runtime. Additionally, the tradeoffs between performance and vulnerability and the causal relations between compiler optimization and application vulnerability are investigated.
Show less - Date Issued
- 2015
- Identifier
- CFE0006206, ucf:52889
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006206