Current Search: computer architecture (x)
View All Items
- Title
- ARCHITECTURAL SUPPORT FOR IMPROVING SYSTEMHARDWARE/SOFTWARE RELIABILITY.
- Creator
-
Dimitrov, Martin, Zhou, Huiyang, University of Central Florida
- Abstract / Description
-
It is a great challenge to build reliable computer systems with unreliable hardware and buggy software. On one hand, software bugs account for as much as 40% of system failures and incur high cost, an estimate of $59.5B a year, on the US economy. On the other hand, under the current trends of technology scaling, transient faults (also known as soft errors) in the underlying hardware are predicted to grow at least in proportion to the number of devices being integrated, which further...
Show moreIt is a great challenge to build reliable computer systems with unreliable hardware and buggy software. On one hand, software bugs account for as much as 40% of system failures and incur high cost, an estimate of $59.5B a year, on the US economy. On the other hand, under the current trends of technology scaling, transient faults (also known as soft errors) in the underlying hardware are predicted to grow at least in proportion to the number of devices being integrated, which further exacerbates the problem of system reliability. We propose several methods to improve system reliability both in terms of detecting and correcting soft-errors as well as facilitating software debugging. In our first approach, we detect instruction-level anomalies during program execution. The anomalies can be used to detect and repair soft-errors, or can be reported to the programmer to aid software debugging. In our second approach, we improve anomaly detection for software debugging by detecting different types of anomalies as well as by removing false-positives. While the anomalies reported by our first two methods are helpful in debugging single-threaded programs, they do not address concurrency bugs in multi-threaded programs. In our third approach, we propose a new debugging primitive which exposes the non-deterministic behavior of parallel programs and facilitates the debugging process. Our idea is to generate a time-ordered trace of events such as function calls/returns and memory accesses in different threads. In our experience, exposing the time-ordered event information to the programmer is highly beneficial for reasoning about the root causes of concurrency bugs.
Show less - Date Issued
- 2010
- Identifier
- CFE0002975, ucf:47941
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0002975
- Title
- IMPROVING BRANCH PREDICTION ACCURACY VIA EFFECTIVE SOURCE INFORMATION AND PREDICTION ALGORITHMS.
- Creator
-
GAO, HONGLIANG, ZHOU, HUIYANG, University of Central Florida
- Abstract / Description
-
Modern superscalar processors rely on branch predictors to sustain a high instruction fetch throughput. Given the trend of deep pipelines and large instruction windows, a branch misprediction will incur a large performance penalty and result in a significant amount of energy wasted by the instructions along wrong paths. With their critical role in high performance processors, there has been extensive research on branch predictors to improve the prediction accuracy. Conceptually a dynamic...
Show moreModern superscalar processors rely on branch predictors to sustain a high instruction fetch throughput. Given the trend of deep pipelines and large instruction windows, a branch misprediction will incur a large performance penalty and result in a significant amount of energy wasted by the instructions along wrong paths. With their critical role in high performance processors, there has been extensive research on branch predictors to improve the prediction accuracy. Conceptually a dynamic branch prediction scheme includes three major components: a source, an information processor, and a predictor. Traditional works mainly focus on the algorithm for the predictor. In this dissertation, besides novel prediction algorithms, we investigate other components and develop untraditional ways to improve the prediction accuracy. First, we propose an adaptive information processing method to dynamically extract the most effective inputs to maximize the correlation to be exploited by the predictor. Second, we propose a new prediction algorithm, which improves the Prediction by Partial Matching (PPM) algorithm by selectively combining multiple partial matches. The PPM algorithm was previously considered optimal and has been used to derive the upper limit of branch prediction accuracy. Our proposed algorithm achieves higher prediction accuracy than PPM and can be implemented in realistic hardware budget. Third, we discover a new locality existing between the address of producer loads and the outcomes of their consumer branches. We study this address-branch correlation in detail and propose a branch predictor to explore this correlation for long-latency and hard-to-predict branches, which existing branch predictors fail to predict accurately.
Show less - Date Issued
- 2008
- Identifier
- CFE0002283, ucf:47877
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0002283
- Title
- Harmony Oriented Architecture.
- Creator
-
Martin, Kyle, Hua, Kien, Wu, Annie, Heinrich, Mark, University of Central Florida
- Abstract / Description
-
This thesis presents Harmony Oriented Architecture: a novel architectural paradigm that applies the principles of Harmony Oriented Programming to the architecture of scalable and evolvable distributed systems. It is motivated by research on Ultra Large Scale systems that has revealed inherent limitations in human ability to design large-scale software systems that can only be overcome through radical alternatives to traditional object-oriented software engineering practice that simplifies the...
Show moreThis thesis presents Harmony Oriented Architecture: a novel architectural paradigm that applies the principles of Harmony Oriented Programming to the architecture of scalable and evolvable distributed systems. It is motivated by research on Ultra Large Scale systems that has revealed inherent limitations in human ability to design large-scale software systems that can only be overcome through radical alternatives to traditional object-oriented software engineering practice that simplifies the construction of highly scalable and evolvable system.HOP eschews encapsulation and information hiding, the core principles of object- oriented design, in favor of exposure and information sharing through a spatial abstraction. This helps to avoid the brittle interface dependencies that impede the evolution of object-oriented software. HOA extends these concepts to distributed systems resulting in an architecture in which application components are represented by objects in a spatial database and executed in strict isolation using an embedded application server. Application components store their state entirely in the database and interact solely by diffusing data into a space for proximate components to observe. This architecture provides a high degree of decoupling, isolation, and state exposure allowing highly scalable and evolvable applications to be built.A proof-of-concept prototype of a non-distributed HOA middleware platform supporting JavaScript application components is implemented and evaluated. Results show remarkably good performance considering that little effort was made to optimize the implementation.
Show less - Date Issued
- 2011
- Identifier
- CFE0004480, ucf:49298
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004480
- Title
- A COMMON COMPONENT-BASED SOFTWARE ARCHITECTURE FOR MILITARY AND COMMERCIAL PC-BASED VIRTUAL SIMULATION.
- Creator
-
Lewis, Joshua, Proctor, Michael, University of Central Florida
- Abstract / Description
-
Commercially available military-themed virtual simulations have been developed and sold for entertainment since the beginning of the personal computing era. There exists an intense interest by various branches of the military to leverage the technological advances of the personal computing and video game industries to provide low cost military training. By nature of the content of the commercial military-themed virtual simulations, a large overlap has grown between the interests, resources,...
Show moreCommercially available military-themed virtual simulations have been developed and sold for entertainment since the beginning of the personal computing era. There exists an intense interest by various branches of the military to leverage the technological advances of the personal computing and video game industries to provide low cost military training. By nature of the content of the commercial military-themed virtual simulations, a large overlap has grown between the interests, resources, standards, and technology of the computer entertainment industry and military training branches. This research attempts to identify these commonalities with the purpose of systematically designing and evaluating a common component-based software architecture that could be used to implement a framework for developing content for both commercial and military virtual simulation software applications.
Show less - Date Issued
- 2006
- Identifier
- CFE0001177, ucf:46868
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0001177
- Title
- Automated Synthesis of Memristor Crossbar Networks.
- Creator
-
Chakraborty, Dwaipayan, Jha, Sumit Kumar, Leavens, Gary, Ewetz, Rickard, Valliyil Thankachan, Sharma, Xu, Mengyu, University of Central Florida
- Abstract / Description
-
The advancement of semiconductor device technology over the past decades has enabled the design of increasingly complex electrical and computational machines. Electronic design automation (EDA) has played a significant role in the design and implementation of transistor-based machines. However, as transistors move closer toward their physical limits, the speed-up provided by Moore's law will grind to a halt. Once again, we find ourselves on the verge of a paradigm shift in the computational...
Show moreThe advancement of semiconductor device technology over the past decades has enabled the design of increasingly complex electrical and computational machines. Electronic design automation (EDA) has played a significant role in the design and implementation of transistor-based machines. However, as transistors move closer toward their physical limits, the speed-up provided by Moore's law will grind to a halt. Once again, we find ourselves on the verge of a paradigm shift in the computational sciences as newer devices pave the way for novel approaches to computing. One of such devices is the memristor -- a resistor with non-volatile memory.Memristors can be used as junctional switches in crossbar circuits, which comprise of intersecting sets of vertical and horizontal nanowires. The major contribution of this dissertation lies in automating the design of such crossbar circuits -- doing a new kind of EDA for a new kind of computational machinery. In general, this dissertation attempts to answer the following questions:a. How can we synthesize crossbars for computing large Boolean formulas, up to 128-bit?b. How can we synthesize more compact crossbars for small Boolean formulas, up to 8-bit?c. For a given loop-free C program doing integer arithmetic, is it possible to synthesize an equivalent crossbar circuit?We have presented novel solutions to each of the above problems. Our new, proposed solutions resolve a number of significant bottlenecks in existing research, via the usage of innovative logic representation and artificial intelligence techniques. For large Boolean formulas (up to 128-bit), we have utilized Reduced Ordered Binary Decision Diagrams (ROBDDs) to automatically synthesize linearly growing crossbar circuits that compute them. This cutting edge approach towards flow-based computing has yielded state-of-the-art results. It is worth noting that this approach is scalable to n-bit Boolean formulas. We have made significant original contributions by leveraging artificial intelligence for automatic synthesis of compact crossbar circuits. This inventive method has been expanded to encompass crossbar networks with 1D1M (1-diode-1-memristor) switches, as well. The resultant circuits satisfy the tight constraints of the Feynman Grand Prize challenge and are able to perform 8-bit binary addition. A leading edge development for end-to-end computation with flow-based crossbars has been implemented, which involves methodical translation of loop-free C programs into crossbar circuits via automated synthesis. The original contributions described in this dissertation reflect the substantial progress we have made in the area of electronic design automation for synthesis of memristor crossbar networks.
Show less - Date Issued
- 2019
- Identifier
- CFE0007609, ucf:52528
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0007609
- Title
- RESOURCE-CONSTRAINT AND SCALABLE DATA DISTRIBUTION MANAGEMENT FOR HIGH LEVEL ARCHITECTURE.
- Creator
-
Gupta, Pankaj, Guha, Ratan, University of Central Florida
- Abstract / Description
-
In this dissertation, we present an efficient algorithm, called P-Pruning algorithm, for data distribution management problem in High Level Architecture. High Level Architecture (HLA) presents a framework for modeling and simulation within the Department of Defense (DoD) and forms the basis of IEEE 1516 standard. The goal of this architecture is to interoperate multiple simulations and facilitate the reuse of simulation components. Data Distribution Management (DDM) is one of the six...
Show moreIn this dissertation, we present an efficient algorithm, called P-Pruning algorithm, for data distribution management problem in High Level Architecture. High Level Architecture (HLA) presents a framework for modeling and simulation within the Department of Defense (DoD) and forms the basis of IEEE 1516 standard. The goal of this architecture is to interoperate multiple simulations and facilitate the reuse of simulation components. Data Distribution Management (DDM) is one of the six components in HLA that is responsible for limiting and controlling the data exchanged in a simulation and reducing the processing requirements of federates. DDM is also an important problem in the parallel and distributed computing domain, especially in large-scale distributed modeling and simulation applications, where control on data exchange among the simulated entities is required. We present a performance-evaluation simulation study of the P-Pruning algorithm against three techniques: region-matching, fixed-grid, and dynamic-grid DDM algorithms. The P-Pruning algorithm is faster than region-matching, fixed-grid, and dynamic-grid DDM algorithms as it avoid the quadratic computation step involved in other algorithms. The simulation results show that the P-Pruning DDM algorithm uses memory at run-time more efficiently and requires less number of multicast groups as compared to the three algorithms. To increase the scalability of P-Pruning algorithm, we develop a resource-efficient enhancement for the P-Pruning algorithm. We also present a performance evaluation study of this resource-efficient algorithm in a memory-constraint environment. The Memory-Constraint P-Pruning algorithm deploys I/O efficient data-structures for optimized memory access at run-time. The simulation results show that the Memory-Constraint P-Pruning DDM algorithm is faster than the P-Pruning algorithm and utilizes memory at run-time more efficiently. It is suitable for high performance distributed simulation applications as it improves the scalability of the P-Pruning algorithm by several order in terms of number of federates. We analyze the computation complexity of the P-Pruning algorithm using average-case analysis. We have also extended the P-Pruning algorithm to three-dimensional routing space. In addition, we present the P-Pruning algorithm for dynamic conditions where the distribution of federated is changing at run-time. The dynamic P-Pruning algorithm investigates the changes among federates regions and rebuilds all the affected multicast groups. We have also integrated the P-Pruning algorithm with FDK, an implementation of the HLA architecture. The integration involves the design and implementation of the communicator module for mapping federate interest regions. We provide a modular overview of P-Pruning algorithm components and describe the functional flow for creating multicast groups during simulation. We investigate the deficiencies in DDM implementation under FDK and suggest an approach to overcome them using P-Pruning algorithm. We have enhanced FDK from its existing HLA 1.3 specification by using IEEE 1516 standard for DDM implementation. We provide the system setup instructions and communication routines for running the integrated on a network of machines. We also describe implementation details involved in integration of P-Pruning algorithm with FDK and provide results of our experiences.
Show less - Date Issued
- 2007
- Identifier
- CFE0001949, ucf:47447
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0001949
- Title
- Normally-Off Computing Design Methodology Using Spintronics: from Devices to Architectures.
- Creator
-
Roohi, Arman, DeMara, Ronald, Abdolvand, Reza, Wang, Jun, Fan, Deliang, Del Barco, Enrique, University of Central Florida
- Abstract / Description
-
Energy-harvesting-powered computing offers intriguing and vast opportunities to dramatically transform the landscape of Internet of Things (IoT) devices and wireless sensor networks by utilizing ambient sources of light, thermal, kinetic, and electromagnetic energy to achieve battery-free computing. In order to operate within the restricted energy capacity and intermittency profile of battery-free operation, it is proposed to innovate Elastic Intermittent Computation (EIC) as a new duty-cycle...
Show moreEnergy-harvesting-powered computing offers intriguing and vast opportunities to dramatically transform the landscape of Internet of Things (IoT) devices and wireless sensor networks by utilizing ambient sources of light, thermal, kinetic, and electromagnetic energy to achieve battery-free computing. In order to operate within the restricted energy capacity and intermittency profile of battery-free operation, it is proposed to innovate Elastic Intermittent Computation (EIC) as a new duty-cycle-variable computing approach leveraging the non-volatility inherent in post-CMOS switching devices. The foundations of EIC will be advanced from the ground up by extending Spin Hall Effect Magnetic Tunnel Junction (SHE-MTJ) device models to realize SHE-MTJ-based Majority Gate (MG) and Polymorphic Gate (PG) logic approaches and libraries, that leverage intrinsic-non-volatility to realize middleware-coherent, intermittent computation without checkpointing, micro-tasking, or software bloat and energy overheads vital to IoT. Device-level EIC research concentrates on encapsulating SHE-MTJ behavior with a compact model to leverage the non-volatility of the device for intrinsic provision of intermittent computation and lifetime energy reduction. Based on this model, the circuit-level EIC contributions will entail the design, simulation, and analysis of PG-based spintronic logic which is adaptable at the gate-level to support variable duty cycle execution that is robust to brief and extended supply outages or unscheduled dropouts, and development of spin-based research synthesis and optimization routines compatible with existing commercial toolchains. These tools will be employed to design a hybrid post-CMOS processing unit utilizing pipelining and power-gating through state-holding properties within the datapath itself, thus eliminating checkpointing and data transfer operations.
Show less - Date Issued
- 2019
- Identifier
- CFE0007526, ucf:52619
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0007526
- Title
- Simulation, Analysis, and Optimization of Heterogeneous CPU-GPU Systems.
- Creator
-
Giles, Christopher, Heinrich, Mark, Ewetz, Rickard, Lin, Mingjie, Pattanaik, Sumanta, Flitsiyan, Elena, University of Central Florida
- Abstract / Description
-
With the computing industry's recent adoption of the Heterogeneous System Architecture (HSA) standard, we have seen a rapid change in heterogeneous CPU-GPU processor designs. State-of-the-art heterogeneous CPU-GPU processors tightly integrate multicore CPUs and multi-compute unit GPUs together on a single die. This brings the MIMD processing capabilities of the CPU and the SIMD processing capabilities of the GPU together into a single cohesive package with new HSA features comprising better...
Show moreWith the computing industry's recent adoption of the Heterogeneous System Architecture (HSA) standard, we have seen a rapid change in heterogeneous CPU-GPU processor designs. State-of-the-art heterogeneous CPU-GPU processors tightly integrate multicore CPUs and multi-compute unit GPUs together on a single die. This brings the MIMD processing capabilities of the CPU and the SIMD processing capabilities of the GPU together into a single cohesive package with new HSA features comprising better programmability, coherency between the CPU and GPU, shared Last Level Cache (LLC), and shared virtual memory address spaces. These advancements can potentially bring marked gains in heterogeneous processor performance and have piqued the interest of researchers who wish to unlock these potential performance gains. Therefore, in this dissertation I explore the heterogeneous CPU-GPU processor and application design space with the goal of answering interesting research questions, such as, (1) what are the architectural design trade-offs in heterogeneous CPU-GPU processors and (2) how do we best maximize heterogeneous CPU-GPU application performance on a given system. To enable my exploration of the heterogeneous CPU-GPU design space, I introduce a novel discrete event-driven simulation library called KnightSim and a novel computer architectural simulator called M2S-CGM. M2S-CGM includes all of the simulation elements necessary to simulate coherent execution between a CPU and GPU with shared LLC and shared virtual memory address spaces. I then utilize M2S-CGM for the conduct of three architectural studies. First, I study the architectural effects of shared LLC and CPU-GPU coherence on the overall performance of non-collaborative GPU-only applications. Second, I profile and analyze a set of collaborative CPU-GPU applications to determine how to best optimize them for maximum collaborative performance. Third, I study the impact of varying four key architectural parameters on collaborative CPU-GPU performance by varying GPU compute unit coalesce size, GPU to memory controller bandwidth, GPU frequency, and system wide switching fabric latency.
Show less - Date Issued
- 2019
- Identifier
- CFE0007807, ucf:52346
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0007807
- Title
- EPISODIC MEMORY MODEL FOR EMBODIED CONVERSATIONAL AGENTS.
- Creator
-
Elvir, Miguel, Gonzalez, Avelino, University of Central Florida
- Abstract / Description
-
Embodied Conversational Agents (ECA) form part of a range of virtual characters whose intended purpose include engaging in natural conversations with human users. While works in literature are ripe with descriptions of attempts at producing viable ECA architectures, few authors have addressed the role of episodic memory models in conversational agents. This form of memory, which provides a sense of autobiographic record-keeping in humans, has only recently been peripherally integrated into...
Show moreEmbodied Conversational Agents (ECA) form part of a range of virtual characters whose intended purpose include engaging in natural conversations with human users. While works in literature are ripe with descriptions of attempts at producing viable ECA architectures, few authors have addressed the role of episodic memory models in conversational agents. This form of memory, which provides a sense of autobiographic record-keeping in humans, has only recently been peripherally integrated into dialog management tools for ECAs. In our work, we propose to take a closer look at the shared characteristics of episodic memory models in recent examples from the field. Additionally, we propose several enhancements to these existing models through a unified episodic memory model for ECAÃÂ's. As part of our research into episodic memory models, we present a process for determining the prevalent contexts in the conversations obtained from the aforementioned interactions. The process presented demonstrates the use of statistical and machine learning services, as well as Natural Language Processing techniques to extract relevant snippets from conversations. Finally, mechanisms to store, retrieve, and recall episodes from previous conversations are discussed. A primary contribution of this research is in the context of contemporary memory models for conversational agents and cognitive architectures. To the best of our knowledge, this is the first attempt at providing a comparative summary of existing works. As implementations of ECAs become more complex and encompass more realistic conversation engines, we expect that episodic memory models will continue to evolve and further enhance the naturalness of conversations.
Show less - Date Issued
- 2010
- Identifier
- CFE0003353, ucf:48443
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0003353
- Title
- Adaptive Architectural Strategies for Resilient Energy-Aware Computing.
- Creator
-
Ashraf, Rizwan, DeMara, Ronald, Lin, Mingjie, Wang, Jun, Jha, Sumit, Johnson, Mark, University of Central Florida
- Abstract / Description
-
Reconfigurable logic or Field-Programmable Gate Array (FPGA) devices have the ability to dynamically adapt the computational circuit based on user-specified or operating-condition requirements. Such hardware platforms are utilized in this dissertation to develop adaptive techniques for achieving reliable and sustainable operation while autonomously meeting these requirements. In particular, the properties of resource uniformity and in-field reconfiguration via on-chip processors are exploited...
Show moreReconfigurable logic or Field-Programmable Gate Array (FPGA) devices have the ability to dynamically adapt the computational circuit based on user-specified or operating-condition requirements. Such hardware platforms are utilized in this dissertation to develop adaptive techniques for achieving reliable and sustainable operation while autonomously meeting these requirements. In particular, the properties of resource uniformity and in-field reconfiguration via on-chip processors are exploited to implement Evolvable Hardware (EHW). EHW utilize genetic algorithms to realize logic circuits at runtime, as directed by the objective function. However, the size of problems solved using EHW as compared with traditional approaches has been limited to relatively compact circuits. This is due to the increase in complexity of the genetic algorithm with increase in circuit size. To address this research challenge of scalability, the Netlist-Driven Evolutionary Refurbishment (NDER) technique was designed and implemented herein to enable on-the-fly permanent fault mitigation in FPGA circuits. NDER has been shown to achieve refurbishment of relatively large sized benchmark circuits as compared to related works. Additionally, Design Diversity (DD) techniques which are used to aid such evolutionary refurbishment techniques are also proposed and the efficacy of various DD techniques is quantified and evaluated.Similarly, there exists a growing need for adaptable logic datapaths in custom-designed nanometer-scale ICs, for ensuring operational reliability in the presence of Process, Voltage, and Temperature (PVT) and, transistor-aging variations owing to decreased feature sizes for electronic devices. Without such adaptability, excessive design guardbands are required to maintain the desired integration and performance levels. To address these challenges, the circuit-level technique of Self-Recovery Enabled Logic (SREL) was designed herein. At design-time, vulnerable portions of the circuit identified using conventional Electronic Design Automation tools are replicated to provide post-fabrication adaptability via intelligent techniques. In-situ timing sensors are utilized in a feedback loop to activate suitable datapaths based on current conditions that optimize performance and energy consumption. Primarily, SREL is able to mitigate the timing degradations caused due to transistor aging effects in sub-micron devices by reducing the stress induced on active elements by utilizing power-gating. As a result, fewer guardbands need to be included to achieve comparable performance levels which leads to considerable energy savings over the operational lifetime.The need for energy-efficient operation in current computing systems has given rise to Near-Threshold Computing as opposed to the conventional approach of operating devices at nominal voltage. In particular, the goal of exascale computing initiative in High Performance Computing (HPC) is to achieve 1 EFLOPS under the power budget of 20MW. However, it comes at the cost of increased reliability concerns, such as the increase in performance variations and soft errors. This has given rise to increased resiliency requirements for HPC applications in terms of ensuring functionality within given error thresholds while operating at lower voltages. My dissertation research devised techniques and tools to quantify the effects of radiation-induced transient faults in distributed applications on large-scale systems. A combination of compiler-level code transformation and instrumentation are employed for runtime monitoring to assess the speed and depth of application state corruption as a result of fault injection. Finally, fault propagation models are derived for each HPC application that can be used to estimate the number of corrupted memory locations at runtime. Additionally, the tradeoffs between performance and vulnerability and the causal relations between compiler optimization and application vulnerability are investigated.
Show less - Date Issued
- 2015
- Identifier
- CFE0006206, ucf:52889
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006206