View All Items
- Title
- VCLUSTER: A PORTABLE VIRTUAL COMPUTING LIBRARY FOR CLUSTER COMPUTING.
- Creator
-
Zhang, Hua, Guha, Ratan, University of Central Florida
- Abstract / Description
-
Message passing has been the dominant parallel programming model in cluster computing, and libraries like Message Passing Interface (MPI) and Portable Virtual Machine (PVM) have proven their novelty and efficiency through numerous applications in diverse areas. However, as clusters of Symmetric Multi-Processor (SMP) and heterogeneous machines become popular, conventional message passing models must be adapted accordingly to support this new kind of clusters efficiently. In addition, Java...
Show moreMessage passing has been the dominant parallel programming model in cluster computing, and libraries like Message Passing Interface (MPI) and Portable Virtual Machine (PVM) have proven their novelty and efficiency through numerous applications in diverse areas. However, as clusters of Symmetric Multi-Processor (SMP) and heterogeneous machines become popular, conventional message passing models must be adapted accordingly to support this new kind of clusters efficiently. In addition, Java programming language, with its features like object oriented architecture, platform independent bytecode, and native support for multithreading, makes it an alternative language for cluster computing. This research presents a new parallel programming model and a library called VCluster that implements this model on top of a Java Virtual Machine (JVM). The programming model is based on virtual migrating threads to support clusters of heterogeneous SMP machines efficiently. VCluster is implemented in 100% Java, utilizing the portability of Java to address the problems of heterogeneous machines. VCluster virtualizes computational and communication resources such as threads, computation states, and communication channels across multiple separate JVMs, which makes a mobile thread possible. Equipped with virtual migrating thread, it is feasible to balance the load of computing resources dynamically. Several large scale parallel applications have been developed using VCluster to compare the performance and usage of VCluster with other libraries. The results of the experiments show that VCluster makes it easier to develop multithreading parallel applications compared to conventional libraries like MPI. At the same time, the performance of VCluster is comparable to MPICH, a widely used MPI library, combined with popular threading libraries like POSIX Thread and OpenMP. In the next phase of our work, we implemented thread group and thread migration to demonstrate the feasibility of dynamic load balancing in VCluster. We carried out experiments to show that the load can be dynamically balanced in VCluster, resulting in a better performance. Thread group also makes it possible to implement collective communication functions between threads, which have been proved to be useful in process based libraries.
Show less - Date Issued
- 2008
- Identifier
- CFE0002339, ucf:47809
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0002339
- Title
- IMPROVING PERFORMANCE AND PROGRAMMER PRODUCTIVITY FOR I/O-INTENSIVE HIGH PERFORMANCE COMPUTING APPLICATIONS.
- Creator
-
Sehrish, Saba, Wang, Jun, University of Central Florida
- Abstract / Description
-
Due to the explosive growth in the size of scientific data sets, data-intensive computing is an emerging trend in computational science. HPC applications are generating and processing large amount of data ranging from terabytes (TB) to petabytes (PB). This new trend of growth in data for HPC applications has imposed challenges as to what is an appropriate parallel programming framework to efficiently process large data sets. In this work, we study the applicability of two programming models ...
Show moreDue to the explosive growth in the size of scientific data sets, data-intensive computing is an emerging trend in computational science. HPC applications are generating and processing large amount of data ranging from terabytes (TB) to petabytes (PB). This new trend of growth in data for HPC applications has imposed challenges as to what is an appropriate parallel programming framework to efficiently process large data sets. In this work, we study the applicability of two programming models (MPI/MPI-IO and MapReduce) to a variety of I/O-intensive HPC applications ranging from simulations to analytics. We identify several performance and programmer productivity related limitations of these existing programming models, if used for I/O-intensive applications. We propose new frameworks which will improve both performance and programmer productivity for the emerging I/O-intensive applications. Message Passing Interface (MPI) is widely used for writing HPC applications. MPI/MPI- IO allows a fine-grained control of assigning data and task distribution. At the programming frameworks level, various optimizations have been proposed to improve the performance of MPI/MPI-IO function calls. These performance optimizations are provided as various function options to the programmers. In order to write an efficient code, they are required to know the exact usage of the optimization functions, hence programmer productivity is limited. We propose an abstraction called Reduced Function Set Abstraction (RFSA) for MPI-IO to reduce the number of I/O functions and provide methods to automate the selection of appropriate I/O function for writing HPC simulation applications. The purpose of RFSA is to hide the performance optimization functions from the application developer, and relieve the application developer from deciding on a specific function. The proposed set of functions relies on a selection algorithm to decide among the most common optimizations provided by MPI-IO. Additionally, many application scientists are looking to integrate data-intensive computing into computational-intensive High Performance Computing facilities, particularly for data analytics. We have observed several scientific applications which must migrate their data from an HPC storage system to a data-intensive one. There is a gap between the data semantics of HPC storage and data-intensive system, hence, once migrated, the data must be further refined and reorganized. This reorganization must be performed before existing data-intensive tools such as MapReduce can be effectively used to analyze data. This reorganization requires at least two complete scans through the data set and then at least one MapReduce program to prepare the data before analyzing it. Running multiple MapReduce phases causes significant overhead for the application, in the form of excessive I/O operations. For every MapReduce application that must be run in order to complete the desired data analysis, a distributed read and write operation on the file system must be performed. Our contribution is to extend Map-Reduce to eliminate the multiple scans and also reduce the number of pre-processing MapReduce programs. We have added additional expressiveness to the MapReduce language in our novel framework called MapReduce with Access Patterns (MRAP), which allows users to specify the logical semantics of their data such that 1) the data can be analyzed without running multiple data pre-processing MapReduce programs, and 2) the data can be simultaneously reorganized as it is migrated to the data-intensive file system. We also provide a scheduling mechanism to further improve the performance of these applications. The main contributions of this thesis are, 1) We implement a selection algorithm for I/O functions like read/write, merge a set of functions for data types and file views and optimize the atomicity function by automating the locking mechanism in RFSA. By running different parallel I/O benchmarks on both medium-scale clusters and NERSC supercomputers, we show an improved programmer productivity (35.7% on average). This approach incurs an overhead of 2-5% for one particular optimization, and shows performance improvement of 17% when a combination of different optimizations is required by an application. 2) We provide an augmented Map-Reduce system (MRAP), which consist of an API and corresponding optimizations i.e. data restructuring and scheduling. We have demonstrated up to 33% throughput improvement in one real application (read-mapping in bioinformatics), and up to 70% in an I/O kernel of another application (halo catalogs analytics). Our scheduling scheme shows performance improvement of 18% for an I/O kernel of another application (QCD analytics).
Show less - Date Issued
- 2010
- Identifier
- CFE0003236, ucf:48560
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0003236
- Title
- AN EXPLORATION OF CHRONIC PAIN EXPERIENCE, COPING, AND THE NEO FIVE FACTORS IN HIGH FUNCTIONING ADULTS.
- Creator
-
Stalter, Juliana, Mottarella, Karen, University of Central Florida
- Abstract / Description
-
Chronic pain affects nearly 48 million Americans (Haggard, Stowell, Bernstein, & Gatchel, 2008). Established guidelines for pain management encourage the use of personality assessment in chronic pain evaluation (Karlin, Creech, Grimes, Clark, Meagher, & Morey, 2005). In relation to the Big Five personality factors, low Openness relates negatively to treatment success, (Hopwood, Creech, Clark, Meagher, & Morey, 2008), and elevated Neuroticism scores also correlate with increased pain levels...
Show moreChronic pain affects nearly 48 million Americans (Haggard, Stowell, Bernstein, & Gatchel, 2008). Established guidelines for pain management encourage the use of personality assessment in chronic pain evaluation (Karlin, Creech, Grimes, Clark, Meagher, & Morey, 2005). In relation to the Big Five personality factors, low Openness relates negatively to treatment success, (Hopwood, Creech, Clark, Meagher, & Morey, 2008), and elevated Neuroticism scores also correlate with increased pain levels among individuals in hospital or rehab settings (Ashgari & Nicholas, 2006; Nitch & Boon, 2004). In contrast to these prior studies, this study identifies correlates in a relatively high-functioning population (college students) to further elucidate the connection between chronic pain and personality. This study compares scores on the NEO-FFI (Costa & McCrae, 1992), the West Haven-Yale Multidimensional Pain Inventory (WHYMPI, Kerns, Turk, & Rudy, 1985), and the Pain Catastrophizing Scale (PCS, American Academy of Orthopaedic Manual Physical Therapists, 2010). Significant correlations were found between Neuroticism, Extraversion, and Agreeableness and select subscales of both the WHYMPI and the PCS. A linear regression of scores showed that Neuroticism was very strongly related to WHYMPI scores. In fact, the WHYMPI scores accounted for 67.9% of variance in Neuroticism. Scores on the WHYMPI also correlated with PCS scores. Helplessness and Overall scores significantly correlated to Life Control and certain positive social support scores. The findings of this study emphasize the need for pain clinicians to incorporate psychological assessments, especially concerning Neuroticism, into their evaluations of chronic pain patients.
Show less - Date Issued
- 2011
- Identifier
- CFH0004121, ucf:44863
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFH0004121
- Title
- Techniques for automated parameter estimation in computational models of probabilistic systems.
- Creator
-
Hussain, Faraz, Jha, Sumit, Leavens, Gary, Turgut, Damla, Uddin, Nizam, University of Central Florida
- Abstract / Description
-
The main contribution of this dissertation is the design of two new algorithms for automatically synthesizing values of numerical parameters of computational models of complexstochastic systems such that the resultant model meets user-specified behavioral specifications.These algorithms are designed to operate on probabilistic systems (-) systems that, in general,behave differently under identical conditions. The algorithms work using an approach thatcombines formal verification and...
Show moreThe main contribution of this dissertation is the design of two new algorithms for automatically synthesizing values of numerical parameters of computational models of complexstochastic systems such that the resultant model meets user-specified behavioral specifications.These algorithms are designed to operate on probabilistic systems (-) systems that, in general,behave differently under identical conditions. The algorithms work using an approach thatcombines formal verification and mathematical optimization to explore a model's parameterspace.The problem of determining whether a model instantiated with a given set of parametervalues satisfies the desired specification is first defined using formal verification terminology,and then reformulated in terms of statistical hypothesis testing. Parameter space explorationinvolves determining the outcome of the hypothesis testing query for each parameter pointand is guided using simulated annealing. The first algorithm uses the sequential probabilityratio test (SPRT) to solve the hypothesis testing problems, whereas the second algorithmuses an approach based on Bayesian statistical model checking (BSMC).The SPRT-based parameter synthesis algorithm was used to validate that a given model ofglucose-insulin metabolism has the capability of representing diabetic behavior by synthesizingvalues of three parameters that ensure that the glucose-insulin subsystem spends at least 20minutes in a diabetic scenario. The BSMC-based algorithm was used to discover the valuesof parameters in a physiological model of the acute inflammatory response that guarantee aset of desired clinical outcomes.These two applications demonstrate how our algorithms use formal verification, statisticalhypothesis testing and mathematical optimization to automatically synthesize parameters ofcomplex probabilistic models in order to meet user-specified behavioral properties
Show less - Date Issued
- 2016
- Identifier
- CFE0006117, ucf:51200
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006117
- Title
- Adaptive Architectural Strategies for Resilient Energy-Aware Computing.
- Creator
-
Ashraf, Rizwan, DeMara, Ronald, Lin, Mingjie, Wang, Jun, Jha, Sumit, Johnson, Mark, University of Central Florida
- Abstract / Description
-
Reconfigurable logic or Field-Programmable Gate Array (FPGA) devices have the ability to dynamically adapt the computational circuit based on user-specified or operating-condition requirements. Such hardware platforms are utilized in this dissertation to develop adaptive techniques for achieving reliable and sustainable operation while autonomously meeting these requirements. In particular, the properties of resource uniformity and in-field reconfiguration via on-chip processors are exploited...
Show moreReconfigurable logic or Field-Programmable Gate Array (FPGA) devices have the ability to dynamically adapt the computational circuit based on user-specified or operating-condition requirements. Such hardware platforms are utilized in this dissertation to develop adaptive techniques for achieving reliable and sustainable operation while autonomously meeting these requirements. In particular, the properties of resource uniformity and in-field reconfiguration via on-chip processors are exploited to implement Evolvable Hardware (EHW). EHW utilize genetic algorithms to realize logic circuits at runtime, as directed by the objective function. However, the size of problems solved using EHW as compared with traditional approaches has been limited to relatively compact circuits. This is due to the increase in complexity of the genetic algorithm with increase in circuit size. To address this research challenge of scalability, the Netlist-Driven Evolutionary Refurbishment (NDER) technique was designed and implemented herein to enable on-the-fly permanent fault mitigation in FPGA circuits. NDER has been shown to achieve refurbishment of relatively large sized benchmark circuits as compared to related works. Additionally, Design Diversity (DD) techniques which are used to aid such evolutionary refurbishment techniques are also proposed and the efficacy of various DD techniques is quantified and evaluated.Similarly, there exists a growing need for adaptable logic datapaths in custom-designed nanometer-scale ICs, for ensuring operational reliability in the presence of Process, Voltage, and Temperature (PVT) and, transistor-aging variations owing to decreased feature sizes for electronic devices. Without such adaptability, excessive design guardbands are required to maintain the desired integration and performance levels. To address these challenges, the circuit-level technique of Self-Recovery Enabled Logic (SREL) was designed herein. At design-time, vulnerable portions of the circuit identified using conventional Electronic Design Automation tools are replicated to provide post-fabrication adaptability via intelligent techniques. In-situ timing sensors are utilized in a feedback loop to activate suitable datapaths based on current conditions that optimize performance and energy consumption. Primarily, SREL is able to mitigate the timing degradations caused due to transistor aging effects in sub-micron devices by reducing the stress induced on active elements by utilizing power-gating. As a result, fewer guardbands need to be included to achieve comparable performance levels which leads to considerable energy savings over the operational lifetime.The need for energy-efficient operation in current computing systems has given rise to Near-Threshold Computing as opposed to the conventional approach of operating devices at nominal voltage. In particular, the goal of exascale computing initiative in High Performance Computing (HPC) is to achieve 1 EFLOPS under the power budget of 20MW. However, it comes at the cost of increased reliability concerns, such as the increase in performance variations and soft errors. This has given rise to increased resiliency requirements for HPC applications in terms of ensuring functionality within given error thresholds while operating at lower voltages. My dissertation research devised techniques and tools to quantify the effects of radiation-induced transient faults in distributed applications on large-scale systems. A combination of compiler-level code transformation and instrumentation are employed for runtime monitoring to assess the speed and depth of application state corruption as a result of fault injection. Finally, fault propagation models are derived for each HPC application that can be used to estimate the number of corrupted memory locations at runtime. Additionally, the tradeoffs between performance and vulnerability and the causal relations between compiler optimization and application vulnerability are investigated.
Show less - Date Issued
- 2015
- Identifier
- CFE0006206, ucf:52889
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006206