Current Search: High-Performance Computing (x)
View All Items
- Title
- Research on High-performance and Scalable Data Access in Parallel Big Data Computing.
- Creator
-
Yin, Jiangling, Wang, Jun, Jin, Yier, Lin, Mingjie, Qi, GuoJun, Wang, Chung-Ching, University of Central Florida
- Abstract / Description
-
To facilitate big data processing, many dedicated data-intensive storage systems such as Google File System(GFS), Hadoop Distributed File System(HDFS) and Quantcast File System(QFS) have been developed. Currently, the Hadoop Distributed File System(HDFS) [20] is the state-of-art and most popular open-source distributed file system for big data processing. It is widely deployed as the bedrock for many big data processing systems/frameworks, such as the script-based pig system, MPI-based...
Show moreTo facilitate big data processing, many dedicated data-intensive storage systems such as Google File System(GFS), Hadoop Distributed File System(HDFS) and Quantcast File System(QFS) have been developed. Currently, the Hadoop Distributed File System(HDFS) [20] is the state-of-art and most popular open-source distributed file system for big data processing. It is widely deployed as the bedrock for many big data processing systems/frameworks, such as the script-based pig system, MPI-based parallel programs, graph processing systems and scala/java-based Spark frameworks. These systems/applications employ parallel processes/executors to speed up data processing within scale-out clusters.Job or task schedulers in parallel big data applications such as mpiBLAST and ParaView can maximize the usage of computing resources such as memory and CPU by tracking resource consumption/availability for task assignment. However, since these schedulers do not take the distributed I/O resources and global data distribution into consideration, the data requests from parallel processes/executors in big data processing will unfortunately be served in an imbalanced fashion on the distributed storage servers. These imbalanced access patterns among storage nodes are caused because a). unlike conventional parallel file system using striping policies to evenly distribute data among storage nodes, data-intensive file systems such as HDFS store each data unit, referred to as chunk or block file, with several copies based on a relative random policy, which can result in an uneven data distribution among storage nodes; b). based on the data retrieval policy in HDFS, the more data a storage node contains, the higher the probability that the storage node could be selected to serve the data. Therefore, on the nodes serving multiple chunk files, the data requests from different processes/executors will compete for shared resources such as hard disk head and network bandwidth. Because of this, the makespan of the entire program could be significantly prolonged and the overall I/O performance will degrade.The first part of my dissertation seeks to address aspects of these problems by creating an I/O middleware system and designing matching-based algorithms to optimize data access in parallel big data processing. To address the problem of remote data movement, we develop an I/O middleware system, called SLAM, which allows MPI-based analysis and visualization programs to benefit from locality read, i.e, each MPI process can access its required data from a local or nearby storage node. This can greatly improve the execution performance by reducing the amount of data movement over network. Furthermore, to address the problem of imbalanced data access, we propose a method called Opass, which models the data read requests that are issued by parallel applications to cluster nodes as a graph data structure where edges weights encode the demands of load capacity. We then employ matching-based algorithms to map processes to data to achieve data access in a balanced fashion. The final part of my dissertation focuses on optimizing sub-dataset analyses in parallel big data processing. Our proposed methods can benefit different analysis applications with various computational requirements and the experiments on different cluster testbeds show their applicability and scalability.
Show less - Date Issued
- 2015
- Identifier
- CFE0006021, ucf:51008
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006021
- Title
- Approximate Binary Decision Diagrams for High-Performance Computing.
- Creator
-
Sivakumar, Anagha, Jha, Sumit Kumar, Leavens, Gary, Valliyil Thankachan, Sharma, University of Central Florida
- Abstract / Description
-
Many soft applications such as machine learning and probabilistic computational modeling can benefit from approximate but high-performance implementations. In this thesis, we study how Binary decision diagrams (BDDs) can be used to synthesize approximate high-performance implementations from high-level specifications such as program kernels written in a C-like language. We demonstrate the potential of our approach by designing nanoscale crossbars from such approximate Boolean decision...
Show moreMany soft applications such as machine learning and probabilistic computational modeling can benefit from approximate but high-performance implementations. In this thesis, we study how Binary decision diagrams (BDDs) can be used to synthesize approximate high-performance implementations from high-level specifications such as program kernels written in a C-like language. We demonstrate the potential of our approach by designing nanoscale crossbars from such approximate Boolean decision diagrams. Our work may be useful in designing massively-parallel approximate crossbar computing systems for application-specific domains such as probabilistic computational modeling.
Show less - Date Issued
- 2018
- Identifier
- CFE0007414, ucf:52704
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0007414
- Title
- IMPROVING PERFORMANCE AND PROGRAMMER PRODUCTIVITY FOR I/O-INTENSIVE HIGH PERFORMANCE COMPUTING APPLICATIONS.
- Creator
-
Sehrish, Saba, Wang, Jun, University of Central Florida
- Abstract / Description
-
Due to the explosive growth in the size of scientific data sets, data-intensive computing is an emerging trend in computational science. HPC applications are generating and processing large amount of data ranging from terabytes (TB) to petabytes (PB). This new trend of growth in data for HPC applications has imposed challenges as to what is an appropriate parallel programming framework to efficiently process large data sets. In this work, we study the applicability of two programming models ...
Show moreDue to the explosive growth in the size of scientific data sets, data-intensive computing is an emerging trend in computational science. HPC applications are generating and processing large amount of data ranging from terabytes (TB) to petabytes (PB). This new trend of growth in data for HPC applications has imposed challenges as to what is an appropriate parallel programming framework to efficiently process large data sets. In this work, we study the applicability of two programming models (MPI/MPI-IO and MapReduce) to a variety of I/O-intensive HPC applications ranging from simulations to analytics. We identify several performance and programmer productivity related limitations of these existing programming models, if used for I/O-intensive applications. We propose new frameworks which will improve both performance and programmer productivity for the emerging I/O-intensive applications. Message Passing Interface (MPI) is widely used for writing HPC applications. MPI/MPI- IO allows a fine-grained control of assigning data and task distribution. At the programming frameworks level, various optimizations have been proposed to improve the performance of MPI/MPI-IO function calls. These performance optimizations are provided as various function options to the programmers. In order to write an efficient code, they are required to know the exact usage of the optimization functions, hence programmer productivity is limited. We propose an abstraction called Reduced Function Set Abstraction (RFSA) for MPI-IO to reduce the number of I/O functions and provide methods to automate the selection of appropriate I/O function for writing HPC simulation applications. The purpose of RFSA is to hide the performance optimization functions from the application developer, and relieve the application developer from deciding on a specific function. The proposed set of functions relies on a selection algorithm to decide among the most common optimizations provided by MPI-IO. Additionally, many application scientists are looking to integrate data-intensive computing into computational-intensive High Performance Computing facilities, particularly for data analytics. We have observed several scientific applications which must migrate their data from an HPC storage system to a data-intensive one. There is a gap between the data semantics of HPC storage and data-intensive system, hence, once migrated, the data must be further refined and reorganized. This reorganization must be performed before existing data-intensive tools such as MapReduce can be effectively used to analyze data. This reorganization requires at least two complete scans through the data set and then at least one MapReduce program to prepare the data before analyzing it. Running multiple MapReduce phases causes significant overhead for the application, in the form of excessive I/O operations. For every MapReduce application that must be run in order to complete the desired data analysis, a distributed read and write operation on the file system must be performed. Our contribution is to extend Map-Reduce to eliminate the multiple scans and also reduce the number of pre-processing MapReduce programs. We have added additional expressiveness to the MapReduce language in our novel framework called MapReduce with Access Patterns (MRAP), which allows users to specify the logical semantics of their data such that 1) the data can be analyzed without running multiple data pre-processing MapReduce programs, and 2) the data can be simultaneously reorganized as it is migrated to the data-intensive file system. We also provide a scheduling mechanism to further improve the performance of these applications. The main contributions of this thesis are, 1) We implement a selection algorithm for I/O functions like read/write, merge a set of functions for data types and file views and optimize the atomicity function by automating the locking mechanism in RFSA. By running different parallel I/O benchmarks on both medium-scale clusters and NERSC supercomputers, we show an improved programmer productivity (35.7% on average). This approach incurs an overhead of 2-5% for one particular optimization, and shows performance improvement of 17% when a combination of different optimizations is required by an application. 2) We provide an augmented Map-Reduce system (MRAP), which consist of an API and corresponding optimizations i.e. data restructuring and scheduling. We have demonstrated up to 33% throughput improvement in one real application (read-mapping in bioinformatics), and up to 70% in an I/O kernel of another application (halo catalogs analytics). Our scheduling scheme shows performance improvement of 18% for an I/O kernel of another application (QCD analytics).
Show less - Date Issued
- 2010
- Identifier
- CFE0003236, ucf:48560
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0003236
- Title
- HIGH PERFORMANCE DATA MINING TECHNIQUES FOR INTRUSION DETECTION.
- Creator
-
Siddiqui, Muazzam Ahmed, Lee, Joohan, University of Central Florida
- Abstract / Description
-
The rapid growth of computers transformed the way in which information and data was stored. With this new paradigm of data access, comes the threat of this information being exposed to unauthorized and unintended users. Many systems have been developed which scrutinize the data for a deviation from the normal behavior of a user or system, or search for a known signature within the data. These systems are termed as Intrusion Detection Systems (IDS). These systems employ different techniques...
Show moreThe rapid growth of computers transformed the way in which information and data was stored. With this new paradigm of data access, comes the threat of this information being exposed to unauthorized and unintended users. Many systems have been developed which scrutinize the data for a deviation from the normal behavior of a user or system, or search for a known signature within the data. These systems are termed as Intrusion Detection Systems (IDS). These systems employ different techniques varying from statistical methods to machine learning algorithms.Intrusion detection systems use audit data generated by operating systems, application softwares or network devices. These sources produce huge amount of datasets with tens of millions of records in them. To analyze this data, data mining is used which is a process to dig useful patterns from a large bulk of information. A major obstacle in the process is that the traditional data mining and learning algorithms are overwhelmed by the bulk volume and complexity of available data. This makes these algorithms impractical for time critical tasks like intrusion detection because of the large execution time.Our approach towards this issue makes use of high performance data mining techniques to expedite the process by exploiting the parallelism in the existing data mining algorithms and the underlying hardware. We will show that how high performance and parallel computing can be used to scale the data mining algorithms to handle large datasets, allowing the data mining component to search a much larger set of patterns and models than traditional computational platforms and algorithms would allow.We develop parallel data mining algorithms by parallelizing existing machine learning techniques using cluster computing. These algorithms include parallel backpropagation and parallel fuzzy ARTMAP neural networks. We evaluate the performances of the developed models in terms of speedup over traditional algorithms, prediction rate and false alarm rate. Our results showed that the traditional backpropagation and fuzzy ARTMAP algorithms can benefit from high performance computing techniques which make them well suited for time critical tasks like intrusion detection.
Show less - Date Issued
- 2004
- Identifier
- CFE0000056, ucf:46142
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0000056
- Title
- AN ARCHITECTURE FOR HIGH-PERFORMANCE PRIVACY-PRESERVING AND DISTRIBUTED DATA MINING.
- Creator
-
Secretan, James, Georgiopoulos, Michael, University of Central Florida
- Abstract / Description
-
This dissertation discusses the development of an architecture and associated techniques to support Privacy Preserving and Distributed Data Mining. The field of Distributed Data Mining (DDM) attempts to solve the challenges inherent in coordinating data mining tasks with databases that are geographically distributed, through the application of parallel algorithms and grid computing concepts. The closely related field of Privacy Preserving Data Mining (PPDM) adds the dimension of privacy to...
Show moreThis dissertation discusses the development of an architecture and associated techniques to support Privacy Preserving and Distributed Data Mining. The field of Distributed Data Mining (DDM) attempts to solve the challenges inherent in coordinating data mining tasks with databases that are geographically distributed, through the application of parallel algorithms and grid computing concepts. The closely related field of Privacy Preserving Data Mining (PPDM) adds the dimension of privacy to the problem, trying to find ways that organizations can collaborate to mine their databases collectively, while at the same time preserving the privacy of their records. Developing data mining algorithms for DDM and PPDM environments can be difficult and there is little software to support it. In addition, because these tasks can be computationally demanding, taking hours of even days to complete data mining tasks, organizations should be able to take advantage of high-performance and parallel computing to accelerate these tasks. Unfortunately there is no such framework that is able to provide all of these services easily for a developer. In this dissertation such a framework is developed to support the creation and execution of DDM and PPDM applications, called APHID (Architecture for Private, High-performance Integrated Data mining). The architecture allows users to flexibly and seamlessly integrate cluster and grid resources into their DDM and PPDM applications. The architecture is scalable, and is split into highly de-coupled services to ensure flexibility and extensibility. This dissertation first develops a comprehensive example algorithm, a privacy-preserving Probabilistic Neural Network (PNN), which serves a basis for analysis of the difficulties of DDM/PPDM development. The privacy-preserving PNN is the first such PNN in the literature, and provides not only a practical algorithm ready for use in privacy-preserving applications, but also a template for other data intensive algorithms, and a starting point for analyzing APHID's architectural needs. After analyzing the difficulties in the PNN algorithm's development, as well as the shortcomings of researched systems, this dissertation presents the first concrete programming model joining high performance computing resources with a privacy preserving data mining process. Unlike many of the existing PPDM development models, the platform of services is language independent, allowing layers and algorithms to be implemented in popular languages (Java, C++, Python, etc.). An implementation of a PPDM algorithm is developed in Java utilizing the new framework. Performance results are presented, showing that APHID can enable highly simplified PPDM development while speeding up resource intensive parts of the algorithm.
Show less - Date Issued
- 2009
- Identifier
- CFE0002853, ucf:48076
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0002853
- Title
- Network Partitioning in Distributed Agent-Based Models.
- Creator
-
Petkova, Antoniya, Deo, Narsingh, Hughes, Charles, Bassiouni, Mostafa, Shaykhian, Gholam, University of Central Florida
- Abstract / Description
-
Agent-Based Models (ABMs) are an emerging simulation paradigm for modeling complex systems, comprised of autonomous, possibly heterogeneous, interacting agents. The utility of ABMs lies in their ability to represent such complex systems as self-organizing networks of agents. Modeling and understanding the behavior of complex systems usually occurs at large and representative scales, and often obtaining and visualizing of simulation results in real-time is critical.The real-time requirement...
Show moreAgent-Based Models (ABMs) are an emerging simulation paradigm for modeling complex systems, comprised of autonomous, possibly heterogeneous, interacting agents. The utility of ABMs lies in their ability to represent such complex systems as self-organizing networks of agents. Modeling and understanding the behavior of complex systems usually occurs at large and representative scales, and often obtaining and visualizing of simulation results in real-time is critical.The real-time requirement necessitates the use of in-memory computing, as it is dif?cult and challenging to handle the latency and unpredictability of disk accesses. Combining this observation with the scale requirement emphasizes the need to use parallel and distributed computing platforms, such as MPI-enabled CPU clusters. Consequently, the agent population must be "partitioned" across different CPUs in a cluster. Further, the typically high volume of interactions among agents can quickly become a signi?cant bottleneck for real-time or large-scale simulations. The problem is exacerbated if the underlying ABM network is dynamic and the inter-process communication evolves over the course of the simulation. Therefore, it is critical to develop topology-aware partitioning mechanisms to support such large simulations.In this dissertation, we demonstrate that distributed agent-based model simulations bene?t from the use of graph partitioning algorithms that involve a local, neighborhood-based perspective. Such methods do not rely on global accesses to the network and thus are more scalable. In addition, we propose two partitioning schemes that consider the bottom-up individual-centric nature of agent-based modeling. The ?rst technique utilizes label-propagation community detection to partition the dynamic agent network of an ABM. We propose a latency-hiding, seamless integration of community detection in the dynamics of a distributed ABM. To achieve this integration, we exploit the similarity in the process flow patterns of a label-propagation community-detection algorithm and self-organizing ABMs.In the second partitioning scheme, we apply a combination of the Guided Local Search (GLS) and Fast Local Search (FLS) metaheuristics in the context of graph partitioning. The main driving principle of GLS is the dynamic modi?cation of the objective function to escape local optima. The algorithm augments the objective of a local search, thereby transforming the landscape structure and escaping a local optimum. FLS is a local search heuristic algorithm that is aimed at reducing the search space of the main search algorithm. It breaks down the space into sub-neighborhoods such that inactive sub-neighborhoods are removed from the search process. The combination of GLS and FLS allowed us to design a graph partitioning algorithm that is both scalable and sensitive to the inherent modularity of real-world networks.
Show less - Date Issued
- 2017
- Identifier
- CFE0006903, ucf:51706
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006903
- Title
- Techniques for automated parameter estimation in computational models of probabilistic systems.
- Creator
-
Hussain, Faraz, Jha, Sumit, Leavens, Gary, Turgut, Damla, Uddin, Nizam, University of Central Florida
- Abstract / Description
-
The main contribution of this dissertation is the design of two new algorithms for automatically synthesizing values of numerical parameters of computational models of complexstochastic systems such that the resultant model meets user-specified behavioral specifications.These algorithms are designed to operate on probabilistic systems (-) systems that, in general,behave differently under identical conditions. The algorithms work using an approach thatcombines formal verification and...
Show moreThe main contribution of this dissertation is the design of two new algorithms for automatically synthesizing values of numerical parameters of computational models of complexstochastic systems such that the resultant model meets user-specified behavioral specifications.These algorithms are designed to operate on probabilistic systems (-) systems that, in general,behave differently under identical conditions. The algorithms work using an approach thatcombines formal verification and mathematical optimization to explore a model's parameterspace.The problem of determining whether a model instantiated with a given set of parametervalues satisfies the desired specification is first defined using formal verification terminology,and then reformulated in terms of statistical hypothesis testing. Parameter space explorationinvolves determining the outcome of the hypothesis testing query for each parameter pointand is guided using simulated annealing. The first algorithm uses the sequential probabilityratio test (SPRT) to solve the hypothesis testing problems, whereas the second algorithmuses an approach based on Bayesian statistical model checking (BSMC).The SPRT-based parameter synthesis algorithm was used to validate that a given model ofglucose-insulin metabolism has the capability of representing diabetic behavior by synthesizingvalues of three parameters that ensure that the glucose-insulin subsystem spends at least 20minutes in a diabetic scenario. The BSMC-based algorithm was used to discover the valuesof parameters in a physiological model of the acute inflammatory response that guarantee aset of desired clinical outcomes.These two applications demonstrate how our algorithms use formal verification, statisticalhypothesis testing and mathematical optimization to automatically synthesize parameters ofcomplex probabilistic models in order to meet user-specified behavioral properties
Show less - Date Issued
- 2016
- Identifier
- CFE0006117, ucf:51200
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006117
- Title
- Adaptive Architectural Strategies for Resilient Energy-Aware Computing.
- Creator
-
Ashraf, Rizwan, DeMara, Ronald, Lin, Mingjie, Wang, Jun, Jha, Sumit, Johnson, Mark, University of Central Florida
- Abstract / Description
-
Reconfigurable logic or Field-Programmable Gate Array (FPGA) devices have the ability to dynamically adapt the computational circuit based on user-specified or operating-condition requirements. Such hardware platforms are utilized in this dissertation to develop adaptive techniques for achieving reliable and sustainable operation while autonomously meeting these requirements. In particular, the properties of resource uniformity and in-field reconfiguration via on-chip processors are exploited...
Show moreReconfigurable logic or Field-Programmable Gate Array (FPGA) devices have the ability to dynamically adapt the computational circuit based on user-specified or operating-condition requirements. Such hardware platforms are utilized in this dissertation to develop adaptive techniques for achieving reliable and sustainable operation while autonomously meeting these requirements. In particular, the properties of resource uniformity and in-field reconfiguration via on-chip processors are exploited to implement Evolvable Hardware (EHW). EHW utilize genetic algorithms to realize logic circuits at runtime, as directed by the objective function. However, the size of problems solved using EHW as compared with traditional approaches has been limited to relatively compact circuits. This is due to the increase in complexity of the genetic algorithm with increase in circuit size. To address this research challenge of scalability, the Netlist-Driven Evolutionary Refurbishment (NDER) technique was designed and implemented herein to enable on-the-fly permanent fault mitigation in FPGA circuits. NDER has been shown to achieve refurbishment of relatively large sized benchmark circuits as compared to related works. Additionally, Design Diversity (DD) techniques which are used to aid such evolutionary refurbishment techniques are also proposed and the efficacy of various DD techniques is quantified and evaluated.Similarly, there exists a growing need for adaptable logic datapaths in custom-designed nanometer-scale ICs, for ensuring operational reliability in the presence of Process, Voltage, and Temperature (PVT) and, transistor-aging variations owing to decreased feature sizes for electronic devices. Without such adaptability, excessive design guardbands are required to maintain the desired integration and performance levels. To address these challenges, the circuit-level technique of Self-Recovery Enabled Logic (SREL) was designed herein. At design-time, vulnerable portions of the circuit identified using conventional Electronic Design Automation tools are replicated to provide post-fabrication adaptability via intelligent techniques. In-situ timing sensors are utilized in a feedback loop to activate suitable datapaths based on current conditions that optimize performance and energy consumption. Primarily, SREL is able to mitigate the timing degradations caused due to transistor aging effects in sub-micron devices by reducing the stress induced on active elements by utilizing power-gating. As a result, fewer guardbands need to be included to achieve comparable performance levels which leads to considerable energy savings over the operational lifetime.The need for energy-efficient operation in current computing systems has given rise to Near-Threshold Computing as opposed to the conventional approach of operating devices at nominal voltage. In particular, the goal of exascale computing initiative in High Performance Computing (HPC) is to achieve 1 EFLOPS under the power budget of 20MW. However, it comes at the cost of increased reliability concerns, such as the increase in performance variations and soft errors. This has given rise to increased resiliency requirements for HPC applications in terms of ensuring functionality within given error thresholds while operating at lower voltages. My dissertation research devised techniques and tools to quantify the effects of radiation-induced transient faults in distributed applications on large-scale systems. A combination of compiler-level code transformation and instrumentation are employed for runtime monitoring to assess the speed and depth of application state corruption as a result of fault injection. Finally, fault propagation models are derived for each HPC application that can be used to estimate the number of corrupted memory locations at runtime. Additionally, the tradeoffs between performance and vulnerability and the causal relations between compiler optimization and application vulnerability are investigated.
Show less - Date Issued
- 2015
- Identifier
- CFE0006206, ucf:52889
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006206