Current Search: Wang, Jun (x)
Pages
-
-
Title
-
METADATA AND DATA MANAGEMENT IN HIGH PERFORMANCE FILE AND STORAGE SYSTEMS.
-
Creator
-
Gu, Peng, Wang, Jun, University of Central Florida
-
Abstract / Description
-
With the advent of emerging "e-Science" applications, today's scientific research increasingly relies on petascale-and-beyond computing over large data sets of the same magnitude. While the computational power of supercomputers has recently entered the era of petascale, the performance of their storage system is far lagged behind by many orders of magnitude. This places an imperative demand on revolutionizing their underlying I/O systems, on which the management of both metadata and data...
Show moreWith the advent of emerging "e-Science" applications, today's scientific research increasingly relies on petascale-and-beyond computing over large data sets of the same magnitude. While the computational power of supercomputers has recently entered the era of petascale, the performance of their storage system is far lagged behind by many orders of magnitude. This places an imperative demand on revolutionizing their underlying I/O systems, on which the management of both metadata and data is deemed to have significant performance implications. Prefetching/caching and data locality awareness optimizations, as conventional and effective management techniques for metadata and data I/O performance enhancement, still play their crucial roles in current parallel and distributed file systems. In this study, we examine the limitations of existing prefetching/caching techniques and explore the untapped potentials of data locality optimization techniques in the new era of petascale computing. For metadata I/O access, we propose a novel weighted-graph-based prefetching technique, built on both direct and indirect successor relationship, to reap performance benefit from prefetching specifically for clustered metadata serversan arrangement envisioned necessary for petabyte scale distributed storage systems. For data I/O access, we design and implement Segment-structured On-disk data Grouping and Prefetching (SOGP), a combined prefetching and data placement technique to boost the local data read performance for parallel file systems, especially for those applications with partially overlapped access patterns. One high-performance local I/O software package in SOGP work for Parallel Virtual File System in the number of about 2000 C lines was released to Argonne National Laboratory in 2007 for potential integration into the production mode.
Show less
-
Date Issued
-
2008
-
Identifier
-
CFE0002251, ucf:47826
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0002251
-
-
Title
-
IMPROVING PERFORMANCE AND PROGRAMMER PRODUCTIVITY FOR I/O-INTENSIVE HIGH PERFORMANCE COMPUTING APPLICATIONS.
-
Creator
-
Sehrish, Saba, Wang, Jun, University of Central Florida
-
Abstract / Description
-
Due to the explosive growth in the size of scientific data sets, data-intensive computing is an emerging trend in computational science. HPC applications are generating and processing large amount of data ranging from terabytes (TB) to petabytes (PB). This new trend of growth in data for HPC applications has imposed challenges as to what is an appropriate parallel programming framework to efficiently process large data sets. In this work, we study the applicability of two programming models ...
Show moreDue to the explosive growth in the size of scientific data sets, data-intensive computing is an emerging trend in computational science. HPC applications are generating and processing large amount of data ranging from terabytes (TB) to petabytes (PB). This new trend of growth in data for HPC applications has imposed challenges as to what is an appropriate parallel programming framework to efficiently process large data sets. In this work, we study the applicability of two programming models (MPI/MPI-IO and MapReduce) to a variety of I/O-intensive HPC applications ranging from simulations to analytics. We identify several performance and programmer productivity related limitations of these existing programming models, if used for I/O-intensive applications. We propose new frameworks which will improve both performance and programmer productivity for the emerging I/O-intensive applications. Message Passing Interface (MPI) is widely used for writing HPC applications. MPI/MPI- IO allows a fine-grained control of assigning data and task distribution. At the programming frameworks level, various optimizations have been proposed to improve the performance of MPI/MPI-IO function calls. These performance optimizations are provided as various function options to the programmers. In order to write an efficient code, they are required to know the exact usage of the optimization functions, hence programmer productivity is limited. We propose an abstraction called Reduced Function Set Abstraction (RFSA) for MPI-IO to reduce the number of I/O functions and provide methods to automate the selection of appropriate I/O function for writing HPC simulation applications. The purpose of RFSA is to hide the performance optimization functions from the application developer, and relieve the application developer from deciding on a specific function. The proposed set of functions relies on a selection algorithm to decide among the most common optimizations provided by MPI-IO. Additionally, many application scientists are looking to integrate data-intensive computing into computational-intensive High Performance Computing facilities, particularly for data analytics. We have observed several scientific applications which must migrate their data from an HPC storage system to a data-intensive one. There is a gap between the data semantics of HPC storage and data-intensive system, hence, once migrated, the data must be further refined and reorganized. This reorganization must be performed before existing data-intensive tools such as MapReduce can be effectively used to analyze data. This reorganization requires at least two complete scans through the data set and then at least one MapReduce program to prepare the data before analyzing it. Running multiple MapReduce phases causes significant overhead for the application, in the form of excessive I/O operations. For every MapReduce application that must be run in order to complete the desired data analysis, a distributed read and write operation on the file system must be performed. Our contribution is to extend Map-Reduce to eliminate the multiple scans and also reduce the number of pre-processing MapReduce programs. We have added additional expressiveness to the MapReduce language in our novel framework called MapReduce with Access Patterns (MRAP), which allows users to specify the logical semantics of their data such that 1) the data can be analyzed without running multiple data pre-processing MapReduce programs, and 2) the data can be simultaneously reorganized as it is migrated to the data-intensive file system. We also provide a scheduling mechanism to further improve the performance of these applications. The main contributions of this thesis are, 1) We implement a selection algorithm for I/O functions like read/write, merge a set of functions for data types and file views and optimize the atomicity function by automating the locking mechanism in RFSA. By running different parallel I/O benchmarks on both medium-scale clusters and NERSC supercomputers, we show an improved programmer productivity (35.7% on average). This approach incurs an overhead of 2-5% for one particular optimization, and shows performance improvement of 17% when a combination of different optimizations is required by an application. 2) We provide an augmented Map-Reduce system (MRAP), which consist of an API and corresponding optimizations i.e. data restructuring and scheduling. We have demonstrated up to 33% throughput improvement in one real application (read-mapping in bioinformatics), and up to 70% in an I/O kernel of another application (halo catalogs analytics). Our scheduling scheme shows performance improvement of 18% for an I/O kernel of another application (QCD analytics).
Show less
-
Date Issued
-
2010
-
Identifier
-
CFE0003236, ucf:48560
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0003236
-
-
Title
-
RESEARCHES ON REVERSE LOOKUP PROBLEM IN DISTRIBUTED FILE SYSTEM.
-
Creator
-
Zhang, Junyao, Wang, Jun, University of Central Florida
-
Abstract / Description
-
Recent years have witnessed an increasing demand for super data clusters. The super data clusters have reached the petabyte-scale can consist of thousands or tens of thousands storage nodes at a single site. For this architecture, reliability is becoming a great concern. In order to achieve a high reliability, data recovery and node reconstruction is a must. Although extensive research works have investigated how to sustain high performance and high reliability in case of node failures at...
Show moreRecent years have witnessed an increasing demand for super data clusters. The super data clusters have reached the petabyte-scale can consist of thousands or tens of thousands storage nodes at a single site. For this architecture, reliability is becoming a great concern. In order to achieve a high reliability, data recovery and node reconstruction is a must. Although extensive research works have investigated how to sustain high performance and high reliability in case of node failures at large scale, a reverse lookup problem, namely finding the objects list for the failed node remains open. This is especially true for storage systems with high requirement of data integrity and availability, such as scientific research data clusters and etc. Existing solutions are either time consuming or expensive. Meanwhile, replication based block placement can be used to realize fast reverse lookup. However, they are designed for centralized, small-scale storage architectures. In this thesis, we propose a fast and efficient reverse lookup scheme named Group-based Shifted Declustering (G-SD) layout that is able to locate the whole content of the failed node. G-SD extends our previous shifted declustering layout and applies to large-scale file systems. Our mathematical proofs and real-life experiments show that G-SD is a scalable reverse lookup scheme that is up to one order of magnitude faster than existing schemes.
Show less
-
Date Issued
-
2010
-
Identifier
-
CFE0003504, ucf:48970
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0003504
-
-
Title
-
CONCENTRIC LAYOUT, A NEW SCIENTIFIC DATA LAYOUT FOR MATRIX DATA SET IN HADOOP FILE SYSTEM.
-
Creator
-
cheng, lu, wang, jun, University of Central Florida
-
Abstract / Description
-
The data generated by scientific simulation, sensor, monitor or optical telescope has increased with dramatic speed. In order to analyze the raw data speed and space efficiently, data pre-process operation is needed to achieve better performance in data analysis phase. Current research shows an increasing tread of adopting MapReduce framework for large scale data processing. However, the data access patterns which generally applied to scientific data set are not supported by current MapReduce...
Show moreThe data generated by scientific simulation, sensor, monitor or optical telescope has increased with dramatic speed. In order to analyze the raw data speed and space efficiently, data pre-process operation is needed to achieve better performance in data analysis phase. Current research shows an increasing tread of adopting MapReduce framework for large scale data processing. However, the data access patterns which generally applied to scientific data set are not supported by current MapReduce framework directly. The gap between the requirement from analytics application and the property of MapReduce framework motivates us to provide support for these data access patterns in MapReduce framework. In our work, we studied the data access patterns in matrix files and proposed a new concentric data layout solution to facilitate matrix data access and analysis in MapReduce framework. Concentric data layout is a data layout which maintains the dimensional property in chunk level. Contrary to the continuous data layout which adopted in current Hadoop framework by default, concentric data layout stores the data from the same sub-matrix into one chunk. This matches well with the matrix operations like computation. The concentric data layout preprocesses the data beforehand, and optimizes the afterward run of MapReduce application. The experiments indicate that the concentric data layout improves the overall performance, reduces the execution time by 38% when the file size is 16 GB, also it relieves the data overhead phenomenon and increases the effective data retrieval rate by 32% on average.
Show less
-
Date Issued
-
2010
-
Identifier
-
CFE0003537, ucf:48955
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0003537
-
-
Title
-
RESEARCH IN HIGH PERFORMANCE AND LOW POWER COMPUTER SYSTEMS FOR DATA-INTENSIVE ENVIRONMENT.
-
Creator
-
Shang, pengju, Wang, Jun, University of Central Florida
-
Abstract / Description
-
The evolution of computer science and engineering is always motivated by the requirements for better performance, power efficiency, security, user interface (UI), etc. The first two factors are potential tradeoffs: better performance usually requires better hardware, e.g., the CPUs with larger number of transistors, the disks with higher rotation speed; however, the increasing number of transistors on the single die or chip reveals super-linear growth in CPU power consumption, and the change...
Show moreThe evolution of computer science and engineering is always motivated by the requirements for better performance, power efficiency, security, user interface (UI), etc. The first two factors are potential tradeoffs: better performance usually requires better hardware, e.g., the CPUs with larger number of transistors, the disks with higher rotation speed; however, the increasing number of transistors on the single die or chip reveals super-linear growth in CPU power consumption, and the change in disk rotation speed has a quadratic effect on disk power consumption. We propose three new systematic approaches, Transactional RAID, data-affinity-aware data placement DAFA and Modeless power management, to tackle the performance problem in Database systems, large scale clusters or cloud platforms, and the power management problem in Chip Multi Processors, respectively. The first design, Transactional RAID (TRAID), is motivated by the fact that in recent years, more storage system applications have employed transaction processing techniques to ensure data integrity and consistency. In transaction processing systems(TPS), log is a kind of redundancy to ensure transaction ACID (atomicity, consistency, isolation, durability) properties and data recoverability. Furthermore, high reliable storage systems, such as redundant array of inexpensive disks (RAID), are widely used as the underlying storage system for Databases to guarantee system reliability and availability with high I/O performance. However, the Databases and storage systems tend to implement their independent fault tolerant mechanisms from their own perspectives and thereby leading to potential high overhead. We observe the overlapped redundancies between the TPS and RAID systems, and propose a novel reliable storage architecture called Transactional RAID (TRAID). TRAID deduplicates this overlap by only logging one compact version (XOR results) of recovery references for the updating data. It minimizes the amount of log content as well as the log flushing overhead, thereby boosts the overall transaction processing performance. At the same time, TRAID guarantees comparable RAID reliability, the same recovery correctness and ACID semantics of traditional transactional processing systems. On the other hand, the emerging myriad data intensive applications place a demand for high-performance computing resources with massive storage. Academia and industry pioneers have been developing big data parallel computing frameworks and large-scale distributed file systems (DFS) widely used to facilitate the high-performance runs of data-intensive applications, such as bio-informatics, astronomy, and high-energy physics. Our recent work reported that data distribution in DFS can significantly affect the efficiency of data processing and hence the overall application performance. This is especially true for those with sophisticated access patterns. For example, Yahoo's Hadoop clusters employs a random data placement strategy for load balance and simplicity. This allows the MapReduce programs to access all the data (without or not distinguishing interest locality) at full parallelism. Our work focuses on Hadoop systems. We observed that the data distribution is one of the most important factors that affect the parallel programming performance. However, the default Hadoop adopts random data distribution strategy, which does not consider the data semantics, specifically, data affinity. We propose a Data-Affinity-Aware (DAFA) data placement scheme to address the above problem. DAFA builds a history data access graph to exploit the data affinity. According to the data affinity, DAFA re-organizes data to maximize the parallelism of the affinitive data, and also subjective to the overall load balance. This enables DAFA to realize the maximum number of map tasks with data-locality. Besides the system performance, power consumption is another important concern of current computer systems. In the U.S. alone, the energy used by servers which could be saved comes to 3.17 million tons of carbon dioxide, or 580,678 cars. However, the goals of high performance and low energy consumption are at odds with each other. An ideal power management strategy should be able to dynamically respond to the change (either linear or nonlinear, or non-model) of workloads and system configuration without violating the performance requirement. We propose a novel power management scheme called MAR (modeless, adaptive, rule-based) in multiprocessor systems to minimize the CPU power consumption under performance constraints. By using richer feedback factors, e.g. the I/O wait, MAR is able to accurately describe the relationships among core frequencies, performance and power consumption. We adopt a modeless control model to reduce the complexity of system modeling. MAR is designed for CMP (Chip Multi Processor) systems by employing multi-input/multi-output (MIMO) theory and per-core level DVFS (Dynamic Voltage and Frequency Scaling).
Show less
-
Date Issued
-
2011
-
Identifier
-
CFE0003910, ucf:48749
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0003910
-
-
Title
-
Towards High-Performance Big Data Processing Systems.
-
Creator
-
Zhang, Hong, Wang, Liqiang, Turgut, Damla, Wang, Jun, Zhang, Shunpu, University of Central Florida
-
Abstract / Description
-
The amount of generated and stored data has been growing rapidly, It is estimated that 2.5 quintillion bytes of data are generated every day, and 90% of the data in the world today has been created in the last two years. How to solve these big data issues has become a hot topic in both industry and academia.Due to the complex of big data platform, we stratify it into four layers: storage layer, resource management layer, computing layer, and methodology layer. This dissertation proposes brand...
Show moreThe amount of generated and stored data has been growing rapidly, It is estimated that 2.5 quintillion bytes of data are generated every day, and 90% of the data in the world today has been created in the last two years. How to solve these big data issues has become a hot topic in both industry and academia.Due to the complex of big data platform, we stratify it into four layers: storage layer, resource management layer, computing layer, and methodology layer. This dissertation proposes brand-new approaches to address the performance of big data platforms like Hadoop and Spark on these four layers.We first present an improved HDFS design called SMARTH, which optimizes the storage layer. It utilizes asynchronous multi-pipeline data transfers instead of a single pipeline stop-and-wait mechanism. SMARTH records the actual transfer speed of data blocks and sends this information to the namenode along with periodic heartbeat messages. The namenode sorts datanodes according to their past performance and tracks this information continuously. When a client initiates an upload request, the namenode will send it a list of ''high performance'' datanodes that it thinks will yield the highest throughput for the client. By choosing higher performance datanodes relative to each client and by taking advantage of the multi-pipeline design, our experiments show that SMARTH significantly improves the performance of data write operations compared to HDFS. Specifically, SMARTH is able to improve the throughput of data transfer by 27-245% in a heterogeneous virtual cluster on Amazon EC2. Secondly, we propose an optimized Hadoop extension called MRapid, which significantly speeds up the execution of short jobs on the resource management layer. It is completely backward compatible to Hadoop, and imposes negligible overhead. Our experiments on Microsoft Azure public cloud show that MRapid can improve performance by up to 88% compared to the original Hadoop.Thirdly, we introduce an efficient 3-level sampling performance model, called Hedgehog, and focus on the relationship between resource and performance. This design is a brand new white-box model for Spark, which is more complex and challenging than Hadoop. In our tool, we employ a Java bytecode manipulation and analysis framework called ASM to reduce the profiling overhead dramatically.Fourthly, on the computing layer, we optimize the current implementation of SGD in Spark's MLlib by reusing data partition for multiple times within a single iteration to find better candidate weights in a more efficient way. Whether using multiple local iterations within each partition is dynamically decided by the 68-95-99.7 rule. We also design a variant of momentum algorithm to optimize step size in every iteration. This method uses a new adaptive rule that decreases the step size whenever neighboring gradients show differing directions of significance. Experiments show that our adaptive algorithm is more efficient and can be 7 times faster compared to the original MLlib's SGD.At last, on the application layer, we present a scalable and distributed geographic information system, called Dart, based on Hadoop and HBase. Dart provides a hybrid table schema to store spatial data in HBase so that the Reduce process can be omitted for operations like calculating the mean center and the median center. It employs reasonable pre-splitting and hash techniques to avoid data imbalance and hot region problems. It also supports massive spatial data analysis like K-Nearest Neighbors (KNN) and Geometric Median Distribution. In our experiments, we evaluate the performance of Dart by processing 160 GB Twitter data on an Amazon EC2 cluster. The experimental results show that Dart is very scalable and efficient.
Show less
-
Date Issued
-
2018
-
Identifier
-
CFE0007271, ucf:52208
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0007271
-
-
Title
-
Bridging the Gap between Application and Solid-State-Drives.
-
Creator
-
Zhou, Jian, Wang, Jun, Lin, Mingjie, Fan, Deliang, Ewetz, Rickard, Qi, GuoJun, University of Central Florida
-
Abstract / Description
-
Data storage is one of the important and often critical parts of the computing systemin terms of performance, cost, reliability, and energy.Numerous new memory technologies,such as NAND flash, phase change memory (PCM), magnetic RAM (STT-RAM) and Memristor,have emerged recently.Many of them have already entered the production system.Traditional storage optimization and caching algorithms are far from optimalbecause storage I/Os do not show simple locality.To provide optimal storage we need...
Show moreData storage is one of the important and often critical parts of the computing systemin terms of performance, cost, reliability, and energy.Numerous new memory technologies,such as NAND flash, phase change memory (PCM), magnetic RAM (STT-RAM) and Memristor,have emerged recently.Many of them have already entered the production system.Traditional storage optimization and caching algorithms are far from optimalbecause storage I/Os do not show simple locality.To provide optimal storage we need accurate predictions of I/O behavior.However, the workloads are increasingly dynamic and diverse,making the long and short time I/O prediction challenge.Because of the evolution of the storage technologiesand the increasing diversity of workloads,the storage software is becoming more and more complex.For example, Flash Translation Layer (FTL) is added for NAND-flash based Solid State Disks (NAND-SSDs).However, it introduces overhead such as address translation delay and garbage collection costs.There are many recent studies aim to address the overhead.Unfortunately, there is no one-size-fits-all solution due to the variety of workloads.Despite rapidly evolving in storage technologies,the increasing heterogeneity and diversity in machines and workloadscoupled with the continued data explosionexacerbate the gap between computing and storage speeds.In this dissertation, we improve the data storage performance from both top-down and bottom-up approach.First, we will investigate exposing the storage level parallelismso that applications can avoid I/O contentions and workloads skewwhen scheduling the jobs.Second, we will study how architecture aware task scheduling can improve the performance of the application when PCM based NVRAM are equipped.Third, we will develop an I/O correlation aware flash translation layer for NAND-flash based Solid State Disks.Fourth, we will build a DRAM-based correlation aware FTL emulator and study the performance in various filesystems.
Show less
-
Date Issued
-
2018
-
Identifier
-
CFE0007273, ucf:52188
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0007273
-
-
Title
-
Research on High-performance and Scalable Data Access in Parallel Big Data Computing.
-
Creator
-
Yin, Jiangling, Wang, Jun, Jin, Yier, Lin, Mingjie, Qi, GuoJun, Wang, Chung-Ching, University of Central Florida
-
Abstract / Description
-
To facilitate big data processing, many dedicated data-intensive storage systems such as Google File System(GFS), Hadoop Distributed File System(HDFS) and Quantcast File System(QFS) have been developed. Currently, the Hadoop Distributed File System(HDFS) [20] is the state-of-art and most popular open-source distributed file system for big data processing. It is widely deployed as the bedrock for many big data processing systems/frameworks, such as the script-based pig system, MPI-based...
Show moreTo facilitate big data processing, many dedicated data-intensive storage systems such as Google File System(GFS), Hadoop Distributed File System(HDFS) and Quantcast File System(QFS) have been developed. Currently, the Hadoop Distributed File System(HDFS) [20] is the state-of-art and most popular open-source distributed file system for big data processing. It is widely deployed as the bedrock for many big data processing systems/frameworks, such as the script-based pig system, MPI-based parallel programs, graph processing systems and scala/java-based Spark frameworks. These systems/applications employ parallel processes/executors to speed up data processing within scale-out clusters.Job or task schedulers in parallel big data applications such as mpiBLAST and ParaView can maximize the usage of computing resources such as memory and CPU by tracking resource consumption/availability for task assignment. However, since these schedulers do not take the distributed I/O resources and global data distribution into consideration, the data requests from parallel processes/executors in big data processing will unfortunately be served in an imbalanced fashion on the distributed storage servers. These imbalanced access patterns among storage nodes are caused because a). unlike conventional parallel file system using striping policies to evenly distribute data among storage nodes, data-intensive file systems such as HDFS store each data unit, referred to as chunk or block file, with several copies based on a relative random policy, which can result in an uneven data distribution among storage nodes; b). based on the data retrieval policy in HDFS, the more data a storage node contains, the higher the probability that the storage node could be selected to serve the data. Therefore, on the nodes serving multiple chunk files, the data requests from different processes/executors will compete for shared resources such as hard disk head and network bandwidth. Because of this, the makespan of the entire program could be significantly prolonged and the overall I/O performance will degrade.The first part of my dissertation seeks to address aspects of these problems by creating an I/O middleware system and designing matching-based algorithms to optimize data access in parallel big data processing. To address the problem of remote data movement, we develop an I/O middleware system, called SLAM, which allows MPI-based analysis and visualization programs to benefit from locality read, i.e, each MPI process can access its required data from a local or nearby storage node. This can greatly improve the execution performance by reducing the amount of data movement over network. Furthermore, to address the problem of imbalanced data access, we propose a method called Opass, which models the data read requests that are issued by parallel applications to cluster nodes as a graph data structure where edges weights encode the demands of load capacity. We then employ matching-based algorithms to map processes to data to achieve data access in a balanced fashion. The final part of my dissertation focuses on optimizing sub-dataset analyses in parallel big data processing. Our proposed methods can benefit different analysis applications with various computational requirements and the experiments on different cluster testbeds show their applicability and scalability.
Show less
-
Date Issued
-
2015
-
Identifier
-
CFE0006021, ucf:51008
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0006021
-
-
Title
-
Managing IO Resource for Co-running Data Intensive Applications in Virtual Clusters.
-
Creator
-
Huang, Dan, Wang, Jun, Zhou, Qun, Sun, Wei, Zhang, Shaojie, Wang, Liqiang, University of Central Florida
-
Abstract / Description
-
Today Big Data computer platforms employ resource management systems such as Yarn, Torque, Mesos, and Google Borg to enable sharing the physical computing among many users or applications. Given virtualization and resource management systems, users are able to launch their applications on the same node with low mutual interference and management overhead on CPU and memory. However, there are still challenges to be addressed before these systems can be fully adopted to manage the IO resources...
Show moreToday Big Data computer platforms employ resource management systems such as Yarn, Torque, Mesos, and Google Borg to enable sharing the physical computing among many users or applications. Given virtualization and resource management systems, users are able to launch their applications on the same node with low mutual interference and management overhead on CPU and memory. However, there are still challenges to be addressed before these systems can be fully adopted to manage the IO resources in Big Data File Systems (BDFS) and shared network facilities. In this study, we mainly study on three IO management problems systematically, in terms of the proportional sharing of block IO in container-based virtualization, the network IO contention in MPI-based HPC applications and the data migration overhead in HPC workflows. To improve the proportional sharing, we develop a prototype system called BDFS-Container, by containerizing BDFS at Linux block IO level. Central to BDFS-Container, we propose and design a proactive IOPS throttling based mechanism named IOPS Regulator, which improves proportional IO sharing under the BDFS IO pattern by 74.4% on an average. In the aspect of network IO resource management, we exploit using virtual switches to facilitate network traffic manipulation and reduce mutual interference on the network for in-situ applications. In order to dynamically allocate the network bandwidth when it is needed, we adopt SARIMA-based techniques to analyze and predict MPI traffic issued from simulations. Third, to solve the data migration problem in small-medium sized HPC clusters, we propose to construct a sided IO path, named as SideIO, to explicitly direct analysis data to BDFS that co-locates computation with data. By experimenting with two real-world scientific workflows, SideIO completely avoids the most expensive data movement overhead and achieves up to 3x speedups compared with current solutions.
Show less
-
Date Issued
-
2018
-
Identifier
-
CFE0007195, ucf:52268
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0007195
-
-
Title
-
Energy Efficient and Secure Wireless Sensor Networks Design.
-
Creator
-
Attiah, Afraa, Zou, Changchun, Chatterjee, Mainak, Wang, Jun, Yuksel, Murat, Wang, Chung-Ching, University of Central Florida
-
Abstract / Description
-
ABSTRACTWireless Sensor Networks (WSNs) are emerging technologies that have the ability to sense,process, communicate, and transmit information to a destination, and they are expected to have significantimpact on the efficiency of many applications in various fields. The resource constraintsuch as limited battery power, is the greatest challenge in WSNs design as it affects the lifetimeand performance of the network. An energy efficient, secure, and trustworthy system is vital whena WSN...
Show moreABSTRACTWireless Sensor Networks (WSNs) are emerging technologies that have the ability to sense,process, communicate, and transmit information to a destination, and they are expected to have significantimpact on the efficiency of many applications in various fields. The resource constraintsuch as limited battery power, is the greatest challenge in WSNs design as it affects the lifetimeand performance of the network. An energy efficient, secure, and trustworthy system is vital whena WSN involves highly sensitive information. Thus, it is critical to design mechanisms that are energyefficient and secure while at the same time maintaining the desired level of quality of service.Inspired by these challenges, this dissertation is dedicated to exploiting optimization and gametheoretic approaches/solutions to handle several important issues in WSN communication, includingenergy efficiency, latency, congestion, dynamic traffic load, and security. We present severalnovel mechanisms to improve the security and energy efficiency of WSNs. Two new schemes areproposed for the network layer stack to achieve the following: (a) to enhance energy efficiencythrough optimized sleep intervals, that also considers the underlying dynamic traffic load and (b)to develop the routing protocol in order to handle wasted energy, congestion, and clustering. Wealso propose efficient routing and energy-efficient clustering algorithms based on optimization andgame theory. Furthermore, we propose a dynamic game theoretic framework (i.e., hyper defense)to analyze the interactions between attacker and defender as a non-cooperative security game thatconsiders the resource limitation. All the proposed schemes are validated by extensive experimentalanalyses, obtained by running simulations depicting various situations in WSNs in orderto represent real-world scenarios as realistically as possible. The results show that the proposedschemes achieve high performance in different terms, such as network lifetime, compared with thestate-of-the-art schemes.
Show less
-
Date Issued
-
2018
-
Identifier
-
CFE0006971, ucf:51672
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0006971
-
-
Title
-
Reducing the Overhead of Memory Space, Network Communication and Disk I/O for Analytic Frameworks in Big Data Ecosystem.
-
Creator
-
Zhang, Xuhong, Wang, Jun, Fan, Deliang, Lin, Mingjie, Zhang, Shaojie, University of Central Florida
-
Abstract / Description
-
To facilitate big data processing, many distributed analytic frameworks and storage systems such as Apache Hadoop, Apache Hama, Apache Spark and Hadoop Distributed File System (HDFS) have been developed. Currently, many researchers are conducting research to either make them more scalable or enabling them to support more analysis applications. In my PhD study, I conducted three main works in this topic, which are minimizing the communication delay in Apache Hama, minimizing the memory space...
Show moreTo facilitate big data processing, many distributed analytic frameworks and storage systems such as Apache Hadoop, Apache Hama, Apache Spark and Hadoop Distributed File System (HDFS) have been developed. Currently, many researchers are conducting research to either make them more scalable or enabling them to support more analysis applications. In my PhD study, I conducted three main works in this topic, which are minimizing the communication delay in Apache Hama, minimizing the memory space and computational overhead in HDFS and minimizing the disk I/O overhead for approximation applications in Hadoop ecosystem. Specifically, In Apache Hama, communication delay makes up a large percentage of the overall graph processing time. While most recent research has focused on reducing the number of network messages, we add a runtime communication and computation scheduler to overlap them as much as possible. As a result, communication delay can be mitigated. In HDFS, the block location table and its corresponding maintenance could occupy more than half of the memory space and 30% of processing capacity in master node, which severely limit the scalability and performance of master node. We propose Deister that uses deterministic mathematical calculations to eliminate the huge table for storing the block locations and its corresponding maintenance. My third work proposes to enable both efficient and accurate approximations on arbitrary sub-datasets of a large dataset. Existing offline sampling based approximation systems are not adaptive to dynamic query workloads and online sampling based approximation systems suffer from low I/O efficiency and poor estimation accuracy. Therefore, we develop a distribution aware method called Sapprox. Our idea is to collect the occurrences of a sub-dataset at each logical partition of a dataset (storage distribution) in the distributed system at a very small cost, and make good use of such information to facilitate online sampling.
Show less
-
Date Issued
-
2017
-
Identifier
-
CFE0007299, ucf:52149
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0007299
-
-
Title
-
UTILIZING EDGE IN IOT AND VIDEO STREAMING APPLICATIONS TO REDUCE BOTTLENECKS IN INTERNET TRAFFIC.
-
Creator
-
Akpinar, Kutalmis, Hua, Kien, Zou, Changchun, Turgut, Damla, Wang, Jun, University of Central Florida
-
Abstract / Description
-
There is a large increase in the surge of data over Internet due to the increasing demand on multimedia content. It is estimated that 80% of Internet traffic will be video by 2022, according to a recent study. At the same time, IoT devices on Internet will double the human population. While infrastructure standards on IoT are still nonexistent, enterprise solutions tend to encourage cloud-based solutions, causing an additional surge of data over the Internet. This study proposes solutions to...
Show moreThere is a large increase in the surge of data over Internet due to the increasing demand on multimedia content. It is estimated that 80% of Internet traffic will be video by 2022, according to a recent study. At the same time, IoT devices on Internet will double the human population. While infrastructure standards on IoT are still nonexistent, enterprise solutions tend to encourage cloud-based solutions, causing an additional surge of data over the Internet. This study proposes solutions to bring video traffic and IoT computation back to the edges of the network, so that costly Internet infrastructure upgrades are not necessary. An efficient way to prevent the Internet surge over the network for IoT is to push the application specific computation to the edge of the network, close to where the data is generated, so that large data can be eliminated before being delivered to the cloud. In this study, an event query language and processing environment is provided to process events from various devices. The query processing environment brings the application developers, sensor infrastructure providers and end users together. It uses boolean events as the streaming and processing units. This addresses the device heterogeneity and pushes the data-intense tasks to the edge of network.The second focus of the study is Video-on-Demand applications. A characteristic of VoD traffic is its high redundancy. Due to the demand on popular content, the same video traffic flows through Internet Service Provider's network as overlapping but separate streams. In previous studies on redundancy elimination, overlapping streams are merged into each other in link-level by receiving the packet only for the first stream, and re-using it for the subsequent duplicated streams. In this study, we significantly improve these techniques by introducing a merger-aware routing method.Our final focus is increasing utilization of Content Delivery Network (CDN) servers on the edge of network to reduce the long-distance traffic. The proposed system uses Software Defined Networks (SDN) to route adaptive video streaming clients to the best available CDN servers in terms of network availability. While performing the network assistance, the system does not reveal the video request information to the network provider, thus enabling privacy protection for encrypted streams. The request routing is performed in segment level for adaptive streaming. This enables to re-route the client to the best available CDN without an interruption if network conditions change during the stream.
Show less
-
Date Issued
-
2019
-
Identifier
-
CFE0007882, ucf:52774
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0007882
-
-
Title
-
Research on Improving Reliability, Energy Efficiency and Scalability in Distributed and Parallel File Systems.
-
Creator
-
Zhang, Junyao, Wang, Jun, Zhang, Shaojie, Lee, Jooheung, University of Central Florida
-
Abstract / Description
-
With the increasing popularity of cloud computing and "Big data" applications, current data centers are often required to manage petabytes or exabytes of data. To store this huge amount of data, thousands or tens of thousands storage nodes are required at a single site. This imposes three major challenges for storage system designers: (1) Reliability---node failure in these datacenters is a normal occurrence rather than a rare situation. This makes data reliability a great concern. (2) Energy...
Show moreWith the increasing popularity of cloud computing and "Big data" applications, current data centers are often required to manage petabytes or exabytes of data. To store this huge amount of data, thousands or tens of thousands storage nodes are required at a single site. This imposes three major challenges for storage system designers: (1) Reliability---node failure in these datacenters is a normal occurrence rather than a rare situation. This makes data reliability a great concern. (2) Energy efficiency---a data center can consume up to 100 times more energy than a standard office building. More than 10% of this energy consumption can be attributed to storage systems. Thus, reducing the energy consumption of the storage system is key to reducing the overall consumption of the data center.(3) Scalability---with the continuously increasing size of data, maintaining the scalability of the storage systems is essential. That is, the expansion of the storage system should be completed efficiently and without limitations on the total number of storage nodes or performance.This thesis proposes three ways to improve the above three key features for current large-scale storage systems. Firstly, we define the problem of "reverse lookup", namely finding the list of objects (blocks) for a failed node. As the first step of failure recovery, this process is directly related to the recovery/reconstruction time. While existing solutions use metadata traversal or data distribution reversing methods for reverse lookup, which are either time consuming or expensive, a deterministic block placement can achieve fast and efficient reverse lookup.However, the deterministic placement solutions are designed for centralized, small-scale storage architectures such as RAID etc.. Due to their lacking of scalability, they cannot be directly applied in large-scale storage systems. In this paper, we propose Group-Shifted Declustering (G-SD), a deterministic data layout for multi-way replication. G-SD addresses the scalability issue of our previous Shifted Declustering layout and supports fast and efficient reverse lookup.Secondly, we define a problem: "how to balance the performance, energy, and recovery in degradation mode for an energy efficient storage system?". While extensive researches have been proposed to tradeoff performance for energy efficiency under normal mode, the system enters degradation mode when node failure occurs, in which node reconstruction is initiated. This very process requires a number of disks to be spun up and requires a substantial amount of I/O bandwidth, which will not only compromise energy efficiency but also performance. Without considering the I/O bandwidth contention between recovery and performance, we find that the current energy proportional solutions cannot answer this question accurately. This thesis present PERP, a mathematical model to minimize the energy consumption for a storage systems with respect to performance and recovery. PERP answers this problem by providing the accurate number of nodes and the assigned recovery bandwidth at each time frame.Thirdly, current distributed file systems such as Google File System(GFS) and Hadoop Distributed File System (HDFS), employ a pseudo-random method for replica distribution and a centralized lookup table (block map) to record all replica locations. This lookup table requires a large amount of memory and consumes a considerable amount of CPU/network resources on the metadata server. With the booming size of "Big Data", the metadata server becomes a scalability and performance bottleneck. While current approaches such as HDFS Federation attempt to "horizontally" extend scalability by allowing multiple metadata servers, we believe a more promising optimization option is to "vertically" scale up each metadata server. We propose Deister, a novel block management scheme that builds on top of a deterministic declustering distribution method Intersected Shifted Declustering (ISD). Thus both replica distribution and location lookup can be achieved without a centralized lookup table.
Show less
-
Date Issued
-
2015
-
Identifier
-
CFE0006238, ucf:51082
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0006238
-
-
Title
-
Developing new power management and High-Reliability Schemes in Data-Intensive Environment.
-
Creator
-
Wang, Ruijun, Wang, Jun, Jin, Yier, DeMara, Ronald, Zhang, Shaojie, Ni, Liqiang, University of Central Florida
-
Abstract / Description
-
With the increasing popularity of data-intensive applications as well as the large-scale computingand storage systems, current data centers and supercomputers are often dealing with extremelylarge data-sets. To store and process this huge amount of data reliably and energy-efficiently,three major challenges should be taken into consideration for the system designers. Firstly, power conservation(-)Multicore processors or CMPs have become a mainstream in the current processormarket because of...
Show moreWith the increasing popularity of data-intensive applications as well as the large-scale computingand storage systems, current data centers and supercomputers are often dealing with extremelylarge data-sets. To store and process this huge amount of data reliably and energy-efficiently,three major challenges should be taken into consideration for the system designers. Firstly, power conservation(-)Multicore processors or CMPs have become a mainstream in the current processormarket because of the tremendous improvement in transistor density and the advancement in semiconductor technology. However, the increasing number of transistors on a single die or chip reveals a super-linear growth in power consumption [4]. Thus, how to balance system performance andpower-saving is a critical issue which needs to be solved effectively. Secondly, system reliability(-)Reliability is a critical metric in the design and development of replication-based big data storagesystems such as Hadoop File System (HDFS). In the system with thousands machines and storagedevices, even in-frequent failures become likely. In Google File System, the annual disk failurerate is 2:88%,which means you were expected to see 8,760 disk failures in a year. Unfortunately,given an increasing number of node failures, how often a cluster starts losing data when beingscaled out is not well investigated. Thirdly, energy efficiency(-)The fast processing speeds of the current generation of supercomputers provide a great convenience to scientists dealing with extremely large data sets. The next generation of (")exascale(") supercomputers could provide accuratesimulation results for the automobile industry, aerospace industry, and even nuclear fusion reactors for the very first time. However, the energy cost of super-computing is extremely high, with a total electricity bill of 9 million dollars per year. Thus, conserving energy and increasing the energy efficiency of supercomputers has become critical in recent years.This dissertation proposes new solutions to address the above three key challenges for currentlarge-scale storage and computing systems. Firstly, we propose a novel power management scheme called MAR (model-free, adaptive, rule-based) in multiprocessor systems to minimize the CPU power consumption subject to performance constraints. By introducing new I/O wait status, MAR is able to accurately describe the relationship between core frequencies, performance and power consumption. Moreover, we adopt a model-free control method to filter out the I/O wait status from the traditional CPU busy/idle model in order to achieve fast responsiveness to burst situations and take full advantage of power saving. Our extensive experiments on a physical testbed demonstrate that, for SPEC benchmarks and data-intensive (TPC-C) benchmarks, an MAR prototype system achieves 95.8-97.8% accuracy of the ideal power saving strategy calculated offline. Compared with baseline solutions, MAR is able to save 12.3-16.1% more power while maintain a comparable performance loss of about 0.78-1.08%. In addition, more simulation results indicate that our design achieved 3.35-14.2% more power saving efficiency and 4.2-10.7% less performance loss under various CMP configurations as compared with various baseline approaches such as LAST, Relax,PID and MPC.Secondly, we create a new reliability model by incorporating the probability of replica loss toinvestigate the system reliability of multi-way declustering data layouts and analyze their potential parallel recovery possibilities. Our comprehensive simulation results on Matlab and SHARPE show that the shifted declustering data layout outperforms the random declustering layout in a multi-way replication scale-out architecture, in terms of data loss probability and system reliability by upto 63% and 85% respectively. Our study on both 5-year and 10-year system reliability equipped with various recovery bandwidth settings shows that, the shifted declustering layout surpasses the two baseline approaches in both cases by consuming up to 79 % and 87% less recovery bandwidth for copyset, as well as 4.8% and 10.2% less recovery bandwidth for random layout.Thirdly, we develop a power-aware job scheduler by applying a rule based control method and takinginto account real world power and speedup profiles to improve power efficiency while adheringto predetermined power constraints. The intensive simulation results shown that our proposed method is able to achieve the maximum utilization of computing resources as compared to baselinescheduling algorithms while keeping the energy cost under the threshold. Moreover, by introducinga Power Performance Factor (PPF) based on the real world power and speedup profiles, we areable to increase the power efficiency by up to 75%.
Show less
-
Date Issued
-
2016
-
Identifier
-
CFE0006704, ucf:51907
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0006704
-
-
Title
-
Improving the performance of data-intensive computing on Cloud platforms.
-
Creator
-
Dai, Wei, Bassiouni, Mostafa, Zou, Changchun, Wang, Jun, Lin, Mingjie, Bai, Yuanli, University of Central Florida
-
Abstract / Description
-
Big Data such as Terabyte and Petabyte datasets are rapidly becoming the new norm for various organizations across a wide range of industries. The widespread data-intensive computing needs have inspired innovations in parallel and distributed computing, which has been the effective way to tackle massive computing workload for decades. One significant example is MapReduce, which is a programming model for expressing distributed computations on huge datasets, and an execution framework for data...
Show moreBig Data such as Terabyte and Petabyte datasets are rapidly becoming the new norm for various organizations across a wide range of industries. The widespread data-intensive computing needs have inspired innovations in parallel and distributed computing, which has been the effective way to tackle massive computing workload for decades. One significant example is MapReduce, which is a programming model for expressing distributed computations on huge datasets, and an execution framework for data-intensive computing on commodity clusters as well. Since it was originally proposed by Google, MapReduce has become the most popular technology for data-intensive computing. While Google owns its proprietary implementation of MapReduce, an open source implementation called Hadoop has gained wide adoption in the rest of the world. The combination of Hadoop and Cloud platforms has made data-intensive computing much more accessible and affordable than ever before.This dissertation addresses the performance issue of data-intensive computing on Cloud platforms from three different aspects: task assignment, replica placement, and straggler identification. Both task assignment and replica placement are subjects closely related to load balancing, which is one of the key issues that can significantly affect the performance of parallel and distributed applications. While task assignment schemes strive to balance data processing load among cluster nodes to achieve minimum job completion time, replica placement policies aim to assign block replicas to cluster nodes according to their processing capabilities to exploit data locality to the maximum extent. Straggler identification is also one of the crucial issues data-intensive computing has to deal with, as the overall performance of parallel and distributed applications is often determined by the node with the lowest performance. The results of extensive evaluation tests confirm that the schemes/policies proposed in this dissertation can improve the performance of data-intensive applications running on Cloud platforms.
Show less
-
Date Issued
-
2017
-
Identifier
-
CFE0006731, ucf:51896
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0006731
-
-
Title
-
Leaning Robust Sequence Features via Dynamic Temporal Pattern Discovery.
-
Creator
-
Hu, Hao, Wang, Liqiang, Zhang, Shaojie, Liu, Fei, Qi, GuoJun, Zhou, Qun, University of Central Florida
-
Abstract / Description
-
As a major type of data, time series possess invaluable latent knowledge for describing the real world and human society. In order to improve the ability of intelligent systems for understanding the world and people, it is critical to design sophisticated machine learning algorithms for extracting robust time series features from such latent knowledge. Motivated by the successful applications of deep learning in computer vision, more and more machine learning researchers put their attentions...
Show moreAs a major type of data, time series possess invaluable latent knowledge for describing the real world and human society. In order to improve the ability of intelligent systems for understanding the world and people, it is critical to design sophisticated machine learning algorithms for extracting robust time series features from such latent knowledge. Motivated by the successful applications of deep learning in computer vision, more and more machine learning researchers put their attentions on the topic of applying deep learning techniques to time series data. However, directly employing current deep models in most time series domains could be problematic. A major reason is that temporal pattern types that current deep models are aiming at are very limited, which cannot meet the requirement of modeling different underlying patterns of data coming from various sources. In this study we address this problem by designing different network structures explicitly based on specific domain knowledge such that we can extract features via most salient temporal patterns. More specifically, we mainly focus on two types of temporal patterns: order patterns and frequency patterns. For order patterns, which are usually related to brain and human activities, we design a hashing-based neural network layer to globally encode the ordinal pattern information into the resultant features. It is further generalized into a specially designed Recurrent Neural Networks (RNN) cell which can learn order patterns in an online fashion. On the other hand, we believe audio-related data such as music and speech can benefit from modeling frequency patterns. Thus, we do so by developing two types of RNN cells. The first type tries to directly learn the long-term dependencies on frequency domain rather than time domain. The second one aims to dynamically filter out the ``noise" frequencies based on temporal contexts. By proposing various deep models based on different domain knowledge and evaluating them on extensive time series tasks, we hope this work can provide inspirations for others and increase the community's interests on the problem of applying deep learning techniques to more time series tasks.
Show less
-
Date Issued
-
2019
-
Identifier
-
CFE0007470, ucf:52679
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0007470
-
-
Title
-
Stochastic-Based Computing with Emerging Spin-Based Device Technologies.
-
Creator
-
Bai, Yu, Lin, Mingjie, DeMara, Ronald, Wang, Jun, Jin, Yier, Dong, Yajie, University of Central Florida
-
Abstract / Description
-
In this dissertation, analog and emerging device physics is explored to provide a technology plat- form to design new bio-inspired system and novel architecture. With CMOS approaching the nano-scaling, their physics limits in feature size. Therefore, their physical device characteristics will pose severe challenges to constructing robust digital circuitry. Unlike transistor defects due to fabrication imperfection, quantum-related switching uncertainties will seriously increase their sus-...
Show moreIn this dissertation, analog and emerging device physics is explored to provide a technology plat- form to design new bio-inspired system and novel architecture. With CMOS approaching the nano-scaling, their physics limits in feature size. Therefore, their physical device characteristics will pose severe challenges to constructing robust digital circuitry. Unlike transistor defects due to fabrication imperfection, quantum-related switching uncertainties will seriously increase their sus- ceptibility to noise, thus rendering the traditional thinking and logic design techniques inadequate. Therefore, the trend of current research objectives is to create a non-Boolean high-level compu- tational model and map it directly to the unique operational properties of new, power efficient, nanoscale devices.The focus of this research is based on two-fold: 1) Investigation of the physical hysteresis switching behaviors of domain wall device. We analyze phenomenon of domain wall device and identify hys- teresis behavior with current range. We proposed the Domain-Wall-Motion-based (DWM) NCL circuit that achieves approximately 30x and 8x improvements in energy efficiency and chip layout area, respectively, over its equivalent CMOS design, while maintaining similar delay performance for a one bit full adder. 2) Investigation of the physical stochastic switching behaviors of Mag- netic Tunnel Junction (MTJ) device. With analyzing of stochastic switching behaviors of MTJ, we proposed an innovative stochastic-based architecture for implementing artificial neural network (S-ANN) with both magnetic tunneling junction (MTJ) and domain wall motion (DWM) devices, which enables efficient computing at an ultra-low voltage. For a well-known pattern recognition task, our mixed-model HSPICE simulation results have shown that a 34-neuron S-ANN imple- mentation, when compared with its deterministic-based ANN counterparts implemented with dig- ital and analog CMOS circuits, achieves more than 1.5 ? 2 orders of magnitude lower energy consumption and 2 ? 2.5 orders of magnitude less hidden layer chip area.
Show less
-
Date Issued
-
2016
-
Identifier
-
CFE0006680, ucf:51921
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0006680
-
-
Title
-
Rethinking Routing and Peering in the era of Vertical Integration of Network Functions.
-
Creator
-
Dey, Prasun, Yuksel, Murat, Wang, Jun, Ewetz, Rickard, Zhang, Wei, Hasan, Samiul, University of Central Florida
-
Abstract / Description
-
Content providers typically control the digital content consumption services and are getting the most revenue by implementing an (")all-you-can-eat(") model via subscription or hyper-targeted advertisements. Revamping the existing Internet architecture and design, a vertical integration where a content provider and access ISP will act as unibody in a sugarcane form seems to be the recent trend. As this vertical integration trend is emerging in the ISP market, it is questionable if existing...
Show moreContent providers typically control the digital content consumption services and are getting the most revenue by implementing an (")all-you-can-eat(") model via subscription or hyper-targeted advertisements. Revamping the existing Internet architecture and design, a vertical integration where a content provider and access ISP will act as unibody in a sugarcane form seems to be the recent trend. As this vertical integration trend is emerging in the ISP market, it is questionable if existing routing architecture will suffice in terms of sustainable economics, peering, and scalability. It is expected that the current routing will need careful modifications and smart innovations to ensure effective and reliable end-to-end packet delivery. This involves new feature developments for handling traffic with reduced latency to tackle routing scalability issues in a more secure way and to offer new services at cheaper costs. Considering the fact that prices of DRAM or TCAM in legacy routers are not necessarily decreasing at the desired pace, cloud computing can be a great solution to manage the increasing computation and memory complexity of routing functions in a centralized manner with optimized expenses. Focusing on the attributes associated with existing routing cost models and by exploring a hybrid approach to SDN, we also compare recent trends in cloud pricing (for both storage and service) to evaluate whether it would be economically beneficial to integrate cloud services with legacy routing for improved cost-efficiency. In terms of peering, using the US as a case study, we show the overlaps between access ISPs and content providers to explore the viability of a future in terms of peering between the new emerging content-dominated sugarcane ISPs and the healthiness of Internet economics. To this end, we introduce meta-peering, a term that encompasses automation efforts related to peering (-) from identifying a list of ISPs likely to peer, to injecting control-plane rules, to continuous monitoring and notifying any violation (-) one of the many outcroppings of vertical integration procedure which could be offered to the ISPs as a standalone service.
Show less
-
Date Issued
-
2019
-
Identifier
-
CFE0007797, ucf:52351
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0007797
-
-
Title
-
Adaptive Architectural Strategies for Resilient Energy-Aware Computing.
-
Creator
-
Ashraf, Rizwan, DeMara, Ronald, Lin, Mingjie, Wang, Jun, Jha, Sumit, Johnson, Mark, University of Central Florida
-
Abstract / Description
-
Reconfigurable logic or Field-Programmable Gate Array (FPGA) devices have the ability to dynamically adapt the computational circuit based on user-specified or operating-condition requirements. Such hardware platforms are utilized in this dissertation to develop adaptive techniques for achieving reliable and sustainable operation while autonomously meeting these requirements. In particular, the properties of resource uniformity and in-field reconfiguration via on-chip processors are exploited...
Show moreReconfigurable logic or Field-Programmable Gate Array (FPGA) devices have the ability to dynamically adapt the computational circuit based on user-specified or operating-condition requirements. Such hardware platforms are utilized in this dissertation to develop adaptive techniques for achieving reliable and sustainable operation while autonomously meeting these requirements. In particular, the properties of resource uniformity and in-field reconfiguration via on-chip processors are exploited to implement Evolvable Hardware (EHW). EHW utilize genetic algorithms to realize logic circuits at runtime, as directed by the objective function. However, the size of problems solved using EHW as compared with traditional approaches has been limited to relatively compact circuits. This is due to the increase in complexity of the genetic algorithm with increase in circuit size. To address this research challenge of scalability, the Netlist-Driven Evolutionary Refurbishment (NDER) technique was designed and implemented herein to enable on-the-fly permanent fault mitigation in FPGA circuits. NDER has been shown to achieve refurbishment of relatively large sized benchmark circuits as compared to related works. Additionally, Design Diversity (DD) techniques which are used to aid such evolutionary refurbishment techniques are also proposed and the efficacy of various DD techniques is quantified and evaluated.Similarly, there exists a growing need for adaptable logic datapaths in custom-designed nanometer-scale ICs, for ensuring operational reliability in the presence of Process, Voltage, and Temperature (PVT) and, transistor-aging variations owing to decreased feature sizes for electronic devices. Without such adaptability, excessive design guardbands are required to maintain the desired integration and performance levels. To address these challenges, the circuit-level technique of Self-Recovery Enabled Logic (SREL) was designed herein. At design-time, vulnerable portions of the circuit identified using conventional Electronic Design Automation tools are replicated to provide post-fabrication adaptability via intelligent techniques. In-situ timing sensors are utilized in a feedback loop to activate suitable datapaths based on current conditions that optimize performance and energy consumption. Primarily, SREL is able to mitigate the timing degradations caused due to transistor aging effects in sub-micron devices by reducing the stress induced on active elements by utilizing power-gating. As a result, fewer guardbands need to be included to achieve comparable performance levels which leads to considerable energy savings over the operational lifetime.The need for energy-efficient operation in current computing systems has given rise to Near-Threshold Computing as opposed to the conventional approach of operating devices at nominal voltage. In particular, the goal of exascale computing initiative in High Performance Computing (HPC) is to achieve 1 EFLOPS under the power budget of 20MW. However, it comes at the cost of increased reliability concerns, such as the increase in performance variations and soft errors. This has given rise to increased resiliency requirements for HPC applications in terms of ensuring functionality within given error thresholds while operating at lower voltages. My dissertation research devised techniques and tools to quantify the effects of radiation-induced transient faults in distributed applications on large-scale systems. A combination of compiler-level code transformation and instrumentation are employed for runtime monitoring to assess the speed and depth of application state corruption as a result of fault injection. Finally, fault propagation models are derived for each HPC application that can be used to estimate the number of corrupted memory locations at runtime. Additionally, the tradeoffs between performance and vulnerability and the causal relations between compiler optimization and application vulnerability are investigated.
Show less
-
Date Issued
-
2015
-
Identifier
-
CFE0006206, ucf:52889
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0006206
-
-
Title
-
Normally-Off Computing Design Methodology Using Spintronics: from Devices to Architectures.
-
Creator
-
Roohi, Arman, DeMara, Ronald, Abdolvand, Reza, Wang, Jun, Fan, Deliang, Del Barco, Enrique, University of Central Florida
-
Abstract / Description
-
Energy-harvesting-powered computing offers intriguing and vast opportunities to dramatically transform the landscape of Internet of Things (IoT) devices and wireless sensor networks by utilizing ambient sources of light, thermal, kinetic, and electromagnetic energy to achieve battery-free computing. In order to operate within the restricted energy capacity and intermittency profile of battery-free operation, it is proposed to innovate Elastic Intermittent Computation (EIC) as a new duty-cycle...
Show moreEnergy-harvesting-powered computing offers intriguing and vast opportunities to dramatically transform the landscape of Internet of Things (IoT) devices and wireless sensor networks by utilizing ambient sources of light, thermal, kinetic, and electromagnetic energy to achieve battery-free computing. In order to operate within the restricted energy capacity and intermittency profile of battery-free operation, it is proposed to innovate Elastic Intermittent Computation (EIC) as a new duty-cycle-variable computing approach leveraging the non-volatility inherent in post-CMOS switching devices. The foundations of EIC will be advanced from the ground up by extending Spin Hall Effect Magnetic Tunnel Junction (SHE-MTJ) device models to realize SHE-MTJ-based Majority Gate (MG) and Polymorphic Gate (PG) logic approaches and libraries, that leverage intrinsic-non-volatility to realize middleware-coherent, intermittent computation without checkpointing, micro-tasking, or software bloat and energy overheads vital to IoT. Device-level EIC research concentrates on encapsulating SHE-MTJ behavior with a compact model to leverage the non-volatility of the device for intrinsic provision of intermittent computation and lifetime energy reduction. Based on this model, the circuit-level EIC contributions will entail the design, simulation, and analysis of PG-based spintronic logic which is adaptable at the gate-level to support variable duty cycle execution that is robust to brief and extended supply outages or unscheduled dropouts, and development of spin-based research synthesis and optimization routines compatible with existing commercial toolchains. These tools will be employed to design a hybrid post-CMOS processing unit utilizing pipelining and power-gating through state-holding properties within the datapath itself, thus eliminating checkpointing and data transfer operations.
Show less
-
Date Issued
-
2019
-
Identifier
-
CFE0007526, ucf:52619
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0007526
Pages