You are here
Improving the performance of data-intensive computing on Cloud platforms
- Date Issued:
- 2017
- Abstract/Description:
- Big Data such as Terabyte and Petabyte datasets are rapidly becoming the new norm for various organizations across a wide range of industries. The widespread data-intensive computing needs have inspired innovations in parallel and distributed computing, which has been the effective way to tackle massive computing workload for decades. One significant example is MapReduce, which is a programming model for expressing distributed computations on huge datasets, and an execution framework for data-intensive computing on commodity clusters as well. Since it was originally proposed by Google, MapReduce has become the most popular technology for data-intensive computing. While Google owns its proprietary implementation of MapReduce, an open source implementation called Hadoop has gained wide adoption in the rest of the world. The combination of Hadoop and Cloud platforms has made data-intensive computing much more accessible and affordable than ever before.This dissertation addresses the performance issue of data-intensive computing on Cloud platforms from three different aspects: task assignment, replica placement, and straggler identification. Both task assignment and replica placement are subjects closely related to load balancing, which is one of the key issues that can significantly affect the performance of parallel and distributed applications. While task assignment schemes strive to balance data processing load among cluster nodes to achieve minimum job completion time, replica placement policies aim to assign block replicas to cluster nodes according to their processing capabilities to exploit data locality to the maximum extent. Straggler identification is also one of the crucial issues data-intensive computing has to deal with, as the overall performance of parallel and distributed applications is often determined by the node with the lowest performance. The results of extensive evaluation tests confirm that the schemes/policies proposed in this dissertation can improve the performance of data-intensive applications running on Cloud platforms.
Title: | Improving the performance of data-intensive computing on Cloud platforms. |
38 views
13 downloads |
---|---|---|
Name(s): |
Dai, Wei, Author Bassiouni, Mostafa, Committee Chair Zou, Changchun, Committee Member Wang, Jun, Committee Member Lin, Mingjie, Committee Member Bai, Yuanli, Committee Member University of Central Florida, Degree Grantor |
|
Type of Resource: | text | |
Date Issued: | 2017 | |
Publisher: | University of Central Florida | |
Language(s): | English | |
Abstract/Description: | Big Data such as Terabyte and Petabyte datasets are rapidly becoming the new norm for various organizations across a wide range of industries. The widespread data-intensive computing needs have inspired innovations in parallel and distributed computing, which has been the effective way to tackle massive computing workload for decades. One significant example is MapReduce, which is a programming model for expressing distributed computations on huge datasets, and an execution framework for data-intensive computing on commodity clusters as well. Since it was originally proposed by Google, MapReduce has become the most popular technology for data-intensive computing. While Google owns its proprietary implementation of MapReduce, an open source implementation called Hadoop has gained wide adoption in the rest of the world. The combination of Hadoop and Cloud platforms has made data-intensive computing much more accessible and affordable than ever before.This dissertation addresses the performance issue of data-intensive computing on Cloud platforms from three different aspects: task assignment, replica placement, and straggler identification. Both task assignment and replica placement are subjects closely related to load balancing, which is one of the key issues that can significantly affect the performance of parallel and distributed applications. While task assignment schemes strive to balance data processing load among cluster nodes to achieve minimum job completion time, replica placement policies aim to assign block replicas to cluster nodes according to their processing capabilities to exploit data locality to the maximum extent. Straggler identification is also one of the crucial issues data-intensive computing has to deal with, as the overall performance of parallel and distributed applications is often determined by the node with the lowest performance. The results of extensive evaluation tests confirm that the schemes/policies proposed in this dissertation can improve the performance of data-intensive applications running on Cloud platforms. | |
Identifier: | CFE0006731 (IID), ucf:51896 (fedora) | |
Note(s): |
2017-08-01 Ph.D. Engineering and Computer Science, Electrical Engineering and Computer Engineering Doctoral This record was generated from author submitted information. |
|
Subject(s): | data-intensive computing -- Cloud computing -- MapReduce -- Hadoop -- task assignment -- replica placement -- load balancing -- straggler identification -- speculative execution | |
Persistent Link to This Record: | http://purl.flvc.org/ucf/fd/CFE0006731 | |
Restrictions on Access: | public 2017-08-15 | |
Host Institution: | UCF |