Current Search: Parallel (x)
Pages
-
-
Title
-
HIGH PERFORMANCE DATA MINING TECHNIQUES FOR INTRUSION DETECTION.
-
Creator
-
Siddiqui, Muazzam Ahmed, Lee, Joohan, University of Central Florida
-
Abstract / Description
-
The rapid growth of computers transformed the way in which information and data was stored. With this new paradigm of data access, comes the threat of this information being exposed to unauthorized and unintended users. Many systems have been developed which scrutinize the data for a deviation from the normal behavior of a user or system, or search for a known signature within the data. These systems are termed as Intrusion Detection Systems (IDS). These systems employ different techniques...
Show moreThe rapid growth of computers transformed the way in which information and data was stored. With this new paradigm of data access, comes the threat of this information being exposed to unauthorized and unintended users. Many systems have been developed which scrutinize the data for a deviation from the normal behavior of a user or system, or search for a known signature within the data. These systems are termed as Intrusion Detection Systems (IDS). These systems employ different techniques varying from statistical methods to machine learning algorithms.Intrusion detection systems use audit data generated by operating systems, application softwares or network devices. These sources produce huge amount of datasets with tens of millions of records in them. To analyze this data, data mining is used which is a process to dig useful patterns from a large bulk of information. A major obstacle in the process is that the traditional data mining and learning algorithms are overwhelmed by the bulk volume and complexity of available data. This makes these algorithms impractical for time critical tasks like intrusion detection because of the large execution time.Our approach towards this issue makes use of high performance data mining techniques to expedite the process by exploiting the parallelism in the existing data mining algorithms and the underlying hardware. We will show that how high performance and parallel computing can be used to scale the data mining algorithms to handle large datasets, allowing the data mining component to search a much larger set of patterns and models than traditional computational platforms and algorithms would allow.We develop parallel data mining algorithms by parallelizing existing machine learning techniques using cluster computing. These algorithms include parallel backpropagation and parallel fuzzy ARTMAP neural networks. We evaluate the performances of the developed models in terms of speedup over traditional algorithms, prediction rate and false alarm rate. Our results showed that the traditional backpropagation and fuzzy ARTMAP algorithms can benefit from high performance computing techniques which make them well suited for time critical tasks like intrusion detection.
Show less
-
Date Issued
-
2004
-
Identifier
-
CFE0000056, ucf:46142
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0000056
-
-
Title
-
MODIFICATIONS TO THE FUZZY-ARTMAP ALGORITHM FOR DISTRIBUTED LEARNING IN LARGE DATA SETS.
-
Creator
-
Castro, Jose R, Georgiopoulos, Michael, University of Central Florida
-
Abstract / Description
-
The Fuzzy-ARTMAP (FAM) algorithm is one of the premier neural network architectures for classification problems. FAM can learn on line and is usually faster than other neural network approaches. Nevertheless the learning time of FAM can slow down considerably when the size of the training set increases into the hundreds of thousands. We apply data partitioning and networkpartitioning to the FAM algorithm in a sequential and parallel settingto achieve better convergence time and to efficiently...
Show moreThe Fuzzy-ARTMAP (FAM) algorithm is one of the premier neural network architectures for classification problems. FAM can learn on line and is usually faster than other neural network approaches. Nevertheless the learning time of FAM can slow down considerably when the size of the training set increases into the hundreds of thousands. We apply data partitioning and networkpartitioning to the FAM algorithm in a sequential and parallel settingto achieve better convergence time and to efficiently train withlarge databases (hundreds of thousands of patterns).Our parallelization is implemented on a Beowulf clusters of workstations. Two data partitioning approaches and two networkpartitioning approaches are developed. Extensive testing of all the approaches is done on three large datasets (half a milliondata points). One of them is the Forest Covertype database from Blackard and the other two are artificially generated Gaussian data with different percentages of overlap between classes.Speedups in the data partitioning approach reached the order of the hundreds without having to invest in parallel computation. Speedups onthe network partitioning approach are close to linear on a cluster of workstations. Both methods allowed us to reduce the computation time of training the neural network in large databases from days to minutes. We prove formally that the workload balance of our network partitioning approaches will never be worse than an acceptable bound, and also demonstrate the correctness of these parallelization variants of FAM.
Show less
-
Date Issued
-
2004
-
Identifier
-
CFE0000065, ucf:46092
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0000065
-
-
Title
-
AUTOMATED ADAPTIVE DATA CENTER GENERATION FOR MESHLESS METHODS.
-
Creator
-
Mitteff, Eric, Divo, Eduardo, University of Central Florida
-
Abstract / Description
-
Meshless methods have recently received much attention but are yet to reach their full potential as the required problem setup (i.e. collocation point distribution) is still significant and far from automated. The distribution of points still closely resembles the nodes of finite volume-type meshes and the free parameter, c, of the radial-basis expansion functions (RBF) still must be tailored specifically to a problem. The localized meshless collocation method investigated requires a local...
Show moreMeshless methods have recently received much attention but are yet to reach their full potential as the required problem setup (i.e. collocation point distribution) is still significant and far from automated. The distribution of points still closely resembles the nodes of finite volume-type meshes and the free parameter, c, of the radial-basis expansion functions (RBF) still must be tailored specifically to a problem. The localized meshless collocation method investigated requires a local influence region, or topology, used as the expansion medium to produce the required field derivatives. Tests have shown a regular cartesian point distribution produces optimal results, however, in order to maintain a locally cartesian point distribution a recursive quadtree scheme is herein proposed. The quadtree method allows modeling of irregular geometries and refinement of regions of interest and it lends itself for full automation, thus, reducing problem setup efforts. Furthermore, the construction of the localized expansion regions is closely tied up to the point distribution process and, hence, incorporated into the automated sequence. This also allows for the optimization of the RBF free parameter on a local basis to achieve a desired level of accuracy in the expansion. In addition, an optimized auto-segmentation process is adopted to distribute and balance the problem loads throughout a parallel computational environment while minimizing communication requirements.
Show less
-
Date Issued
-
2006
-
Identifier
-
CFE0001321, ucf:47032
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0001321
-
-
Title
-
ALLIANCES IN GRAPHS: PARAMETERIZED ALGORITHMS ANDON PARTITIONING SERIES-PARALLEL GRAPHS.
-
Creator
-
Enciso, Rosa, Dutton, Ronald, University of Central Florida
-
Abstract / Description
-
Alliances are used to denote agreements between members of a group with similar interests. Alliances can occur between nations, biological sequences, business cartels, and other entities. The notion of alliances in graphs was first introduced by Kristiansen, Hedetniemi, and Hedetniemi in . A defensive alliance in a graph G=(V,E) is a non empty set S⊆V S where, for all x ∈S, |N∩S|≥|N-S|. Consequently, every vertex that is a member of a defensive alliance has at least as...
Show moreAlliances are used to denote agreements between members of a group with similar interests. Alliances can occur between nations, biological sequences, business cartels, and other entities. The notion of alliances in graphs was first introduced by Kristiansen, Hedetniemi, and Hedetniemi in . A defensive alliance in a graph G=(V,E) is a non empty set S⊆V S where, for all x ∈S, |N∩S|≥|N-S|. Consequently, every vertex that is a member of a defensive alliance has at least as many vertices defending it as there are vertices attacking it. Alliances can be used to model a variety of applications such as classification problems, communities in the web distributed protocols, etc . In , Gerber and Kobler introduced the problem of partitioning a graph into strong defensive alliances for the first time as the "Satisfactory Graph Partitioning (SGP)" problem. In his dissertation , Shafique used the problem of partitioning a graph into alliances to model problems in data clustering. Decision problems for several types of alliances and alliance partitions have been shown to be NP-complete. However, because of their applicability, it is of interest to study methods to overcome the complexity of these problems. In this thesis, we will present a variety of algorithms for finding alliances in different families of graphs with a running time that is polynomial in terms of the size of the input, and allowing exponential running time as a function of a chosen parameter. This study is guided by the theory of parameterized complexity introduced by Rod Downey and Michael Fellows in . In addition to parameterized algorithms for alliance related problems, we study the partition of series-parallel graphs into alliances. The class of series-parallel graphs is a special class in graph theory since many problems known to be NP-complete on general graphs have been shown to have polynomial time algorithms on series-parallel graphs. For example, the problem of finding a minimum defensive alliance has been shown to have a linear time algorithm when restricted to series-parallel graphs . Series-parallel graphs have also been to focus of study in a wide range of applications including CMOS layout and scheduling problems [ML86, Oud97]. Our motivation is driven by clustering properties that can be modeled with alliances. We observe that partitioning series-parallel graphs into alliances of roughly the same size can be used to partition task graphs to minimize the communication between processors and balance the workload of each processor. We present a characterization of series-parallel graphs that allow a partition into defensive alliances and a subclass of series-parallel graphs with a satisfactory partitions.
Show less
-
Date Issued
-
2009
-
Identifier
-
CFE0002956, ucf:47945
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0002956
-
-
Title
-
ATTACKS ON DIFFICULT INSTANCES OF GRAPH ISOMORPHISM: SEQUENTIAL AND PARALLEL ALGORITHMS.
-
Creator
-
Tener, Greg, Deo, Narsingh, University of Central Florida
-
Abstract / Description
-
The graph isomorphism problem has received a great deal of attention on both theoretical and practical fronts. However, a polynomial algorithm for the problem has yet to be found. Even so, the best of the existing algorithms perform well in practice; so well that it is challenging to find hard instances for them. The most efficient algorithms, for determining if a pair of graphs are isomorphic, are based on the individualization-refinement paradigm, pioneered by Brendan McKay in 1981 with his...
Show moreThe graph isomorphism problem has received a great deal of attention on both theoretical and practical fronts. However, a polynomial algorithm for the problem has yet to be found. Even so, the best of the existing algorithms perform well in practice; so well that it is challenging to find hard instances for them. The most efficient algorithms, for determining if a pair of graphs are isomorphic, are based on the individualization-refinement paradigm, pioneered by Brendan McKay in 1981 with his algorithm nauty. Nauty and various improved descendants of nauty, such as bliss and saucy, solve the graph isomorphism problem by determining a canonical representative for each of the graphs. The graphs are isomorphic if and only if their canonical representatives are identical. These algorithms also detect the symmetries in a graph which are used to speed up the search for the canonical representative--an approach that performs well in practice. Yet, several families of graphs have been shown to exist which are hard for nauty-like algorithms. This dissertation investigates why these graph families pose difficulty for individualization-refinement algorithms and proposes several techniques for circumventing these limitations. The first technique we propose addresses a fundamental problem pointed out by Miyazaki in 1993. He constructed a family of colored graphs which require exponential time for nauty (and nauty's improved descendants). We analyze Miyazaki's construction to determine the source of difficulty and identify a solution. We modify the base individualization-refinement algorithm by exploiting the symmetries discovered in a graph to guide the search for its canonical representative. This is accomplished with the help of a novel data structure called a guide tree. As a consequence, colored Miyazaki graphs are processed in polynomial time--thus obviating the only known exponential upper-bound on individualization-refinement algorithms (which has stood for the last 16 years). The preceding technique can only help if a graph has enough symmetry to exploit. It cannot be used for another family of hard graphs that have a high degree of regularity, but possess few actual symmetries. To handle these instances, we introduce an adaptive refinement method which utilizes the guide-tree data structure of the preceding technique to use a stronger vertex-invariant, but only when needed. We show that adaptive refinement is very effective, and it can result in dramatic speedups. We then present a third technique ideally suited for large graphs with a preponderance of sparse symmetries. A method was devised by Darga et al. for dealing with these large and highly symmetric graphs, which can reduce runtime by an order of magnitude. We explain the method and show how to incorporate it into our algorithm. Finally, we develop and implement a parallel algorithm for detecting the symmetries in, and finding a canonical representative of a graph. Our novel parallel algorithm divides the search for the symmetries and canonical representative among each processor, allowing for a high degree of scalability. The parallel algorithm is benchmarked on the hardest problem instances, and shown to be effective in subdividing the search space.
Show less
-
Date Issued
-
2009
-
Identifier
-
CFE0002894, ucf:48018
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0002894
-
-
Title
-
METADATA AND DATA MANAGEMENT IN HIGH PERFORMANCE FILE AND STORAGE SYSTEMS.
-
Creator
-
Gu, Peng, Wang, Jun, University of Central Florida
-
Abstract / Description
-
With the advent of emerging "e-Science" applications, today's scientific research increasingly relies on petascale-and-beyond computing over large data sets of the same magnitude. While the computational power of supercomputers has recently entered the era of petascale, the performance of their storage system is far lagged behind by many orders of magnitude. This places an imperative demand on revolutionizing their underlying I/O systems, on which the management of both metadata and data...
Show moreWith the advent of emerging "e-Science" applications, today's scientific research increasingly relies on petascale-and-beyond computing over large data sets of the same magnitude. While the computational power of supercomputers has recently entered the era of petascale, the performance of their storage system is far lagged behind by many orders of magnitude. This places an imperative demand on revolutionizing their underlying I/O systems, on which the management of both metadata and data is deemed to have significant performance implications. Prefetching/caching and data locality awareness optimizations, as conventional and effective management techniques for metadata and data I/O performance enhancement, still play their crucial roles in current parallel and distributed file systems. In this study, we examine the limitations of existing prefetching/caching techniques and explore the untapped potentials of data locality optimization techniques in the new era of petascale computing. For metadata I/O access, we propose a novel weighted-graph-based prefetching technique, built on both direct and indirect successor relationship, to reap performance benefit from prefetching specifically for clustered metadata serversan arrangement envisioned necessary for petabyte scale distributed storage systems. For data I/O access, we design and implement Segment-structured On-disk data Grouping and Prefetching (SOGP), a combined prefetching and data placement technique to boost the local data read performance for parallel file systems, especially for those applications with partially overlapped access patterns. One high-performance local I/O software package in SOGP work for Parallel Virtual File System in the number of about 2000 C lines was released to Argonne National Laboratory in 2007 for potential integration into the production mode.
Show less
-
Date Issued
-
2008
-
Identifier
-
CFE0002251, ucf:47826
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0002251
-
-
Title
-
Research on High-performance and Scalable Data Access in Parallel Big Data Computing.
-
Creator
-
Yin, Jiangling, Wang, Jun, Jin, Yier, Lin, Mingjie, Qi, GuoJun, Wang, Chung-Ching, University of Central Florida
-
Abstract / Description
-
To facilitate big data processing, many dedicated data-intensive storage systems such as Google File System(GFS), Hadoop Distributed File System(HDFS) and Quantcast File System(QFS) have been developed. Currently, the Hadoop Distributed File System(HDFS) [20] is the state-of-art and most popular open-source distributed file system for big data processing. It is widely deployed as the bedrock for many big data processing systems/frameworks, such as the script-based pig system, MPI-based...
Show moreTo facilitate big data processing, many dedicated data-intensive storage systems such as Google File System(GFS), Hadoop Distributed File System(HDFS) and Quantcast File System(QFS) have been developed. Currently, the Hadoop Distributed File System(HDFS) [20] is the state-of-art and most popular open-source distributed file system for big data processing. It is widely deployed as the bedrock for many big data processing systems/frameworks, such as the script-based pig system, MPI-based parallel programs, graph processing systems and scala/java-based Spark frameworks. These systems/applications employ parallel processes/executors to speed up data processing within scale-out clusters.Job or task schedulers in parallel big data applications such as mpiBLAST and ParaView can maximize the usage of computing resources such as memory and CPU by tracking resource consumption/availability for task assignment. However, since these schedulers do not take the distributed I/O resources and global data distribution into consideration, the data requests from parallel processes/executors in big data processing will unfortunately be served in an imbalanced fashion on the distributed storage servers. These imbalanced access patterns among storage nodes are caused because a). unlike conventional parallel file system using striping policies to evenly distribute data among storage nodes, data-intensive file systems such as HDFS store each data unit, referred to as chunk or block file, with several copies based on a relative random policy, which can result in an uneven data distribution among storage nodes; b). based on the data retrieval policy in HDFS, the more data a storage node contains, the higher the probability that the storage node could be selected to serve the data. Therefore, on the nodes serving multiple chunk files, the data requests from different processes/executors will compete for shared resources such as hard disk head and network bandwidth. Because of this, the makespan of the entire program could be significantly prolonged and the overall I/O performance will degrade.The first part of my dissertation seeks to address aspects of these problems by creating an I/O middleware system and designing matching-based algorithms to optimize data access in parallel big data processing. To address the problem of remote data movement, we develop an I/O middleware system, called SLAM, which allows MPI-based analysis and visualization programs to benefit from locality read, i.e, each MPI process can access its required data from a local or nearby storage node. This can greatly improve the execution performance by reducing the amount of data movement over network. Furthermore, to address the problem of imbalanced data access, we propose a method called Opass, which models the data read requests that are issued by parallel applications to cluster nodes as a graph data structure where edges weights encode the demands of load capacity. We then employ matching-based algorithms to map processes to data to achieve data access in a balanced fashion. The final part of my dissertation focuses on optimizing sub-dataset analyses in parallel big data processing. Our proposed methods can benefit different analysis applications with various computational requirements and the experiments on different cluster testbeds show their applicability and scalability.
Show less
-
Date Issued
-
2015
-
Identifier
-
CFE0006021, ucf:51008
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0006021
-
-
Title
-
Improved Interpolation in SPH in Cases of Less Smooth Flow.
-
Creator
-
Brun, Oddny, Wiegand, Rudolf, Pensky, Marianna, University of Central Florida
-
Abstract / Description
-
ABSTRACTWe introduced a method presented in Information Field Theory (IFT) [Abramovich et al.,2007] to improve interpolation in Smoothed Particle Hydrodynamics (SPH) in cases of less smoothflow. The method makes use of wavelet theory combined with B-splines for interpolation. The ideais to identify any jumps a function may have and then reconstruct the smoother segments betweenthe jumps. The results of our work demonstrated superior capability when compared to a particularchallenging SPH...
Show moreABSTRACTWe introduced a method presented in Information Field Theory (IFT) [Abramovich et al.,2007] to improve interpolation in Smoothed Particle Hydrodynamics (SPH) in cases of less smoothflow. The method makes use of wavelet theory combined with B-splines for interpolation. The ideais to identify any jumps a function may have and then reconstruct the smoother segments betweenthe jumps. The results of our work demonstrated superior capability when compared to a particularchallenging SPH application, to better conserve jumps and more accurately interpolate thesmoother segments of the function. The results of our work also demonstrated increased computationalefficiency with limited loss in accuracy as number of multiplications and execution timewere reduced. Similar benefits were observed for functions with spikes analyzed by the samemethod. Lesser, but similar effects were also demonstrated for real life data sets of less smoothnature.SPH is widely used in modeling and simulation of flow of matters. SPH presents advantagescompared to grid based methods both in terms of computational efficiency and accuracy, inparticular when dealing with less smooth flow. The results we achieved through our research is animprovement to the model in cases of less smooth flow, in particular flow with jumps and spikes.Up until now such improvements have been sought through modifications to the models' physicalequations and/or kernel functions and have only partially been able to address the issue.This research, as it introduced wavelet theory and IFT to a field of science that, to ourknowledge, not currently are utilizing these methods, did lay the groundwork for future researchiiiideas to benefit SPH. Among those ideas are further development of criteria for wavelet selection,use of smoothing splines for SPH interpolation and incorporation of Bayesian field theory.Improving the method's accuracy, stability and efficiency under more challenging conditionssuch as flow with jumps and spikes, will benefit applications in a wide area of science. Justin medicine alone, such improvements will further increase real time diagnostics, treatments andtraining opportunities because jumps and spikes are often the characteristics of significant physiologicaland anatomic conditions such as pulsatile blood flow, peristaltic intestine contractions andorgans' edges appearance in imaging.
Show less
-
Date Issued
-
2016
-
Identifier
-
CFE0006446, ucf:51451
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0006446
-
-
Title
-
Practical Dynamic Transactional Data Structures.
-
Creator
-
Laborde, Pierre, Dechev, Damian, Leavens, Gary, Turgut, Damla, Mucciolo, Eduardo, University of Central Florida
-
Abstract / Description
-
Multicore programming presents the challenge of synchronizing multiple threads.Traditionally, mutual exclusion locks are used to limit access to a shared resource to a single thread at a time.Whether this lock is applied to an entire data structure, or only a single element, the pitfalls of lock-based programming persist.Deadlock, livelock, starvation, and priority inversion are some of the hazards of lock-based programming that can be avoided by using non-blocking techniques.Non-blocking...
Show moreMulticore programming presents the challenge of synchronizing multiple threads.Traditionally, mutual exclusion locks are used to limit access to a shared resource to a single thread at a time.Whether this lock is applied to an entire data structure, or only a single element, the pitfalls of lock-based programming persist.Deadlock, livelock, starvation, and priority inversion are some of the hazards of lock-based programming that can be avoided by using non-blocking techniques.Non-blocking data structures allow scalable and thread-safe access to shared data by guaranteeing, at least, system-wide progress.In this work, we present the first wait-free hash map which allows a large number of threads to concurrently insert, get, and remove information.Wait-freedom means that all threads make progress in a finite amount of time --- an attribute that can be critical in real-time environments.We only use atomic operations that are provided by the hardware; therefore, our hash map can be utilized by a variety of data-intensive applications including those within the domains of embedded systems and supercomputers.The challenges of providing this guarantee make the design and implementation of wait-free objects difficult.As such, there are few wait-free data structures described in the literature; in particular, there are no wait-free hash maps.It often becomes necessary to sacrifice performance in order to achieve wait-freedom.However, our experimental evaluation shows that our hash map design is, on average, 7 times faster than a traditional blocking design.Our solution outperforms the best available alternative non-blocking designs in a large majority of cases, typically by a factor of 15 or higher.The main drawback of non-blocking data structures is that only one linearizable operation can be executed by each thread, at any one time.To overcome this limitation we present a framework for developing dynamic transactional data containers.Transactional containers are those that execute a sequence of operations atomically and in such a way that concurrent transactions appear to take effect in some sequential order.We take an existing algorithm that transforms non-blocking sets into static transactional versions (LFTT), and we modify it to support maps.We implement a non-blocking transactional hash map using this new approach.We continue to build on LFTT by implementing a lock-free vector using a methodology to allow LFTT to be compatible with non-linked data structures.A static transaction requires all operands and operations to be specified at compile-time, and no code may be executed between transactions.These limitations render static transactions impractical for most use cases.We modify LFTT to support dynamic transactions, and we enhance it with additional features.Dynamic transactions allow operands to be specified at runtime rather than compile-time, and threads can execute code between the data structure operations of a transaction.We build a framework for transforming non-blocking containers into dynamic transactional data structures, called Dynamic Transactional Transformation (DTT), and provide a library of novel transactional containers.Our library provides the wait-free progress guarantee and supports transactions among multiple data structures, whereas previous work on data structure transactions has been limited to operating on a single container.Our approach is 3 times faster than software transactional memory, and its performance matches its lock-free transactional counterpart.
Show less
-
Date Issued
-
2018
-
Identifier
-
CFE0007215, ucf:52212
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0007215
-
-
Title
-
Modeling and Solving Large-scale Stochastic Mixed-Integer Problems in Transportation and Power Systems.
-
Creator
-
Huang, Zhouchun, Zheng, Qipeng, Xanthopoulos, Petros, Pazour, Jennifer, Chang, Ni-bin, University of Central Florida
-
Abstract / Description
-
In this dissertation, various optimization problems from the area of transportation and power systems will be respectively investigated and the uncertainty will be considered in each problem. Specifically, a long-term problem of electricity infrastructure investment is studied to address the planning for capacity expansion in electrical power systems with the integration of short-term operations. The future investment costs and real-time customer demands cannot be perfectly forecasted and...
Show moreIn this dissertation, various optimization problems from the area of transportation and power systems will be respectively investigated and the uncertainty will be considered in each problem. Specifically, a long-term problem of electricity infrastructure investment is studied to address the planning for capacity expansion in electrical power systems with the integration of short-term operations. The future investment costs and real-time customer demands cannot be perfectly forecasted and thus are considered to be random. Another maintenance scheduling problem is studied for power systems, particularly for natural gas fueled power plants, taking into account gas contracting and the opportunity of purchasing and selling gas in the spot market as well as the maintenance scheduling considering the uncertainty of electricity and gas prices in the spot market. In addition, different vehicle routing problems are researched seeking the route for each vehicle so that the total traveling cost is minimized subject to the constraints and uncertain parameters in corresponding transportation systems.The investigation of each problem in this dissertation mainly consists of two parts, i.e., the formulation of its mathematical model and the development of solution algorithm for solving the model. The stochastic programming is applied as the framework to model each problem and address the uncertainty, while the approach of dealing with the randomness varies in terms of the relationships between the uncertain elements and objective functions or constraints. All the problems will be modeled as stochastic mixed-integer programs, and the huge numbers of involved decision variables and constraints make each problem large-scale and very difficult to manage. In this dissertation, efficient algorithms are developed for these problems in the context of advanced methodologies of optimization and operations research, such as branch and cut, benders decomposition, column generation and Lagrangian method. Computational experiments are implemented for each problem and the results will be present and discussed. The research carried out in this dissertation would be beneficial to both researchers and practitioners seeking to model and solve similar optimization problems in transportation and power systems when uncertainty is involved.
Show less
-
Date Issued
-
2016
-
Identifier
-
CFE0006328, ucf:51559
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0006328
-
-
Title
-
Mathematical and Computational Methods for Freeform Optical Shape Description.
-
Creator
-
Kaya, Ilhan, Foroosh, Hassan, Rolland, Jannick, Turgut, Damla, Thompson, Kevin, Ilegbusi, Olusegun, University of Central Florida
-
Abstract / Description
-
Slow-servo single-point diamond turning as well as advances in computer controlled small lap polishing enable the fabrication of freeform optics, specifically, optical surfaces for imaging applications that are not rotationally symmetric. Freeform optical elements will have a profound importance in the future of optical technology. Orthogonal polynomials added onto conic sections have been extensively used to describe optical surface shapes. The optical testing industry has chosen to...
Show moreSlow-servo single-point diamond turning as well as advances in computer controlled small lap polishing enable the fabrication of freeform optics, specifically, optical surfaces for imaging applications that are not rotationally symmetric. Freeform optical elements will have a profound importance in the future of optical technology. Orthogonal polynomials added onto conic sections have been extensively used to describe optical surface shapes. The optical testing industry has chosen to represent the departure of a wavefront under test from a reference sphere in terms of orthogonal ?-polynomials, specifically Zernike polynomials. Various forms of polynomials for describing freeform optical surfaces may be considered, however, both in optical design and in support of fabrication. More recently, radial basis functions were also investigated for optical shape description. In the application of orthogonal ?-polynomials to optical freeform shape description, there are important limitations, such as the number of terms required as well as edge-ringing and ill-conditioning in representing the surface with the accuracy demanded by most stringent optics applications. The first part of this dissertation focuses upon describing freeform optical surfaces with ? polynomials and shows their limitations when including higher orders together with possible remedies. We show that a possible remedy is to use edge clustered-fitting grids. Provided different grid types, we furthermore compared the efficacy of using different types of ? polynomials, namely Zernike and gradient orthogonal Q polynomials. In the second part of this thesis, a local, efficient and accurate hybrid method is developed in order to greatly reduce the order of polynomial terms required to achieve higher level of accuracy in freeform shape description that were shown to require thousands of terms including many higher order terms under prior art. This comes at the expense of multiple sub-apertures, and as such computational methods may leverage parallel processing. This new method combines the assets of both radial basis functions and orthogonal phi-polynomials for freeform shape description and is uniquely applicable across any aperture shape due to its locality and stitching principles. Finally in this thesis, in order to comprehend the possible advantages of parallel computing for optical surface descriptions, the benefits of making an effective use of impressive computational power offered by multi-core platforms for the computation of ?-polynomials are investigated. The ?-polynomials, specifically Zernike and gradient orthogonal Q-polynomials, are implemented with a set of recurrence based parallel algorithms on Graphics Processing Units (GPUs). The results show that more than an order of magnitude speedup is possible in the computation of ?-polynomials over a sequential implementation if the recurrence based parallel algorithms are adopted.
Show less
-
Date Issued
-
2013
-
Identifier
-
CFE0005012, ucf:49993
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0005012
-
-
Title
-
APPLICATION OF ABSORPTIVE TREATMENTS ON TRAFFIC NOISE BARRIERS IN FLORIDA.
-
Creator
-
Chua, Chin Boon, Wayson, Roger, University of Central Florida
-
Abstract / Description
-
In this thesis, the parallel barrier analysis feature in the Federal Highway Administration Traffic Noise Model (FHWA TNM), which is based on RAYVERB was used to explore the effects of multiple reflections due to single and parallel barriers and the use of absorptive treatment. Database was developed from the data collected from previous research efforts was used to generate a best fit equation model that can be used as a predetermining tool to determine the magnitude of parallel barrier...
Show moreIn this thesis, the parallel barrier analysis feature in the Federal Highway Administration Traffic Noise Model (FHWA TNM), which is based on RAYVERB was used to explore the effects of multiple reflections due to single and parallel barriers and the use of absorptive treatment. Database was developed from the data collected from previous research efforts was used to generate a best fit equation model that can be used as a predetermining tool to determine the magnitude of parallel barrier insertion loss. The best fit equation model was then used to test against measured/model result and TNM prediction results for its validity. Absorptive materials were also studied such that 3 top of them were selected and recommended for Florida highway barrier use. It was found that the top three absorptive treatments for use on Florida highway barriers have been determined to be cementitous material, metal wool and glass fiber. These materials can be used to reduce the sound reflections for single and parallel barriers. The developed best fit equation model from this research is Deg = -2.17NRC - CW0.42 + 1.97eln(BH) + RH0.29 + DBB0.27; the prediction results give moderately high R2 value of 0.55 if compared to the results from database. Prediction results from best fit equation model was also found to be consistent with the results from the measure/modeled results, providing further proof of the validity of the model. However, if compared results from equation model, TNM and measured/model (measured and model compared results using ANSI method), TNM was shown to provide higher insertion loss degradation. It was found that the most effective placement of absorptive material was the pattern which covers the barrier from the bottom up; it was also found that only about 60% from the bottom of the barrier area requires covering with high NRC absorptive treatment (NRC greater than 0.8) without sacrificing insertion loss. Also, if the barrier area near the top includes an easily obtainable NRC value of 0.4, only 40% to 50% of the bottom barrier needs absorptive treatment with a higher, more expensive NRC rating. These findings can substantially reduce the cost of conventional absorptive barrier which have full coverage of high NRC absorptive treatment. This research has begun important improvements in noise barrier design, additional work can be continued to further verify all the findings in this thesis such that easier and better equation model can be developed to calculate insertion loss degradation and cheaper absorptive barrier with less absorptive material usage can be built.
Show less
-
Date Issued
-
2004
-
Identifier
-
CFE0000008, ucf:46127
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0000008
-
-
Title
-
MAC LAYER AND ROUTING PROTOCOLS FOR WIRELESS AD HOC NETWORKS WITH ASYMMETRIC LINKS AND PERFORMANCE EVALUATION STUDIES.
-
Creator
-
Wang, Guoqiang, Marinescu, Dan, University of Central Florida
-
Abstract / Description
-
In a heterogeneous mobile ad hoc network (MANET), assorted devices with different computation and communication capabilities co-exist. In this thesis, we consider the case when the nodes of a MANET have various degrees of mobility and range, and the communication links are asymmetric. Many routing protocols for ad hoc networks routinely assume that all communication links are symmetric, if node A can hear node B and node B can also hear node A. Most current MAC layer protocols are unable to...
Show moreIn a heterogeneous mobile ad hoc network (MANET), assorted devices with different computation and communication capabilities co-exist. In this thesis, we consider the case when the nodes of a MANET have various degrees of mobility and range, and the communication links are asymmetric. Many routing protocols for ad hoc networks routinely assume that all communication links are symmetric, if node A can hear node B and node B can also hear node A. Most current MAC layer protocols are unable to exploit the asymmetric links present in a network, thus leading to an inefficient overall bandwidth utilization, or, in the worst case, to lack of connectivity. To exploit the asymmetric links, the protocols must deal with the asymmetry of the path from a source node to a destination node which affects either the delivery of the original packets, or the paths taken by acknowledgments, or both. Furthermore, the problem of hidden nodes requires a more careful analysis in the case of asymmetric links. MAC layer and routing protocols for ad hoc networks with asymmetric links require a rigorous performance analysis. Analytical models are usually unable to provide even approximate solutions to questions such as end-to-end delay, packet loss ratio, throughput, etc. Traditional simulation techniques for large-scale wireless networks require vast amounts of storage and computing cycles rarely available on single computing systems. In our search for an effective solution to study the performance of wireless networks we investigate the time-parallel simulation. Time-parallel simulation has received significant attention in the past. The advantages, as well as, the theoretical and practical limitations of time-parallel simulation have been extensively researched for many applications when the complexity of the models involved severely limits the applicability of analytical studies and is unfeasible with traditional simulation techniques. Our goal is to study the behavior of large systems consisting of possibly thousands of nodes over extended periods of time and obtain results efficiently, and time-parallel simulation enables us to achieve this objective. We conclude that MAC layer and routing protocols capable of using asymmetric links are more complex than traditional ones, but can improve the connectivity, and provide better performance. We are confident that approximate results for various performance metrics of wireless networks obtained using time-parallel simulation are sufficiently accurate and able to provide the necessary insight into the inner workings of the protocols.
Show less
-
Date Issued
-
2007
-
Identifier
-
CFE0001736, ucf:47302
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0001736
-
-
Title
-
SOURCE REPRESENTATION AND FRAMING IN CHILDHOOD IMMUNIZATION COMMUNICATION.
-
Creator
-
Raneri, April, Matusitz, Jonathan, University of Central Florida
-
Abstract / Description
-
Research has indicated a strong interest in knowing who is being represented and how information is being represented in the communication about childhood immunization. This study uses a two-part analysis to look at source representation and framing in childhood immunization communication. A quantitative analysis of articles from the New York Times and USA Today were examined for their source representation, their use of fear appeals, through the Extended Parallel Processing Model (EPPM), and...
Show moreResearch has indicated a strong interest in knowing who is being represented and how information is being represented in the communication about childhood immunization. This study uses a two-part analysis to look at source representation and framing in childhood immunization communication. A quantitative analysis of articles from the New York Times and USA Today were examined for their source representation, their use of fear appeals, through the Extended Parallel Processing Model (EPPM), and the use of frames, through the application of Prospect Theory. A qualitative semiotic analysis was conducted on 36 images that appeared on www.yahoo.com and www.google.com to find common themes for who is being represented and how information is being portrayed through the images. Results found a high prevalence of representation from the Center for Disease Control and Prevention, other governmental agencies and views from health/medical professionals in both the articles and images.
Show less
-
Date Issued
-
2010
-
Identifier
-
CFE0003016, ucf:48343
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0003016
-
-
Title
-
Techniques for automated parameter estimation in computational models of probabilistic systems.
-
Creator
-
Hussain, Faraz, Jha, Sumit, Leavens, Gary, Turgut, Damla, Uddin, Nizam, University of Central Florida
-
Abstract / Description
-
The main contribution of this dissertation is the design of two new algorithms for automatically synthesizing values of numerical parameters of computational models of complexstochastic systems such that the resultant model meets user-specified behavioral specifications.These algorithms are designed to operate on probabilistic systems (-) systems that, in general,behave differently under identical conditions. The algorithms work using an approach thatcombines formal verification and...
Show moreThe main contribution of this dissertation is the design of two new algorithms for automatically synthesizing values of numerical parameters of computational models of complexstochastic systems such that the resultant model meets user-specified behavioral specifications.These algorithms are designed to operate on probabilistic systems (-) systems that, in general,behave differently under identical conditions. The algorithms work using an approach thatcombines formal verification and mathematical optimization to explore a model's parameterspace.The problem of determining whether a model instantiated with a given set of parametervalues satisfies the desired specification is first defined using formal verification terminology,and then reformulated in terms of statistical hypothesis testing. Parameter space explorationinvolves determining the outcome of the hypothesis testing query for each parameter pointand is guided using simulated annealing. The first algorithm uses the sequential probabilityratio test (SPRT) to solve the hypothesis testing problems, whereas the second algorithmuses an approach based on Bayesian statistical model checking (BSMC).The SPRT-based parameter synthesis algorithm was used to validate that a given model ofglucose-insulin metabolism has the capability of representing diabetic behavior by synthesizingvalues of three parameters that ensure that the glucose-insulin subsystem spends at least 20minutes in a diabetic scenario. The BSMC-based algorithm was used to discover the valuesof parameters in a physiological model of the acute inflammatory response that guarantee aset of desired clinical outcomes.These two applications demonstrate how our algorithms use formal verification, statisticalhypothesis testing and mathematical optimization to automatically synthesize parameters ofcomplex probabilistic models in order to meet user-specified behavioral properties
Show less
-
Date Issued
-
2016
-
Identifier
-
CFE0006117, ucf:51200
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0006117
-
-
Title
-
AN INTERACTIVE DISTRIBUTED SIMULATION FRAMEWORK WITH APPLICATION TO WIRELESS NETWORKS AND INTRUSION DETECTION.
-
Creator
-
Kachirski, Oleg, Guha, Ratan, University of Central Florida
-
Abstract / Description
-
In this dissertation, we describe the portable, open-source distributed simulation framework (WINDS) targeting simulations of wireless network infrastructures that we have developed. We present the simulation framework which uses modular architecture and apply the framework to studies of mobility pattern effects, routing and intrusion detection mechanisms in simulations of large-scale wireless ad hoc, infrastructure, and totally mobile networks. The distributed simulations within the...
Show moreIn this dissertation, we describe the portable, open-source distributed simulation framework (WINDS) targeting simulations of wireless network infrastructures that we have developed. We present the simulation framework which uses modular architecture and apply the framework to studies of mobility pattern effects, routing and intrusion detection mechanisms in simulations of large-scale wireless ad hoc, infrastructure, and totally mobile networks. The distributed simulations within the framework execute seamlessly and transparently to the user on a symmetric multiprocessor cluster computer or a network of computers with no modifications to the code or user objects. A visual graphical interface precisely depicts simulation object states and interactions throughout the simulation execution, giving the user full control over the simulation in real time. The network configuration is detected by the framework, and communication latency is taken into consideration when dynamically adjusting the simulation clock, allowing the simulation to run on a heterogeneous computing system. The simulation framework is easily extensible to multi-cluster systems and computing grids. An entire simulation system can be constructed in a short time, utilizing user-created and supplied simulation components, including mobile nodes, base stations, routing algorithms, traffic patterns and other objects. These objects are automatically compiled and loaded by the simulation system, and are available for dynamic simulation injection at runtime. Using our distributed simulation framework, we have studied modern intrusion detection systems (IDS) and assessed applicability of existing intrusion detection techniques to wireless networks. We have developed a mobile agent-based IDS targeting mobile wireless networks, and introduced load-balancing optimizations aimed at limited-resource systems to improve intrusion detection performance. Packet-based monitoring agents of our IDS employ a CASE-based reasoner engine that performs fast lookups of network packets in the existing SNORT-based intrusion rule-set. Experiments were performed using the intrusion data from MIT Lincoln Laboratories studies, and executed on a cluster computer utilizing our distributed simulation system.
Show less
-
Date Issued
-
2005
-
Identifier
-
CFE0000642, ucf:46545
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0000642
-
-
Title
-
Simulation, Analysis, and Optimization of Heterogeneous CPU-GPU Systems.
-
Creator
-
Giles, Christopher, Heinrich, Mark, Ewetz, Rickard, Lin, Mingjie, Pattanaik, Sumanta, Flitsiyan, Elena, University of Central Florida
-
Abstract / Description
-
With the computing industry's recent adoption of the Heterogeneous System Architecture (HSA) standard, we have seen a rapid change in heterogeneous CPU-GPU processor designs. State-of-the-art heterogeneous CPU-GPU processors tightly integrate multicore CPUs and multi-compute unit GPUs together on a single die. This brings the MIMD processing capabilities of the CPU and the SIMD processing capabilities of the GPU together into a single cohesive package with new HSA features comprising better...
Show moreWith the computing industry's recent adoption of the Heterogeneous System Architecture (HSA) standard, we have seen a rapid change in heterogeneous CPU-GPU processor designs. State-of-the-art heterogeneous CPU-GPU processors tightly integrate multicore CPUs and multi-compute unit GPUs together on a single die. This brings the MIMD processing capabilities of the CPU and the SIMD processing capabilities of the GPU together into a single cohesive package with new HSA features comprising better programmability, coherency between the CPU and GPU, shared Last Level Cache (LLC), and shared virtual memory address spaces. These advancements can potentially bring marked gains in heterogeneous processor performance and have piqued the interest of researchers who wish to unlock these potential performance gains. Therefore, in this dissertation I explore the heterogeneous CPU-GPU processor and application design space with the goal of answering interesting research questions, such as, (1) what are the architectural design trade-offs in heterogeneous CPU-GPU processors and (2) how do we best maximize heterogeneous CPU-GPU application performance on a given system. To enable my exploration of the heterogeneous CPU-GPU design space, I introduce a novel discrete event-driven simulation library called KnightSim and a novel computer architectural simulator called M2S-CGM. M2S-CGM includes all of the simulation elements necessary to simulate coherent execution between a CPU and GPU with shared LLC and shared virtual memory address spaces. I then utilize M2S-CGM for the conduct of three architectural studies. First, I study the architectural effects of shared LLC and CPU-GPU coherence on the overall performance of non-collaborative GPU-only applications. Second, I profile and analyze a set of collaborative CPU-GPU applications to determine how to best optimize them for maximum collaborative performance. Third, I study the impact of varying four key architectural parameters on collaborative CPU-GPU performance by varying GPU compute unit coalesce size, GPU to memory controller bandwidth, GPU frequency, and system wide switching fabric latency.
Show less
-
Date Issued
-
2019
-
Identifier
-
CFE0007807, ucf:52346
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0007807
-
-
Title
-
The Performance and Power Impact of Using Multiple DRAM Address Mapping Schemes in Multicore Processors.
-
Creator
-
Jadaa, Rami, Heinrich, Mark, DeMara, Ronald, Yuan, Jiann-Shiun, University of Central Florida
-
Abstract / Description
-
Lowest-level cache misses are satisfied by the main memory through a specific address mapping scheme that is hard-coded in the memory controller. A dynamic address mapping scheme technique is investigated to provide higher performance and lower power consumption, and a method to throttle memory to meet a specific power budget. Several experiments are conducted on single and multithreaded synthetic memory traces -to study extreme cases- and validate the usability of the proposed dynamic...
Show moreLowest-level cache misses are satisfied by the main memory through a specific address mapping scheme that is hard-coded in the memory controller. A dynamic address mapping scheme technique is investigated to provide higher performance and lower power consumption, and a method to throttle memory to meet a specific power budget. Several experiments are conducted on single and multithreaded synthetic memory traces -to study extreme cases- and validate the usability of the proposed dynamic mapping scheme over the fixed one. Results show that applications' performance varies according to the mapping scheme used, and a dynamic mapping scheme achieves up to 2x increase in peak bandwidth utilization and around 30% higher energy efficiency than a system using only a single fixed scheme Moreover, the technique can be used to limit memory accesses into a subset of the memory devices by controlling data allocation at a finer granularity, providing a method to throttle main memory by allowing un-accessed devices to be put into power-down mode, hence saving power to meet a certain power budget.
Show less
-
Date Issued
-
2011
-
Identifier
-
CFE0004121, ucf:49118
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0004121
-
-
Title
-
Optimal distribution network reconfiguration using meta-heuristic algorithms.
-
Creator
-
Asrari, Arash, Wu, Thomas, Lotfifard, Saeed, Haralambous, Michael, Atia, George, Pazour, Jennifer, University of Central Florida
-
Abstract / Description
-
Finding optimal configuration of power distribution systems topology is an NP-hard combinatorial optimization problem. It becomes more complex when time varying nature of loads in large-scale distribution systems is taken into account. In the second chapter of this dissertation, a systematic approach is proposed to tackle the computational burden of the procedure. To solve the optimization problem, a novel adaptive fuzzy based parallel genetic algorithm (GA) is proposed that employs the...
Show moreFinding optimal configuration of power distribution systems topology is an NP-hard combinatorial optimization problem. It becomes more complex when time varying nature of loads in large-scale distribution systems is taken into account. In the second chapter of this dissertation, a systematic approach is proposed to tackle the computational burden of the procedure. To solve the optimization problem, a novel adaptive fuzzy based parallel genetic algorithm (GA) is proposed that employs the concept of parallel computing in identifying the optimal configuration of the network. The integration of fuzzy logic into GA enhances the efficiency of the parallel GA by adaptively modifying the migration rates between different processors during the optimization process. A computationally efficient graph encoding method based on Dandelion coding strategy is developed which automatically generates radial topologies and prevents the construction of infeasible radial networks during the optimization process. The main shortcoming of the proposed algorithm in Chapter 2 is that it identifies only one single solution. It means that the system operator will not have any option but relying on the found solution. That is why a novel hybrid optimization algorithm is proposed in the third chapter of this dissertation that determines Pareto frontiers, as candidate solutions, for multi-objective distribution network reconfiguration problem. Implementing this model, the system operator will have more flexibility in choosing the best configuration among the alternative solutions. The proposed hybrid optimization algorithm combines the concept of fuzzy Pareto dominance (FPD) with shuffled frog leaping algorithm (SFLA) to recognize non-dominated suboptimal solutions identified by SFLA. The local search step of SFLA is also customized for power systems applications so that it automatically creates and analyzes only the feasible and radial configurations in its optimization procedure which significantly increases the convergence speed of the algorithm. In the fourth chapter, the problem of optimal network reconfiguration is solved for the case in which the system operator is going to employ an optimization algorithm that is automatically modifying its parameters during the optimization process. Defining three fuzzy functions, the probability of crossover and mutation will be adaptively tuned as the algorithm proceeds and the premature convergence will be avoided while the convergence speed of identifying the optimal configuration will not decrease. This modified genetic algorithm is considered a step towards making the parallel GA, presented in the second chapter of this dissertation, more robust in avoiding from getting stuck in local optimums. In the fifth chapter, the concentration will be on finding a potential smart grid solution to more high-quality suboptimal configurations of distribution networks. This chapter is considered an improvement for the third chapter of this dissertation for two reasons: (1) A fuzzy logic is used in the partitioning step of SFLA to improve the proposed optimization algorithm and to yield more accurate classification of frogs. (2) The problem of system reconfiguration is solved considering the presence of distributed generation (DG) units in the network. In order to study the new paradigm of integrating smart grids into power systems, it will be analyzed how the quality of suboptimal solutions can be affected when DG units are continuously added to the distribution network.The heuristic optimization algorithm which is proposed in Chapter 3 and is improved in Chapter 5 is implemented on a smaller case study in Chapter 6 to demonstrate that the identified solution through the optimization process is the same with the optimal solution found by an exhaustive search.
Show less
-
Date Issued
-
2015
-
Identifier
-
CFE0005575, ucf:50238
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0005575
-
-
Title
-
Improvement of Data-Intensive Applications Running on Cloud Computing Clusters.
-
Creator
-
Ibrahim, Ibrahim, Bassiouni, Mostafa, Lin, Mingjie, Zhou, Qun, Ewetz, Rickard, Garibay, Ivan, University of Central Florida
-
Abstract / Description
-
MapReduce, designed by Google, is widely used as the most popular distributed programmingmodel in cloud environments. Hadoop, an open-source implementation of MapReduce, is a data management framework on large cluster of commodity machines to handle data-intensive applications. Many famous enterprises including Facebook, Twitter, and Adobehave been using Hadoop for their data-intensive processing needs. Task stragglers in MapReduce jobs dramatically impede job execution on massive datasets in...
Show moreMapReduce, designed by Google, is widely used as the most popular distributed programmingmodel in cloud environments. Hadoop, an open-source implementation of MapReduce, is a data management framework on large cluster of commodity machines to handle data-intensive applications. Many famous enterprises including Facebook, Twitter, and Adobehave been using Hadoop for their data-intensive processing needs. Task stragglers in MapReduce jobs dramatically impede job execution on massive datasets in cloud computing systems. This impedance is due to the uneven distribution of input data and computation load among cluster nodes, heterogeneous data nodes, data skew in reduce phase, resource contention situations, and network configurations. All these reasons may cause delay failure and the violation of job completion time. One of the key issues that can significantly affect the performance of cloud computing is the computation load balancing among cluster nodes. Replica placement in Hadoop distributed file system plays a significant role in data availability and the balanced utilization of clusters. In the current replica placement policy (RPP) of Hadoop distributed file system (HDFS), the replicas of data blocks cannot be evenly distributed across cluster's nodes. The current HDFS must rely on a load balancing utility for balancing the distribution of replicas, which results in extra overhead for time and resources. This dissertation addresses data load balancing problem and presents an innovative replica placement policy for HDFS. It can perfectly balance the data load among cluster's nodes. The heterogeneity of cluster nodes exacerbates the issue of computational load balancing; therefore, another replica placement algorithm has been proposed in this dissertation for heterogeneous cluster environments. The timing of identifying the straggler map task is very important for straggler mitigation in data-intensive cloud computing. To mitigate the straggler map task, Present progress and Feedback based Speculative Execution (PFSE) algorithm has been proposed in this dissertation. PFSE is a new straggler identification scheme to identify the straggler map tasks based on the feedback information received from completed tasks beside the progress of the current running task. Straggler reduce task aggravates the violation of MapReduce job completion time. Straggler reduce task is typically the result of bad data partitioning during the reduce phase. The Hash partitioner employed by Hadoop may cause intermediate data skew, which results in straggler reduce task. In this dissertation a new partitioning scheme, named Balanced Data Clusters Partitioner (BDCP), is proposed to mitigate straggler reduce tasks. BDCP is based on sampling of input data and feedback information about the current processing task. BDCP can assist in straggler mitigation during the reduce phase and minimize the job completion time in MapReduce jobs. The results of extensive experiments corroborate that the algorithms and policies proposed in this dissertation can improve the performance of data-intensive applications running on cloud platforms.
Show less
-
Date Issued
-
2019
-
Identifier
-
CFE0007818, ucf:52804
-
Format
-
Document (PDF)
-
PURL
-
http://purl.flvc.org/ucf/fd/CFE0007818
Pages