Current Search: Ni, Liqiang (x)
View All Items
- Title
- TO HYDRATE OR CHLORINATE:A REGRESSION ANALYSIS OF THE LEVELS OF CHLORINE IN THE PUBLIC WATER SUPPLY.
- Creator
-
Doyle, Drew, Ni, Liqiang, University of Central Florida
- Abstract / Description
-
Public water supplies contain disease-causing microorganisms in the water or distribution ducts. In order to kill off these pathogens, a disinfectant, such as chlorine, is added to the water. Chlorine is the most widely used disinfectant in all U.S. water treatment facilities. Chlorine is known to be one of the most powerful disinfectants to restrict harmful pathogens from reaching the consumer. In the interest of obtaining a better understanding of what variables affect the levels of...
Show morePublic water supplies contain disease-causing microorganisms in the water or distribution ducts. In order to kill off these pathogens, a disinfectant, such as chlorine, is added to the water. Chlorine is the most widely used disinfectant in all U.S. water treatment facilities. Chlorine is known to be one of the most powerful disinfectants to restrict harmful pathogens from reaching the consumer. In the interest of obtaining a better understanding of what variables affect the levels of chlorine in the water, this thesis will analyze a particular set of water samples randomly collected from locations in Orange County, Florida. Thirty water samples will be collected and have their chlorine level, temperature, and pH recorded. A linear regression analysis will be performed on the data collected with several qualitative and quantitative variables. Water storage time, temperature, time of day, location, pH, and dissolved oxygen level will be the independent variables collected from each water sample. All data collected will be analyzed through various Statistical Analysis System (SAS) procedures. Partial residual plots will be used to determine possible relationships between the chlorine level and the independent variables and stepwise selection to eliminate possible insignificant predictors. From there, several possible models for the data will be selected. F tests will be conducted to determine which of the models appears to be the most useful. All tests will include hypotheses, test statistics, p values, and conclusions. There will also be an analysis of the residual plot, jackknife residuals, leverage values, Cook's D, press statistic, and normal probability plot of the residuals. Possible outliers will be investigated and the critical values for flagged observations will be stated along with what problems the flagged values indicate.
Show less - Date Issued
- 2015
- Identifier
- CFH0004907, ucf:45497
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFH0004907
- Title
- AN ANALYSIS OF THE RELATIONSHIP BETWEEN ECONOMIC DEVELOPMENT AND DEMOGRAPHIC CHARACTERISTICS IN THE UNITED STATES.
- Creator
-
Heyne, Chad, Ni, Liqiang, University of Central Florida
- Abstract / Description
-
Over the past several decades there has been extensive research done in an attempt to determine what demographic characteristics affect economic growth, measured in GDP per capita. Understanding what influences the growth of a country will vastly help policy makers enact policies to lead the country in a positive direction. This research focuses on isolating a new variable, women in the work force. As well as isolating a new variable, this research will modify a preexisting variable that was...
Show moreOver the past several decades there has been extensive research done in an attempt to determine what demographic characteristics affect economic growth, measured in GDP per capita. Understanding what influences the growth of a country will vastly help policy makers enact policies to lead the country in a positive direction. This research focuses on isolating a new variable, women in the work force. As well as isolating a new variable, this research will modify a preexisting variable that was shown to be significant in order to make the variable more robust and sensitive to recessions. The intent of this thesis is to explore the relationship between several demographic characteristics and their effect on the growth rate of GDP per capita. The first step is to reproduce the work done by Barlow (1994) to ensure that the United States follows similar rules as the countries in his research. Afterwards, we will introduce new variables into the model, comparing the goodness of fit through the methods of R-squared, AIC and BIC. There have been several models developed to answer each of the research questions independently.
Show less - Date Issued
- 2011
- Identifier
- CFH0003837, ucf:44712
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFH0003837
- Title
- The Effect of Traumatic Brain Injury on Exposure Therapy in Veterans with Combat-related Posttraumatic Stress Disorder.
- Creator
-
Ragsdale, Kathleen, Beidel, Deborah, Neer, Sandra, Bowers, Clint, Ni, Liqiang, University of Central Florida
- Abstract / Description
-
Veterans of Operation Enduring Freedom, Operation Iraqi Freedom, and Operation New Dawn are presenting for treatment with high rates of combat-related posttraumatic stress disorder (PTSD) and traumatic brain injury (TBI), spurring a need for clinical research on optimal treatment strategies. While exposure therapy has long been supported as an efficacious treatment for combat-related PTSD, some clinicians are hesitant to utilize this treatment for veterans with TBI history due to presumed...
Show moreVeterans of Operation Enduring Freedom, Operation Iraqi Freedom, and Operation New Dawn are presenting for treatment with high rates of combat-related posttraumatic stress disorder (PTSD) and traumatic brain injury (TBI), spurring a need for clinical research on optimal treatment strategies. While exposure therapy has long been supported as an efficacious treatment for combat-related PTSD, some clinicians are hesitant to utilize this treatment for veterans with TBI history due to presumed cognitive deficits that may preclude successful engagement. The purpose of this study was to compare exposure therapy process variables in veterans with PTSD only and veterans with PTSD+TBI. Results suggest that individuals with PTSD+TBI engage successfully in exposure therapy, and do so no differently than individuals with PTSD only. Additional analyses indicated that regardless of TBI status, more severe PTSD was related to longer sessions, more sessions, and slower extinction rate during imaginal exposure. Finally, in a subset of participants, self-report of executive dysfunction did not impact exposure therapy process variables. Overall, findings indicate that exposure therapy should be the first-line treatment for combat-related PTSD regardless of presence of TBI history.
Show less - Date Issued
- 2015
- Identifier
- CFE0005868, ucf:50894
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0005868
- Title
- Role of Sleep in Exposure Therapy for Posttraumatic Stress Disorder in OIF/OEF Combat Veterans.
- Creator
-
Mesa, Franklin, Beidel, Deborah, Neer, Sandra, Bowers, Clint, Ni, Liqiang, University of Central Florida
- Abstract / Description
-
Exposure therapy is theorized to reduce posttraumatic stress disorder (PTSD) symptomatology by promoting habituation/extinction of fear responses to trauma-related cues. Empirical evidence indicates that emotional memory, including habituation/extinction learning, is enhanced by sleep. However, service members with combat-related PTSD often report disturbed sleep. In this study, quality of sleep and indicators of extinction learning were examined in veterans of recent wars who had completed...
Show moreExposure therapy is theorized to reduce posttraumatic stress disorder (PTSD) symptomatology by promoting habituation/extinction of fear responses to trauma-related cues. Empirical evidence indicates that emotional memory, including habituation/extinction learning, is enhanced by sleep. However, service members with combat-related PTSD often report disturbed sleep. In this study, quality of sleep and indicators of extinction learning were examined in veterans of recent wars who had completed an exposure-based PTSD intervention. Fifty-five participants were categorized into two groups based on self-reported quality of sleep: low sleep disruption severity (LSDS; N = 29) and high sleep disruption severity (HSDS; N = 26). Participants in the LSDS group exhibited faster habituation to their traumatic memories and reported less PTSD symptomatology during and following treatment relative to participants in the HSDS group. These findings indicate that individuals with combat-related PTSD reporting less disturbed sleep experience greater extinction learning during exposure therapy. Thus, incorporating interventions that target PTSD-related sleep disturbances may be one way to maximize exposure therapy outcomes in service members with PTSD.
Show less - Date Issued
- 2016
- Identifier
- CFE0006355, ucf:51520
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006355
- Title
- Nonparametric and Empirical Bayes Estimation Methods.
- Creator
-
Benhaddou, Rida, Pensky, Marianna, Han, Deguang, Swanson, Jason, Ni, Liqiang, University of Central Florida
- Abstract / Description
-
In the present dissertation, we investigate two different nonparametric models; empirical Bayes model and functional deconvolution model. In the case of the nonparametric empirical Bayes estimation, we carried out a complete minimax study. In particular, we derive minimax lower bounds for the risk of the nonparametric empirical Bayes estimator for a general conditional distribution. This result has never been obtained previously. In order to attain optimal convergence rates, we use a wavelet...
Show moreIn the present dissertation, we investigate two different nonparametric models; empirical Bayes model and functional deconvolution model. In the case of the nonparametric empirical Bayes estimation, we carried out a complete minimax study. In particular, we derive minimax lower bounds for the risk of the nonparametric empirical Bayes estimator for a general conditional distribution. This result has never been obtained previously. In order to attain optimal convergence rates, we use a wavelet series based empirical Bayes estimator constructed in Pensky and Alotaibi (2005). We propose an adaptive version of this estimator using Lepski's method and show that the estimator attains optimal convergence rates. The theory is supplemented by numerous examples. Our study of the functional deconvolution model expands results of Pensky and Sapatinas (2009, 2010, 2011) to the case of estimating an $(r+1)$-dimensional function or dependent errors. In both cases, we derive minimax lower bounds for the integrated square risk over a wide set of Besov balls and construct adaptive wavelet estimators that attain those optimal convergence rates. In particular, in the case of estimating a periodic $(r+1)$-dimensional function, we show that by choosing Besov balls of mixed smoothness, we can avoid the ''curse of dimensionality'' and, hence, obtain higher than usual convergence rates when $r$ is large. The study of deconvolution of a multivariate function is motivated by seismic inversion which can be reduced to solution of noisy two-dimensional convolution equations that allow to draw inference on underground layer structures along the chosen profiles. The common practice in seismology is to recover layer structures separately for each profile and then to combine the derived estimates into a two-dimensional function. By studying the two-dimensional version of the model, we demonstrate that this strategy usually leads to estimators which are less accurate than the ones obtained as two-dimensional functional deconvolutions. Finally, we consider a multichannel deconvolution model with long-range dependent Gaussian errors. We do not limit our consideration to a specific type of long-range dependence, rather we assume that the eigenvalues of the covariance matrix of the errors are bounded above and below. We show that convergence rates of the estimators depend on a balance between the smoothness parameters of the response function, the smoothness of the blurring function, the long memory parameters of the errors, and how the total number of observations is distributed among the channels.
Show less - Date Issued
- 2013
- Identifier
- CFE0004814, ucf:49737
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004814
- Title
- Scene Understanding for Real Time Processing of Queries over Big Data Streaming Video.
- Creator
-
Aved, Alexander, Hua, Kien, Foroosh, Hassan, Zou, Changchun, Ni, Liqiang, University of Central Florida
- Abstract / Description
-
With heightened security concerns across the globe and the increasing need to monitor, preserve and protect infrastructure and public spaces to ensure proper operation, quality assurance and safety, numerous video cameras have been deployed. Accordingly, they also need to be monitored effectively and efficiently. However, relying on human operators to constantly monitor all the video streams is not scalable or cost effective. Humans can become subjective, fatigued, even exhibit bias and it is...
Show moreWith heightened security concerns across the globe and the increasing need to monitor, preserve and protect infrastructure and public spaces to ensure proper operation, quality assurance and safety, numerous video cameras have been deployed. Accordingly, they also need to be monitored effectively and efficiently. However, relying on human operators to constantly monitor all the video streams is not scalable or cost effective. Humans can become subjective, fatigued, even exhibit bias and it is difficult to maintain high levels of vigilance when capturing, searching and recognizing events that occur infrequently or in isolation.These limitations are addressed in the Live Video Database Management System (LVDBMS), a framework for managing and processing live motion imagery data. It enables rapid development of video surveillance software much like traditional database applications are developed today. Such developed video stream processing applications and ad hoc queries are able to "reuse" advanced image processing techniques that have been developed. This results in lower software development and maintenance costs. Furthermore, the LVDBMS can be intensively tested to ensure consistent quality across all associated video database applications. Its intrinsic privacy framework facilitates a formalized approach to the specification and enforcement of verifiable privacy policies. This is an important step towards enabling a general privacy certification for video surveillance systems by leveraging a standardized privacy specification language.With the potential to impact many important fields ranging from security and assembly line monitoring to wildlife studies and the environment, the broader impact of this work is clear. The privacy framework protects the general public from abusive use of surveillance technology; success in addressing the (")trust(") issue will enable many new surveillance-related applications. Although this research focuses on video surveillance, the proposed framework has the potential to support many video-based analytical applications.
Show less - Date Issued
- 2013
- Identifier
- CFE0004648, ucf:49900
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004648
- Title
- Developing new power management and High-Reliability Schemes in Data-Intensive Environment.
- Creator
-
Wang, Ruijun, Wang, Jun, Jin, Yier, DeMara, Ronald, Zhang, Shaojie, Ni, Liqiang, University of Central Florida
- Abstract / Description
-
With the increasing popularity of data-intensive applications as well as the large-scale computingand storage systems, current data centers and supercomputers are often dealing with extremelylarge data-sets. To store and process this huge amount of data reliably and energy-efficiently,three major challenges should be taken into consideration for the system designers. Firstly, power conservation(-)Multicore processors or CMPs have become a mainstream in the current processormarket because of...
Show moreWith the increasing popularity of data-intensive applications as well as the large-scale computingand storage systems, current data centers and supercomputers are often dealing with extremelylarge data-sets. To store and process this huge amount of data reliably and energy-efficiently,three major challenges should be taken into consideration for the system designers. Firstly, power conservation(-)Multicore processors or CMPs have become a mainstream in the current processormarket because of the tremendous improvement in transistor density and the advancement in semiconductor technology. However, the increasing number of transistors on a single die or chip reveals a super-linear growth in power consumption [4]. Thus, how to balance system performance andpower-saving is a critical issue which needs to be solved effectively. Secondly, system reliability(-)Reliability is a critical metric in the design and development of replication-based big data storagesystems such as Hadoop File System (HDFS). In the system with thousands machines and storagedevices, even in-frequent failures become likely. In Google File System, the annual disk failurerate is 2:88%,which means you were expected to see 8,760 disk failures in a year. Unfortunately,given an increasing number of node failures, how often a cluster starts losing data when beingscaled out is not well investigated. Thirdly, energy efficiency(-)The fast processing speeds of the current generation of supercomputers provide a great convenience to scientists dealing with extremely large data sets. The next generation of (")exascale(") supercomputers could provide accuratesimulation results for the automobile industry, aerospace industry, and even nuclear fusion reactors for the very first time. However, the energy cost of super-computing is extremely high, with a total electricity bill of 9 million dollars per year. Thus, conserving energy and increasing the energy efficiency of supercomputers has become critical in recent years.This dissertation proposes new solutions to address the above three key challenges for currentlarge-scale storage and computing systems. Firstly, we propose a novel power management scheme called MAR (model-free, adaptive, rule-based) in multiprocessor systems to minimize the CPU power consumption subject to performance constraints. By introducing new I/O wait status, MAR is able to accurately describe the relationship between core frequencies, performance and power consumption. Moreover, we adopt a model-free control method to filter out the I/O wait status from the traditional CPU busy/idle model in order to achieve fast responsiveness to burst situations and take full advantage of power saving. Our extensive experiments on a physical testbed demonstrate that, for SPEC benchmarks and data-intensive (TPC-C) benchmarks, an MAR prototype system achieves 95.8-97.8% accuracy of the ideal power saving strategy calculated offline. Compared with baseline solutions, MAR is able to save 12.3-16.1% more power while maintain a comparable performance loss of about 0.78-1.08%. In addition, more simulation results indicate that our design achieved 3.35-14.2% more power saving efficiency and 4.2-10.7% less performance loss under various CMP configurations as compared with various baseline approaches such as LAST, Relax,PID and MPC.Secondly, we create a new reliability model by incorporating the probability of replica loss toinvestigate the system reliability of multi-way declustering data layouts and analyze their potential parallel recovery possibilities. Our comprehensive simulation results on Matlab and SHARPE show that the shifted declustering data layout outperforms the random declustering layout in a multi-way replication scale-out architecture, in terms of data loss probability and system reliability by upto 63% and 85% respectively. Our study on both 5-year and 10-year system reliability equipped with various recovery bandwidth settings shows that, the shifted declustering layout surpasses the two baseline approaches in both cases by consuming up to 79 % and 87% less recovery bandwidth for copyset, as well as 4.8% and 10.2% less recovery bandwidth for random layout.Thirdly, we develop a power-aware job scheduler by applying a rule based control method and takinginto account real world power and speedup profiles to improve power efficiency while adheringto predetermined power constraints. The intensive simulation results shown that our proposed method is able to achieve the maximum utilization of computing resources as compared to baselinescheduling algorithms while keeping the energy cost under the threshold. Moreover, by introducinga Power Performance Factor (PPF) based on the real world power and speedup profiles, we areable to increase the power efficiency by up to 75%.
Show less - Date Issued
- 2016
- Identifier
- CFE0006704, ucf:51907
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006704
- Title
- Quantitative Line Assignment in Optical Emission Spectroscopy.
- Creator
-
Chappell, Jessica, Baudelet, Matthieu, Hernandez, Florencio, Campiglia, Andres, Ni, Liqiang, Sigman, Michael, University of Central Florida
- Abstract / Description
-
Quantitative elemental analysis using Optical Emission Spectroscopy (OES) starts with a high level of confidence in spectral line assignment from reference databases. Spectral interferences caused by instrumental and line broadening decrease the resolution of OES spectra creating uncertainty in the elemental profile of a sample for the first time. An approach has been developed to quantify spectral interferences for individual line assignment in OES. The algorithm calculates a statistical...
Show moreQuantitative elemental analysis using Optical Emission Spectroscopy (OES) starts with a high level of confidence in spectral line assignment from reference databases. Spectral interferences caused by instrumental and line broadening decrease the resolution of OES spectra creating uncertainty in the elemental profile of a sample for the first time. An approach has been developed to quantify spectral interferences for individual line assignment in OES. The algorithm calculates a statistical interference factor (SIF) that combines a physical understanding of plasma emission with a Bayesian analysis of the OES spectrum. It can be used on a single optical spectrum and still address individual lines. Contrary to current methods, quantification of the uncertainty in elemental profiles of OES, leads to more accurate results, higher reliability and validation of the method. The SIF algorithm was evaluated for Laser-Induced Breakdown Spectroscopy (LIBS) on samples with increasing complexity: from silicon to nickel spiked alumina to NIST standards (600 glass series and nickel-chromium alloy). The influence of the user's knowledge of the sample composition was studied and showed that for the majority of spectral lines this information is not changing the line assignment for simple compositions. Nonetheless, the amount of interference could change with this information, as expected. Variance of the SIF results for NIST glass standard was evaluated by the chi-square hypothesis test of variance showing that the results of the SIF algorithm are very reproducible.
Show less - Date Issued
- 2018
- Identifier
- CFE0007564, ucf:52575
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0007564
- Title
- Bayesian Model Selection for Classification with Possibly Large Number of Groups.
- Creator
-
Davis, Justin, Pensky, Marianna, Swanson, Jason, Richardson, Gary, Crampton, William, Ni, Liqiang, University of Central Florida
- Abstract / Description
-
The purpose of the present dissertation is to study model selection techniques which are specifically designed for classification of high-dimensional data with a large number of classes. To the best of our knowledge, this problem has never been studied in depth previously. We assume that the number of components p is much larger than the number of samples n, and that only few of those p components are useful for subsequent classification. In what follows, we introduce two Bayesian models...
Show moreThe purpose of the present dissertation is to study model selection techniques which are specifically designed for classification of high-dimensional data with a large number of classes. To the best of our knowledge, this problem has never been studied in depth previously. We assume that the number of components p is much larger than the number of samples n, and that only few of those p components are useful for subsequent classification. In what follows, we introduce two Bayesian models which use two different approaches to the problem: one which discards components which have "almost constant" values (Model 1) and another which retains the components for which between-group variations are larger than within-group variation (Model 2). We show that particular cases of the above two models recover familiar variance or ANOVA-based component selection. When one has only two classes and features are a priori independent, Model 2 reduces to the Feature Annealed Independence Rule (FAIR) introduced by Fan and Fan (2008) and can be viewed as a natural generalization to the case of L (>) 2 classes. A nontrivial result of the dissertation is that the precision of feature selection using Model 2 improves when the number of classes grows. Subsequently, we examine the rate of misclassification with and without feature selection on the basis of Model 2.
Show less - Date Issued
- 2011
- Identifier
- CFE0004097, ucf:49091
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004097
- Title
- Estimation for the Cox Model with Various Types of Censored Data.
- Creator
-
Riddlesworth, Tonya, Ren, Joan, Mohapatra, Ram, Richardson, Gary, Ni, Liqiang, Schott, James, University of Central Florida
- Abstract / Description
-
In survival analysis, the Cox model is one of the most widely used tools. However, up to now there has not been any published work on the Cox model with complicated types of censored data, such as doubly censored data, partly-interval censored data, etc., while these types of censored data have been encountered in important medical studies, such as cancer, heart disease, diabetes, etc. In this dissertation, we first derive the bivariate nonparametric maximum likelihood estimator (BNPMLE) Fn(t...
Show moreIn survival analysis, the Cox model is one of the most widely used tools. However, up to now there has not been any published work on the Cox model with complicated types of censored data, such as doubly censored data, partly-interval censored data, etc., while these types of censored data have been encountered in important medical studies, such as cancer, heart disease, diabetes, etc. In this dissertation, we first derive the bivariate nonparametric maximum likelihood estimator (BNPMLE) Fn(t,z) for joint distribution function Fo(t,z) of survival time T and covariate Z, where T is subject to right censoring, noting that such BNPMLE Fn has not been studied in statistical literature. Then, based on this BNPMLE Fn we derive empirical likelihood-based (Owen, 1988) confidence interval for the conditional survival probabilities, which is an important and difficult problem in statistical analysis, and also has not been studied in literature. Finally, with this BNPMLE Fn as a starting point, we extend the weighted empirical likelihood method (Ren, 2001 and 2008a) to the multivariate case, and obtain a weighted empirical likelihood-based estimation method for the Cox model. Such estimation method is given in a unified form, and is applicable to various types of censored data aforementioned.
Show less - Date Issued
- 2011
- Identifier
- CFE0004158, ucf:49051
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004158
- Title
- Chemometric Applications to a Complex Classification Problem: Forensic Fire Debris Analysis.
- Creator
-
Waddell, Erin, Sigman, Michael, Belfield, Kevin, Campiglia, Andres, Yestrebsky, Cherie, Ni, Liqiang, University of Central Florida
- Abstract / Description
-
Fire debris analysis currently relies on visual pattern recognition of the total ion chromatograms, extracted ion profiles, and target compound chromatograms to identify the presence of an ignitable liquid according to the ASTM International E1618-10 standard method. For large data sets, this methodology can be time consuming and is a subjective method, the accuracy of which is dependent upon the skill and experience of the analyst. This research aimed to develop an automated classification...
Show moreFire debris analysis currently relies on visual pattern recognition of the total ion chromatograms, extracted ion profiles, and target compound chromatograms to identify the presence of an ignitable liquid according to the ASTM International E1618-10 standard method. For large data sets, this methodology can be time consuming and is a subjective method, the accuracy of which is dependent upon the skill and experience of the analyst. This research aimed to develop an automated classification method for large data sets and investigated the use of the total ion spectrum (TIS). The TIS is calculated by taking an average mass spectrum across the entire chromatographic range and has been shown to contain sufficient information content for the identification of ignitable liquids. The TIS of ignitable liquids and substrates, defined as common building materials and household furnishings, were compiled into model data sets. Cross-validation (CV) and fire debris samples, obtained from laboratory-scale and large-scale burns, were used to test the models. An automated classification method was developed using computational software, written in-house, that considers a multi-step classification scheme to detect ignitable liquid residues in fire debris samples and assign these to the classes defined in ASTM E1618-10. Classifications were made using linear discriminant analysis, quadratic discriminant analysis (QDA), and soft independent modeling of class analogy (SIMCA). Overall, the highest correct classification rates were achieved using QDA for the first step of the scheme and SIMCA for the remaining steps. In the first step of the classification scheme, correct classification rates of 95.3% and 89.2% were obtained for the CV test set and fire debris samples, respectively. Correct classifications rates of 100% were achieved for both data sets in the majority of the remaining steps which used SIMCA for classification. In this research, the first statistically valid error rates for fire debris analysis have been developed through cross-validation of large data sets. The error rates reduce the subjectivity associated with the current methods and provide a level of confidence in sample classification that does not currently exist in forensic fire debris analysis.
Show less - Date Issued
- 2013
- Identifier
- CFE0004954, ucf:49586
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004954
- Title
- Content-based Information Retrieval via Nearest Neighbor Search.
- Creator
-
Huang, Yinjie, Georgiopoulos, Michael, Anagnostopoulos, Georgios, Hu, Haiyan, Sukthankar, Gita, Ni, Liqiang, University of Central Florida
- Abstract / Description
-
Content-based information retrieval (CBIR) has attracted significant interest in the past few years. When given a search query, the search engine will compare the query with all the stored information in the database through nearest neighbor search. Finally, the system will return the most similar items. We contribute to the CBIR research the following: firstly, Distance Metric Learning (DML) is studied to improve retrieval accuracy of nearest neighbor search. Additionally, Hash Function...
Show moreContent-based information retrieval (CBIR) has attracted significant interest in the past few years. When given a search query, the search engine will compare the query with all the stored information in the database through nearest neighbor search. Finally, the system will return the most similar items. We contribute to the CBIR research the following: firstly, Distance Metric Learning (DML) is studied to improve retrieval accuracy of nearest neighbor search. Additionally, Hash Function Learning (HFL) is considered to accelerate the retrieval process.On one hand, a new local metric learning framework is proposed - Reduced-Rank Local Metric Learning (R2LML). By considering a conical combination of Mahalanobis metrics, the proposed method is able to better capture information like data's similarity and location. A regularization to suppress the noise and avoid over-fitting is also incorporated into the formulation. Based on the different methods to infer the weights for the local metric, we considered two frameworks: Transductive Reduced-Rank Local Metric Learning (T-R2LML), which utilizes transductive learning, while Efficient Reduced-Rank Local Metric Learning (E-R2LML)employs a simpler and faster approximated method. Besides, we study the convergence property of the proposed block coordinate descent algorithms for both our frameworks. The extensive experiments show the superiority of our approaches.On the other hand, *Supervised Hash Learning (*SHL), which could be used in supervised, semi-supervised and unsupervised learning scenarios, was proposed in the dissertation. By considering several codewords which could be learned from the data, the proposed method naturally derives to several Support Vector Machine (SVM) problems. After providing an efficient training algorithm, we also study the theoretical generalization bound of the new hashing framework. In the final experiments, *SHL outperforms many other popular hash function learning methods. Additionally, in order to cope with large data sets, we also conducted experiments running on big data using a parallel computing software package, namely LIBSKYLARK.
Show less - Date Issued
- 2016
- Identifier
- CFE0006327, ucf:51544
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006327
- Title
- Chemical Analysis, Databasing, and Statistical Analysis of Smokeless Powders for Forensic Application.
- Creator
-
Dennis, Dana-Marie, Sigman, Michael, Campiglia, Andres, Yestrebsky, Cherie, Fookes, Barry, Ni, Liqiang, University of Central Florida
- Abstract / Description
-
Smokeless powders are a set of energetic materials, known as low explosives, which are typically utilized for reloading ammunition. There are three types which differ in their primary energetic materials; where single base powders contain nitrocellulose as their primary energetic material, double and triple base powders contain nitroglycerin in addition to nitrocellulose, and triple base powders also contain nitroguanidine. Additional organic compounds, while not proprietary to specific...
Show moreSmokeless powders are a set of energetic materials, known as low explosives, which are typically utilized for reloading ammunition. There are three types which differ in their primary energetic materials; where single base powders contain nitrocellulose as their primary energetic material, double and triple base powders contain nitroglycerin in addition to nitrocellulose, and triple base powders also contain nitroguanidine. Additional organic compounds, while not proprietary to specific manufacturers, are added to the powders in varied ratios during the manufacturing process to optimize the ballistic performance of the powders. The additional compounds function as stabilizers, plasticizers, flash suppressants, deterrents, and opacifiers. Of the three smokeless powder types, single and double base powders are commercially available, and have been heavily utilized in the manufacture of improvised explosive devices.Forensic smokeless powder samples are currently analyzed using multiple analytical techniques. Combined microscopic, macroscopic, and instrumental techniques are used to evaluate the sample, and the information obtained is used to generate a list of potential distributors. Gas chromatography (-) mass spectrometry (GC-MS) is arguably the most useful of the instrumental techniques since it distinguishes single and double base powders, and provides additional information about the relative ratios of all the analytes present in the sample. However, forensic smokeless powder samples are still limited to being classified as either single or double base powders, based on the absence or presence of nitroglycerin, respectively. In this work, the goal was to develop statistically valid classes, beyond the single and double base designations, based on multiple organic compounds which are commonly encountered in commercial smokeless powders. Several chemometric techniques were applied to smokeless powder GC-MS data for determination of the classes, and for assignment of test samples to these novel classes. The total ion spectrum (TIS), which is calculated from the GC-MS data for each sample, is obtained by summing the intensities for each mass-to-charge (m/z) ratio across the entire chromatographic profile. A TIS matrix comprising data for 726 smokeless powder samples was subject to agglomerative hierarchical cluster (AHC) analysis, and six distinct classes were identified. Within each class, a single m/z ratio had the highest intensity for the majority of samples, though the m/z ratio was not always unique to the specific class. Based on these observations, a new classification method known as the Intense Ion Rule (IIR) was developed and used for the assignment of test samples to the AHC designated classes.Discriminant models were developed for assignment of test samples to the AHC designated classes using k-Nearest Neighbors (kNN) and linear and quadratic discriminant analyses (LDA and QDA, respectively). Each of the models were optimized using leave-one-out (LOO) and leave-group-out (LGO) cross-validation, and the performance of the models was evaluated by calculating correct classification rates for assignment of the cross-validation (CV) samples to the AHC designated classes. The optimized models were utilized to assign test samples to the AHC designated classes. Overall, the QDA LGO model achieved the highest correct classification rates for assignment of both the CV samples and the test samples to the AHC designated classes.In forensic application, the goal of an explosives analyst is to ascertain the manufacturer of a smokeless powder sample. In addition, knowledge about the probability of a forensic sample being produced by a specific manufacturer could potentially decrease the time invested by an analyst during investigation by providing a shorter list of potential manufacturers. In this work, Bayes' Theorem and Bayesian Networks were investigated as an additional tool to be utilized in forensic casework. Bayesian Networks were generated and used to calculate posterior probabilities of a test sample belonging to specific manufacturers. The networks were designed to include manufacturer controlled powder characteristics such as shape, color, and dimension; as well as, the relative intensities of the class associated ions determined from cluster analysis. Samples were predicted to belong to a manufacturer based on the highest posterior probability. Overall percent correct rates were determined by calculating the percentage of correct predictions; that is, where the known and predicted manufacturer were the same. The initial overall percent correct rate was 66%. The dimensions of the smokeless powders were added to the network as average diameter and average length nodes. Addition of average diameter and length resulted in an overall prediction rate of 70%.
Show less - Date Issued
- 2015
- Identifier
- CFE0005784, ucf:50059
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0005784
- Title
- On Kernel-base Multi-Task Learning.
- Creator
-
Li, Cong, Georgiopoulos, Michael, Anagnostopoulos, Georgios, Tappen, Marshall, Hu, Haiyan, Ni, Liqiang, University of Central Florida
- Abstract / Description
-
Multi-Task Learning (MTL) has been an active research area in machine learning for two decades. By training multiple relevant tasks simultaneously with information shared across tasks, it is possible to improve the generalization performance of each task, compared to training each individual task independently. During the past decade, most MTL research has been based on the Regularization-Loss framework due to its flexibility in specifying various types of information sharing strategies, the...
Show moreMulti-Task Learning (MTL) has been an active research area in machine learning for two decades. By training multiple relevant tasks simultaneously with information shared across tasks, it is possible to improve the generalization performance of each task, compared to training each individual task independently. During the past decade, most MTL research has been based on the Regularization-Loss framework due to its flexibility in specifying various types of information sharing strategies, the opportunity it offers to yield a kernel-based methods and its capability in promoting sparse feature representations.However, certain limitations exist in both theoretical and practical aspects of Regularization-Loss-based MTL. Theoretically, previous research on generalization bounds in connection to MTL Hypothesis Space (HS)s, where data of all tasks are pre-processed by a (partially) common operator, has been limited in two aspects: First, all previous works assumed linearity of the operator, therefore completely excluding kernel-based MTL HSs, for which the operator is potentially non-linear. Secondly, all previous works, rather unnecessarily, assumed that all the task weights to be constrained within norm-balls, whose radii are equal. The requirement of equal radii leads to significant inflexibility of the relevant HSs, which may cause the generalization performance of the corresponding MTL models to deteriorate. Practically, various algorithms have been developed for kernel-based MTL models, due to different characteristics of the formulations. Most of these algorithms are a burden to develop and end up being quite sophisticated, so that practitioners may face a hard task in interpreting and implementing them, especially when multiple models are involved. This is even more so, when Multi-Task Multiple Kernel Learning (MT-MKL) models are considered. This research largely resolves the above limitations. Theoretically, a pair of new kernel-based HSs are proposed: one for single-kernel MTL, and another one for MT-MKL. Unlike previous works, we allow each task weight to be constrained within a norm-ball, whose radius is learned during training. By deriving and analyzing the generalization bounds of these two HSs, we show that, indeed, such a flexibility leads to much tighter generalization bounds, which often results to significantly better generalization performance. Based on this observation, a pair of new models is developed, one for each case: single-kernel MTL, and another one for MT-MKL. From a practical perspective, we propose a general MT-MKL framework that covers most of the prominent MT-MKL approaches, including our new MT-MKL formulation. Then, a general purpose algorithm is developed to solve the framework, which can also be employed for training all other models subsumed by this framework. A series of experiments is conducted to assess the merits of the proposed mode when trained by the new algorithm. Certain properties of our HSs and formulations are demonstrated, and the advantage of our model in terms of classification accuracy is shown via these experiments.
Show less - Date Issued
- 2014
- Identifier
- CFE0005517, ucf:50321
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0005517
- Title
- Forensic Application of Chemometric Analysis to Visible Absorption Spectra Collected from Dyed Textile Fibers.
- Creator
-
Flores, Alejandra, Sigman, Michael, Yestrebsky, Cherie, Campiglia, Andres, Chumbimuni Torres, Karin, Ni, Liqiang, University of Central Florida
- Abstract / Description
-
Forensic analysis of evidence consists of the comparison of physical, spectroscopic, or chemical characteristics of a questioned sample to a set of knowns. Currently, decisions as to whether or not the questioned sample can be associated or grouped with the knowns are left up to the discretion of the forensic analyst. The implications of these outcomes are presented as evidence to a jury in a court of law to determine if a defendant is guilty of committing a crime or not. Leading up to, and...
Show moreForensic analysis of evidence consists of the comparison of physical, spectroscopic, or chemical characteristics of a questioned sample to a set of knowns. Currently, decisions as to whether or not the questioned sample can be associated or grouped with the knowns are left up to the discretion of the forensic analyst. The implications of these outcomes are presented as evidence to a jury in a court of law to determine if a defendant is guilty of committing a crime or not. Leading up to, and since, the publication of the National Academy of Sciences (NAS) report entitled (")Strengthening Forensic Science in the United States: A Path Forward,(") the inadequacies of allowing potentially biased forensic opinion to carry such weight in the courtroom have been unmasked. This report exposed numerous shortcomings in many areas of forensic science, but also made recommendations on how to fortify the discipline. The main suggestions directed towards disciplines that analyze trace evidence include developing error rates for commonly employed practices and evaluating method reliability and validity.This research focuses on developing a statistical method of analysis for comparing visible absorption profiles collected from highly similarly colored textile fibers via microspectrophotometry (MSP). Several chemometric techniques were applied to spectral data and utilized to help discriminate fibers beyond the point where traditional methods of microscopical examination may fail. Because a dye's chemical structure dictates the shape of the absorption profile, two fibers dyed with chemically similar dyes can be very difficult to distinguish from one another using traditional fiber examination techniques. The application of chemometrics to multivariate spectral data may help elicit latent characteristics that may aid in fiber discrimination.The three sample sets analyzed include dyed fabric swatches (three pairs of fabrics were dyed with chemically similar dye pairs), commercially available blue yarns (100% acrylic), and denims fabrics (100% cotton). Custom dyed swatches were each dyed uniformly with a single dye whereas the dye formulation for both the yarns and denims is unknown. As a point for study, spectral comparisons were performed according to the guidelines published by the Standard Working Group for Materials Analysis (SWGMAT) Fiber Subgroup based on visual analysis only. In the next set of tests, principal components analysis (PCA) was utilized to reduce the dimensionality of the large multivariate data sets and to visualize the natural groupings of samples. Comparisons were performed using the resulting PCA scores where group membership of the questioned object was evaluated against the known objects using the score value as the distance metric. Score value is calculated using the score and orthogonal distances, the respective cutoff values based on a quantile percentage, and an optimization parameter, ?. Lastly, likelihood ratios (LR) were generated from density functions modelled from similarity values assessing comparisons between sample population data. R code was written in-house to execute all method of fiber comparisons described here. The SWGMAT method performed with 62.7% accuracy, the optimal accuracy rate for the score value method was 75.9%, and the accuracy rates for swatch-yarn and denim comparisons, respectively, are 97.7% and 67.1% when the LR method was applied.
Show less - Date Issued
- 2015
- Identifier
- CFE0005613, ucf:50212
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0005613
- Title
- The Characterization of Condom Lubricants and Personal Hygiene Products using DART-TOFMS and GC- MS and The Investigation of Gold Nanoparticle Behavior in Water and the Interaction with Blood Serum Proteins.
- Creator
-
Moustafa, Yasmine, Huo, Qun, Bridge, Candice, Sigman, Michael, Baudelet, Matthieu, Popolan-Vaida, Denisia, Ni, Liqiang, University of Central Florida
- Abstract / Description
-
This dissertation is divided into two independent research projects. First, condom lubricants, sexual lubricants, and personal hygiene products (PHPs) were studied using direct analysis in real time-time-of-flight-mass spectrometry (DART-TOFMS) and gas chromatography-mass spectrometry (GC-MS). The purpose addressed the concern of perpetrators resorting to new tactics, i.e. using condoms to remove seminal fluid that could provide a DNA link to a suspect, leading to the need of the...
Show moreThis dissertation is divided into two independent research projects. First, condom lubricants, sexual lubricants, and personal hygiene products (PHPs) were studied using direct analysis in real time-time-of-flight-mass spectrometry (DART-TOFMS) and gas chromatography-mass spectrometry (GC-MS). The purpose addressed the concern of perpetrators resorting to new tactics, i.e. using condoms to remove seminal fluid that could provide a DNA link to a suspect, leading to the need of the consideration of condom lubricants as pieces of sexual assault evidence. Due to condom lubricants having a chemical composition resembling PHPs, the investigation of both sample groups was analyzed to prevent false positives. Although past research has focused on the identification of major lubricant groups and additives, the discernment between such samples is insufficient. The discriminatory capability and rapid analysis of samples using DART-TOFMS was illustrated through resolution among the sample groups and higher classification rates. Here, lubricant analysis was introduced as a viable source of evidence, with a scheme detailing their discrimination from common hygiene products using DART-TOFMS as a robust tool for the analysis of sexual assault evidence. Second, gold nanoparticles (AuNPs) were characterized using dynamic light scattering (DLS), Ultraviolet-Visible spectroscopy (UV-VIS), dark field Imaging (DFM), and Transmission electron microscopy (TEM). Following characterization, AuNPs were used in protein adsorption study from blood serum concentration and to observe how the differences in their characterization affected their interactions with blood serum proteins. AuNPs are an interest in the bioanalytical sector due to their optical properties, scattering of light, and high surface-to-volume ratio. A common issue plagues the field: the difficulty of inter/intra laboratory reproducibility from one characterization technique. This further affects the understanding of how AuNPs may react for diagnostic and other applications. The importance of a comprehensive characterization protocol for AuNP products and the need for manufacturers to include product specifications is demonstrated in this study.
Show less - Date Issued
- 2019
- Identifier
- CFE0007842, ucf:52814
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0007842