Current Search: structured content (x)
View All Items
- Title
- EXTRACTING QUANTITATIVE INFORMATIONFROM NONNUMERIC MARKETING DATA: AN AUGMENTEDLATENT SEMANTIC ANALYSIS APPROACH.
- Creator
-
Arroniz, Inigo, Michaels, Ronald, University of Central Florida
- Abstract / Description
-
Despite the widespread availability and importance of nonnumeric data, marketers do not have the tools to extract information from large amounts of nonnumeric data. This dissertation attempts to fill this void: I developed a scalable methodology that is capable of extracting information from extremely large volumes of nonnumeric data. The proposed methodology integrates concepts from information retrieval and content analysis to analyze textual information. This approach avoids a pervasive...
Show moreDespite the widespread availability and importance of nonnumeric data, marketers do not have the tools to extract information from large amounts of nonnumeric data. This dissertation attempts to fill this void: I developed a scalable methodology that is capable of extracting information from extremely large volumes of nonnumeric data. The proposed methodology integrates concepts from information retrieval and content analysis to analyze textual information. This approach avoids a pervasive difficulty of traditional content analysis, namely the classification of terms into predetermined categories, by creating a linear composite of all terms in the document and, then, weighting the terms according to their inferred meaning. In the proposed approach, meaning is inferred by the collocation of the term across all the texts in the corpus. It is assumed that there is a lower dimensional space of concepts that underlies word usage. The semantics of each word are inferred by identifying its various contexts in a document and across documents (i.e., in the corpus). After the semantic similarity space is inferred from the corpus, the words in each document are weighted to obtain their representation on the lower dimensional semantic similarity space, effectively mapping the terms to the concept space and ultimately creating a score that measures the concept of interest. I propose an empirical application of the outlined methodology. For this empirical illustration, I revisit an important marketing problem, the effect of movie critics on the performance of the movies. In the extant literature, researchers have used an overall numerical rating of the review to capture the content of the movie reviews. I contend that valuable information present in the textual materials remains uncovered. I use the proposed methodology to extract this information from the nonnumeric text contained in a movie review. The proposed setting is particularly attractive to validate the methodology because the setting allows for a simple test of the text-derived metrics by comparing them to the numeric ratings provided by the reviewers. I empirically show the application of this methodology and traditional computer-aided content analytic methods to study an important marketing topic, the effect of movie critics on movie performance. In the empirical application of the proposed methodology, I use two datasets that combined contain more than 9,000 movie reviews nested in more than 250 movies. I am restudying this marketing problem in the light of directly obtaining information from the reviews instead of following the usual practice of using an overall rating or a classification of the review as either positive or negative. I find that the addition of direct content and structure of the review adds a significant amount of exploratory power as a determinant of movie performance, even in the presence of actual reviewer overall ratings (stars) and other controls. This effect is robust across distinct opertaionalizations of both the review content and the movie performance metrics. In fact, my findings suggest that as we move from sales to profitability to financial return measures, the role of the content of the review, and therefore the critic's role, becomes increasingly important.
Show less - Date Issued
- 2007
- Identifier
- CFE0001617, ucf:47164
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0001617
- Title
- "ANALYZING THE EFFECTS OF SINGLE-SOURCING METHODOLOGIES ON THE ROLE OF THE TECHNICAL COMMUNICATOR".
- Creator
-
Boehl, Jeremy, Applen, J.D., University of Central Florida
- Abstract / Description
-
This thesis discusses the specific effects of single sourcing methodologies on the role of the technical communicator, his or her job responsibilities, qualifications, collaboration with coworkers, employee and employer expectations, and the effects on career progression. The methodologies discussed included all types of single sourcing methods for technical documentation (such as XML-based), advanced and non-advanced Content Management Systems (CMS), and Digital Asset Management (DAM)...
Show moreThis thesis discusses the specific effects of single sourcing methodologies on the role of the technical communicator, his or her job responsibilities, qualifications, collaboration with coworkers, employee and employer expectations, and the effects on career progression. The methodologies discussed included all types of single sourcing methods for technical documentation (such as XML-based), advanced and non-advanced Content Management Systems (CMS), and Digital Asset Management (DAM) systems. Other topics explored are an overview of single sourcing for technical documentation, a comparison of the "craftsman model" to the current trend of single sourcing and structured content, specific effects on technical communicators such as role changes, the effects of incorporating XML into a technical communicator's daily work environment, and the effects of other emerging technologies such as advanced CMS and DAM systems on technical communicators. General findings include that the practice of single sourcing, whether a positive or negative development, has continued and likely will continue to increase in technical communication groups within organizations. Single sourcing, especially for dynamic, customized content is also increasing because of the current marketplace, but works best via the use of a CMS and other systems used by large organizations. Single sourcing is also best implemented after extensive strategic planning and training of employees. Many technical communicators will have to accept new roles and positions, the direction of which is greatly impacted by the extent of their skills. Recommendations are made for additional research on the effects of single sourcing implementation on the technical communicator, and how to adapt to changes. Additional research is also needed on XML, DITA (Darwinian Information Typing Architecture), and DAM systems, all related specifically to technical communication.
Show less - Date Issued
- 2006
- Identifier
- CFE0001302, ucf:47031
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0001302
- Title
- EFFICIENT TECHNIQUES FOR RELEVANCE FEEDBACK PROCESSING IN CONTENT-BASED IMAGE RETRIEVAL.
- Creator
-
Liu, Danzhou, Hua, Kien, University of Central Florida
- Abstract / Description
-
In content-based image retrieval (CBIR) systems, there are two general types of search: target search and category search. Unlike queries in traditional database systems, users in most cases cannot specify an ideal query to retrieve the desired results for either target search or category search in multimedia database systems, and have to rely on iterative feedback to refine their query. Efficient evaluation of such iterative queries can be a challenge, especially when the multimedia database...
Show moreIn content-based image retrieval (CBIR) systems, there are two general types of search: target search and category search. Unlike queries in traditional database systems, users in most cases cannot specify an ideal query to retrieve the desired results for either target search or category search in multimedia database systems, and have to rely on iterative feedback to refine their query. Efficient evaluation of such iterative queries can be a challenge, especially when the multimedia database contains a large number of entries, and the search needs many iterations, and when the underlying distance measure is computationally expensive. The overall processing costs, including CPU and disk I/O, are further emphasized if there are numerous concurrent accesses. To address these limitations involved in relevance feedback processing, we propose a generic framework, including a query model, index structures, and query optimization techniques. Specifically, this thesis has five main contributions as follows. The first contribution is an efficient target search technique. We propose four target search methods: naive random scan (NRS), local neighboring movement (LNM), neighboring divide-and-conquer (NDC), and global divide-and-conquer (GDC) methods. All these methods are built around a common strategy: they do not retrieve checked images (i.e., shrink the search space). Furthermore, NDC and GDC exploit Voronoi diagrams to aggressively prune the search space and move towards target images. We theoretically and experimentally prove that the convergence speeds of GDC and NDC are much faster than those of NRS and recent methods. The second contribution is a method to reduce the number of expensive distance computation when answering k-NN queries with non-metric distance measures. We propose an efficient distance mapping function that transfers non-metric measures into metric, and still preserves the original distance orderings. Then existing metric index structures (e.g., M-tree) can be used to reduce the computational cost by exploiting the triangular inequality property. The third contribution is an incremental query processing technique for Support Vector Machines (SVMs). SVMs have been widely used in multimedia retrieval to learn a concept in order to find the best matches. SVMs, however, suffer from the scalability problem associated with larger database sizes. To address this limitation, we propose an efficient query evaluation technique by employing incremental update. The proposed technique also takes advantage of a tuned index structure to efficiently prune irrelevant data. As a result, only a small portion of the data set needs to be accessed for query processing. This index structure also provides an inexpensive means to process the set of candidates to evaluate the final query result. This technique can work with different kernel functions and kernel parameters. The fourth contribution is a method to avoid local optimum traps. Existing CBIR systems, designed around query refinement based on relevance feedback, suffer from local optimum traps that may severely impair the overall retrieval performance. We therefore propose a simulated annealing-based approach to address this important issue. When a stuck-at-a-local-optimum occurs, we employ a neighborhood search technique (i.e., simulated annealing) to continue the search for additional matching images, thus escaping from the local optimum. We also propose an index structure to speed up such neighborhood search. Finally, the fifth contribution is a generic framework to support concurrent accesses. We develop new storage and query processing techniques to exploit sequential access and leverage inter-query concurrency to share computation. Our experimental results, based on the Corel dataset, indicate that the proposed optimization can significantly reduce average response time while achieving better precision and recall, and is scalable to support a large user community. This latter performance characteristic is largely neglected in existing systems making them less suitable for large-scale deployment. With the growing interest in Internet-scale image search applications, our framework offers an effective solution to the scalability problem.
Show less - Date Issued
- 2009
- Identifier
- CFE0002728, ucf:48162
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0002728