You are here

EXTRACTING QUANTITATIVE INFORMATIONFROM NONNUMERIC MARKETING DATA: AN AUGMENTEDLATENT SEMANTIC ANALYSIS APPROACH

Download pdf | Full Screen View

Date Issued:
2007
Abstract/Description:
Despite the widespread availability and importance of nonnumeric data, marketers do not have the tools to extract information from large amounts of nonnumeric data. This dissertation attempts to fill this void: I developed a scalable methodology that is capable of extracting information from extremely large volumes of nonnumeric data. The proposed methodology integrates concepts from information retrieval and content analysis to analyze textual information. This approach avoids a pervasive difficulty of traditional content analysis, namely the classification of terms into predetermined categories, by creating a linear composite of all terms in the document and, then, weighting the terms according to their inferred meaning. In the proposed approach, meaning is inferred by the collocation of the term across all the texts in the corpus. It is assumed that there is a lower dimensional space of concepts that underlies word usage. The semantics of each word are inferred by identifying its various contexts in a document and across documents (i.e., in the corpus). After the semantic similarity space is inferred from the corpus, the words in each document are weighted to obtain their representation on the lower dimensional semantic similarity space, effectively mapping the terms to the concept space and ultimately creating a score that measures the concept of interest. I propose an empirical application of the outlined methodology. For this empirical illustration, I revisit an important marketing problem, the effect of movie critics on the performance of the movies. In the extant literature, researchers have used an overall numerical rating of the review to capture the content of the movie reviews. I contend that valuable information present in the textual materials remains uncovered. I use the proposed methodology to extract this information from the nonnumeric text contained in a movie review. The proposed setting is particularly attractive to validate the methodology because the setting allows for a simple test of the text-derived metrics by comparing them to the numeric ratings provided by the reviewers. I empirically show the application of this methodology and traditional computer-aided content analytic methods to study an important marketing topic, the effect of movie critics on movie performance. In the empirical application of the proposed methodology, I use two datasets that combined contain more than 9,000 movie reviews nested in more than 250 movies. I am restudying this marketing problem in the light of directly obtaining information from the reviews instead of following the usual practice of using an overall rating or a classification of the review as either positive or negative. I find that the addition of direct content and structure of the review adds a significant amount of exploratory power as a determinant of movie performance, even in the presence of actual reviewer overall ratings (stars) and other controls. This effect is robust across distinct opertaionalizations of both the review content and the movie performance metrics. In fact, my findings suggest that as we move from sales to profitability to financial return measures, the role of the content of the review, and therefore the critic's role, becomes increasingly important.
Title: EXTRACTING QUANTITATIVE INFORMATIONFROM NONNUMERIC MARKETING DATA: AN AUGMENTEDLATENT SEMANTIC ANALYSIS APPROACH.
11 views
6 downloads
Name(s): Arroniz, Inigo, Author
Michaels, Ronald, Committee Chair
University of Central Florida, Degree Grantor
Type of Resource: text
Date Issued: 2007
Publisher: University of Central Florida
Language(s): English
Abstract/Description: Despite the widespread availability and importance of nonnumeric data, marketers do not have the tools to extract information from large amounts of nonnumeric data. This dissertation attempts to fill this void: I developed a scalable methodology that is capable of extracting information from extremely large volumes of nonnumeric data. The proposed methodology integrates concepts from information retrieval and content analysis to analyze textual information. This approach avoids a pervasive difficulty of traditional content analysis, namely the classification of terms into predetermined categories, by creating a linear composite of all terms in the document and, then, weighting the terms according to their inferred meaning. In the proposed approach, meaning is inferred by the collocation of the term across all the texts in the corpus. It is assumed that there is a lower dimensional space of concepts that underlies word usage. The semantics of each word are inferred by identifying its various contexts in a document and across documents (i.e., in the corpus). After the semantic similarity space is inferred from the corpus, the words in each document are weighted to obtain their representation on the lower dimensional semantic similarity space, effectively mapping the terms to the concept space and ultimately creating a score that measures the concept of interest. I propose an empirical application of the outlined methodology. For this empirical illustration, I revisit an important marketing problem, the effect of movie critics on the performance of the movies. In the extant literature, researchers have used an overall numerical rating of the review to capture the content of the movie reviews. I contend that valuable information present in the textual materials remains uncovered. I use the proposed methodology to extract this information from the nonnumeric text contained in a movie review. The proposed setting is particularly attractive to validate the methodology because the setting allows for a simple test of the text-derived metrics by comparing them to the numeric ratings provided by the reviewers. I empirically show the application of this methodology and traditional computer-aided content analytic methods to study an important marketing topic, the effect of movie critics on movie performance. In the empirical application of the proposed methodology, I use two datasets that combined contain more than 9,000 movie reviews nested in more than 250 movies. I am restudying this marketing problem in the light of directly obtaining information from the reviews instead of following the usual practice of using an overall rating or a classification of the review as either positive or negative. I find that the addition of direct content and structure of the review adds a significant amount of exploratory power as a determinant of movie performance, even in the presence of actual reviewer overall ratings (stars) and other controls. This effect is robust across distinct opertaionalizations of both the review content and the movie performance metrics. In fact, my findings suggest that as we move from sales to profitability to financial return measures, the role of the content of the review, and therefore the critic's role, becomes increasingly important.
Identifier: CFE0001617 (IID), ucf:47164 (fedora)
Note(s): 2007-05-01
Ph.D.
Business Administration, Department of Marketing
Doctorate
This record was generated from author submitted information.
Subject(s): Latent Structures
Latent Semantic Analysis
Content Analysis
Movie reviewers
Movie performance
Persistent Link to This Record: http://purl.flvc.org/ucf/fd/CFE0001617
Restrictions on Access: public
Host Institution: UCF

In Collections