Current Search: Gomez, Fernando (x)
View All Items
- Title
- Creator
Glinos, Demetrios, Gomez, Fernando, University of Central Florida
- Abstract / Description
Question answering (QA) stands squarely along the path from document retrieval to text understanding. As an area of research interest, it serves as a proving ground where strategies for document processing, knowledge representation, question analysis, and answer extraction may be evaluated in real world information extraction contexts. The task is to go beyond the representation of text documents as "bags of words" or data blobs that can be scanned for keyword combinations and word...
Show moreQuestion answering (QA) stands squarely along the path from document retrieval to text understanding. As an area of research interest, it serves as a proving ground where strategies for document processing, knowledge representation, question analysis, and answer extraction may be evaluated in real world information extraction contexts. The task is to go beyond the representation of text documents as "bags of words" or data blobs that can be scanned for keyword combinations and word collocations in the manner of internet search engines. Instead, the goal is to recognize and extract the semantic content of the text, and to organize it in a manner that supports reasoning about the concepts represented. The issue presented is how to obtain and query such a structure without either a predefined set of concepts or a predefined set of relationships among concepts. This research investigates a means for acquiring from text documents both the underlying concepts and their interrelationships. Specifically, a syntax-based formalism for representing atomic propositions that are extracted from text documents is presented, together with a method for constructing a network of concept nodes for indexing such logical forms based on the discourse entities they contain. It is shown that meaningful questions can be decomposed into Boolean combinations of question patterns using the same formalism, with free variables representing the desired answers. It is further shown that this formalism can be used for robust question answering using the concept network and WordNet synonym, hypernym, hyponym, and antonym relationships. This formalism was implemented in the Semantic Extractor (SEMEX) research tool and was tested against the factoid questions from the 2005 Text Retrieval Conference (TREC), which operated upon the AQUAINT corpus of newswire documents. After adjusting for the limitations of the tool and the document set, correct answers were found for approximately fifty percent of the questions analyzed, which compares favorably with other question answering systems.
Show less - Date Issued
- 2006
- Identifier
- CFE0000985, ucf:46711
- Format
- Document (PDF)
- Title
- Larger-first partial parsing.
- Creator
Van Delden, Sebastian Alexander, Gomez, Fernando, Engineering and Computer Science
- Abstract / Description
University of Central Florida College of Engineering Thesis; Larger-first partial parsing is a primarily top-down approach to partial parsing that is opposite to current easy-first, or primarily bottom-up, strategies. A rich partial tree structure is captured by an algorithm that assigns a hierarchy of structural tags to each of the input tokens in a sentence. Part-of-speech tags are first assigned to the words in a sentence by a part-of-speech tagger. A cascade of Deterministic Finite State...
Show moreUniversity of Central Florida College of Engineering Thesis; Larger-first partial parsing is a primarily top-down approach to partial parsing that is opposite to current easy-first, or primarily bottom-up, strategies. A rich partial tree structure is captured by an algorithm that assigns a hierarchy of structural tags to each of the input tokens in a sentence. Part-of-speech tags are first assigned to the words in a sentence by a part-of-speech tagger. A cascade of Deterministic Finite State Automata then uses this part-of-speech information to identify syntactic relations primarily ina descending order of their size. The cascade is divided into four specialized sections: (1) a Comma Network, which identifies syntactic relations associated with commas; (2) a Conjunction Network, which partially disambiguates phrasal conjunctions and fully disambiguates clausal conjunctions; (3) a Clause Network, which identifies non-comma-delimited clauses; and (4) a Phrase Network, which identifies the remaining base phrases in the sentence. Each automaton is capable of adding one ore more levels of structural tags to the to the tokens in a sentence. The larger-first approach is compared against a well-known easy-first approach. The results indicate that this larger-first approach is capable of (1) producing a more detailed partial parse than an easy first approach; (2) providing better containment of attachment ambiguity; (3) handling overlapping syntactic relations; and (4) achieving a higher accuracy than the easy-first approach. The automata of each network were developed by an empirical analysis of several sources and are presented here in details.
Show less - Date Issued
- 2003
- Identifier
- CFR0000760, ucf:52932
- Format
- Document (PDF)
- Title
- Creator
Schwartz, Hansen, Gomez, Fernando, University of Central Florida
- Abstract / Description
This work investigates the effective acquisition of lexical knowledge from the Web to perform semantic interpretation. The Web provides an unprecedented amount of natural language from which to gain knowledge useful for semantic interpretation. The knowledge acquired is described as common sense knowledge, information one uses in his or her daily life to understand language and perception. Novel approaches are presented for both the acquisition of this knowledge and use of the knowledge in...
Show moreThis work investigates the effective acquisition of lexical knowledge from the Web to perform semantic interpretation. The Web provides an unprecedented amount of natural language from which to gain knowledge useful for semantic interpretation. The knowledge acquired is described as common sense knowledge, information one uses in his or her daily life to understand language and perception. Novel approaches are presented for both the acquisition of this knowledge and use of the knowledge in semantic interpretation algorithms. The goal is to increase accuracy over other automatic semantic interpretation systems, and in turn enable stronger real world applications such as machine translation, advanced Web search, sentiment analysis, and question answering. The major contributions of this dissertation consist of two methods of acquiring lexical knowledge from the Web, namely a database of common sense knowledge and Web selectors. The first method is a framework for acquiring a database of concept relationships. To acquire this knowledge, relationships between nouns are found on the Web and analyzed over WordNet using information-theory, producing information about concepts rather than ambiguous words. For the second contribution, words called Web selectors are retrieved which take the place of an instance of a target word in its local context. The selectors serve for the system to learn the types of concepts that the sense of a target word should be similar. Web selectors are acquired dynamically as part of a semantic interpretation algorithm, while the relationships in the database are useful to stand-alone programs. A final contribution of this dissertation concerns a novel semantic similarity measure and an evaluation of similarity and relatedness measures on tasks of concept similarity. Such tasks are useful when applying acquired knowledge to semantic interpretation. Applications to word sense disambiguation, an aspect of semantic interpretation, are used to evaluate the contributions. Disambiguation systems which utilize semantically annotated training data are considered supervised. The algorithms of this dissertation are considered minimally-supervised; they do not require training data created by humans, though they may use human-created data sources. In the case of evaluating a database of common sense knowledge, integrating the knowledge into an existing minimally-supervised disambiguation system significantly improved results -- a 20.5\% error reduction. Similarly, the Web selectors disambiguation system, which acquires knowledge directly as part of the algorithm, achieved results comparable with top minimally-supervised systems, an F-score of 80.2\% on a standard noun disambiguation task. This work enables the study of many subsequent related tasks for improving semantic interpretation and its application to real-world technologies. Other aspects of semantic interpretation, such as semantic role labeling could utilize the same methods presented here for word sense disambiguation. As the Web continues to grow, the capabilities of the systems in this dissertation are expected to increase. Although the Web selectors system achieves great results, a study in this dissertation shows likely improvements from acquiring more data. Furthermore, the methods for acquiring a database of common sense knowledge could be applied in a more exhaustive fashion for other types of common sense knowledge. Finally, perhaps the greatest benefits from this work will come from the enabling of real world technologies that utilize semantic interpretation.
Show less - Date Issued
- 2011
- Identifier
- CFE0003688, ucf:48805
- Format
- Document (PDF)
- Title
- An intelligent editor for natural language processing of unrestricted text.
- Creator
Glinos, Demetrios George, Gomez, Fernando, Arts and Sciences
- Abstract / Description
University of Central Florida College of Arts and Sciences Thesis; The understanding of natural language by computational methods has been a continuing and elusive problem in artificial intelligence. In recent years there has been a resurgence in natural language processing research. Much of this work has been on empirical or corpus-based methods which use a data-driven approach to train systems on large amounts of real language data. Using corpus-based methods, the performance of part-of...
Show moreUniversity of Central Florida College of Arts and Sciences Thesis; The understanding of natural language by computational methods has been a continuing and elusive problem in artificial intelligence. In recent years there has been a resurgence in natural language processing research. Much of this work has been on empirical or corpus-based methods which use a data-driven approach to train systems on large amounts of real language data. Using corpus-based methods, the performance of part-of-speech (POS) taggers, which assign to the individual words of a sentence their appropriate part of speech category (e.g., noun, verb, preposition), now rivals human performance levels, achieving accuracies exceeding 95%. Such taggers have proved useful as preprocessors for such tasks as parsing, speech synthesis, and information retrieval. Parsing remains, however, a difficult problem, even with the benefit of POS tagging. Moveover, as sentence length increases, there is a corresponding combinatorial explosing of alternative possible parses. Consider the following sentence from a New York Times online article: After Salinas was arrested for murder in 1995 and lawyers for the bank had begun monitoring his accounts, his personal banker in New York quietly advised Salinas' wife to move the money elsewhere, apparently without the consent of the legal department. To facilitate the parsing and other tasks, we would like to decompose this sentence into the following three shorter sentences which, taken together, convey the same meaning as the original: 1. Salinas was arrested for murder in 1995. 2. Lawyers for the bank had begun monitoring his accounts. 3. His personal banker in New York quietly advised Salinas' wife to move the money elsewhere, apparently without the consent of the legal department. This study investigates the development of heuristics for decomposing such long sentences into sets of shorter sentences without affecting the meaning of the original sentences. Without parsing or semantic analysis, heuristic rules were developed based on: (1) the output of a POS tagger (Brill's tagger); (2) the punctuation contained in the input sentences; and (3) the words themselves. The heuristic algorithms were implemented in an intelligent editor program which first augmented the POS tags and assigned tags to punctuation, and then tested the rules against a corpus of 25 New York Times online articles containing approximately 1,200 sentences and over 32,000 words, with good results. Recommendations are made for improving the algorithms and for continuing this line of research.
Show less - Date Issued
- 1999
- Identifier
- CFR0008181, ucf:53055
- Format
- Document (PDF)
- Title
- Automatically Acquiring a Semantic Network of Related Concepts.
- Creator
Szumlanski, Sean, Gomez, Fernando, Wu, Annie, Hughes, Charles, Sims, Valerie, University of Central Florida
- Abstract / Description
We describe the automatic acquisition of a semantic network in which over 7,500 of the most frequently occurring nouns in the English language are linked to their semantically related concepts in the WordNet noun ontology. Relatedness between nouns is discovered automatically from lexical co-occurrence in Wikipedia texts using a novel adaptation of an information theoretic inspired measure. Our algorithm then capitalizes on salient sense clustering among these semantic associates to...
Show moreWe describe the automatic acquisition of a semantic network in which over 7,500 of the most frequently occurring nouns in the English language are linked to their semantically related concepts in the WordNet noun ontology. Relatedness between nouns is discovered automatically from lexical co-occurrence in Wikipedia texts using a novel adaptation of an information theoretic inspired measure. Our algorithm then capitalizes on salient sense clustering among these semantic associates to automatically disambiguate them to their corresponding WordNet noun senses (i.e., concepts). The resultant concept-to-concept associations, stemming from 7,593 target nouns, with 17,104 distinct senses among them, constitute a large-scale semantic network with 208,832 undirected edges between related concepts. Our work can thus be conceived of as augmenting the WordNet noun ontology with RelatedTo links.The network, which we refer to as the Szumlanski-Gomez Network (SGN), has been subjected to a variety of evaluative measures, including manual inspection by human judges and quantitative comparison to gold standard data for semantic relatedness measurements. We have also evaluated the network's performance in an applied setting on a word sense disambiguation (WSD) task in which the network served as a knowledge source for established graph-based spreading activation algorithms, and have shown: a) the network is competitive with WordNet when used as a stand-alone knowledge source for WSD, b) combining our network with WordNet achieves disambiguation results that exceed the performance of either resource individually, and c) our network outperforms a similar resource, WordNet++ (Ponzetto (&) Navigli, 2010), that has been automatically derived from annotations in the Wikipedia corpus.Finally, we present a study on human perceptions of relatedness. In our study, we elicited quantitative evaluations of semantic relatedness from human subjects using a variation of the classical methodology that Rubenstein and Goodenough (1965) employed to investigate human perceptions of semantic similarity. Judgments from individual subjects in our study exhibit high average correlation to the elicited relatedness means using leave-one-out sampling (r = 0.77, ? = 0.09, N = 73), although not as high as average human correlation in previous studies of similarity judgments, for which Resnik (1995) established an upper bound of r = 0.90 (? = 0.07, N = 10). These results suggest that human perceptions of relatedness are less strictly constrained than evaluations of similarity, and establish a clearer expectation for what constitutes human-like performance by a computational measure of semantic relatedness. We also contrast the performance of a variety of similarity and relatedness measures on our dataset to their performance on similarity norms and introduce our own dataset as a supplementary evaluative standard for relatedness measures.
Show less - Date Issued
- 2013
- Identifier
- CFE0004759, ucf:49767
- Format
- Document (PDF)
- Title
- Using Freebase, an Automatically Generated Dictionary, and a Classifier to Identify a Person's Profession in Tweets.
- Creator
Hall, Abraham, Gomez, Fernando, Dechev, Damian, Tappen, Marshall, University of Central Florida
- Abstract / Description
Algorithms for classifying pre-tagged person entities in tweets into one of eight profession categories are presented. A classifier using a semi-supervised learning algorithm that takes into consideration the local context surrounding the entity in the tweet, hash tag information, and topic signature scores is described. In addition to the classifier, this research investigates two dictionaries containing the professions of persons. These two dictionaries are used in their own classification...
Show moreAlgorithms for classifying pre-tagged person entities in tweets into one of eight profession categories are presented. A classifier using a semi-supervised learning algorithm that takes into consideration the local context surrounding the entity in the tweet, hash tag information, and topic signature scores is described. In addition to the classifier, this research investigates two dictionaries containing the professions of persons. These two dictionaries are used in their own classification algorithms which are independent of the classifier. The method for creating the first dictionary dynamically from the web and the algorithm that accesses this dictionary to classify a person into one of the eight profession categories are explained next. The second dictionary is freebase, an openly available online database that is maintained by its online community. The algorithm that uses freebase for classifying a person into one of the eight professions is described. The results also show that classifications made using the automated constructed dictionary, freebase, or the classifier are all moderately successful. The results also show that classifications made with the automated constructed person dictionary are slightly more accurate than classifications made using freebase. Various hybrid methods, combining the classifier and the two dictionaries are also explained. The results of those hybrid methods show significant improvement over any of the individual methods.
Show less - Date Issued
- 2013
- Identifier
- CFE0004858, ucf:49715
- Format
- Document (PDF)