You are here

Larger-first partial parsing

Download pdf | Full Screen View

Date Issued:
2003
Abstract/Description:
University of Central Florida College of Engineering Thesis; Larger-first partial parsing is a primarily top-down approach to partial parsing that is opposite to current easy-first, or primarily bottom-up, strategies. A rich partial tree structure is captured by an algorithm that assigns a hierarchy of structural tags to each of the input tokens in a sentence. Part-of-speech tags are first assigned to the words in a sentence by a part-of-speech tagger. A cascade of Deterministic Finite State Automata then uses this part-of-speech information to identify syntactic relations primarily ina descending order of their size. The cascade is divided into four specialized sections: (1) a Comma Network, which identifies syntactic relations associated with commas; (2) a Conjunction Network, which partially disambiguates phrasal conjunctions and fully disambiguates clausal conjunctions; (3) a Clause Network, which identifies non-comma-delimited clauses; and (4) a Phrase Network, which identifies the remaining base phrases in the sentence. Each automaton is capable of adding one ore more levels of structural tags to the to the tokens in a sentence. The larger-first approach is compared against a well-known easy-first approach. The results indicate that this larger-first approach is capable of (1) producing a more detailed partial parse than an easy first approach; (2) providing better containment of attachment ambiguity; (3) handling overlapping syntactic relations; and (4) achieving a higher accuracy than the easy-first approach. The automata of each network were developed by an empirical analysis of several sources and are presented here in details.
Title: Larger-first partial parsing.
33 views
14 downloads
Name(s): Van Delden, Sebastian Alexander, Author
Gomez, Fernando, Committee Chair
Engineering and Computer Science, Degree Grantor
Type of Resource: text
Date Issued: 2003
Publisher: University of Central Florida
Language(s): English
Abstract/Description: University of Central Florida College of Engineering Thesis; Larger-first partial parsing is a primarily top-down approach to partial parsing that is opposite to current easy-first, or primarily bottom-up, strategies. A rich partial tree structure is captured by an algorithm that assigns a hierarchy of structural tags to each of the input tokens in a sentence. Part-of-speech tags are first assigned to the words in a sentence by a part-of-speech tagger. A cascade of Deterministic Finite State Automata then uses this part-of-speech information to identify syntactic relations primarily ina descending order of their size. The cascade is divided into four specialized sections: (1) a Comma Network, which identifies syntactic relations associated with commas; (2) a Conjunction Network, which partially disambiguates phrasal conjunctions and fully disambiguates clausal conjunctions; (3) a Clause Network, which identifies non-comma-delimited clauses; and (4) a Phrase Network, which identifies the remaining base phrases in the sentence. Each automaton is capable of adding one ore more levels of structural tags to the to the tokens in a sentence. The larger-first approach is compared against a well-known easy-first approach. The results indicate that this larger-first approach is capable of (1) producing a more detailed partial parse than an easy first approach; (2) providing better containment of attachment ambiguity; (3) handling overlapping syntactic relations; and (4) achieving a higher accuracy than the easy-first approach. The automata of each network were developed by an empirical analysis of several sources and are presented here in details.
Identifier: CFR0000760 (IID), ucf:52932 (fedora)
Note(s): 2003-12-01
Ph.D.
Electrical Engineering and Computer Science
Doctorate
This record was generated from author submitted information.
Electronically reproduced by the University of Central Florida from a book held in the John C. Hitt Library at the University of Central Florida, Orlando.
Subject(s): Dissertations
Academic -- Engineering
Engineering -- Dissertations
Academic
Natural language processing (Computer science)
Parsing (Computer grammar)
Sequential machine theory
Persistent Link to This Record: http://purl.flvc.org/ucf/fd/CFR0000760
Restrictions on Access: public
Host Institution: UCF

In Collections