You are here
Confluence of Vision and Natural Language Processing for Cross-media Semantic Relations Extraction
- Date Issued:
- 2016
- Abstract/Description:
- In this dissertation, we focus on extracting and understanding semantically meaningful relationshipsbetween data items of various modalities; especially relations between images and naturallanguage. We explore the ideas and techniques to integrate such cross-media semantic relationsfor machine understanding of large heterogeneous datasets, made available through the expansionof the World Wide Web. The datasets collected from social media websites, news media outletsand blogging platforms usually contain multiple modalities of data. Intelligent systems are needed to automatically make sense out of these datasets and present them in such a way that humans can find the relevant pieces of information or get a summary of the available material. Such systems have to process multiple modalities of data such as images, text, linguistic features, and structured data in reference to each other. For example, image and video search and retrieval engines are required to understand the relations between visual and textual data so that they can provide relevant answers in the form of images and videos to the users' queries presented in the form of text.We emphasize the automatic extraction of semantic topics or concepts from the data available in any form such as images, free-flowing text or metadata. These semantic concepts/topics become the basis of semantic relations across heterogeneous data types, e.g., visual and textual data. A classic problem involving image-text relations is the automatic generation of textual descriptions of images. This problem is the main focus of our work. In many cases, large amount of text is associated with images. Deep exploration of linguistic features of such text is required to fully utilize the semantic information encoded in it. A news dataset involving images and news articles is an example of this scenario. We devise frameworks for automatic news image description generation based on the semantic relations of images, as well as semantic understanding of linguistic features of the news articles.
Title: | Confluence of Vision and Natural Language Processing for Cross-media Semantic Relations Extraction. |
105 views
93 downloads |
---|---|---|
Name(s): |
Tariq, Amara, Author Foroosh, Hassan, Committee Chair Qi, GuoJun, Committee Member Gonzalez, Avelino, Committee Member Pensky, Marianna, Committee Member University of Central Florida, Degree Grantor |
|
Type of Resource: | text | |
Date Issued: | 2016 | |
Publisher: | University of Central Florida | |
Language(s): | English | |
Abstract/Description: | In this dissertation, we focus on extracting and understanding semantically meaningful relationshipsbetween data items of various modalities; especially relations between images and naturallanguage. We explore the ideas and techniques to integrate such cross-media semantic relationsfor machine understanding of large heterogeneous datasets, made available through the expansionof the World Wide Web. The datasets collected from social media websites, news media outletsand blogging platforms usually contain multiple modalities of data. Intelligent systems are needed to automatically make sense out of these datasets and present them in such a way that humans can find the relevant pieces of information or get a summary of the available material. Such systems have to process multiple modalities of data such as images, text, linguistic features, and structured data in reference to each other. For example, image and video search and retrieval engines are required to understand the relations between visual and textual data so that they can provide relevant answers in the form of images and videos to the users' queries presented in the form of text.We emphasize the automatic extraction of semantic topics or concepts from the data available in any form such as images, free-flowing text or metadata. These semantic concepts/topics become the basis of semantic relations across heterogeneous data types, e.g., visual and textual data. A classic problem involving image-text relations is the automatic generation of textual descriptions of images. This problem is the main focus of our work. In many cases, large amount of text is associated with images. Deep exploration of linguistic features of such text is required to fully utilize the semantic information encoded in it. A news dataset involving images and news articles is an example of this scenario. We devise frameworks for automatic news image description generation based on the semantic relations of images, as well as semantic understanding of linguistic features of the news articles. | |
Identifier: | CFE0006507 (IID), ucf:51401 (fedora) | |
Note(s): |
2016-12-01 Ph.D. Engineering and Computer Science, Computer Science Doctoral This record was generated from author submitted information. |
|
Subject(s): | Image description generation -- Semantic network construction -- Multimedia semantic relations | |
Persistent Link to This Record: | http://purl.flvc.org/ucf/fd/CFE0006507 | |
Restrictions on Access: | public 2016-12-15 | |
Host Institution: | UCF |