You are here

Algorithms for Community Identification in Complex Networks

Download pdf | Full Screen View

Date Issued:
2012
Abstract/Description:
Complex networks such as the Internet, the World Wide Web (WWW), and various social and biological networks, are viewed as large, dynamic, random graphs, with properties significantly different from those of the Erd(&)#246;s-R(&)#233;nyi random graphs. In particular, properties such as degree distribution, network distance, transitivity and clustering coefficient of these networks have been empirically shown to diverge from classical random networks. Existence of communities is one such property inherent to these networks. A community may informally be defined as a locally-dense induced subgraph, of significant size, in a large globally-sparse graph. Recent empirical results reveal communities in networks spanning across different disciplines (-) physics, statistics, sociology, biology, and linguistics. At least two different questions may be posed on the community structure in large networks: (i) Given a network, detect or extract all (i.e., sets of nodes that constitute) communities; and (ii) Given a node in the network, identify the best community that the given node belongs to, if there exists one. Several algorithms have been proposed to solve the former problem, known as Community Discovery. The latter problem, known as Community Identification, has also been studied, but to a much smaller extent. Both these problems have been shown to be NP-complete, and a number of approximate algorithms have been proposed in recent years. A comprehensive taxonomy of the existing community detection algorithms is presented in this work. Global exploration of these complex networks to pull out communities (community discovery) is time and memory consuming. A more confined approach to mine communities in a given network is investigated in this research. Identifying communities does not require the knowledge of the entire graph. Community identification algorithms exist in the literature, but to a smaller extent. The dissertation presents a thorough description and analysis of the existing techniques to identify communities in large networks. Also a novel heuristic for identifying the community to which a given seed node belongs using only its neighborhood information is presented. An improved definition of a community based on the average degree of the induced subgraph is discussed thoroughly and it is compared with the various definitions in the literature. Next, a faster and accurate algorithm to identify communities in complex networks based on maximizing the average degree is described. The divisive nature of the algorithm (as against the existing agglomerative methods) efficiently identifies communities in large complex networks. The performance of the algorithm on several synthetic and real-world complex networks has also been thoroughly investigated.
Title: Algorithms for Community Identification in Complex Networks.
36 views
17 downloads
Name(s): Vasudevan, Mahadevan, Author
Deo, Narsingh, Committee Chair
Hughes, Charles, Committee Member
Guha, Ratan, Committee Member
Chatterjee, Mainak, Committee Member
Zhao, Yue, Committee Member
University of Central Florida, Degree Grantor
Type of Resource: text
Date Issued: 2012
Publisher: University of Central Florida
Language(s): English
Abstract/Description: Complex networks such as the Internet, the World Wide Web (WWW), and various social and biological networks, are viewed as large, dynamic, random graphs, with properties significantly different from those of the Erd(&)#246;s-R(&)#233;nyi random graphs. In particular, properties such as degree distribution, network distance, transitivity and clustering coefficient of these networks have been empirically shown to diverge from classical random networks. Existence of communities is one such property inherent to these networks. A community may informally be defined as a locally-dense induced subgraph, of significant size, in a large globally-sparse graph. Recent empirical results reveal communities in networks spanning across different disciplines (-) physics, statistics, sociology, biology, and linguistics. At least two different questions may be posed on the community structure in large networks: (i) Given a network, detect or extract all (i.e., sets of nodes that constitute) communities; and (ii) Given a node in the network, identify the best community that the given node belongs to, if there exists one. Several algorithms have been proposed to solve the former problem, known as Community Discovery. The latter problem, known as Community Identification, has also been studied, but to a much smaller extent. Both these problems have been shown to be NP-complete, and a number of approximate algorithms have been proposed in recent years. A comprehensive taxonomy of the existing community detection algorithms is presented in this work. Global exploration of these complex networks to pull out communities (community discovery) is time and memory consuming. A more confined approach to mine communities in a given network is investigated in this research. Identifying communities does not require the knowledge of the entire graph. Community identification algorithms exist in the literature, but to a smaller extent. The dissertation presents a thorough description and analysis of the existing techniques to identify communities in large networks. Also a novel heuristic for identifying the community to which a given seed node belongs using only its neighborhood information is presented. An improved definition of a community based on the average degree of the induced subgraph is discussed thoroughly and it is compared with the various definitions in the literature. Next, a faster and accurate algorithm to identify communities in complex networks based on maximizing the average degree is described. The divisive nature of the algorithm (as against the existing agglomerative methods) efficiently identifies communities in large complex networks. The performance of the algorithm on several synthetic and real-world complex networks has also been thoroughly investigated.
Identifier: CFE0004294 (IID), ucf:49463 (fedora)
Note(s): 2012-05-01
Ph.D.
Engineering and Computer Science, Computer Science
Doctoral
This record was generated from author submitted information.
Subject(s): Complex networks -- Community identification -- algorithms -- real-world networks -- community
Persistent Link to This Record: http://purl.flvc.org/ucf/fd/CFE0004294
Restrictions on Access: public 2012-05-15
Host Institution: UCF

In Collections