Project 3
Non-overlapping Community Detection in Large Networks
Due: 2019-12-10
In this project, you are going to detect / reveal significant communities in large
networks using current various community detection (graph mining, clustering)
methods.
1. Detect / reveal communities in the following networks:
http://snap.stanford.edu/data/index.html
Networks with ground-truth communities
Name Type Nodes Edges Communities Description
com-LiveJournal Undirected,
Communities 3,997,962 34,681,189 287,512
LiveJournal online
social network
com-Friendster Undirected,
Communities 65,608,366 1,806,067,135 957,154
Friendster online
social network
com-Orkut Undirected,
Communities 3,072,441 117,185,083 6,288,363
Orkut online social
network
com-Youtube Undirected,
Communities 1,134,890 2,987,624 8,385
Youtube online
social network
com-DBLP Undirected,
Communities 317,080 1,049,866 13,477
DBLP collaboration
network
com-Amazon Undirected,
Communities 334,863 925,872 75,149 Amazon product
network
email-Eu-core
Directed,
Communities 1,005 25,571 42 E-mail network
wiki-topcats Directed,
Communities 1,791,489 28,511,807 17,364 Wikipedia hyperlinks
2. The following methods should be implemented and evaluated:
1) Hierarchical clustering using Jaccard index;
2) Spectral Clustering;
3) CNM (Community Detection in Complex Networks Using External Optimization);
4) HRG (Hierarchical random graph, http://tuvalu.santafe.edu/~aaronc/hierarchy/);
5) Infomap (http://igraph.org/python/doc/igraph.Graph-class.html);
6) Fast Unfolding community detection (http://arxiv.org/pdf/0803.0476v2.pdf,
community detection in networkx package);
7) Multi-Scale Community Detection using Stability as Optimisation Criterion in a
Greedy Algorithm (http://www.elemartelot.org/index.php/programming/cd-code);
8) Multi-Scale Community Detection using Stability Optimisation
(http://www.elemartelot.org/index.php/programming/cd-code);
Igraph (http://igraph.org/python) has implemented methods 1)-5);
networkx (https://networkx.github.io/,
http://blog.sciencenet.cn/blog-404069-337442.html) has implemented 6);
Codes of 7) and 8) are listed as above.
3. Use “community detection benchmark” to evaluate the performances of
community detection methods:
http://arxiv.org/abs/0805.4770
https://sites.google.com/site/santofortunato/inthepress2
--“Package 1 includes the code to generate undirected and unweighted graphs with
overlapping communities. The extent of the overlap can be tuned by input, and it can
be set to zero if one is interested in non-overlapping clusters.”
In this project, we are just focusing on the undirected and unweighted graphs and
non-overlapping communities, such that you can just use the Package1.
Note: The Benchmark graphs generates series of graphs with varying degrees
community structure via changing the mixing parameter μ, for less μ the community
structure is more distinct and earlier to be detected.
4. How to evaluate the performance? The NMI is a good measurement
(http://blog.sina.com.cn/s/blog_45e6be080101dlya.html). You can refer to this source
code:
http://www.mathworks.com/matlabcentral/fileexchange/35625-information-theory-tool
box/content/nmi.m
5. Report your experiments’ details, including the methods you used, the
implementation details, and the performance evaluation, comparison, analysis and
discussion. You are encouraged to improve the current methods, even develop a
novel method to apply in these datasets. The report should be consist of four parts:
1) introduction and related works; 2) method and implementation; 3) experimental
results and 4) analysis and discussion.
6. You can apply the HPC resource for this project via this link:
https://www.must.edu.mo/ssi/labs/hpc.
因为专业,所以值得信赖。如有需要,请加QQ:99515681 或 微信:codehelp