|
|
|
|
留学生の方
(For International Students) |
|
|
|
|
|
|
|
|
|
|
|
|
Parallelizing Spectral K-Means Algorithm with Map-Reduce Framework (Map-ReduceによるSpectral K-Meansアルゴリズムの並列化に関する研究) |
胡 ヱ康
(指導教員:武市正人 教授)
Map-reduce is the hottest parallel programming framework which enables automatic parallelization and distribution of large scale computations on distributed systems composed with a number of general purpose machines via network. Meanwhile, implementing spectral clustering based algorithms with map-reduce framework was usually not applicable because the memory usage of establishing dense similarity matrix of large datasets is too heavy for general purpose machines. In this thesis, we proposed a spectral K-means algorithm using approximated sparse similarity matrix to make it suitable for general purpose machines and implemented it with Hadoop's map-reduce framework. By experiments on TDT-2 and RCV1 corpus, we obtained both good clustering quality and scalability with this implementation.
During the two years in the master program, I tried implementing various algorithms on different parallel programming enviroments and finally decided to do the above work as my thesis. Writing the thesis is an interesting but challenging work, and thanks to my supervisor and all our lab members for their instructions and advices.
|
|
|