Distributed Sentence Similarity Base on Word Mover's Distance

Algorithm:

Refrence from one ICML15 paper: Word Mover's Distance.

1. First use Google's word2vec tool to get distributed word representing aka. word vectors.

2. Then use earth mover's distance as similarity measure metric.

3. Solve the EMD problem as transportation problem by Hungarian Algorithm.


Outcome:

Result looks not bad, but still have ways to improve the precision.

For example: use n-gram to keep a little bit sentence structure.

上一篇:uva 10801 - Lift Hopping(最短路Dijkstra)


下一篇:maven 根据不同的环境打war包-->资源文件的处理方式