Skip to content Skip to sidebar Skip to footer

Understanding Gensim Word2vec's Most_similar

I am unsure how I should use the most_similar method of gensim's Word2Vec. Let's say you want to test the tried-and-true example of: man stands to king as woman stands to X; find X

Solution 1:

You can view exactly what most_similar() does in its source code:

https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/keyedvectors.py#L485

It's not quite "find points in the vector space that are as close as possible to the positive vectors and as far away as possible from the negative ones". Rather, as described in the original word2vec papers, it performs vector arithmetic: adding the positive vectors, subtracting the negative, then from that resulting position, listing the known-vectors closest to that angle.

That is sufficient to solve man : king :: woman :: ?-style analogies, via a call like:

sims = wordvecs.most_similar(positive=['king', 'woman'], 
                             negative=['man'])

(You can think of this as, "start at 'king'-vector, add 'woman'-vector, subtract 'man'-vector, from where you wind up, report ranked word-vectors closest to that point (while leaving out any of the 3 query vectors).")

Post a Comment for "Understanding Gensim Word2vec's Most_similar"