gensim Load embeddings

gensim package


from gensim.models.keyedvectors import KeyedVectors

twitter_embedding_path = 'twitter_embedding.emb'
twitter_vocab_path = 'twitter_model.vocab'
foursquare_embedding_path = 'foursquare_embedding.emb'
foursquare_vocab_path = 'foursquare_model.vocab'

# load the embedding vector using gensim
x_vectors = KeyedVectors.load_word2vec_format(foursquare_embedding_path, binary=False, fvocab=foursquare_vocab_path)
y_vectors = KeyedVectors.load_word2vec_format(twitter_embedding_path, binary=False, fvocab=twitter_vocab_path)

print(x_vectors.vocab.keys()[0:10])
print(y_vectors[0:10])

Content in 'twitter_embedding.emb':

5120 64
BarackObama -0.079930 0.106491 -0.075812 -0.026447 ...
mashable 0.046692 -0.038019 -0.055519 ...
...

Content in 'twitter_model.vocab':

BarackObama 3475971
mashable 2668606
JonahLupton 2515250
instagram 2359886
TheEllenShow 2292545
cnnbrk 2157283
nytimes 2141588
foursquare 2021352

...

上一篇:C++Primer 5th Chap3 Strings,Vectors, and Arrays(未完)


下一篇:线性组合(linear combinations), 生成空间(span), 基向量(basis vectors)——线性代数本质(二)