慢慢学习着用吧

unlocking Text Data with Machine learning & Deep Learning Using Python

only a few lines for now, more later when i am more farmiliar with this shit.

But to train these models, it requires a huge amount of computing
power. So, let us go ahead and use Google’s pre-trained model, which has
been trained with over 100 billion words.
Download the model from the below path and keep it in your local
storage:
https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit
Import the gensim package and follow the steps to understand
Google’s word2vec.

import gensim package

import gensim

load the saved model

model = gensim.models.Word2Vec.load_word2vec_format(‘C:\
Users\GoogleNews-vectors-negative300.bin’, binary=True)
#Checking how similarity works.
print (model.similarity(‘this’, ‘is’))
Chapter 3 Converting text to Features
92
Output:
0.407970363878
#Lets check one more.
print (model.similarity(‘post’, ‘book’))
Output:
0.0572043891977
“This” and “is” have a good amount of similarity, but the similarity
between the words “post” and “book” is poor. For any given set of words,
it uses the vectors of both the words and calculates the similarity between
them.

Finding the odd one out.

model.doesnt_match(‘breakfast cereal dinner lunch’;.split())
Output:
‘cereal’
Of ‘breakfast’ , ‘cereal’ , ‘dinner’ and ‘lunch’, only cereal is the word that is
not anywhere related to the remaining 3 words.

It is also finding the relations between words.

word_vectors.most_similar(positive=[‘woman’, ‘king’],
negative=[‘man’])
Output:
queen: 0.7699
If you add ‘woman’ and ‘king’ and minus man, it is predicting queen as
output with 77% confidence. Isn’t this amazing?
king woman man queen
Chapter 3 Converting text to Features
93
Let’s have a look at few of the interesting examples using T – SNE plot
for word embeddings.
Above is the word embedding’s output representation of home
interiors and exteriors. If you clearly observe, all the words related to
electric fittings are near to each other; similarly, words related to bathroom
fittings are near to each other, and so on. This is the beauty of word
embeddings.

慢慢学习着用吧慢慢学习着用吧 weixin_45514087 发布了2 篇原创文章 · 获赞 0 · 访问量 34 私信 关注
上一篇:《dna2vec》_MarkDown_生物计算


下一篇:PAT Advanced 1063 Set Similarity (25分)(STL)