unlocking Text Data with Machine learning & Deep Learning Using Python
only a few lines for now, more later when i am more farmiliar with this shit.
But to train these models, it requires a huge amount of computing
power. So, let us go ahead and use Google’s pre-trained model, which has
been trained with over 100 billion words.
Download the model from the below path and keep it in your local
storage:
https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit
Import the gensim package and follow the steps to understand
Google’s word2vec.
import gensim package
import gensim
load the saved model
model = gensim.models.Word2Vec.load_word2vec_format(‘C:\
Users\GoogleNews-vectors-negative300.bin’, binary=True)
#Checking how similarity works.
print (model.similarity(‘this’, ‘is’))
Chapter 3 Converting text to Features
92
Output:
0.407970363878
#Lets check one more.
print (model.similarity(‘post’, ‘book’))
Output:
0.0572043891977
“This” and “is” have a good amount of similarity, but the similarity
between the words “post” and “book” is poor. For any given set of words,
it uses the vectors of both the words and calculates the similarity between
them.
Finding the odd one out.
model.doesnt_match(‘breakfast cereal dinner lunch’;.split())
Output:
‘cereal’
Of ‘breakfast’ , ‘cereal’ , ‘dinner’ and ‘lunch’, only cereal is the word that is
not anywhere related to the remaining 3 words.
It is also finding the relations between words.
word_vectors.most_similar(positive=[‘woman’, ‘king’],
negative=[‘man’])
Output:
queen: 0.7699
If you add ‘woman’ and ‘king’ and minus man, it is predicting queen as
output with 77% confidence. Isn’t this amazing?
king woman man queen
Chapter 3 Converting text to Features
93
Let’s have a look at few of the interesting examples using T – SNE plot
for word embeddings.
Above is the word embedding’s output representation of home
interiors and exteriors. If you clearly observe, all the words related to
electric fittings are near to each other; similarly, words related to bathroom
fittings are near to each other, and so on. This is the beauty of word
embeddings.