【刘知远NLP课 整理】Word Representation

【刘知远NLP课 整理】Word Representation

Word representation is a process that transform the symbols to the machine understandable meanings. The goals of Word Representation are

  1. Compute word similarity

WR(Star) ≃ WR(Sun)
WR(Motel) ≃ WR(Hotel)

  1. Infer word relation

WR(China) − WR(Beijing) ≃ WR(Japan) - WR(Tokyo)
WR(Man) ≃ WR(King) − WR(Queen) + WR(Woman)
WR(Swimming) ≃ WR(Walking) − WR(Walk) + WR(Swim)

Now we start to discuss some ways of obtaining word representations.

Such as using synonyms and hypernyms to represent a word. e.g. WordNet, a resource containing synonym and hypernym sets.

【刘知远NLP课 整理】Word Representation

However, lots of problems exist:

  • Missing nuance

    ("proficient", "good") are synonyms only in some contexts

  • Missing new meanings of words

    Apple (fruit → IT company)

  • Subjective

  • Data sparsity

  • Requires human labor to create and adapt

2. One-Hot Representation

Regard words as discrete symbols.

【刘知远NLP课 整理】Word Representation

  • Vector dimension = # words in vocabulary
  • Token with greater ID value doesn't imply more or less important as compared with the token with less ID value.

The problem is all the vectors are orthogonal. No natural notion of similarity for one-hot vectors.

\[
上一篇:mysql 调用带返回值的存储过程


下一篇:因为在HttpGet请求中使用body传输json,被老板骂了