An Introduction to Maximum Entropy Model

原文链接:http://www.cnblogs.com/JVKing/articles/2744660.html

Leon-Garcia的Probability, Statistics, and Random Processes for Electrical Engineering中对于Maximum Entropy Model的描述非常清楚。现根据我自己的理解归纳如下:

1. Entropy of a random variable

 $X$ is a discrete random variable with possible values $S_X = \{1, 2, ..., K\}$ and probability mass function $p_k = P[X = k]$. The uncertainty of the event $I_k = {X = k}$ can be characterized as $I(X = k) = -lnP[X = k]$, which is low if the probability is close to one and high if the probability is small.

The entropy of a random variable $X$ is defined as the expected value of the uncertainty of its outcomes: $H_X = E[I(X)] = -\sum\limits_{k = 1}^K{P[X = k]lnP[X = k]}$

Let $p = (p_1, p_2, ..., p_K)$ and $q = (q_1, q_2, ..., q_K)$ be two probability mass functions. The relative entropy of $q$ with respect to $p$ is defined by: $H(p; q) = \sum\limits_{k = 1}^K{p_kln\frac{1}{q_k}}-H_X = \sum\limits_{k = 1}^K{p_kln\frac{p_k}{q_k}}$

 

觉得MaxEnt Model真是越想越不可思议,它是一种multivariate logistic regression (利用sigmoid function作为classifier),然后relative entropy(或者也叫marginal entropy)其实跟Kullback-Leiber Divergence具有相同的形式。同时从信息论的角度理解和mutual information (maximum mutual information) 也有关系。这真是值得思考的问题,很多书或者tutorial都只解释了MaxEnt的一部分。没有见着系统整理的文章。另外信息论中得到的MaxEnt Model和NLP中如何应用也是值得思考的问题。由于MaxEnt Model的assumption中要求了random variable function的expectation,在statistics中实现的话就是采用counting的方式,这真是太美妙的结合了。
[to be continued] 

转载于:https://www.cnblogs.com/JVKing/articles/2744660.html

上一篇:CS294-112:Introduction and Course overview


下一篇:【笔记】机器学习 - 李宏毅 - 1 - Introduction