Notes on Probabilistic Latent Semantic Analysis (PLSA)

转自:http://www.hongliangjie.com/2010/01/04/notes-on-probabilistic-latent-semantic-analysis-plsa/

I highly recommend you read the more detailed version of http://arxiv.org/abs/1212.3900

Formulation of PLSA

There are two ways to formulate PLSA. They are equivalent but may lead to different inference process.

  1. Notes on Probabilistic Latent Semantic Analysis (PLSA)
  2. Notes on Probabilistic Latent Semantic Analysis (PLSA)

Let’s see why these two equations are equivalent by using Bayes rule.

Notes on Probabilistic Latent Semantic Analysis (PLSA)Notes on Probabilistic Latent Semantic Analysis (PLSA)Notes on Probabilistic Latent Semantic Analysis (PLSA)Notes on Probabilistic Latent Semantic Analysis (PLSA)

The whole data set is generated as (we assume that all words are generated independently):

Notes on Probabilistic Latent Semantic Analysis (PLSA)

The Log-likelihood of the whole data set for (1) and (2) are:

Notes on Probabilistic Latent Semantic Analysis (PLSA)

Notes on Probabilistic Latent Semantic Analysis (PLSA)

EM

For Notes on Probabilistic Latent Semantic Analysis (PLSA) or Notes on Probabilistic Latent Semantic Analysis (PLSA), the optimization is hard due to the log of sum. Therefore, an algorithm called Expectation-Maximization is usually employed. Before we introduce anything about EM, please note that EM is only guarantee to find a local optimum (although it may be a global one).

First, we see how EM works in general. As we shown for PLSA, we usually want to estimate the likelihood of data, namely Notes on Probabilistic Latent Semantic Analysis (PLSA), given the paramter Notes on Probabilistic Latent Semantic Analysis (PLSA). The easiest way is to obtain a maximum likelihood estimator by maximizing Notes on Probabilistic Latent Semantic Analysis (PLSA). However, sometimes, we also want to include some hidden variables which are usually useful for our task. Therefore, what we really want to maximize is Notes on Probabilistic Latent Semantic Analysis (PLSA), the complete likelihood. Now, our attention becomes to this complete likelihood. Again, directly maximizing this likelihood is usually difficult. What we would like to show here is to obtain a lower bound of the likelihood and maximize this lower bound.

We need Jensen’s Inequality to help us obtain this lower bound. For any convex function Notes on Probabilistic Latent Semantic Analysis (PLSA), Jensen’s Inequality states that :

Notes on Probabilistic Latent Semantic Analysis (PLSA)

Thus, it is not difficult to show that :

Notes on Probabilistic Latent Semantic Analysis (PLSA)

and for concave functions (like logarithm), it is :

Notes on Probabilistic Latent Semantic Analysis (PLSA)

Back to our complete likelihood, we can obtain the following conclusion by using concave version of Jensen’s Inequality :

Notes on Probabilistic Latent Semantic Analysis (PLSA)Notes on Probabilistic Latent Semantic Analysis (PLSA)Notes on Probabilistic Latent Semantic Analysis (PLSA)

Therefore, we obtained a lower bound of complete likelihood and we want to maximize it as tight as possible. EM is an algorithm that maximize this lower bound through a iterative fashion. Usually, EM first would fix current Notes on Probabilistic Latent Semantic Analysis (PLSA) value and maximize Notes on Probabilistic Latent Semantic Analysis (PLSA) and then use the new Notes on Probabilistic Latent Semantic Analysis (PLSA) value to obtain a new guess on Notes on Probabilistic Latent Semantic Analysis (PLSA), which is essentially a two stage maximization process. The first step can be shown as follows:

Notes on Probabilistic Latent Semantic Analysis (PLSA)Notes on Probabilistic Latent Semantic Analysis (PLSA)Notes on Probabilistic Latent Semantic Analysis (PLSA)Notes on Probabilistic Latent Semantic Analysis (PLSA)Notes on Probabilistic Latent Semantic Analysis (PLSA)

The first term is the same for all Notes on Probabilistic Latent Semantic Analysis (PLSA). Therefore, in order to maximize the whole equation, we need to minimize KL divergence between Notes on Probabilistic Latent Semantic Analysis (PLSA) and Notes on Probabilistic Latent Semantic Analysis (PLSA), which eventually leads to the optimum solution of Notes on Probabilistic Latent Semantic Analysis (PLSA). So, usually for E-step, we use current guess of Notes on Probabilistic Latent Semantic Analysis (PLSA) to calculate the posterior distribution of hidden variable as the new update score. For M-step, it is problem-dependent. We will see how to do that in later discussions.

Another explanation of EM is in terms of optimizing a so-called Q function. We devise the data generation process as Notes on Probabilistic Latent Semantic Analysis (PLSA). Therefore, the complete likelihood is modified as:

Notes on Probabilistic Latent Semantic Analysis (PLSA)

Think about how to maximize Notes on Probabilistic Latent Semantic Analysis (PLSA). Instead of directly maximizing it, we can iteratively maximize Notes on Probabilistic Latent Semantic Analysis (PLSA) as :

Notes on Probabilistic Latent Semantic Analysis (PLSA)Notes on Probabilistic Latent Semantic Analysis (PLSA)

Now take the expectation of this equation, we have:

Notes on Probabilistic Latent Semantic Analysis (PLSA)

The last term is always non-negative since it can be recognized as the KL-divergence of Notes on Probabilistic Latent Semantic Analysis (PLSA) and Notes on Probabilistic Latent Semantic Analysis (PLSA). Therefore, we obtain a lower bound of Likelihood :

Notes on Probabilistic Latent Semantic Analysis (PLSA)

The last two terms can be treated as constants as they do not contain the variable Notes on Probabilistic Latent Semantic Analysis (PLSA), so the lower bound is essentially the first term, which is also sometimes called as “Q-function”. Notes on Probabilistic Latent Semantic Analysis (PLSA)

EM of Formulation 1

In case of Formulation 1, let us introduce hidden variables Notes on Probabilistic Latent Semantic Analysis (PLSA) to indicate which hidden topic Notes on Probabilistic Latent Semantic Analysis (PLSA) is selected to generated Notes on Probabilistic Latent Semantic Analysis (PLSA) in Notes on Probabilistic Latent Semantic Analysis (PLSA) (Notes on Probabilistic Latent Semantic Analysis (PLSA)). Therefore, the complete likelihood can be formulated as :

Notes on Probabilistic Latent Semantic Analysis (PLSA)Notes on Probabilistic Latent Semantic Analysis (PLSA)

From the equation above, we can write our Q-function for the complete likelihood Notes on Probabilistic Latent Semantic Analysis (PLSA):

Notes on Probabilistic Latent Semantic Analysis (PLSA)

For E-step, simply using Bayes Rule, we can obtain:

Notes on Probabilistic Latent Semantic Analysis (PLSA)Notes on Probabilistic Latent Semantic Analysis (PLSA)Notes on Probabilistic Latent Semantic Analysis (PLSA)

For M-step, we need to maximize Q-function, which needs to be incorporated with other constraints:

Notes on Probabilistic Latent Semantic Analysis (PLSA)Notes on Probabilistic Latent Semantic Analysis (PLSA)

and take all derivatives:

Notes on Probabilistic Latent Semantic Analysis (PLSA)Notes on Probabilistic Latent Semantic Analysis (PLSA)Notes on Probabilistic Latent Semantic Analysis (PLSA)Notes on Probabilistic Latent Semantic Analysis (PLSA)Notes on Probabilistic Latent Semantic Analysis (PLSA)Notes on Probabilistic Latent Semantic Analysis (PLSA)

Therefore, we can easily obtain:

Notes on Probabilistic Latent Semantic Analysis (PLSA)Notes on Probabilistic Latent Semantic Analysis (PLSA)Notes on Probabilistic Latent Semantic Analysis (PLSA)Notes on Probabilistic Latent Semantic Analysis (PLSA)Notes on Probabilistic Latent Semantic Analysis (PLSA)

EM of Formulation 2

Use similar method to introduce hidden variables to indicate which Notes on Probabilistic Latent Semantic Analysis (PLSA) is selected to generated Notes on Probabilistic Latent Semantic Analysis (PLSA) and Notes on Probabilistic Latent Semantic Analysis (PLSA) and we can have the following complete likelihood :

Notes on Probabilistic Latent Semantic Analysis (PLSA)Notes on Probabilistic Latent Semantic Analysis (PLSA)

Therefore, the Q-function Notes on Probabilistic Latent Semantic Analysis (PLSA) would be :

Notes on Probabilistic Latent Semantic Analysis (PLSA)

For E-step, again, simply using Bayes Rule, we can obtain:

Notes on Probabilistic Latent Semantic Analysis (PLSA)Notes on Probabilistic Latent Semantic Analysis (PLSA)

For M-step, we maximize the constraint version of Q-function:

Notes on Probabilistic Latent Semantic Analysis (PLSA)Notes on Probabilistic Latent Semantic Analysis (PLSA)

and take all derivatives:

Notes on Probabilistic Latent Semantic Analysis (PLSA)Notes on Probabilistic Latent Semantic Analysis (PLSA)Notes on Probabilistic Latent Semantic Analysis (PLSA)Notes on Probabilistic Latent Semantic Analysis (PLSA)Notes on Probabilistic Latent Semantic Analysis (PLSA)

Therefore, we can easily obtain:

Notes on Probabilistic Latent Semantic Analysis (PLSA)Notes on Probabilistic Latent Semantic Analysis (PLSA)Notes on Probabilistic Latent Semantic Analysis (PLSA)Notes on Probabilistic Latent Semantic Analysis (PLSA)

上一篇:SSM 整合 quartz JDBC方式实现job动态增删改查记录


下一篇:[ACM] POJ 3349 Snowflake Snow Snowflakes(哈希查找,链式解决冲突)