What is Learning to Rank?
Learning to Rank (LTR) applies machine learning to search relevance ranking. How does relevance ranking differ from other machine learning problems? Regression is one classic machine learning problem. In regression, you’re attempting to predict a variable (such as a stock price) as a function of known information (such as number of company employees, the company’s revenue, etc). In these cases, you’re building a function, say f, that can take what’s known (numEmployees, revenue), and have f output an approximate stock price.
排序学习 (LTR) 将机器学习应用于搜索相关性排名。相关性排名与其他机器学习问题有何不同?回归是一种经典的机器学习问题。在回归中,您试图根据已知信息(例如公司员工人数、公司收入等)预测变量(例如股票价格)。在这些情况下,你正在构建一个函数,比如 f,它可以采用已知的(员工数,收入),并让 f 输出一个近似的股票价格。
Classification is another machine learning problem. With classification, our function f, would classify our company into several categories. For example, profitable or not profitable. Or perhaps whether or not the company is evading taxes.
In Learning to Rank, the function f we want to learn does not make a direct prediction. Rather it’s used for ranking documents. We want a function f that comes as close as possible to our user’s sense of the ideal ordering of documents dependent on a query. The value output by f itself has no meaning (it’s not a stock price or a category). It’s more a prediction of a users’ sense of the relative usefulnes of a document given a query.
Here, we’ll briefly walk through the 10,000 meter view of Learning to Rank. For more information, we recommend blog articles How is Search Different From Other Machine Learning Problems? and What is Learning to Rank?.
Judgments: expression of the ideal ordering
Judgment lists, sometimes referred to as “golden sets” grade individual search results for a keyword search. For example, our demo uses TheMovieDB. When users search for “Rambo” we can indicate which movies ought to come back for “Rambo” based on our user’s expectations of search.
For example, we know these movies are very relevant:
- First Blood
- Rambo
We know these sequels are fairly relevant, but not exactly relevant:
- Rambo III
- Rambo First Blood, Part II
Some movies that star Sylvester Stallone are only tangentially relevant:
- Rocky
- Cobra
And of course many movies are not even close:
- Bambi
- First Daughter
Judgment lists apply “grades” to documents for a keyword, this helps establish the ideal ordering for a given keyword. For example, if we grade documents from 0-4, where 4 is exactly relevant. The above would turn into the judgment list: