% example 2.4 illustrate the use of
% nonnegative matrix factorization for information retrieval. First, we load the
% data and normalize the columns.
% Loads up variable: X, termdoc, docs, and words
clc;clear;close all;
% load lsiex 没有这个数据所以手动输入
X=[ 1 0 0 1 0 ;
1 0 1 1 1;
1 0 0 1 0;
0 0 0 1 0;
0 1 0 1 1;
0 0 0 1 0;
];
docs ={{‘D1’} {‘D2’} {‘D3’} {‘D4’} {‘D5’}};
words={{‘bake’} {‘recipes’} {‘bread’} {‘cake’} {‘pastry’} {‘pie’}};
p=5;
% [n,p] = size(termdoc);
% Normalize columns to be unit norm
for i = 1:p
termdoc(:,i) = X(:,i)/norm(X(:,i));
end
[n,p] = size(termdoc);
[W,H] = nnmf(termdoc,3,‘algorithm’,‘mult’);
% Recall that we had the following queries and are looking for documents that
% match them.
q1 = [1 0 1 0 0 0];
q2 = [1 0 0 0 0 0];
% Now we compute the rank k approximation to our term-document matrix
% that we obtained using nonnegative matrix factorization.
termdocnmfk = W * H;
% We use the cosine measure and the columns of our approximated termdocument matrix to find the closest matches to our queries.
for i = 1:5
m1 = norm(q1);
m2 = norm(termdocnmfk(:,i));
cosq1c(i) = (q1 * termdocnmfk(:,i))/(m1m2);
m1 = norm(q2);
m2 = norm(termdocnmfk(:,i));
cosq2c(i) = (q2 * termdocnmfk(:,i))/(m1m2);
end
% For this run, our results are
% cosq1c = 0.7449 0.0000 0.0100 0.7185 0.0056
% cosq2c = 0.5268 0.0000 0.0075 0.5080 0.0043
X矩阵意思是五本书,每本书有6个单词,最后求出cosq1c 表示和第一本书和第四本书最相关。
来源:Exploratory Data Analysis with MATLAB, 2nd Edition windy
example 2.4