Non-negative matrix factorisation
Non-negative Matrix Factorisation, shortened as NMF, is a technique, often employed in recommender systems and in computer vision, to create an (approximate) factorisation of a big matrix into two sensibly smaller matrices in a way that elements are non-negative.
The explanation here is inspired by the one on Wikipedia.

What?

Let's say you have a large
m×nm \times n
matrix
VV
, with NMF you are able to factorise it into the product of matrices
WW
(
m×pm \times p
) and
HH
(
p×np \times n
), with the property that
pm,np \ll m, n
and
pp
playing the role of the hidden features in the data. The typical use case is that of a big sparse matrix factorised into two smaller dense matrices, which are much easier to deal with for computations.
Often, an approximate factorisation is desired, so that in reality
VWH .V \approx W H \ .
In fact,
pp
is chosen precisely in such a way that
WHWH
is a reasonable approximation of
VV
. The real full decomposition is
V=WH+U ,V = WH + U \ ,
where
UU
is the residuals matrix, whose elements can be either positive or negative.
The usual way to proceed in finding
WW
and
HH
is to use gradient descent (see page) by looking at minimising the difference between
VV
and
WHWH
with the constraint of non-negativity.

Some examples

Let's say we are working with text and doing some NLP on documents. We have a term-document matrix (see the page linked)
VV
,
102×10510^2 \times 10^5
and we would like to extract 10 features in order to generate a features matrix
WW
with
10410^4
rows and 10 columns and a coefficient matrix
HH
with 10 rows and
10210^2
columns. The product of
WHWH
would then have
10410^4
rows and
10210^2
columns, working as a reasonable approximation of the original
VV
.
Other typical applications are those where you have the matrix of users' ratings on items (products, movies, ...) and you'd want to discover some
pp
latent features.
WW
will give you the association of users to these hidden features and
HH
the association of the items to these hidden features. Once obtained the factorisation, you can use it to predict the rating a given user
uiu_i
would give to an item
djd_j
, as
Vij=pwipThpjV_{ij} = \sum_p w_{ip}^T h_{pj}

References

  1. 1.
    Wikipedia on the topic
Last modified 7mo ago