Neural collaborative filtering — A primer

Neural collaborative filtering — A primer

Product recommendation is a problem faced by many companies who have a lot of data on user behavior and want to turn into actionable insights. Chief among these problems is trying to guess which movie, article, or video a user might like by comparing them to similar users. Before a neural solution was developed, companies mainly relied on NMF (Non-Negative Matrix Factorization) to uncover user preferences.

Here’s a simple overview of NMF if we imagine a table of users’ movie ratings like so:

Using collaborative filtering algorithms like Non-Negative Matrix Factorization, the unknowns would be filled in by creating two matrices whose matrix product would produce the closest match to the values we observe in the table above. The resulting matrices would also contain useful information on users and movies.

The major disadvantage of NMF is that we can’t pass unknown values. Typically, the unknown values are replaced with zeros before running the NMF algorithm, which results in un-watched movies having a lower predicted score. Another disadvantage is that if we have other information about the users and movies (i.e. where the users live or the year the movie came out), we can’t include these variables with NMF. There is also the matter of time; NMF can take a very long time to run and does not scale well.

A Neural Collaborative Filter using embeddings for movies and users is a clever solution that addresses each of NMF’s shortcomings. Here’s a basic overview of the neural net:

With this configuration, we don’t need to fill any missing values with zeros; we only train on the data we have. Also, we can add info about the user like their age and additional info about the movies like the genre. Finally, thanks to modern optimization algorithms like Adam, convergence-to-error similar to that of NMF happens much more quickly on large datasets depending on the hyper-parameters you choose.

The embeddings are initialized randomly and update during training. After convergence, each user, movie, and genre embedding can be fed into the neural net to produce a predicted rating for a movie which the user has never seen. These trained embedding matrices can also be used as input to other machine learning algorithms like a random forest regressor or XGBoost.

These cutting-edge techniques allow for better results and provide an edge over competitors using more restrictive algorithms.