4. Matrix Factorization

The reference article emphasizes matrix factorization variants. This remains foundational for data scientists.

4.1 PMF / latent factors (explicit feedback)

Model:

r^ui=puqi

where user and item embeddings pu,qiRf.

Regularized loss over observed pairs Ω:

minP,Q (u,i)Ω(ruipuqi)2 +λ(pu22+qi22)

Optimization:

  • SGD (simple, flexible)
  • ALS (efficient for large sparse systems)
  • Practical implementations are available in the Surprise library and its documentation

With ALS, you alternate between solving for user factors while holding item factors fixed and solving for item factors while holding user factors fixed. That makes large sparse factorization problems easier to optimize in practice.

Illustration of matrix factorization model

Image credit: Dive into Deep Learning, CC BY-SA 4.0.

Alternating least squares optimization cycle

4.2 SVD-style bias terms

A common extension adds global/user/item bias terms:

r^ui=μ+bu+bi+puqi

Biases capture broad effects (strict users, broadly popular items) and usually improve quality.

4.3 Implicit-feedback factorization

Following the article’s logic, implicit events are treated as preference plus confidence.

One common setup:

  • Preference: pui0,1 from interaction presence
  • Confidence: cui=1+αtui, where tui is interaction strength

Objective:

minX,Y u,icui(puixuyi)2 +λ(xu22+yi22)

This is the core weighted-implicit matrix factorization approach used in large-scale recommenders.

The Google course adds an important weighted-matrix-factorization view that is especially useful in industrial retrieval systems. Let A be the feedback matrix and let obs denote observed interactions. A common weighted objective is:

minU,V (u,i)obs(AuiUu,Vi)2 +w0(u,i)obsUu,Vi2

Here w0 controls how strongly the model treats unobserved pairs as weak negatives. In practice, this matters a lot: too little weight on unobserved pairs can make the embedding space collapse, while too much weight can wash out true positives. Google also notes that frequent users or popular items can dominate the objective, so observed pairs are often reweighted by user or item frequency.

4.4 Evaluation for rating prediction

For explicit-feedback recommendation, D2L’s matrix factorization section uses RMSE as the primary evaluation measure:

RMSE=1|T|(u,i)T(ruir^ui)2

where T is the evaluation set of observed user-item pairs.

RMSE is appropriate for rating prediction, but it is not sufficient for top-n recommendation because it does not evaluate rank order.

4.5 AutoRec for nonlinear rating prediction

AutoRec extends collaborative filtering with an autoencoder-style reconstruction objective.

  • Input is a partially observed user vector or item vector from the rating matrix
  • The network reconstructs missing entries through a hidden representation
  • Only observed ratings should contribute to the training loss

For item-based AutoRec, D2L writes the input as the ith column Ri of the rating matrix and reconstructs it with a nonlinear network:

h(Ri)=f(W,g(VRi+μ)+b)

The learning objective minimizes reconstruction error over observed entries only:

minW,V,μ,b i=1MRih(Ri)O2 +λ(WF2+VF2)

Conceptually, AutoRec matters because it is one of the earliest examples in D2L of moving from linear collaborative filtering to nonlinear neural reconstruction for rating prediction.

4.6 Personalized ranking objectives

D2L makes an important distinction between rating prediction objectives and ranking objectives.

  • Pointwise objectives model one user-item interaction at a time
  • Pairwise objectives model relative preference between a positive and a negative item
  • Listwise objectives optimize properties of an entire ranked list
Objective familyTraining signalProsConsTypical use
PointwiseOne labeled user-item example at a timeSimple to implement, works with standard regression or classification losses, easy to calibrate as a score or probabilityDoes not optimize ordering directly, sensitive to label noise and exposure bias, can overfocus on absolute score accuracyCTR prediction, rating prediction, coarse ranking baselines
PairwisePositive item compared against a sampled negative itemBetter aligned with top-n ranking, efficient for implicit feedback, usually easier to train than full listwise methodsQuality depends heavily on negative sampling, does not model full-list effects, can miss business constraints beyond pair comparisonsCandidate generation, implicit-feedback retrieval, pre-ranking
ListwiseEntire ranked list or slateBest conceptual match to ranking metrics such as NDCG, can optimize position effects and whole-list qualityMore complex objectives, heavier computation, harder data construction and serving alignmentFinal-stage ranking, search ranking, slate optimization

For top-n recommendation from implicit feedback, pairwise objectives are often a better match to the task.

The two core D2L losses are:

  1. Bayesian Personalized Ranking (BPR), which encourages the positive item to score above a sampled negative item:

(u,i,j)Dlnσ(y^uiy^uj)λΘΘ2

  1. Hinge ranking loss, which pushes the positive item away from the negative item by a margin m:

(u,i,j)Dmax(my^ui+y^uj,0)

These are central for implicit-feedback recommendation because they optimize relative ordering rather than absolute score accuracy.

4.7 SVD++ intuition

SVD++ augments user representation with signals from interacted items, helping when explicit feedback is sparse but interaction history exists.

Previous
Next