4. Matrix Factorization
The reference article emphasizes matrix factorization variants. This remains foundational for data scientists.
4.1 PMF / latent factors (explicit feedback)
Model:
where user and item embeddings
Regularized loss over observed pairs
Optimization:
- SGD (simple, flexible)
- ALS (efficient for large sparse systems)
- Practical implementations are available in the Surprise library and its documentation
With ALS, you alternate between solving for user factors while holding item factors fixed and solving for item factors while holding user factors fixed. That makes large sparse factorization problems easier to optimize in practice.
Image credit: Dive into Deep Learning, CC BY-SA 4.0.
4.2 SVD-style bias terms
A common extension adds global/user/item bias terms:
Biases capture broad effects (strict users, broadly popular items) and usually improve quality.
4.3 Implicit-feedback factorization
Following the article’s logic, implicit events are treated as preference plus confidence.
One common setup:
- Preference:
from interaction presence - Confidence:
, where is interaction strength
Objective:
This is the core weighted-implicit matrix factorization approach used in large-scale recommenders.
The Google course adds an important weighted-matrix-factorization view that is especially useful in industrial retrieval systems. Let
Here
4.4 Evaluation for rating prediction
For explicit-feedback recommendation, D2L’s matrix factorization section uses RMSE as the primary evaluation measure:
where
RMSE is appropriate for rating prediction, but it is not sufficient for top-
4.5 AutoRec for nonlinear rating prediction
AutoRec extends collaborative filtering with an autoencoder-style reconstruction objective.
- Input is a partially observed user vector or item vector from the rating matrix
- The network reconstructs missing entries through a hidden representation
- Only observed ratings should contribute to the training loss
For item-based AutoRec, D2L writes the input as the
The learning objective minimizes reconstruction error over observed entries only:
Conceptually, AutoRec matters because it is one of the earliest examples in D2L of moving from linear collaborative filtering to nonlinear neural reconstruction for rating prediction.
4.6 Personalized ranking objectives
D2L makes an important distinction between rating prediction objectives and ranking objectives.
- Pointwise objectives model one user-item interaction at a time
- Pairwise objectives model relative preference between a positive and a negative item
- Listwise objectives optimize properties of an entire ranked list
| Objective family | Training signal | Pros | Cons | Typical use |
|---|---|---|---|---|
| Pointwise | One labeled user-item example at a time | Simple to implement, works with standard regression or classification losses, easy to calibrate as a score or probability | Does not optimize ordering directly, sensitive to label noise and exposure bias, can overfocus on absolute score accuracy | CTR prediction, rating prediction, coarse ranking baselines |
| Pairwise | Positive item compared against a sampled negative item | Better aligned with top- | Quality depends heavily on negative sampling, does not model full-list effects, can miss business constraints beyond pair comparisons | Candidate generation, implicit-feedback retrieval, pre-ranking |
| Listwise | Entire ranked list or slate | Best conceptual match to ranking metrics such as NDCG, can optimize position effects and whole-list quality | More complex objectives, heavier computation, harder data construction and serving alignment | Final-stage ranking, search ranking, slate optimization |
For top-
The two core D2L losses are:
- Bayesian Personalized Ranking (BPR), which encourages the positive item to score above a sampled negative item:
- Hinge ranking loss, which pushes the positive item away from the negative item by a margin
:
These are central for implicit-feedback recommendation because they optimize relative ordering rather than absolute score accuracy.
4.7 SVD++ intuition
SVD++ augments user representation with signals from interacted items, helping when explicit feedback is sparse but interaction history exists.