5. Feature-Rich Recommendation

As D2L section 21.8 emphasizes, interaction data is often sparse and noisy. In many production settings, recommendation is better framed as impression-level prediction with rich side features.

5.1 Feature-rich recommendation and CTR

Feature-rich recommendation is common in ads, feeds, and product surfaces.

  • Labels are often binary, such as click vs no click
  • Inputs include many categorical fields rather than only user and item IDs
  • The D2L advertising example uses 34 fields, with the first column as the click label and the remaining columns as categorical features

This setting is different from classic matrix factorization because the goal is often click-through rate prediction over impression-level examples rather than rating reconstruction.

CTR is defined as:

CTR=clicksimpressions×100

5.2 Factorization machines

Factorization machines are one of the most important bridges between collaborative filtering and feature-rich prediction.

For a feature vector xRd, the two-way FM model is:

y^(x)=w0+i=1dwixi+i=1dj=i+1dvi,vjxixj

Interpretation:

  • The first two terms are linear
  • The last term models pairwise feature interactions
  • If one feature encodes user identity and another encodes item identity, the interaction term reduces to a collaborative-filtering-style embedding interaction

D2L also highlights the computational trick that reduces FM interaction cost from O(kd2) to O(kd), which is why FM remains practical on high-dimensional sparse data.

5.3 DeepFM

DeepFM extends FM by combining low-order feature interactions from FM with high-order nonlinear interactions from a deep network.

  • The FM branch captures low-order interactions
  • The deep branch uses shared embeddings and an MLP to learn higher-order interactions
  • Both outputs are combined into a final prediction

D2L presents the DeepFM prediction as:

y^=σ(y^(FM)+y^(DNN))

DeepFM is especially useful when simple pairwise interactions are not expressive enough, but you still want the inductive bias of factorization-based feature interaction.

DeepFM architecture

Image credit: Dive into Deep Learning, CC BY-SA 4.0.

5.4 Hybrid factorization with features (LightFM-style)

  • User embedding = sum of user-feature embeddings
  • Item embedding = sum of item-feature embeddings
  • Score uses dot product (+ optional biases)

Why data scientists use this:

  • Stronger cold-start behavior
  • Smooth path between collaborative and content-based modeling
  • Practical when metadata quality is reasonable

The Google course makes the same idea concrete from a matrix-factorization angle: you can augment the original interaction matrix with user-feature and item-feature blocks, then factorize the augmented matrix so that side features learn embeddings alongside users and items. Conceptually, this is one of the cleanest bridges between classic WALS-style recommender systems and modern hybrid feature-based models.

5.5 Industrial ads CTR architectures

If you want a more industrial view of feature-rich ranking, two recent papers are worth reading after the material above.

These are useful extensions of the chapter because they show what happens when feature-rich recommendation is pushed into industrial ads ranking. They should not be read as broad recommender-system blueprints or as the state of the art for recommender systems in general. They are narrower and more specific: both are ads CTR architecture papers shaped by impression-level prediction, extreme scale, and production serving constraints.

Previous
Next