2. Explicit vs. Implicit Feedback
As in the reference article, the first key split is the type of supervision.
2.1 Explicit feedback
Examples:
- Star ratings
- Like/dislike labels
- Written reviews with sentiment scores
Pros:
- Direct preference signal
- Easier to define regression-style losses
Cons:
- Sparse in most real products
- Selection bias (only some users rate)
2.2 Implicit feedback
Examples:
- Clicks
- Watch time
- Purchases
- Add-to-cart, save, dwell
Pros:
- High volume
- Better behavioral coverage
Cons:
- Noisy preference proxy
- Requires careful negative sampling and weighting
In both cases, interactions define a sparse user-item matrix with entries over user-item pairs
2.3 Recommendation tasks
Following D2L Chapter 21, it helps to separate recommendation work by task:
- Rating prediction: estimate a user’s explicit rating for an item
- Top-
recommendation: rank candidate items and return a personalized list - Sequence-aware recommendation: use ordered behavior and timestamps
- Click-through rate prediction: predict whether a shown item or ad will be clicked
- Cold-start recommendation: serve new users or new items when history is limited
These tasks overlap, but they drive different labels, evaluation protocols, and model choices.
2.4 Benchmark datasets and split strategy
The MovieLens 100K dataset remains the standard conceptual benchmark for explicit-feedback recommendation.
- 100,000 ratings
- 943 users
- 1,682 movies
- Ratings from 1 to 5
- Approximate matrix sparsity of 93.7%
Two split strategies from D2L are especially useful in practice:
- Random split for rating prediction and general offline evaluation
- Sequence-aware split, where the most recent interaction is held out per user
This distinction matters because sequence-aware recommendation should be evaluated with a chronological split, not a random one.