8. Mini-Project

The best way to internalize the course is to run one compact end-to-end project. Keep it small enough to finish, but complete enough to practice the full workflow.

Project goal

Choose a tabular dataset and solve one clearly scoped prediction task.

Good examples:

house-price prediction
loan default classification
delivery-time regression
customer churn classification
marketing response prediction

Minimum deliverables

Your project should include:

a short problem statement
a description of the target and candidate features
an evaluation plan with metric justification
one simple baseline
one stronger comparison model
a short reflection on feature choices, errors, and trade-offs

Recommended workflow

Step 1: Frame the task

what decision will this prediction support?
what data is available at prediction time?
what would a reasonable naive baseline do?

Step 2: Inspect the data

summary statistics
missingness profile
class balance or target distribution
obvious outliers or suspicious values

Step 3: Build a baseline

Start simple.

logistic or linear regression
a shallow tree
a majority-class or mean predictor when appropriate

The point is to learn the baseline difficulty before you earn the right to use a more flexible model.

Step 4: Add one stronger model

Pick one:

random forest
gradient boosting
a carefully structured pipeline with engineered features

Step 5: Reflect

Write a short decision memo:

Which model would you actually ship first?
What is the main source of uncertainty?
What additional data or labeling would help most?

Optional stretch goals

compare precision-recall trade-offs under multiple thresholds
add a text feature or grouped categorical feature
use cross-validation instead of one split
package the workflow in a reproducible sklearn pipeline

Final checkpoint

If you can clearly explain why your final model is preferable to your baseline, this course has done its job.

Last updated on Sat, Mar 14, 2026