6. Mini-Project

The best way to internalize the course is to run one small end-to-end project where tree-based models are the main focus rather than just one option among many.

Project idea: Predict pet adoption time

One natural project for this course is to predict how quickly a pet will be adopted based on structured profile information.

Potential inputs could include:

age
breed or mix
health and vaccination status
shelter metadata
fee information
short text fields or descriptions

This project works well because it can be framed as either:

a regression problem
an ordinal or multiclass prediction problem
a ranking problem if the goal is prioritization

Minimum deliverables

Your project should include:

a clear task definition
a dataset description and feature inventory
one single-tree baseline
one forest-style ensemble
one boosting model
a short analysis of feature importance and model behavior

Recommended workflow

Step 1: Frame the task

What exact outcome are you predicting?
What information is available at prediction time?
Is the target better treated as numeric, ordinal, or categorical?

Step 2: Build a simple tree

Start with a regularized decision tree and use it to learn:

what the first few important splits look like
how depth changes fit
whether the tree already exposes meaningful structure

Step 3: Add an ensemble

Train a random forest or ExtraTrees model and compare:

validation performance
stability
feature-importance behavior

Step 4: Add a boosting model

Train one of:

XGBoost
LightGBM
CatBoost

Then compare it against the forest rather than assuming it should automatically win.

Step 5: Reflect

Write a short decision memo:

Which model would you ship first?
Which model is easiest to explain?
Which feature signals feel trustworthy, and which need more scrutiny?

Stretch goals

compare impurity importance versus permutation importance
inspect whether correlated variables change the interpretation
add a text-derived feature block
compare OOB error to cross-validation

Final checkpoint

If you can explain why your final choice is better than a single tree, you have learned the most important lesson of the course.

Last updated on Sat, Mar 14, 2026