7. References and Further Study
Use these references to deepen specific topics after working through the handbook.
Core documentation
- scikit-learn: Decision Trees
- scikit-learn: Ensemble Methods
- scikit-learn: Permutation Feature Importance
- XGBoost documentation
- LightGBM documentation
- CatBoost documentation
Foundational reading
- Random Forests Leo Breiman’s original random forest paper.
- Extremely randomized trees The original ExtraTrees paper by Geurts, Ernst, and Wehenkel.
- Greedy Function Approximation: A Gradient Boosting Machine Jerome Friedman’s classic gradient boosting paper.
- Understanding Random Forests: From Theory to Practice A practical and readable reference for tree ensembles, bias-variance intuition, and random forest variants.
Deep learning versus trees on tabular data
- Tabular Data: Deep Learning is Not All You Need A useful benchmark-style paper showing that XGBoost often outperformed several deep tabular models on the evaluated datasets while also requiring less tuning effort.
- Why do tree-based models still outperform deep learning on typical tabular data? A strong reference for the argument that tree-based models remain the default baseline on many medium-sized tabular tasks, and for the more detailed discussion of why tabular neural networks struggle.
Modern tree libraries
- XGBoost: A Scalable Tree Boosting System
- LightGBM: A Highly Efficient Gradient Boosting Decision Tree
- CatBoost: unbiased boosting with categorical features
Topics worth following up
- post-pruning strategies for single trees
- out-of-bag error versus cross-validation
- correlated features and importance instability
- when permutation importance is more trustworthy than impurity importance
- handling categorical features across boosted-tree libraries
Suggested next step after this course
Take one real tabular workflow from your own work and compare:
- a regularized single tree
- a random forest or ExtraTrees model
- one boosting implementation
Then write down:
- what changed in performance
- what changed in interpretability
- what changed in tuning burden
That comparison usually teaches more than reading another round of definitions.