9. References and Further Study
Use these references to deepen specific topics after working through the handbook.
Core learning resources
Good follow-on topics
- calibration and threshold selection
- feature importance and interpretability
- temporal validation for forecasting-style tabular tasks
- probability estimation and decision analysis
- monitoring, drift, and retraining policy
Modern benchmarks and foundation models
- TabArena live leaderboard
The most useful live reference for current tabular performance comparisons. As of March 14, 2026, its public no-imputation / lite / all-tasks / all-datasets board lists
RealTabPFN-v2.5 (tuned + ensembled)at Elo1648,AutoGluon 1.4 (extreme, 4h)at1640, and strong tuned-plus-ensembled tree baselines likeLightGBM(1440),CatBoost(1414), andXGBoost(1387). Use the live board for the latest ranking because these results can change over time. - TabArena A living benchmark for tabular machine learning that focuses on realistic evaluation, curated datasets, multiple splits, and strong tuning practices. It is a useful reference point for understanding how serious tabular comparisons are increasingly done.
- TabPFN and Prior Labs documentation TabPFN is a tabular foundation model family trained on large amounts of synthetic data so it can learn reusable tabular patterns and perform strong prediction with very little task-specific tuning. It is worth knowing because it changes what a strong modern baseline can look like.
- TabPFN-2.5 TabPFN-2.5 is the newer generation in that family. It extends the same synthetic pre-training idea to larger tabular settings and is worth tracking as part of the current movement toward foundation-model-style workflows for structured data.
- TabPFN-TS TabPFN-TS adapts the TabPFN approach to time series forecasting by reframing forecasting as a tabular regression problem and combining the model with lightweight feature engineering for zero-shot forecasting tasks.
AutoML libraries and platforms
- AutoGluon Tabular A strong practical choice for tabular AutoML when you want competitive baselines, ensembles, and a relatively compact Python API.
- FLAML AutoML A lightweight AutoML library designed around efficient search and explicit time-budget control.
- auto-sklearn A scikit-learn-oriented AutoML system that is especially natural if your workflow already centers on sklearn pipelines and conventions.
- H2O AutoML A leaderboard-style AutoML system that trains multiple model families and stacked ensembles inside the H2O ecosystem.
Benchmark references for AutoML vs baselines
- AMLB: an AutoML Benchmark A strong starting reference for comparing major AutoML systems against each other. The benchmark compares 9 AutoML frameworks across 71 classification tasks and 33 regression tasks, and evaluates not just accuracy but also inference-time trade-offs and framework failures.
- AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data Useful when you want a direct framework comparison involving AutoGluon, H2O, auto-sklearn, TPOT, AutoWEKA, and Google AutoML Tables on 50 classification and regression tasks from Kaggle and the OpenML AutoML Benchmark.
- Auto-Sklearn 2.0: Hands-free AutoML via Meta-Learning Useful for understanding how a modern auto-sklearn variant compares with earlier auto-sklearn and other popular AutoML frameworks on 39 benchmark datasets under time constraints.
- FLAML: A Fast and Lightweight AutoML Library Especially useful when the comparison question is not only accuracy but also budget efficiency, since the paper emphasizes equal-budget and low-compute comparisons against other AutoML libraries.
- H2O AutoML: Scalable Automatic Machine Learning Useful for understanding the H2O AutoML design itself and for tracing how it was positioned relative to other open-source AutoML systems in OpenML benchmark-style evaluations.
- TabArena: A Living Benchmark for Machine Learning on Tabular Data Particularly useful when you want to compare strong modern methods against tuned non-AutoML baselines such as CatBoost, LightGBM, XGBoost, Random Forest, and newer deep or foundation models. This is a better reference than older AutoML-only benchmarks when the question is how AutoML-style workflows compare with today’s strongest tree-based methods.
For comparisons among tools that are not all shown on the same public live board, combine the live TabArena leaderboard with the framework-specific benchmark papers above. That gives a better picture than relying on any single leaderboard snapshot alone.
Topic-specific further reading
- An overview of gradient descent optimization algorithms
- Understanding Random Forests: From Theory to Practice
- Interpretable Machine Learning
Suggested next step after this course
After finishing this short course, a natural next move is to take one workflow from your own work and rewrite it using the checklist below:
- target and decision
- split strategy
- metric choice
- preprocessing pipeline
- baseline model
- stronger comparison model
- risk and monitoring notes
That exercise usually turns passive understanding into applied competence much faster than reading more theory.