9. References and Further Study

Use these references to deepen specific topics after working through the handbook.

Core learning resources

Good follow-on topics

calibration and threshold selection
feature importance and interpretability
temporal validation for forecasting-style tabular tasks
probability estimation and decision analysis
monitoring, drift, and retraining policy

Modern benchmarks and foundation models

TabArena live leaderboard The most useful live reference for current tabular performance comparisons. As of March 14, 2026, its public no-imputation / lite / all-tasks / all-datasets board lists RealTabPFN-v2.5 (tuned + ensembled) at Elo 1648, AutoGluon 1.4 (extreme, 4h) at 1640, and strong tuned-plus-ensembled tree baselines like LightGBM (1440), CatBoost (1414), and XGBoost (1387). Use the live board for the latest ranking because these results can change over time.
TabArena A living benchmark for tabular machine learning that focuses on realistic evaluation, curated datasets, multiple splits, and strong tuning practices. It is a useful reference point for understanding how serious tabular comparisons are increasingly done.
TabPFN and Prior Labs documentation TabPFN is a tabular foundation model family trained on large amounts of synthetic data so it can learn reusable tabular patterns and perform strong prediction with very little task-specific tuning. It is worth knowing because it changes what a strong modern baseline can look like.
TabPFN-2.5 TabPFN-2.5 is the newer generation in that family. It extends the same synthetic pre-training idea to larger tabular settings and is worth tracking as part of the current movement toward foundation-model-style workflows for structured data.
TabPFN-TS TabPFN-TS adapts the TabPFN approach to time series forecasting by reframing forecasting as a tabular regression problem and combining the model with lightweight feature engineering for zero-shot forecasting tasks.

AutoML libraries and platforms

AutoGluon Tabular A strong practical choice for tabular AutoML when you want competitive baselines, ensembles, and a relatively compact Python API.
FLAML AutoML A lightweight AutoML library designed around efficient search and explicit time-budget control.
auto-sklearn A scikit-learn-oriented AutoML system that is especially natural if your workflow already centers on sklearn pipelines and conventions.
H2O AutoML A leaderboard-style AutoML system that trains multiple model families and stacked ensembles inside the H2O ecosystem.

Benchmark references for AutoML vs baselines

AMLB: an AutoML Benchmark A strong starting reference for comparing major AutoML systems against each other. The benchmark compares 9 AutoML frameworks across 71 classification tasks and 33 regression tasks, and evaluates not just accuracy but also inference-time trade-offs and framework failures.
AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data Useful when you want a direct framework comparison involving AutoGluon, H2O, auto-sklearn, TPOT, AutoWEKA, and Google AutoML Tables on 50 classification and regression tasks from Kaggle and the OpenML AutoML Benchmark.
Auto-Sklearn 2.0: Hands-free AutoML via Meta-Learning Useful for understanding how a modern auto-sklearn variant compares with earlier auto-sklearn and other popular AutoML frameworks on 39 benchmark datasets under time constraints.
FLAML: A Fast and Lightweight AutoML Library Especially useful when the comparison question is not only accuracy but also budget efficiency, since the paper emphasizes equal-budget and low-compute comparisons against other AutoML libraries.
H2O AutoML: Scalable Automatic Machine Learning Useful for understanding the H2O AutoML design itself and for tracing how it was positioned relative to other open-source AutoML systems in OpenML benchmark-style evaluations.
TabArena: A Living Benchmark for Machine Learning on Tabular Data Particularly useful when you want to compare strong modern methods against tuned non-AutoML baselines such as CatBoost, LightGBM, XGBoost, Random Forest, and newer deep or foundation models. This is a better reference than older AutoML-only benchmarks when the question is how AutoML-style workflows compare with today’s strongest tree-based methods.

For comparisons among tools that are not all shown on the same public live board, combine the live TabArena leaderboard with the framework-specific benchmark papers above. That gives a better picture than relying on any single leaderboard snapshot alone.

Topic-specific further reading

Suggested next step after this course

After finishing this short course, a natural next move is to take one workflow from your own work and rewrite it using the checklist below:

target and decision
split strategy
metric choice
preprocessing pipeline
baseline model
stronger comparison model
risk and monitoring notes

That exercise usually turns passive understanding into applied competence much faster than reading more theory.

Last updated on Sat, Mar 14, 2026