9. References and Further Study

Use these references to deepen specific topics after working through the handbook.

Core learning resources

Good follow-on topics

  • calibration and threshold selection
  • feature importance and interpretability
  • temporal validation for forecasting-style tabular tasks
  • probability estimation and decision analysis
  • monitoring, drift, and retraining policy

Modern benchmarks and foundation models

  • TabArena live leaderboard The most useful live reference for current tabular performance comparisons. As of March 14, 2026, its public no-imputation / lite / all-tasks / all-datasets board lists RealTabPFN-v2.5 (tuned + ensembled) at Elo 1648, AutoGluon 1.4 (extreme, 4h) at 1640, and strong tuned-plus-ensembled tree baselines like LightGBM (1440), CatBoost (1414), and XGBoost (1387). Use the live board for the latest ranking because these results can change over time.
  • TabArena A living benchmark for tabular machine learning that focuses on realistic evaluation, curated datasets, multiple splits, and strong tuning practices. It is a useful reference point for understanding how serious tabular comparisons are increasingly done.
  • TabPFN and Prior Labs documentation TabPFN is a tabular foundation model family trained on large amounts of synthetic data so it can learn reusable tabular patterns and perform strong prediction with very little task-specific tuning. It is worth knowing because it changes what a strong modern baseline can look like.
  • TabPFN-2.5 TabPFN-2.5 is the newer generation in that family. It extends the same synthetic pre-training idea to larger tabular settings and is worth tracking as part of the current movement toward foundation-model-style workflows for structured data.
  • TabPFN-TS TabPFN-TS adapts the TabPFN approach to time series forecasting by reframing forecasting as a tabular regression problem and combining the model with lightweight feature engineering for zero-shot forecasting tasks.

AutoML libraries and platforms

  • AutoGluon Tabular A strong practical choice for tabular AutoML when you want competitive baselines, ensembles, and a relatively compact Python API.
  • FLAML AutoML A lightweight AutoML library designed around efficient search and explicit time-budget control.
  • auto-sklearn A scikit-learn-oriented AutoML system that is especially natural if your workflow already centers on sklearn pipelines and conventions.
  • H2O AutoML A leaderboard-style AutoML system that trains multiple model families and stacked ensembles inside the H2O ecosystem.

Benchmark references for AutoML vs baselines

  • AMLB: an AutoML Benchmark A strong starting reference for comparing major AutoML systems against each other. The benchmark compares 9 AutoML frameworks across 71 classification tasks and 33 regression tasks, and evaluates not just accuracy but also inference-time trade-offs and framework failures.
  • AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data Useful when you want a direct framework comparison involving AutoGluon, H2O, auto-sklearn, TPOT, AutoWEKA, and Google AutoML Tables on 50 classification and regression tasks from Kaggle and the OpenML AutoML Benchmark.
  • Auto-Sklearn 2.0: Hands-free AutoML via Meta-Learning Useful for understanding how a modern auto-sklearn variant compares with earlier auto-sklearn and other popular AutoML frameworks on 39 benchmark datasets under time constraints.
  • FLAML: A Fast and Lightweight AutoML Library Especially useful when the comparison question is not only accuracy but also budget efficiency, since the paper emphasizes equal-budget and low-compute comparisons against other AutoML libraries.
  • H2O AutoML: Scalable Automatic Machine Learning Useful for understanding the H2O AutoML design itself and for tracing how it was positioned relative to other open-source AutoML systems in OpenML benchmark-style evaluations.
  • TabArena: A Living Benchmark for Machine Learning on Tabular Data Particularly useful when you want to compare strong modern methods against tuned non-AutoML baselines such as CatBoost, LightGBM, XGBoost, Random Forest, and newer deep or foundation models. This is a better reference than older AutoML-only benchmarks when the question is how AutoML-style workflows compare with today’s strongest tree-based methods.

For comparisons among tools that are not all shown on the same public live board, combine the live TabArena leaderboard with the framework-specific benchmark papers above. That gives a better picture than relying on any single leaderboard snapshot alone.

Topic-specific further reading

Suggested next step after this course

After finishing this short course, a natural next move is to take one workflow from your own work and rewrite it using the checklist below:

  • target and decision
  • split strategy
  • metric choice
  • preprocessing pipeline
  • baseline model
  • stronger comparison model
  • risk and monitoring notes

That exercise usually turns passive understanding into applied competence much faster than reading more theory.

Previous