Causal Inference Resources
A broad reference library for learning, applying, and staying current with causal inference. It is organized by format and use case so you can move from foundations to methods to production practice.
Most readers do not need every section. If you are new to the area, start with the short guides below and then jump into the subsection that matches your problem.
1) Start Here
- Foundations first: start with Causal Inference: What If, The Effect, and Causal Inference: The Mixtape.
- Product experimentation: start with Decision Making at Netflix, Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-Experiment Data, and Experiment Rigor for Switchback Experiment Analysis.
- Causal ML and personalization: start with Applied Causal Inference Powered by ML and AI, Double/Debiased Machine Learning for Treatment and Structural Parameters, and EconML.
- Observational policy evaluation: start with Causal Inference for Statistics, Social, and Biomedical Sciences, Matching on the Estimated Propensity Score, and Difference-in-Differences with Multiple Time Periods.
2) Choose by Problem
- Need better A/B tests or experimentation systems: use the industry experimentation section for platform design, variance reduction, trustworthy experimentation, and switchbacks.
- Need uplift, personalization, or treatment heterogeneity: combine the causal ML papers, the Python libraries section, and the heterogeneity case studies.
- Need matching or weighting for observational studies: use the method-specific package list for
MatchIt,WeightIt,cobalt,CBPS,optmatch, andPSweight, then pair them with the observational papers. - Need difference-in-differences or staggered adoption methods: use
did,did2s,DRDID,HonestDiD,eventStudyInteract, and the panel-method papers. - Need instrumental variables or regression discontinuity: start with the IV/RD package cluster plus the canonical IV and RD papers.
- Need time-varying treatment or longitudinal causal inference: start with the TMLE and longitudinal package cluster, then move to
Targeted Learning,Targeted Learning in Data Science, andCausal Inference: What If. - Need marketplace, network, or interference methods: jump directly to the interference papers and the marketplace case studies.
3) Books and Core References
Open / free books
- Advanced Data Analysis from an Elementary Point of View
- Applied Causal Inference Powered by ML and AI
- Causal Inference for the Brave and True
- Causal Inference: The Mixtape
- Causal Inference: What If
- The Effect
Print and paid books
- Causal Analysis
- Causal Inference and Discovery in Python
- Causal Inference for Statistics, Social, and Biomedical Sciences
- Causal Inference in Python
- Causal Inference in Statistics: A Primer
- Causality
- Counterfactuals and Causal Inference
- Design of Observational Studies
- Elements of Causal Inference
- Handbook of Causal Analysis for Social Research
- Impact Evaluation: Treatment Effects and Causal Analysis
- Mostly Harmless Econometrics
- Quasi-Experimentation: A Guide to Design and Analysis
- Targeted Learning
- Targeted Learning in Data Science
4) Courses, Lecture Notes, and Teaching Material
Video lecture series
- Applied Methods
- Causal Inference
- Causal Inference with Panel Data
- Causality Boot Camp
- Machine Learning and Causal Inference
- Mastering Mostly Harmless Econometrics
- Modern Sampling Methods: Design and Inference
- Modern Topics in Uncertainty Quantification
Slides and lecture notes
- A First Course in Causal Inference
- A User’s Guide to Statistical Inference and Regression
- Causal Econometrics
- Causal Inference and Machine Learning
- Causal Inference
- Causal Machine Learning
- Introduction to Causal Inference
- Introduction to Modern Causal Inference
- R Guide for TMLE in Medical Research
- Stefan Wager Causal Inference Notes
5) Libraries and Tooling
Python
- ananke
- causal-learn
- causal-tune
- CausalML
- CausalNex
- CausalPy
- DeepIV
- DoWhy
- DoubleML
- EconML
- GeoLift
- linearmodels
- metalearners
- pyfixest
- scikit-uplift
- Statsmodels
- trimmed_match
R
Julia
Method-specific econometrics and diagnostics
Matching, weighting, and balance
Difference-in-differences, event studies, and panel methods
Instrumental variables and regression discontinuity
TMLE and longitudinal treatment
6) Foundational and Canonical Papers
Foundations and DAGs
- A Crash Course in Good and Bad Controls
- Causal Diagrams for Empirical Research
- Causal Inference Using Potential Outcomes
- Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies
- Natural Experiments
Heterogeneous treatment effects and causal ML
- Adapting Neural Networks for the Estimation of Treatment Effects
- Double/Debiased Machine Learning for Treatment and Structural Parameters
- Empirical Analysis of Model Selection for Heterogeneous Causal Effect Estimation
- Machine Learning Estimation of Heterogeneous Causal Effects: Empirical Monte Carlo Evidence
- Meta-learners for Estimating Heterogeneous Treatment Effects using Machine Learning
- Quasi-Oracle Estimation of Heterogeneous Treatment Effects
- Towards Optimal Doubly Robust Estimation of Heterogeneous Causal Effects
Experiments, bandits, and interference
- Asymptotically Efficient Adaptive Allocation Rules
- Design and Analysis of Switchback Experiments
- Estimation Considerations in Contextual Bandits
- Exact P-values for Network Interference
- On Causal Inference in the Presence of Interference
- Time-uniform, Nonparametric, Nonasymptotic Confidence Sequences
- Toward Causal Inference With Interference
Observational and quasi-experimental methods
- Difference-in-Differences with Multiple Time Periods
- Difference-in-Differences with Variation in Treatment Timing
- Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score
- Identification and Estimation of Local Average Treatment Effects
- Identification and Estimation of Treatment Effects with a Regression-Discontinuity Design
- Large Sample Properties of Matching Estimators for Average Treatment Effects
- Matching on the Estimated Propensity Score
- Synthetic Difference In Differences Estimation
- The Central Role of the Propensity Score in Observational Studies for Causal Effects
Inference and robustness
- Robust Standard Errors in Small Samples: Some Practical Advice
- Sampling-Based versus Design-Based Uncertainty in Regression Analysis
- Targeted Maximum Likelihood Estimation for Causal Inference in Observational Studies
- When Should You Adjust Standard Errors for Clustering?
7) Tutorials, Surveys, and Practitioner Guides
- A Practical Introduction to Regression Discontinuity Designs: Extensions
- A Practical Introduction to Regression Discontinuity Designs: Foundations
- A Tutorial on Thompson Sampling
- Causal Models for Longitudinal and Panel Data: A Survey
- Group Sequential Designs: A Tutorial
- Using Synthetic Controls: Feasibility, Data Requirements, and Methodological Aspects
- What’s Trending in Difference-in-Differences? A Synthesis of the Recent Econometrics Literature
8) Industry Experimentation and Applied Case Studies
Experimentation platforms and systems
- Decision Making at Netflix
- Democratizing Online Controlled Experiments at Booking.com
- Experimentation Platform at Zalando: Part 1 - Evolution
- How We Reimagined A/B Testing at Squarespace
- How We Scaled Experimentation at Hulu
- Scaling Airbnb’s Experimentation Platform
- Spotify’s New Experimentation Platform
- Supporting Rapid Product Iteration with an Experimentation Analysis Platform
- Under the Hood of Uber’s Experimentation Platform
Power, variance reduction, and metrics
- Comparing Quantiles at Scale in Online A/B-Testing
- CUPED for Switchback Tests
- Deep Dive Into Variance Reduction
- How Booking.com Increases the Power of Online Experiments with CUPED
- How Meta Scaled Regression Adjustment to Improve Power Across Hundreds of Thousands of Experiments
- Improving Experimental Power through Control Using Predictions as Covariate (CUPAC)
- Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-Experiment Data
- Large-Scale Online Experimentation with Quantile Metrics
Network effects, switchbacks, and marketplace interference
- Budget-split Testing: A Trustworthy and Powerful Approach to Marketplace A/B Testing
- Detecting Interference: An A/B Test of A/B Tests
- Experiment Rigor for Switchback Experiment Analysis
- Experimental Design in Two-Sided Platforms: An Analysis of Bias
- How Meta Tests Products with Strong Network Effects
- Reducing Marketplace Interference Bias Via Shadow Prices
- Tips and Considerations for Switchback Test Designs
- Using Ego-Clusters to Measure Network Effects at LinkedIn
Heterogeneous treatment effects, personalization, and bandits
- Free Lunch! Retrospective Uplift Modeling for Dynamic Promotions Recommendation within ROI Constraints
- Heterogeneous Treatment Effects at Netflix
- Leveraging Causal Modeling to Get More Value from Flat Experiment Results
- Multi-Armed Bandits and the Stitch Fix Experimentation Platform
- Practical Bandits: An Industry Perspective
- Smarter Promotions With Causal Machine Learning
Quasi-experiments, synthetic controls, and counterfactual measurement
- Gaining Confidence in Synthetic Control Causal Inference with Sensitivity Analysis
- How to Use Quasi-experiments and Counterfactuals to Build Great Products
- Key Challenges with Quasi Experiments at Netflix
- Optimizing at the Edge: Using Regression Discontinuity Designs to Power Decision-Making
- Quasi Experimentation at Netflix
- Using Back-Door Adjustment Causal Analysis to Measure Pre-Post Effects
Trustworthy experimentation and diagnostics
- Addressing the Challenges of Sample Ratio Mismatch in A/B Testing
- Data Quality: Fundamental Building Blocks for Trustworthy A/B Testing Analysis
- Imbalance Detection for Healthier Experimentation
- Patterns of Trustworthy Experimentation: During-Experiment Stage
- Patterns of Trustworthy Experimentation: Post-Experiment Stage
- Patterns of Trustworthy Experimentation: Pre-Experiment Stage
- Why We Shouldn’t Condition on Posttreatment Variables in Experiments
9) Blogs and Ongoing Writing
Industry and applied experimentation
- Airbnb Engineering
- Booking AI
- DoorDash Engineering
- Instacart Data Science
- Microsoft Data Science
- Microsoft Experimentation Platform
- Netflix Tech Blog
- Spotify Data Science
- Spotify Research
- Uber Engineering
- Wayfair Data Science
- Zalando Engineering
Independent and academic writing
- Data Colada
- evanmiller.org
- Numbers, Letters, Sometimes Both
- Statistical Modeling, Causal Inference, and Social Science
- Statistical Odds and Ends
10) Talks, Seminars, and Communities
Talks and recorded lectures
- A Tutorial on Bayesian Causal Inference
- Always Valid Inference: Continuous Monitoring of A/B Test
- Analysis and Design of Multi-Armed Bandit Experiments and Policy Learning
- Causal Inference Libraries: What They Do, What I’d Like Them To Do
- Interference and Spillovers in Randomized Experiments
- Modern Balancing Methods for Causal Inference
- Regression Discontinuity Designs: Foundations
- Regression Discontinuity Designs: Practice and Topics
- Synthetic Controls: Methods and Practice