Causal Inference Method Selector
How to choose a causal inference method: the basics.
Choose causal inference methods from study design, identification strategy, and business objective.
This tool uses a decision-tree backbone centered on identification structure, but it returns multiple viable methods with assumptions and follow-up checks rather than forcing a single branch.
Study Setup
Answer the questions that matter for identification. The tool will adapt the later questions to your design.
Start with whether treatment assignment was randomized or not.
Pick the main causal question, not every downstream analysis you may run later.
Design Signals
Examples: pre-period spend, trips, clicks, or repeated baseline outcome measurements.
Use this only for experimental settings.
Examples: shared driver supply, seller liquidity, auction budgets, inventory competition, or social-network spillovers.
Example: assigned users do not always adopt the feature, or encouragement differs from uptake.
Needed for methods such as difference-in-differences, interrupted time series, and synthetic control.
Examples: one city launch, one state regulation, one platform-wide intervention.
Examples: age cutoff, credit score threshold, policy eligibility boundary.
The instrument must shift treatment strongly and affect the outcome only through treatment.
If treated and untreated units barely overlap in covariate space, many adjustment methods become unstable.
Think high-cardinality features, rich user history, text, or many confounders.
Recommended Methods
The tool shows a primary recommendation, strong fallbacks, and identification warnings.
Suggested workflow
Use the first method as the working analysis plan, then benchmark it against a strong fallback or diagnostic.
Randomization is the primary source of identification.
Validate this first before adding more estimator complexity.
Treatment assignment is randomized and sufficiently implemented
Randomized experiment with covariate-adjusted analysis
Use intention-to-treat as the baseline estimate, with regression adjustment or stratification for precision and imbalance control.
Why it fits
- Randomization is the primary source of identification.
Critical assumptions
- Treatment assignment is randomized and sufficiently implemented
- No material interference or spillovers across units unless explicitly modeled
- Outcome measurement and variance estimation match the assignment unit
Pros
- Strongest identification strategy when assignment really is random.
- Easy to explain to product, ops, and leadership stakeholders.
- Clean fit for launch, pricing, and guardrail decisions.
Cons
- Can be expensive, slow, or operationally disruptive to run well.
- Spillovers, attrition, or leakage can quietly break identification.
- A single average effect can hide meaningful segment heterogeneity.
What to validate next
- Check balance, attrition, and treatment leakage
- Cluster standard errors if assignment was clustered
- Report ITT before any treatment-on-treated analysis
Representative industry use cases
- Spotify Engineering: experimentation platformPlatformized product experimentation with managed configuration, metric catalogs, and consistent analysis for many concurrent tests.
- Wayfair Tech Blog: geo experimentsMarket-level randomized experiments to measure incrementality when user-level assignment is infeasible.
Popular book references
- Trustworthy Online Controlled ExperimentsCh. 2, 'Running and Analyzing Experiments: An End-to-End Example.'
- Causal Inference for Data ScienceCh. 1, 'Introducing causality,' including A/B testing and RCT basics.
- Mostly Harmless EconometricsCh. 2, 'The Experimental Ideal.'
Suggested packages
- StatsmodelsUse for regression adjustment, robust standard errors, and baseline econometric estimators.
- PyFixestUse for clustered or high-dimensional fixed-effects regressions when experiments are run over panels, markets, or repeated outcomes.
- linearmodelsUse for absorbed fixed effects and panel-robust inference when randomized experiments are analyzed at user, geo, or time-cell level.
Also consider: CUPED, effect among compliers (CACE / LATE), heterogeneity models.
Why this is not a rigid one-path decision tree
- Many applied problems support more than one defensible method.
- Identification assumptions matter more than the algorithm name.
- Practitioners often need a primary method plus a robustness check, not a single branch answer.
- The best workflow is usually design first, estimator second, diagnostics third.
This selector therefore uses a decision-tree backbone but returns method cards with fit, assumptions, and what to validate next.
Methods covered
- Randomized experiment analysis with covariate adjustment
- Switchback experiments for interference-heavy marketplaces or networks
- CUPED / pre-period variance reduction
- Effect among compliers (CACE / LATE) via IV for noncompliance
- Heterogeneous treatment effect models such as causal forests, uplift models, and meta-learners
- Mediation analysis
- Matching and propensity-score weighting
- Doubly robust estimators such as AIPW and double machine learning
- Difference-in-differences and event-study style designs
- Interrupted time series and synthetic control
- Regression discontinuity design
- Instrumental variables for observational settings