GV249 Seminar AT5: Causation

Lennard Metson

2024-11-12

Correlation \(\neq\) causation

📖 Key reading

Bueno de Mesquita & Fowler - Thinking Clearly with Data - Chapters 2 & 9

Causation

  • Correlation \(\neq\) causation… but what does?
  • Counterfactuals!

Causation

Definition

A is caused by B when A would not have happened if B had not happened.

  • A causal effect is the comparison of the value of A in a world in which B happened versus a world in which B didn’t happen.
  • These two values of A are called potential outcomes

Causation - Potential outcomes

  • \(Y_{i}^{1}\)
  • \(Y_{i}^{0}\)

😟 The fundamental problem of causal inference

⚠️ The problem

We cannot observe both potential outcomes for an individual.

  • Once they are treated, we only observe their treated potential outcome.
  • And when they are untreated, we only observe their untreated potential outcome.
  • Causal inference is a field of statistics focused on getting around this problem.

😌 Solving the fundamental problem

  • Causal inference methods try to fill in missing potential outcomes using information from treated and untreated units.

😌 Solving the fundamental problem

  • The answer lies in comparing average potential outcomes across treated and untreated units.

\[ \text{mean}(Y_{i}^{1}-Y_{i}^{0}) = \text{mean}(Y_{i}^{1}) - \text{mean}(Y_{i}^{0}) \]

🧪 Treatment effects

🧪 ATE

  • The ATE is the mean of the individual-level treatment effect (\(\tau_i\)) for all units (\(\text{mean}(Y_i^1 - Y_i^0)\)).
  • This is mathematically equal to \(\text{mean}(Y_{i}^{1}) - \text{mean}(Y_{i}^{0})\)

🧪 ATT & ATU

  • ATT is the mean \(\tau_i\) for the subset of units who received treatment.

  • ATU is the mean \(\tau_i\) for the subset of units who did not receive treatment.

  • The ATT and ATU are different from the ATE when there is biased (i.e., non-random) selection into who receives treatment.

📺 Example: news and attitudes

📺 Set-up

  • Imagine political attitudes range from 0-5, with 0 being the most left-wing and 5 being the most right-wing.
  • We are interested in whether watching a left-wing news (IV) channel makes individuals more left wing (DV). This is a causal research question!

📺 Set-up

We have some variables:

  • \(X\) = individual \(i\)’s political attitudes before watching left-leaning news (or not).
  • \(D\) (IV) = whether \(i\) watches left-leaning news.
  • \(Y\) (DV) = \(i\)’s political attitudes after watching left-leaning news (or not).

📺 The counter-factual world

  • In the counter-factual world, we can see both potential outcomes.
  • Which means we can calculate the individual causal effect for each individual.
  • On the next slide we can see the full schedule of potential outcomes for 10 people.

📺 The counter-factual world

\(i\) \(X_i\) \(D_i\) \(Y_{i}^{0}\) \(Y_{i}^{1}\) \(\tau_i\)
\(i_1\) 5 5 4
\(i_2\) 4 3 3
\(i_3\) 3 4 5
\(i_4\) 4 4 3
\(i_5\) 5 5 5
\(i_6\) 2 3 3
\(i_7\) 3 2 3
\(i_8\) 2 1 2
\(i_9\) 1 1 1
\(i_{10}\) 1 1 2

📺 The counter-factual world

\(i\) \(X_i\) \(D_i\) \(Y_{i}^{0}\) \(Y_{i}^{1}\) \(\tau_i\)
\(i_1\) 5 5 4 -1
\(i_2\) 4 3 3 0
\(i_3\) 3 4 5 1
\(i_4\) 4 4 3 -1
\(i_5\) 5 5 5 0
\(i_6\) 2 3 3 0
\(i_7\) 3 2 3 1
\(i_8\) 2 1 2 1
\(i_9\) 1 1 1 0
\(i_{10}\) 1 1 2 1
  • \(ATE = mean(\tau_i) = 0.2\)

  • \(mean(Y_{i}^{1}) = 3.1\)

  • \(mean(Y_{i}^{0}) = 2.9\)

  • \(ATE = 3.1-2.9 = 0.2\)

📺 The real-world (1)

  • However, in the real-world, we only see one potential outcome.
  • More right-wing people are less likely to choose to watch left-wing news.
  • The correlation between pre-treatment variables and selecting into treatment is selection bias.
  • Let’s see what happens when we compare potential outcomes letting individuals choose whether they watch left-wing news.

📺 The real-world (1)

\(i\) \(X_i\) \(D_i\) \(Y_{i}^{0}\) \(Y_{i}^{1}\)
\(i_1\) 5 0 5 ?
\(i_2\) 4 0 3 ?
\(i_3\) 3 0 4 ?
\(i_4\) 4 1 ? 3
\(i_5\) 5 0 5 ?
\(i_6\) 2 1 ? 3
\(i_7\) 3 1 ? 3
\(i_8\) 2 0 1 ?
\(i_9\) 1 1 ? 1
\(i_{10}\) 1 1 ? 2
  • In the real world, we only see one potential outcome. This is defined by the value of \(D_i\).
  • We refer to the observed outcome as \(Y_i\).

📺 The real-world (1)

\(i\) \(X_i\) \(D_i\) \(Y_i\)
\(i_1\) 5 0 5
\(i_2\) 4 0 3
\(i_3\) 3 0 4
\(i_4\) 4 1 3
\(i_5\) 5 0 5
\(i_6\) 2 1 3
\(i_7\) 3 1 3
\(i_8\) 2 0 1
\(i_9\) 1 1 1
\(i_{10}\) 1 1 2
  • In the real world, we only see one potential outcome. This is defined by the value of \(D_i\).
  • We refer to the observed outcome as \(Y_i\).

📺 The real-world (1)

\(i\) \(X_i\) \(D_i\) \(Y_i\)
\(i_1\) 5 0 5
\(i_2\) 4 0 3
\(i_3\) 3 0 4
\(i_4\) 4 1 3
\(i_5\) 5 0 5
\(i_6\) 2 1 3
\(i_7\) 3 1 3
\(i_8\) 2 0 1
\(i_9\) 1 1 1
\(i_{10}\) 1 1 2
  • \(mean(Y_i|D_i = 1)\) = 2.4

  • \(mean(Y_i|D_i = 0)\) = 3.6

  • Estimated ATE = \(2.4 - 3.6 = -1.2\)

📺 The real-world (1)

\(i\) \(X_i\) \(D_i\) \(Y_i\)
\(i_1\) 5 0 5
\(i_2\) 4 0 3
\(i_3\) 3 0 4
\(i_4\) 4 1 3
\(i_5\) 5 0 5
\(i_6\) 2 1 3
\(i_7\) 3 1 3
\(i_8\) 2 0 1
\(i_9\) 1 1 1
\(i_{10}\) 1 1 2
  • \(mean(Y_i|D_i = 1)\) = 2.4

  • \(mean(Y_i|D_i = 0)\) = 3.6

  • Estimated ATE = \(2.4 - 3.6 = -1.2\)

What’s the problem?

📺 The real-world (2)

  • Now imagine we can decide who watches left-wing news or not.
  • We can randomly select individuals to reveal their treated (and untreated) potential outcomes.
  • Because we select them randomly, there is no selection bias: individuals have the same probability of watching left-wing news regardless of their underlying attitudes.

📺 The real-world (2)

\(i\) \(X_i\) \(D_i\) \(Y_i\)
\(i_1\) 5 0 5
\(i_2\) 4 1 3
\(i_3\) 3 1 5
\(i_4\) 4 0 4
\(i_5\) 5 1 5
\(i_6\) 2 0 3
\(i_7\) 3 0 2
\(i_8\) 2 0 1
\(i_9\) 1 1 1
\(i_{10}\) 1 1 2

📺 The real-world (2)

\(i\) \(X_i\) \(D_i\) \(Y_i\)
\(i_1\) 5 0 5
\(i_2\) 4 1 3
\(i_3\) 3 1 5
\(i_4\) 4 0 4
\(i_5\) 5 1 5
\(i_6\) 2 0 3
\(i_7\) 3 0 2
\(i_8\) 2 0 1
\(i_9\) 1 1 1
\(i_{10}\) 1 1 2
  • \(mean(Y_i|D_i = 1)\) = 3.2

  • \(mean(Y_i|D_i = 0)\) = 3

  • Estimated ATE = \(3.2 - 3 = 0.2\) 1

📺 What does this tell us about causal inference?

  • Sometimes, we can use information about the average outcomes for treated and untreated units to estimate average treatment effects.
  • Where there is no selection bias, we can directly compare the mean outcomes of two groups.
  • When there is selection bias, we need to use alternative strategies to “fill in” missing potential outcomes.

📺 Broockman and Kalla (2022) did this for real!

  • On a sample of \(\approx\) 5,500 American voters, Broockman and Kalla (2022) randomized encouragement to watch CNN (rather than Fox News).
  • They find evidence that actually, there is an effect, even when you randomise!

📊 Visualisation: extra resources

References

Broockman, David, and Joshua Kalla. 2022. “Consuming Cross-Cutting Media Causes Learning and Moderates Attitudes: A Field Experiment with Fox News Viewers.” Preprint. Open Science Framework. https://doi.org/10.31219/osf.io/jrw26.