GV249 Seminar AT9: Hypothesis Testing

Lennard Metson

2024-12-03

✅ Lesson plan

Dr. Tóth will go through the formatives in the lecture.

🔁 Recap of hypothesis testing

📏 Estimates and estimands

  • Estimates are our attempt to measure properties of a variable (the “true” value of which is called the estimand). Examples of estimands/estimates:

    • The mean, median, variance, covariance, correlation, etc.
    • Difference in means; regression coefficients

📏 Estimates and estimands

  • Estimates are our attempt to measure properties of a variable (the “true” value of which is called the estimand). Examples of estimands/estimates:

    • The mean, median, variance, covariance, correlation, etc.
    • Difference in means; regression coefficients
  • Estimates themselves have statistical properties—these tell us how confident we should be that the estimate is close to the estimand.

🧪 Hypotheses and their nulls

Hypothesis:

  • Comes from a theoretical expectation, e.g., there is a correlation between X and Y.

  • We then translate this into a quantifiable test: the value of our estimand (e.g., OLS \(\beta\) coefficient) is bigger than 0.

Null hypothesis:

  • The inverse of our hypothesis.
  • Usually that the true value of our estimand is 0.

🧪 Hypothesis testing

  • We need to be able to assess the certainty of our estimate to reject the null hypothesis with confidence.

📐 Standard error

The standard error is a statistical property of an estimate (for example, the mean, regression coefficient, etc.). It is the standard deviation of many estimates from the same sample.

\[ SE_{\text{mean}(X)} = \frac{\sigma}{\sqrt{n}} \]

In regression, the standard error of the beta coefficient is calculated differently, but it has roughly the same interpretation—see slide 43 from the AT9 lecture slides!

📐 \(t\)-value

  • The \(t\)-value is the ratio between the size of an estimator (think OLS coefficient) and its standard error.

  • We usually use 2 (negative or positive) as a benchmark for significance.

📐 Confidence interval

  • We usually think about the 95% confidence interval, but we can construct any confidence interval (83%, 90%, 99% are common ones).
    • Each has a different critical value (the number of SEs above and below the estimate). For 95%, the value is 1.96—the CI range is \([\text{estimate} \pm 1.96 \times SE]\).

📐 Confidence interval

  • The X% confidence interval is the percentage of confidence intervals that would include the true estimand if we repeatedly calculated the estimate (and their CIs).
  • This blog contains some interesting points about how we can understand and communicate statistical uncertainty, with a heavy focus on confidence intervals.

📐 \(p\)-value

  • The \(p\)-value is calculated from the \(t\)-value. It is a different way of talking about certainty but will never differ in what it’s showing compared to the \(t\)-value.
  • The \(p\)-value can be interpreted as the probability we would have obtained an estimate as large as the one we did by chance alone if the true estimand was 0.

📐 \(p\)-value

  • The lower the \(p\)-value, the more confident we can be that the null hypothesis is not true.
  • Conventionally in social science, we are looking to see whether \(p < 0.05\). 1

📔 Reading regression tables

📔 Reading regression tables

For your notes:

  • This blog post has a very detailed guide to interpreting regression outputs in R. Remember, though, that in this course we are interested in:
    • Intercept (\(\alpha\))
    • \(\beta\) coefficients & their \(t\)-values and \(p\)-values

📔 Exercises

  • In groups, look at the handouts. Make sure you understand the variables and discuss how you might interpret them.

  • After a few minutes, I’ll ask you questions about how we can interpret these regression models.

💻 Lab