#install.packages(c("estimatr","readstata13", "tidyverse"))
library(tidyverse)
library(readstata13)
library(estimatr)
# setwd("WT/WT0_formative-revision")
<- read.dta13("BES_2017_problemset2.dta") df
2025-01-21
If you feel confident about the Formative and you would like more R practice, you can carry on the extra lab materials:
cses5_br.csv
and extra-questions.R
from the Len Seminar Slides folder.extra-questions_solutions.R
to help.At a conceptual level, what is the difference between validity and reliability in measurement? (3 marks)
Give an example of a measure that is reliable, but not valid. (2 marks)
Look at:
Look at:
Look at:
What are potential outcomes? (2 marks)
Explain the following notations: \(Y_i(1)\), \(Y_i(1)|D_i= 0\), and \(Y_i(0)|d_i = 1\). (3 marks)
What is the fundamental problem of causal inference? Use potential outcomes notation. (4 marks)
Look at:
Look at:
What is the difference between estimand and estimate? (3 marks)
How can any one estimate be too low or too high if the estimator used to obtain it is unbiased? (3 marks)
Look at:
Formulate a descriptive research question covering any subfield of Political Science, which includes two variables, \(x\) (the predictor) and \(y\) (the dependent variable). (10 marks)
Define the theoretical concepts that these variables attempt to measure, operationalise these concepts, and propose valid measurement instruments for them. (10 marks)
Formulate a clear, testable hypothesis. (5 marks)
Write down a linear regression model of \(y\) on \(x\), and define each term in the regression model. (8 marks)
Did not vote Voted <NA>
458 1732 4
0 1 <NA>
458 1732 4
Call:
lm_robust(formula = turnout_sr_num ~ edu_num, data = df)
Standard error type: HC2
Coefficients:
Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
(Intercept) 0.66348 0.021934 30.248 9.184e-168 0.62046 0.70649 2150
edu_num 0.03519 0.005099 6.901 6.767e-12 0.02519 0.04518 2150
Multiple R-squared: 0.02176 , Adjusted R-squared: 0.02131
F-statistic: 47.62 on 1 and 2150 DF, p-value: 6.767e-12
Write down the regression model that you are estimating. (2 marks)
What is the predicted turnout for someone with “GCSEs”/level 3? (2 points)
What is the null hypothesis? Can you reject it at the 0.05 level? (3 marks)
Call:
lm_robust(formula = turnout_sr_num ~ edu_num, data = df)
Standard error type: HC2
Coefficients:
Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
(Intercept) 0.66348 0.021934 30.248 9.184e-168 0.62046 0.70649 2150
edu_num 0.03519 0.005099 6.901 6.767e-12 0.02519 0.04518 2150
Multiple R-squared: 0.02176 , Adjusted R-squared: 0.02131
F-statistic: 47.62 on 1 and 2150 DF, p-value: 6.767e-12
0 1 <NA>
380 1095 719
Call:
lm_robust(formula = turnout_validated ~ edu_num, data = df)
Standard error type: HC2
Coefficients:
Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
(Intercept) 0.66507 0.027754 23.963 3.839e-107 0.610622 0.71951 1447
edu_num 0.02116 0.006649 3.183 1.489e-03 0.008121 0.03421 1447
Multiple R-squared: 0.006686 , Adjusted R-squared: 0.005999
F-statistic: 10.13 on 1 and 1447 DF, p-value: 0.001489
What is the null hypothesis? Can you reject it at the 0.01 level? What can you say based on this test about the relationship between turnout and levels of education in the population of British citizens? (4 marks)
Compare the estimates from the two models you ran and interpret what you find. (3 marks)
Do some research. How is turnout “validated”? (2 marks)
Self-reported:
Call:
lm_robust(formula = turnout_sr_num ~ edu_num, data = df)
Standard error type: HC2
Coefficients:
Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
(Intercept) 0.66348 0.021934 30.248 9.184e-168 0.62046 0.70649 2150
edu_num 0.03519 0.005099 6.901 6.767e-12 0.02519 0.04518 2150
Multiple R-squared: 0.02176 , Adjusted R-squared: 0.02131
F-statistic: 47.62 on 1 and 2150 DF, p-value: 6.767e-12
Validated:
Call:
lm_robust(formula = turnout_validated ~ edu_num, data = df)
Standard error type: HC2
Coefficients:
Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
(Intercept) 0.66507 0.027754 23.963 3.839e-107 0.610622 0.71951 1447
edu_num 0.02116 0.006649 3.183 1.489e-03 0.008121 0.03421 1447
Multiple R-squared: 0.006686 , Adjusted R-squared: 0.005999
F-statistic: 10.13 on 1 and 1447 DF, p-value: 0.001489
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
18.00 37.00 53.00 52.58 67.00 99.00 19
Call:
lm_robust(formula = turnout_validated ~ Age, data = df)
Standard error type: HC2
Coefficients:
Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
(Intercept) 0.495903 0.0370289 13.392 1.126e-38 0.423268 0.568538 1469
Age 0.004655 0.0006304 7.384 2.562e-13 0.003418 0.005891 1469
Multiple R-squared: 0.03857 , Adjusted R-squared: 0.03792
F-statistic: 54.52 on 1 and 1469 DF, p-value: 2.562e-13
GV249 WT0 | 📨 email l.m.metson@lse.ac.uk 🤔 Question? 🙋 raise your hand or 🖥️ use the Moodle Forum.