Problem Set 3: Effect of Demographic Change on Exclusionary Attitudes
You can find instructions for obtaining and submitting problem sets here.
You can find the GitHub Classroom link to download the template repository on the Ed Board
Background
A professor in the Government department here at Harvard, Ryan Enos, conducted a randomized field experiment assessing the extent to which exposure to demographic change affected the political views of individuals living in suburban communities around Boston, Massachusetts.
This exercise is based on: Enos, R. D. 2014. “Causal Effect of Intergroup Contact on Exclusionary Attitudes.” Proceedings of the National Academy of Sciences 111(10): 3699–3704.
Subjects in the experiment were individuals riding on the commuter rail line and overwhelmingly white. Every morning, multiple trains pass through various stations in suburban communities that were used for this study. For pairs of trains leaving the same station at roughly the same time, one was randomly assigned to receive the treatment and one was designated as a control. By doing so all the benefits of randomization apply for this dataset.
The treatment in this experiment was the presence of two native Spanish-speaking ‘confederates’ (a term used in experiments to indicate that these individuals worked for the researcher, unbeknownst to the subjects) on the platform each morning prior to the train’s arrival. The presence of these confederates, who would appear as Hispanic foreigners to the subjects, was intended to simulate the kind of demographic change anticipated for the United States in coming years. For those individuals in the control group, no such confederates were present on the platform. The treatment was administered for 10 days. Participants were asked questions related to immigration policy both before the experiment started and after the experiment had ended. The names and descriptions of variables in the data set boston.csv
are:
Name | Description |
---|---|
age |
Age of individual at time of experiment |
male |
Sex of individual, male (1) or female (0) |
income |
Income group in dollars (not exact income) |
white |
Indicator variable for whether individual identifies as white (1) or not (0) |
college |
Indicator variable for whether individual attended college (1) or not (0) |
usborn |
Indicator variable for whether individual was born in the US (1) or not (0) |
treatment |
Indicator variable for whether an individual was treated (1) or not (0) |
ideology |
Self-placement on ideology spectrum from Very Liberal (1) through Moderate (3) to Very Conservative (5) |
numberim.pre |
Policy opinion on question about increasing the number immigrants allowed in the country from Increased (1) to Decreased (5) |
numberim.post |
Same question as above, asked later |
remain.pre |
Policy opinion on question about allowing the children of undocumented immigrants to remain in the country from Allow (1) to Not Allow (5) |
remain.post |
Same question as above, asked later |
english.pre |
Policy opinion on question about passing a law establishing English as the official language from Not Favor (1) to Favor (5) |
english.post |
Same question as above, asked later |
Question 1 (6 points)
The benefit of randomly assigning individuals to the treatment or control groups is that the two groups should be similar, on average, in terms of their covariates. This is referred to as ‘covariate balance.’
Load the tidyverse
package in the setup chunk. Read the data "data/boston.csv"
into R using read_csv
and call the resulting tibble trains
. Using the group_by
and summarize
functions, create a tibble called balance_table
that shows the sample average of the age
and male
variables by whether participants were in the treatment or control groups.
Then use the knitr::kable()
function to create a nice-looking table, including some informative column names. Briefly comment on the whether you think these variables appear balanced.
Rubric: 1pt for successful knitting (autograder); 3pts for correct balance_table
object (autograder); 1pt for nicely formatted table (PDF); 1pt for brief comments (PDF)
Question 2 (7 points)
Individuals in the experiment were asked a series of questions both at the beginning and the end of the experiment. One such question was “Do you think the number of immigrants from Mexico who are permitted to come to the United States to live should be increased, left the same, or decreased?”
The response to this question prior to the experiment is in the variable numberim.pre
. The response to this question after the experiment is in the variable numberim.post
. In both cases the variable is coded on a 1 – 5 scale. Responses with values of 1 are inclusionary (‘pro-immigration’) and responses with values of 5 are exclusionary (‘anti-immigration’). Take the following steps:
- Create a new variable in the
trains
data calledchange
and define it as the change in immigration attitudes between pre- and post-experiment (post minus pre). Be sure to assign the output of your data wrangling call back totrains
. - Calculate the average change in attitudes about immigration in the treated group and save this output as
trt_change
. This should be a 1 x 1 tibble. - Calculate the average change in attitudes about immigration in the control group and save this output as
ctr_change
. This should be a 1 x 1 tibble. - Compute the average treatment effect from these two objects and store it as
ate
. This should be a 1 x 1 tibble.
Report these estimates in the main text (that is, outside the chunk) and describe what they mean substantively with respect to the study.
Rubric: 1pt for creating change
variable (autograder); 2pts for trt_change
object (autograder); 2pts for ctr_change
object (autograder); 1pt for ate
object (autograder); 1pt for write up about the results (PDF).
Question 3 (3 points)
Describe, in words, what the potential outcomes for a particular person are in the analysis for the last problem. Substantively, what does the fundamental problem of causal inference refer to in this context? Make sure to refer to what treatment and control means in this experiment rather than just mention the “treatment” and “control” groups.
Rubric: 3pts for answer (PDF); no autograder points
Question 4 (2 points)
In your data science group, two members have alternative ideas for what the outcome should have been instead of the change in attitudes on immigration between the beginning and end of the experiment. Jimmy Q. Boxplot thinks that you should have used numberim.pre
as the outcome and Suzy T. Histogram thinks that you should have used numberim.post
. Are either of these two valid and interesting outcomes to explore in this study? Briefly explain why or why not.
Rubric: 2pts for answer (PDF); no autograder points.
Question 5 (7 points)
Does having attended college influence the effect of being exposed to ‘outsiders’ on exclusionary attitudes? Another way to ask the same question is this: is there evidence of a differential impact of treatment, conditional on attending college versus not attending college?
Use group_by
, summarize
, pivot_wider
, and mutate
to create a tibble called ate_college
where each row corresponds to a unique value of the college
variable (so two rows total) and that has the following four columns:
college
: the unique value of the college variable for this row, labelled as either “College Grad” or “Non-College Grad”.Treated
: the mean of thechange
variable for each unique value of college in the treated group.Control
: the mean of thechange
variable for each unique value of college in the control group.ATE
: the difference between the treatment and control means for this value of college.
In the first step of building ate_college
, you should pipe the data to a mutate()
call that (a) modifies the treatment
to be "Treated"
when treatment == 1
and "Control"
when treatment == 0
and (b) modifies the college
variable to be "College Grad"
when college == 1
and "Non-College Grad"
when college == 0
. When you do this, be sure not to overwrite the trains
data with these changes, simply pipe this to the next step.
Pass the table to knitr::kable()
to produce a nicely formatted table. Report the ATEs in text and comment on whether or not you see evidence of a differential effect of treatment.
Rubric: 5pts for ate_college
object (autograder); 1pt for nicely formatted table (PDF); 1pt for brief write-up (PDF)
Question 6 (10 points)
Repeat the same analysis as in the previous question but this time with respect to age. For age, use case_when
and logical vectors to create a new variable age_group
that has the following values:
"25 and under"
whenage
is 25 and under,"26 - 40"
whenage
is between 26-40,"41 - 60"
whenage
is between 41-60, and"Over 60"
whenage
is 61 and over.
Remember to save the output to the trains
name. Use this variable along with the approach in the previous question to create a tibble called ate_age
(replacing college
with age_group
) that has four rows corresponding to the four unique values of age_group
and four columns: age_group
, Treated
, Control
, and ATE
(all defined similarly to the previous question).
Use the the ate_age
tibble to produce a barplot (using ggplot
) of the ATE for each age group and assigning the plot to the name age_plot
. Be sure to include informative labels for the x and y axes, along with a title.
In the text, answer these question: Do there appear to be systematic relationships between the treatment effects and age? If so, what patterns do you see?
Rubric: 2pts for age_group
variable (autograder); 5pts for ate_age
object (autograder); 2pts for age_plot
ggplot object (autograder); 1pt descriptions of the results (PDF)