Homework 4.2
All materials can be found at alexcardazzi.github.io.
Question 1
In the early 2000s, two economists ran an experiment where they sent fictitious resumes in response to job ads in Chicago and Boston. The authors randomly varied the qualities of the fictitious resumes as well as the applicants’ names. Some resumes were randomly given stereotypically white-sounding names (Emily, Brendan) and others African-American-sounding names (Lakisha, Jamal).1 Intrigued students may read a non-technical summary of the paper here.2
In this part of the homework, you are going to investigate whether employers engage in racial discrimination when sifting through resumes using data collected by these researchers (data; documentation).
- Read the data into a data frame called
resume
. (2 Points)
- Create the following binary variables (4 Points):
- A variable called
chicago
that is equal to one ifcity
is equal to “chicago” and zero otherwise. - A variable called
female
that is equal to one ifgender
is equal to “female” and zero otherwise. - A variable called
black
that is equal to one ifethnicity
is equal to “afam” and zero otherwise. - A variable called
callback
equal to one ifcall
is equal to “yes” and zero otherwise.
- A variable called
- Estimate and display the coefficients (using
summary()
is fine) of the following regression (2 Points):
\[\begin{aligned}\text{Callback}_i = \ &\beta_0 + \beta_1 \text{Jobs}_i + \beta_2 \text{Experience}_i + \beta_3 \text{Female}_i \\&+ \beta_4 \text{Chicago}_i + \beta_5 \text{Black}_i+ \epsilon_i\end{aligned}\]
- Interpret each coefficient in words. (4 Points)
- Estimate and display the coefficients (using
summary()
is fine) of the following regression (2 Points):
\[\begin{aligned}\text{Callback}_i = \ &\beta_0 + \beta_1 \text{Jobs}_i + \beta_2 \text{Experience}_i + \beta_3 \text{Female}_i \\&+ \beta_4 \text{Chicago}_i + \beta_5 \text{Black}_i + \beta_6 (\text{Chicago}_i \times \text{Black}_i)+ \epsilon_i\end{aligned}\]
- Interpret the estimates for \(\beta_4\), \(\beta_5\), and \(\beta_6\) from the previous regression. (4 Points)
- Re-estimate the regression once using data only from Chicago, and another using data only from Boston. Report the coefficients using
modelsummary
. Discuss any coefficients that result in different conclusions for the two cities. (4 Points)
Question 2
For this question, you will explore a sample of crash records (data; documentation) reported by police across the country from 1997-2002. Each record in these data contains information about the individual and vehicle involved in the crash, as well as some information about the circumstances and outcomes of the crash.
As a first step, read the data into a data.frame called
crash
. Subset the data to include only drivers. (2 Points)Review the data documentation, especially for the variable
injSeverity
. Remove observations whereinjSeverity
is either missing (NA
),unknown
, orprior death
. Then, create a new variable calledy
that is equal to one if the individual sustained an incapacitating injury or worse, and zero otherwise. This variable will represent the crash causing a substantial injury. (2 Points)There is another variable in the dataset called
dvcat
, which estimates impact speeds in km/h. Convert this to afactor
variable, and re-level it such that the reference level is the slowest impact speed. (2 Points)Re-define the seatbelt and airbag variables to binary indicators. (2 Points)
Estimate a basic regression where major injury is explained by the estimated impact speed, age of the occupant, and year of the vehicle. Display (using
summary()
is fine) and interpret the coefficients of the model. Note: you do not need to interpret the coefficients for impact speed. Rather, discuss the pattern of the coefficients for that variable. (4 Points)
- Re-estimate the model above, but include the variables for the vehicle’s safety features (
seatbelt
andairbag
). What changes about the model? Why do you think you see these changes? (4 Points)
- Add the variable
deploy
to the model, and output the coefficients. What does this variable measure? How does this variable change the interpretation of the model? (4 Points)
- Finally, in addition to what is already in the model, incorporate an interaction between
deploy
andseatbelt
. Again, how does the interpretation of the model change? (4 Points)
- What, if anything, surprised you about the results in the analysis above? (4 Points)
Footnotes
The process of determining which names are stereotypically black/white is described in detail in the published draft.↩︎
In addition, similar research on ban-the-box finds that these policies increase racial discrimination.↩︎