Homework 4
All materials can be found at alexcardazzi.github.io.
Completion Requirements: Complete the following questions in RStudio via the homework template. When you are ready, submit your rendered html
to Canvas.
Grading Criteria: Full credit will be given to correct, well formatted, and detailed answers. Partial credit will be given if I can follow your work and/or see your thought process via code, comments, and text. Point totals are listed next to each question.
Primary, secondary, and post-secondary education all lead to increased human capital, an important factor in driving long run economic growth, worker productivity, employment, innovation, and other important economic outcomes. The production of education takes many forms. For example, primary and secondary education markets contain public schools, schools operated by various religious faiths (the Catholic church, various fundamentalist religions, etc.), as well as private non-denominational organizations like charter schools. Because of the economic importance of education and human capital formation, research on the effectiveness of different forms of education is interesting and important.
Private Catholic schools represent a common alternative to public schools in many communities. A large body of economic research focuses on how different types of education, in particular Catholic versus public school education, affects educational outcomes. This research primarily focuses on grades, but some papers examine high school graduation rates and rates of college attendance.
This research faces a number of important econometric challenges. One key challenge stems from the fact that the decision to enroll a child in Catholic or public schools reflects parental choice and not random assignment. School enrollment decisions reflect many factors at the family and individual level. While public school attendance is free of charge, and financed by taxes, private education like attendance at a Catholic school requires paying tuition, making family income an important factor. Income may also affect educational outcomes independent of school choice, for example through tutoring, educational software, and other unobservable factors, making income a likely back door.
In a “dream” world, a team of economists could obtain a sample of 10,000 students and randomly assign them to either a Catholic school or a public school. In this situation, we could have more confidence in estimates of a casual treatment effect of school type on educational outcomes. However, for obvious reasons, this is not possible. Instead, education economists frequently use matching methods like Propensity Score Matching to come up with as-if random assignment of catholic school education to generate more plausible causal estimates. This homework applies matching methods to a data set containing information on school enrollment, grades, and family characteristics.
Question 1
Explore whether catholic school education improves outcomes. The data for this homework can be found here (codebook).
Read in the data and store it as an object called
school
. (0 Points)Change the column names to the following: (0 Points)
c5r2mtsc
:math_score
c5r2mtsc_std
:math_std
p5hmage
:mom_age
w3momed_hsb
:mom_noCollege
w3income
:income
catholic
:catholic
(no change, this is just for completeness)income2
:income2
(no change, this is just for completeness)
Calculate the percent of students who attended catholic school. (1 Point)
Use
modelsummary
to create a summary statistics table for students who did not attend catholic school. (1 Point)Use
modelsummary
to create a summary statistics table for students who attended catholic school. (1 Point)Discuss the similarities and differences between the students in these groups. (2 Points)
Question 2
Examine the family income for these groups of students.
Use
density()
to estimate the distributions of the log ofincome2
for students who attend catholic school and students who do not. Then, plot these two densities on the same set of labeled axes. Remember to ensure both distributions are fully visible (not cutoff), different colors, and identified in a legend. (2 Points)Do you think mother’s age is an important variable to control for? Why, or why not? Use “DAG” language in your answer. (1 Point)
Regardless of your answer to the previous question, what are some other factors (that may or may not exist in the dataset) that might be important to control for? Again, use “DAG” language in your answer. (1 Point)
Question 3
Begin working through the matching procedure.
Estimate a propensity score model using mother’s age, mother’s education, and family income as predictors.1 (1 Point)
Interpret the signs of the model’s estimated coefficients. (0 Points)
Using this model, generate propensity scores as a new variable called
ps
inschool
. (0 Points)Using these propensity scores, create inverse probability weights as a new variable called
w
inschool
. (1 Point)Using these weights, re-create the income distributions from before. Comment on any similarities/differences.2 (1 Point)
Question 4
In the following parts, estimate the two models below linking catholic school enrollment and math scores. The functional forms are:
\[\text{Math}_i = \alpha + \delta \text{Catholic}_i + \epsilon_i\]
\[\begin{aligned}\text{Math}_i = \alpha &+ \delta \text{Catholic}_i \\ &+ \beta_0\text{Mother's Age}_i \\&+ \beta_1 \text{Mother's Education}_i \\&+ \beta_2log(\text{Family Income}_i) + \epsilon_i\end{aligned}\]
First estimate the models’ parameters using OLS and store them as
r_ols1
andr_ols2
. You do not need to display the results quite yet. (1 Point)Use the
MatchIt
package in R to create a matched sample (match based on propensity scores). Then, estimate the parameters with OLS using the matched sample. Store these asr_match1
andr_match2
. Again, hold off on displaying these coefficients. (1 Point)Estimate the parameters with OLS using the inverse probability weights you generated before. Store these estimations as
r_ipw1
andr_ipw2
. (1 Point)Display the coefficients of the above six models in a table generated by
modelsummary
. (1 Point)
Question 5
Discuss the results of the estimations. Make sure to discuss difference between methods (OLS, PSM, and IPW) as well as models (controls, no controls). (2 Points)
Question 6
Write up your takeaways from this analysis. Do you believe catholic school is better than public school in terms of math education? Which model do you believe the most? Is matching appropriate in this setting? In other words, are the assumptions required to interpret these results as causal satisfied or have they been violated? (2 Points)