Homework 3.1
All materials can be found at alexcardazzi.github.io.
Question 1
For this question, you are going to analyze tipping culture. Consider these data (data; documentation) on tips received by a single waiter over a period of a few months.
Read in the data into R. (2 Points)
Create a table of summary statistics for
total_bill
andtip
. (2 Points)Write down an econometric model (i.e. use \(\beta_0\) and \(\beta_1\)) that explains tip amount based on the total bill amount. (2 Points)
- The rule-of-thumb for tipping is to give 18% of the total bill. If this rule of thumb were followed, what would we expect for values of \(\beta_0\) and \(\beta_1\)? (4 Points)
Plot
total_bill
vstip
. Make the figure presentable with labeled axes, colored points (with some opacity), etc. (2 Points)Estimate the parameters in the model. Interpret the value of each coefficient in words. (2 Points)
- Re-create the plot, but include the regression line and the rule-of-thumb for tipping. Make each line a different color and include a legend. Discuss how people tip in reality compared to the rule-of-thumb. (6 Points)
Question 2
The American Housing Survey (AHS) collects information about individual dwelling choices. These data (data; documentation), which are a subset of the 2007 AHS, contain information on distances and times individuals travel for their commutes to work. (2 Points)
Read the data into R.
Estimate two models of commute time given commute distance. First, estimate a level-level model. Second, estimate a log-log model. Display the coefficients in a table. (4 Points)
Suppose you are a commuter who wants to live as far as possible from their job which is in the middle of a city center (and thus subject to pollution, crime, noise, and everything else you don’t like). However, you also don’t like commuting; each minute in traffic is torture to you. If you are willing to spend 30 minutes in traffic, how far from work, in miles, should you live? Use the parameters from the level-level model to inform your decision. (2 Points)
Estimate the level-level model for each city in the data. Display the coefficients in a single table. (4 Points)
Compare the coefficients for Boston, the highest density city in the sample, to Houston, the lowest density city in the sample.1 (2 Points)
- Following from the previous question, why do you think a city’s density appears to be related to its \(R^2\) in the models above? (2 Points)
Question 3
Infant birth weight is an often used measure in health economics, serving as a vital indicator of an infant’s immediate and long-term health prospects. One of the key determinants of birth weight is the infant’s gestation period, or the time spent in utero. Use these data (data; documentation) records of births from 1961 and 1962 to answer the following questions.
Read in the data. Convert birth weight to pounds and gestation to weeks, both as new variables. Use the
floor()
function to round gestation down to the nearest whole week. (2 Points)Plot the distribution of gestation (in weeks). Be sure to label the axes, etc. What are some things you notice about the distribution? (3 Points)
- Create a visualization of average birth weight by weeks of gestation. What are some things you notice about the figure? (3 Points)
Estimate models to explain birth weight in pounds using gestation in weeks. The first model should be level-level and the second model should be log-level. You should eliminate any outliers you might have identified from the data. Present the coefficients in a table. (2 Points)
Interpret each coefficient from each model in words. (4 Points)
Footnotes
Persons (in 2016) per Square Mile: Boston (13,943), Houston (3,842), Minneapolis (7,664), and Washington (11,158).↩︎