Homework 2.1
All materials can be found at alexcardazzi.github.io.
Question 1
- I send out a survey to individuals who graduated from ODU. I ask them to report their salary. Most people fill out the salary question, but some people leave it blank. Can I fill in zero for these blanks? Explain why or why not. (4 Points)
- Suppose I decided to pick students with ID numbers that end in an 8 to survey. I calculate the mean salary of those who submitted salaries. Will this average salary be representative of all ODU students? Explain why or why not. (4 Points)
- Continuing from above, if you think think the salary will be representative, explain a way one could survey students where the resulting average would not be representative. If you think the above salary will not be representative, explain a way one could survey students where the resulting average would be representative. (4 Points)
- Why do we use the sum of squared differences when calculating dispersion? (4 Points)
Question 2
Use the state-year cigarette panel data from the previous module (data; documentation) for this question.
Read in the data, and present summary statistics for price, population, cpi, disposable income, and cigarette sales. Include the number of observations, mean, standard deviation, minimum and maximum. Use the package(s) from the course notes. (8 Points)
Write your own function to calculate a weighted average. As a hint, your function should accept two arguments: one for values and one for weights. (4 Points)
Use your function to find the population-weighted average of income. Aside from the formulas being different, provide an intuitive explanation for the difference you observe between the weighted average and the simple average. (4 Points)
- Produce a visualization of the distribution of cigarette pack prices. Include a vertical, dashed, and colored (you may choose the color) line representing the mean price. Be sure to label the axes, etc. Provide a short description of what you see. (5 Points)
Produce a scatterplot of price (\(X\)) vs minimum price of adjoining states (\(Y\)). Use filled in circles as the point type. Color the points based on whether price is greater than the minimum price of adjoining states. Be sure to label the axes, etc. (5 Points)
Below is a time series of the pack sales by state over time. Recreate this plot to the best of your ability. Regardless of whether you were able to perfectly recreate the picture, what was the hardest part? (8 Points)