All materials can be found at alexcardazzi.github.io.
Question 1
- How are hypothesis testing and confidence intervals related? (4 Points)
- Suppose I calculate a p-value of 0.08. What is there an 8% probability of? (4 Points)
Question 2
For the following question, use data on the heights and weights of the 2022-23 NBA and WNBA all-stars.
Read data into R. Call the dataset bball
. (2 Points)
Remove any columns with missing data. (2 Points)
Print the row of the youngest player. (2 Points)
Create two summary statistics tables – one for the NBA players, and one for the WNBA players. Include only variables for age, height, and weight. Comment on the differences. (4 Points)
- Which player(s) is taller relative to their respective league: the tallest NBA player(s) or the tallest WNBA player(s)? Hint: use Z-Scores! It might even be helpful to write a function to calculate Z-Scores to minimize the total amount of code you need to write. (4 Points)
Do not manually type the player names / heights, use code to select the player’s info.
Create and print a 95% confidence interval for WNBA heights. (4 Points)
What percentage of WNBA observations fall inside this confidence interval? Is there a percentage of observations we should expect to fall inside this interval? (4 Points)
- I have a hypothesis that older players are heavier than younger players. Use
t.test()
to preform a hypothesis test comparing weights of young and old players. What is the conclusion of the test? Interpret the test’s p-value in words. (4 Points)
Question 3
Navigate to the following website: https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/county/time-series/ On this website, you should see the following options:
- Parameter: keep this as “Average Temperature”.
- Time Scale: keep this as “1-Month”.
- Month: set this to your birth month.
- Start Year: set this to your birth year.
- End Year: set this to “2024”.
- State: set this to the state you were born in. If you were not born in the United States, choose the first state in which you lived.
- County: set this to the county you were born in (or the first county you lived in).
- Once these options are set, click the “Plot” button, scroll down below the generated plot, and right-click on the “CSV” download button. Copy the hyperlink address, and read it into R using
read.csv()
. (2 Points)
- Clean the data so it appears more like a “normal”
data.frame
. (4 Points) For example, using head(temp)
, the output is:
Output
Rockland.County New.York.August.Average.Temperature
Units: Degrees Fahrenheit
Base Period: 1901-2000
Missing: -99
Date Value Anomaly
199508 73.0 2.6
199608 71.4 1.0
But it should look like:
Output
Value Anomaly year
199508 73.0 2.6 1995
199608 71.4 1.0 1996
199708 70.4 0.0 1997
199808 73.1 2.7 1998
199908 72.2 1.8 1999
200008 69.5 -0.9 2000
Recreate the plot on the website. You may ignore the right-hand celsius axis, the “° F” on the left-hand axis, the background grid, the “NOAA” logo, and the shaded area beneath the time series. (4 Points) However, extra credit (5 Bonus Points) will be given to the person who can get the closest to an exact replication.
Make a new column that is the z-score for each observation. Use the 1901-2000 average as the mean in the calculation. (2 Points)
Print the row with the largest z-score. (2 Points)
Assuming temperatures come from a normal distribution, what is the probability of observing a temperature this high (or higher)? (2 Points)