Homework 1.1
All materials can be found at alexcardazzi.github.io.
Question 1
Copy and paste the following vector into R: vec <- sample(1:100, 10, FALSE). Use this vector for the following.
Print
vecto the console. (2 Points)Print the first 5 elements of
vec. (2 Points)Print the last 3 elements of
vec. (2 Points)Print the 3rd and 8th elements of
vec. (3 Points)Print the elements of
vecthat are larger than 50. (3 Points)Print the elements of
vecthat are odd (not divisible by two). (3 Points)
Question 2
Use these data on drinking and wages from Elderton and Pearson (1910) (data; documentation) for the following.
Read in the data using
read.csv()and store it asDrinksWages.1 (1 Point)Print the column names of
DrinksWages. (1 Point)What type of measurement (Nominal, Ordinal, Interval, or Ratio) is
sober? Explain why. (2 Points)
What fraction (or as a percentage) of observations of the wage variable are larger than 30? (2 Points)
Create a new variable inside
DrinksWagescalledtotal. This variable should reflect the sum of the two variablessoberanddrinksfor each observation. (2 Points)Create a new variable inside
DrinksWagescalledx. This variable should bedrinksdivided bytotal. In words, how can we interpret whatxrepresents? (3 Points)Plot
xon the \(X\)-axis and wages on the \(Y\)-axis. Each point should be colored based on theirclassvalue2. Usepch = 19. Label your axes appropriately and include a legend. (4 Points)
Question 3
This one is challenging. This .pdf file contains SAT scores over time for men and women. Unfortunately, when I copied it into a .csv, the formatting got messed up.3
- Recreate the table format of the PDF with
sat2.csvas adata.frame.4 Rename the year and total math columns (the first one) toyearandtotal_math. (6 Points)
- Convert the total math variable into a numeric variable. This step will likely create some
NAvalues in the variable, so remove the rows where total math is missing. (4 Points)
Finally, remove all of the columns except
yearandtotal_mathfrom the dataset. (4 Points)Make a plot with two lines. On the \(X\)-axis should be year and on the \(Y\)-axis should be total math. The first line should be SAT scores from the beginning of the sample until 1999, and another from 2000 and onwards. The two lines should be different colors. Be sure to label the axes and include a legend. (6 Points)
Footnotes
Hint: you can use either the hyperlink’s URL or you can click on the hyperlink, download the file, and read it in via file path. Also, you might find it helpful to read through the data’s documentation so you know what the different columns/variables mean.↩︎
You are free to choose whatever colors you’d like, but please avoid using “yellow” for the sake of our eyes. It’s blindly bright against the white background.↩︎
You’ll run into this a lot when working with
.pdffiles, so avoid them!↩︎Hint: After you read in the data, try printing it to your console. You should see one long vector. Try reorganizing the vector into a matrix, and then the matrix into a
data.frame.↩︎