Homework 1.1

Author
Affiliation

Alex Cardazzi

Old Dominion University

All materials can be found at alexcardazzi.github.io.

Question 1

Copy and paste the following vector into R: vec <- sample(1:100, 10, FALSE). Use this vector for the following.

  1. Print vec to the console. (2 Points)

  2. Print the first 5 elements of vec. (2 Points)

  3. Print the last 3 elements of vec. (2 Points)

  4. Print the 3rd and 8th elements of vec. (3 Points)

  5. Print the elements of vec that are larger than 50. (3 Points)

  6. Print the elements of vec that are odd (not divisible by two). (3 Points)

Question 2

Use these data on drinking and wages from Elderton and Pearson (1910) (data; documentation) for the following.

  1. Read in the data using read.csv() and store it as DrinksWages.1 (1 Point)

  2. Print the column names of DrinksWages. (1 Point)

  3. What type of measurement (Nominal, Ordinal, Interval, or Ratio) is sober? Explain why. (2 Points)

  1. What fraction (or as a percentage) of observations of the wage variable are larger than 30? (2 Points)

  2. Create a new variable inside DrinksWages called total. This variable should reflect the sum of the two variables sober and drinks for each observation. (2 Points)

  3. Create a new variable inside DrinksWages called x. This variable should be drinks divided by total. In words, how can we interpret what x represents? (3 Points)

  4. Plot x on the \(X\)-axis and wages on the \(Y\)-axis. Each point should be colored based on their class value2. Use pch = 19. Label your axes appropriately and include a legend. (4 Points)

Question 3

This one is challenging. This .pdf file contains SAT scores over time for men and women. Unfortunately, when I copied it into a .csv, the formatting got messed up.3

  1. Recreate the table format of the PDF with sat2.csv as a data.frame.4 Rename the year and total math columns (the first one) to year and total_math. (6 Points)
  1. Convert the total math variable into a numeric variable. This step will likely create some NA values in the variable, so remove the rows where total math is missing. (4 Points)
  1. Finally, remove all of the columns except year and total_math from the dataset. (4 Points)

  2. Make a plot with two lines. On the \(X\)-axis should be year and on the \(Y\)-axis should be total math. The first line should be SAT scores from the beginning of the sample until 1999, and another from 2000 and onwards. The two lines should be different colors. Be sure to label the axes and include a legend. (6 Points)

Footnotes

  1. Hint: you can use either the hyperlink’s URL or you can click on the hyperlink, download the file, and read it in via file path. Also, you might find it helpful to read through the data’s documentation so you know what the different columns/variables mean.↩︎

  2. You are free to choose whatever colors you’d like, but please avoid using “yellow” for the sake of our eyes. It’s blindly bright against the white background.↩︎

  3. You’ll run into this a lot when working with .pdf files, so avoid them!↩︎

  4. Hint: After you read in the data, try printing it to your console. You should see one long vector. Try reorganizing the vector into a matrix, and then the matrix into a data.frame.↩︎