Homework 1.1
All materials can be found at alexcardazzi.github.io.
Question 1
Copy and paste the following vector into R: vec <- sample(1:100, 10, FALSE)
. Use this vector for the following.
Print
vec
to the console. (2 Points)Print the first 5 elements of
vec
. (2 Points)Print the last 3 elements of
vec
. (2 Points)Print the 3rd and 8th elements of
vec
. (3 Points)Print the elements of
vec
that are larger than 50. (3 Points)Print the elements of
vec
that are odd (not divisible by two). (3 Points)
Question 2
Use these data on drinking and wages from Elderton and Pearson (1910) (data; documentation) for the following.
Read in the data using
read.csv()
and store it asDrinksWages
.1 (1 Point)Print the column names of
DrinksWages
. (1 Point)What type of measurement (Nominal, Ordinal, Interval, or Ratio) is
sober
? Explain why. (2 Points)
What fraction (or as a percentage) of observations of the wage variable are larger than 30? (2 Points)
Create a new variable inside
DrinksWages
calledtotal
. This variable should reflect the sum of the two variablessober
anddrinks
for each observation. (2 Points)Create a new variable inside
DrinksWages
calledx
. This variable should bedrinks
divided bytotal
. In words, how can we interpret whatx
represents? (3 Points)Plot
x
on the \(X\)-axis and wages on the \(Y\)-axis. Each point should be colored based on theirclass
value2. Usepch = 19
. Label your axes appropriately and include a legend. (4 Points)
Question 3
This one is challenging. This .pdf
file contains SAT scores over time for men and women. Unfortunately, when I copied it into a .csv
, the formatting got messed up.3
- Recreate the table format of the PDF with
sat2.csv
as adata.frame
.4 Rename the year and total math columns (the first one) toyear
andtotal_math
. (6 Points)
- Convert the total math variable into a numeric variable. This step will likely create some
NA
values in the variable, so remove the rows where total math is missing. (4 Points)
Finally, remove all of the columns except
year
andtotal_math
from the dataset. (4 Points)Make a plot with two lines. On the \(X\)-axis should be year and on the \(Y\)-axis should be total math. The first line should be SAT scores from the beginning of the sample until 1999, and another from 2000 and onwards. The two lines should be different colors. Be sure to label the axes and include a legend. (6 Points)
Footnotes
Hint: you can use either the hyperlink’s URL or you can click on the hyperlink, download the file, and read it in via file path. Also, you might find it helpful to read through the data’s documentation so you know what the different columns/variables mean.↩︎
You are free to choose whatever colors you’d like, but please avoid using “yellow” for the sake of our eyes. It’s blindly bright against the white background.↩︎
You’ll run into this a lot when working with
.pdf
files, so avoid them!↩︎Hint: After you read in the data, try printing it to your console. You should see one long vector. Try reorganizing the vector into a matrix, and then the matrix into a
data.frame
.↩︎