Error in x + 2: non-numeric argument to binary operator
Error in plus2(): argument "x" is missing, with no default
Output
[1] 3 4 5 6 7
Module 2.4: Functions and Loops
Old Dominion University
In this portion of the module, we’re going to learn more about R. First, we’re going to learn how to write our own functions. R is open source, meaning that people can write software for others (or themselves) to use. For example, modelsummary
was written (and a lot of the data we’ll use are provided) by Vincent Arel-Bundock.
Functions in R
, or computer programming languages more broadly, take some input (called arguements), do stuff to that input, and then return some output. This is the same as a mathematical function. Consider \(Y = mX + b\). In this function, \(X\) is the input, the “stuff” is multiplying by \(m\) and adding \(b\), and the output is \(Y\).
Let’s write our first function, which we’ll call plus2
. The purpose of plus2
is to add 2 to any number we provide. Try playing with the code below:
Notice how plus2
accepts a single argument (x
), and returns a value of x + 2
. We can define a slightly more complex function that accepts two different arguments. We’ll call this function plusY
, and it will accept x
and y
arguments that it will then add together. Try building this function below:
What happens if you only give plusY()
a single argument instead of two? This would generate an error – check for yourself. There are a few ways we can fix this. One way is to set default values for the arguments in our function. Below, edit plusY()
so y = 0
in the function, which will set zero as y
’s default.
Of course, we can set both defaults to zero. Then, we could technically call plusY()
without any inputs, and it will return 0.
What if it doesn’t make sense to set a default? Or, maybe you want to make sure the two arguments “make sense”. For example:
Sometimes, we want a way to “check” the values of arguments so things like the above don’t happen. This is an instance where you could use R’s built-in function if()
. This will allow for your program to “make decisions” based on pre-programmed logic.
I will demonstrate how to use if()
in our plusY()
function:
[1] 12
[1] "One of either x or y are non-numeric."
Let’s practice by making a function called summary2()
. We want this function to calculate sample size, mean, standard deviation, minimum, and maximum. We also want functionality like na.rm
, except we want it to always be true. In addition, the function should return the number of observations it had to drop because of missingness. Our “practice” vector will be c(6, NA, 8, 2, NA, NA, 9, 5, 1, 4)
.
Our first step should be to figure out which observations are missing. We can do this with is.na()
. This will return a Boolean vector equal telling which elements are missing.
Now that we know which are missing, calculate how many are missing, and then remove them.
Now we can focus on returning the statistical information we’re interested in.
Try putting all of this together into a function:
summary2 <- function(x){
n_missing <- sum(is.na(x))
x <- x[!is.na(x)]
c("Sample Size" = length(x),
"NA Obs." = n_missing,
"Mean" = mean(x),
"St. Dev." = sd(x),
"Min." = min(x),
"Max." = max(x)) -> result
return(result)
}
y <- c(6, NA, 8, 2, NA, NA, 9, 5, 1, 4)
summary2(x = y)
# generate 10,000 random draws from c(1:9, NA)
y <- sample(c(1:9, NA), 10000, replace = TRUE)
summary2(x = y)
Sample Size NA Obs. Mean St. Dev. Min. Max.
7.00000 3.00000 5.00000 2.94392 1.00000 9.00000
Sample Size NA Obs. Mean St. Dev. Min. Max.
8992.000000 1008.000000 4.980761 2.595344 1.000000 9.000000
Next, we will introduce how to write loops in R.
Suppose we are interested in executing the same code over and over and over. As an example, suppose I wanted to write some code to print every student name and grade. Of course, I could write the following:
Name: Alex ... Grade: B
Name: Brooke ... Grade: A
Name: Carlos ... Grade: A
Name: Dasia ... Grade: B
Name: Enzo ... Grade: C
This code works, but there are some problems. First, writing this out so many times makes it prone to typos, even if just copying and pasting. Second, almost anything that is repetitive or has an identifiable pattern is easier for a computer than for a human. Lastly, what if there were 100 names instead of 5? What if 10,000? This is where loops come in.
Our first step is going to be creating vectors of names and grades. Ideally, you’d have these data in a spreadsheet/csv and could easily read it in using read.csv()
.
Second, we’re going to write our loop. The loop needs two things:
i
but it can be anything, of course.1:5
, or 1:length(namez)
to be even more flexible.The iterator and bounds in for
loops are similar to the iterator and bounds in \(\sum_{i = 1}^n\)
Let’s just fill the loop with a simple printing statement to illustrate what the loop does.
Remember, to access the first name in namez
, we would use the following: namez[1]
. Similarly, namez[2]
would return the second element, and namez[length(namez)]
would give the last element. Instead of putting a specific element in the square brackets, we can put the index variable there. Then, we can put this into our loop:
Finally, we can put this all together and generate our initial output.
Now, consider if we had many more names and grades. We could have thousands of names and grades and our little for loop would remain the same!
We are not limited to just numeric iterators/bounds. Sometimes, using non-numeric ones is helpful too:
We can also use loops to create or modify data. As an example, we’re going to write a function that returns the nth element in the Fibonacci Sequence.
Element \(n\) in the Fibonacci Sequence is simply a sum of the previous two elements. Explicitly, \(F_n = F_{n-1} + F_{n-2}\). Usually, people start the sequence with 0 and 1, which makes the third element equal to 1 (1 + 0), the fourth element equal to 2 (1 + 1), the fifth element equal to 3 (2 + 1), and so on. Our function should accept some number \(n\) and return that element within the sequence.
To do this, let’s set up the function:
Now, we need to fill in the some code here...
part of the function. Again, the formula for any element \(n\) is just \(F_n = F_{n-1} + F_{n-2}\). So, to calculate \(F_n\), we need these other two numbers. However, to get \(F_{n-2}\), for example, we need \(F_{n-3}\) and \(F_{n-4}\). Obviously, this continues back until we arrive at \(F_1\) and \(F_2\). This suggests that we’ll need to calculate each element in the sequence until we arrive at \(F_{n}\).
Let’s start with the third element. Since we have v <- c(0, 1)
already, we can write v[3] <- v[2] + v[1]
. This can also be written as: v[3] <- v[3-1] + v[3-2]
. Once we have v[3]
established, we can calculate v[4]
as v[3] + v[2]
, or v[4] <- v[4-1] + v[4-2]
. We would repeat this for 5, 6, 7, and all the way until \(n\). Hopefully, you can see that we could also generalize this code to look like v[i] <- v[i-1] + v[i-2]
inside of a for loop. Our loop bounds would start at 3 (since elements 1 and 2 are established as 0 and 1 already), and we would continue the loop until \(n\).
We can build “logic” into our loops by adding in if
statements, too. Take a look at the following example using names and grades again. Here, I will change what gets printed based on the grade the individual obtained.
Name: Alex ... you're doing a good job!
Name: Brooke ... crushin' it!
Name: Carlos ... crushin' it!
Name: Dasia ... you're doing a good job!
Name: Enzo ... keep studying!
If you remember, in Module 1 we colored points based on certain characteristics. We did this by first setting all colors to be the same value, and then changed the color of certain observations based on their values. See below.
We can use loops to achieve a similar outcome. You can run the code snippets above and below to see for yourself that the results are the same.
ifelse()
While both of these solutions arrive at the same answer, there is a better option.
First, in general, loops are relatively slow in R. It might not seem so when dealing with small samples, but it becomes noticeable as the data grow larger.
Second, as people often say, “lazy” programming is good programming! We should be writing as little as possible1, without sacrificing coherence/readability, to minimize mistakes, bugs, etc.
ifelse()
Introducing: ifelse()
. This is a function that accepts three arguments:
test
: an object which can be coerced to logical mode. In other words, some logical vector like v > 5
yes
: return values for true elements of test
. In other words, what should be the output when v > 5
is TRUE
?no
: return values for false elements of test
. In other words, what should be the output when v > 5
is FALSE
?ifelse()
You can think of ifelse()
as creating a for
loop with if()
statements inside of it. In fact, we can even nest ifelse
statements. For example, consider these data on U.S. Senate vote on the use of force against Iraq in 2002. For each observation, we want to assign some value (here, I chose to assign some text) by party and by vote.
ECON 311: Economics, Causality, and Analytics