Foundations

Module 1.2: Downloading R

Alex Cardazzi

Old Dominion University

Why use code?

You might be thinking, “why can’t we use Excel”?

It’s not that Excel is bad – it’s that R is better.

Using code allows for:

  • Reproducibility
  • Customization
  • Automation

Perhaps the best argument for code over Excel is fixed vs marginal cost. Using Excel has a much lower fixed cost than learning to code. Everyone has seen excel, data is nicely formatted in cells, you can point-and-click to generate nearly everything. Programming is a language, and learning a language is slow and potentially painful. However, once you have some code written and it works, you never have to write it again! In other words, the future marginal cost of coding is much lower.

Why use R?

So, your next question might be, “why R? My comp-sci friends use Python, C++, and SQL.”

  • R is becoming one of the most popular languages for data. R is mostly used for statistical analysis and visualization.
  • R is an open-source language. This means that it is free, and developed by its users. This allows for rapid development of cutting edge statistical methods, etc.
  • R is the language that I know best

Some R Basics

The following might seem abstract at the moment, but will hopefully become more concrete and clear as time goes on.

  • Everything is an object with a name.
  • Functions are how you do things.
  • Functions come built-in (called base R) or can be loaded from libraries.
    • You can also write your own functions!
  • You can have multiple datasets loaded into R at once
    • The Excel analogue would be having multiple sheets.

Download R

First, we have to download R via cran.r-project.org. This is the actual language, and what you should think of as the “brain” of R.

  • This website should look like it was built in the 1990s or early 2000s.
  • Make sure to choose the correct operating system and latest version of R (4.3.1 at the time of writing).

Downloading RStudio

Second, we need to download RStudio via posit.co. If the previous download is the brain, you should think of RStudio as the body. We will only ever interact with R through RStudio in this course.

  • Again, make sure you select the correct operating system for your machine.

Exploring RStudio

Screenshot of RStudio (on Windows).

Screenshot of RStudio (on Windows)

Exploring RStudio

When you open RStudio for the first time, you will see four panels like in the previous image. It is likely that your version of RStudio has a white background with blue or black text. If you would like to change this, go to “Tools > Global Options… > Appearance > Editor theme”. I like a darker theme to make it easier on my eyes.

Exploring RStudio

The four panels are as follows:

  • Top Left: Source – This is where you will write the R code you want to save. In other words, this is where you write and save your work, usually called R scripts (.R files).
  • Bottom Left: Console – When you execute (or run) code, you will usually see output here. This is also a place you can write code you do not want to be part of your final script. If you were a painter, the Source panel would be your canvas and the Console would be your palette.
  • Top Right: Enviornment – Here is where we will be able to see all the objects (data, etc.) that we are working with in the moment. To clear your environment, use the code rm(list = ls()).
  • Bottom Right: Output – This is mostly where you will see plots you have generated, but can also see files on your computer, packages you have installed, and “Help” for certain functions.

Exploring RStudio

Screenshot of RStudio with panel labels.

Screenshot of RStudio with panel labels.

Code Tips

Before we start writing code, here are some important tips and tricks that will make your life (our lives) easier.

  • The hardest part about coding is learning how to Google. You read that correctly. The best programmers are the best Googlers. There is a wealth of knowledge online, and knowing how to sift through it all is truly a skill.
  • Write yourself comments. You can do this by writing # before you type something. This will help you remember what your code does after you have been away from it for a long time. Sometimes, re-reading (decyphering) uncommented code is harder than re-writing it from scratch.
  • Your code probably won’t work the first time. Your code probably won’t work the first few times. However, when trying to fix something, only change one thing at a time.
  • Give objects informative names. It is easier to understand code when things are named “country_gdp” and “yearly_unemployment” rather than “gdp2” and “x”.

Code Tips

As a final tip, and this one is important, we are going to change some default settings to RStudio.

  • Click on “Tools > Global Options… > General”
  • Uncheck “Restore .RData into workspace at startup”
  • Change “Save workspace to .RData in exit:” to Never

It might seem like these auto-saving features are a good idea, but trust me: you will be much better off without it.

Data Types

R has a few different data types:

  • Numbers: You can type a number into R and R will know it’s value.1
  • Boolean: This data type is made up of TRUE and FALSE values. Think of this like binary values (0 and 1). Here is a picture of George Boole.
  • Characters: This datatype is reserved for text. Sometimes characters are called strings, but they are always found inside quotation marks. In R, you can use " or '.
  • Factors: Factors are a weird mix of characters and numbers. Perhaps the best way to think of them is as a categorical variable. In this course, we will generally avoid the use of Factors.

Evaluation

To execute / evaluate / run code in R, there a few different ways to do it. The easiest way is to highlight whatever you are interested in running, and typing ctrl (Cmd on Mac) + enter. You can also just have your cursor on the line and use the same keys to run that specific line. There is also a button on the topright of the Source panel that says “Run”, which will do the same thing.

WebR

Before moving on to explore basic operations, I want to mention something you’ll see embedded throughout this course. I will be exhibiting code in each module in static code blocks. Many times, these code blocks, sometimes called code chunks, might generate output, plots, both, or nothing. Unfortunately, these code blocks are, for all intensive purposes, set in stone. In other words, besides collapsing/expanding them, you cannot really interact or experiment with them. This probably stifles student curiosity, since you’ll probably want to tweak things as you’re going through the notes.

WebR

To address this, I have included WebR chunks into each module’s notes. These chunks will look a bit different from the static chunks, and I encourage you to interact with them! You can write, alter, and execute code inside each chunk, and each WebR chunk will “remember” what you’ve run in other chunks. Go ahead and explore a bit with the chunks below:

Code
# This is a static chunk
# Notice how you cannot modify what's written here.

Basic Operations

Starting from data types, we can begin to perform operations on data.

Arithmetic for numeric values: Addition (+), Subtraction (-), Multiplication (*), Division (/)

Most, if not all, of the code blocks (and output) in this course will be collapsable. In other words, if you click on them, you can hide/display the code.

Code
# Example Comment.  Get ready to math.
5 + 5
10 / 3
4 + 3 * 100 # Another comment. Something about PEMDAS.
(4 + 3) * 100 # Something else about PEMDAS.
Output
[1] 10
[1] 3.333333
[1] 304
[1] 700

Basic Operations

Try some of this in WebR:

Basic Operations

Logic for boolean values: And (&), Or (|), Not (!)

  • “And” and “Or” take two boolean values and combine them to into a single boolean.
    • “And” returns TRUE only when both values are TRUE.
    • “Or” returns TRUE only when both values are not FALSE (or at least one is TRUE).
  • “Not” negates a single boolean.
  • Think of logic like english:
    • “The sky is green AND the grass is green” is a FALSE statement
    • “The sky is green OR the grass is green” is a TRUE statement
Code
TRUE & FALSE
TRUE | TRUE
TRUE & !FALSE
Output
[1] FALSE
[1] TRUE
[1] TRUE

Basic Operations

Try some of these with WebR:

Basic Operations

Some other operations to note are <, >, >=, and <=, since these are in between logical and numerical operations.

Code
5 < 3
5 > 3
5 > 3 & 4 > 3
5 > 3 | 4 > 5
Output
[1] FALSE
[1] TRUE
[1] TRUE
[1] TRUE

Basic Operations

We will discuss operations for characters and factors later in the course. However, there may be times where you will want to convert data from one type to another. To convert from a number or boolean to character, you can use as.character(). To go from text to numeric, you can use as.numeric().

Code
as.character(5)
as.numeric("5")
as.logical("FALSE")
as.numeric("Five")
Output
[1] "5"
[1] 5
[1] FALSE
[1] NA

Notice how the final line produces an NA value. Seeing an NA value is seeing R shrug its shoulders. It is not smart enough to know that "Five" is 5, so it returns a missing value. NA values can mess up a lot of things in R. For example, what is the average of this collection of numbers: 2, 4, NA, 8? R will return NA when asked, because it isn’t sure how to think about the NA. A helpful function, therefore, is is.na(). This returns a boolean depending on the input.

Code
is.na(as.numeric("5"))
is.na(as.numeric("Five"))
Output
[1] FALSE
[1] TRUE

As a final note, if you run the above in RStudio, you might get output saying Warning: NAs introduced by coercion. This is R giving you a heads up about what I just mentioned above. Sometimes, this warning is expected, but other times it’s a good signal to check your data!