Module 1.1: Types of Data
Old Dominion University
Nowadays, data is (are?) everywhere. Data has become a buzzword for those in business, policy, government, industry, science, etc. Understanding how to work with data, therefore, is becoming an increasingly valuable and marketable skill. Employers want to hire people who can use data.
However, everyone “knows” Excel. Everyone “knows” what a correlation is. So, how can you differentiate yourself from others? Put differently, how can you send a credible signal to employers that you really know data?
This course will (re-)introduce you to statistics, econometrics, and R.
A statistic is a measurement from some data.
Descriptive Statistics
Descriptive statistics are meant to organize, summarize, and present data in ways that helps people understand key facts about the data.
Inferential Statistics
Inferential statistics are descriptive statistics that are used to estimate properties of a population based on a sample.
Suppose I want to know the average height of ODU students. Asking all 20,000+ students how tall they are would be very expensive (time + cost + effort), so I need to do something else.
Statistics can be qualitative or quantitative.
Qualitative Statistics
Statistics that fall under this category are meant to describe characteristics or traits of something that are not naturally quantifiable. Examples include eye color, nationality, etc.
Quantitative Statistics
Quantitative statistics are numerical properties of some data. Examples include height, income, etc.
Statistics can also be broken down into what type of measurements they are. Four important distinctions are:
Nominal: Nominal data is represented by labels or names. In addition, there is no natural order to these data. An example would be “language spoken”, and responses might be “english”, “spanish”, “japanese”, “italian”, etc. Nominal data is therefore always qualitative.
Ordinal: Ordinal data is recorded in reference to some relative ranking. There is indeed an order, unlike nominal data, but the numbers representing the order do not have any other meaning. For example: a list of the fastest 100 meter dash times. The difference between #1 and #2 is not the same as the difference between #8 and #9.
Interval: For interval data, the distance between numbers is indeed meaningful, and there must be units that accompany the measurements. However, zero is usefully “meaningless”. For example: Fahrenheit or dress sizes. In both cases, the difference between any two numbers is the same. At the same time, zero degrees Fahrenheit or a size zero dress does not mean an absence of temperature or fabric.
Ratio: Ratio measurements are interval measurements, except zero has a natural meaning. When zero has a natural meaning, the ratio of two measurements also has meaning. For example, consider income. Someone with $50 has twice as much as someone with $25. For Fahrenheit, 50 degrees is not twice as hot as 25 degrees. In addition, having $0 means you have no money (and therefore negatives mean something too).
When measurement is quantitative, the measurement can be either discrete or continuous.
Discrete: Discrete measurements do not allow for fractions or decimals. Only things that can be measured as integers (1, 2, 3, …) qualify as discrete. For example, the number of followers you have on social media is discrete because you cannot have half of a follower.
Continuous: Continuous data can be sub divided into non-whole numbers. For example: time is a continuous measurement because you can have 15.623941004 seconds. Income can be considered continuous even though dollars only go to two decimal places – it is close enough that people would consider it continuous.
In reality, data sets come with rows and columns. Often, each column will contain a different type of measurement, whether it be qualitative or quantitative. In econometrics, there are three main ways to organize data. Usually, we have units (states, firms, individuals, etc.) and time (year, month, quarter, day)
In this course, students will learn to program in the language R
. This will be new for most students, but it is a powerful tool that will help you stand out on the job market.
Many jobs also ask for applicants to be knowledgeable in Excel. Of course, this is a widely used software, and strong knowledge is definitely important. As a challenge, however, let’s try to not open Excel. If you can put that constraint on yourself, learning R
will be easier.
You should think of learning R
as learning a foreign language. Immersion is the best way to learn a foreign language, and R
is no different. If you cut out Excel, you will be forced to use R
, and you will improve faster. Or, if you have to do something in Excel, try to replicate it in R
.
ECON 311: Economics, Causality, and Analytics