Final Project

Author
Affiliation

Alex Cardazzi

Old Dominion University

All materials can be found at alexcardazzi.github.io.

Project Summary

Students will be expected to write a research report with an econometric analysis on a topic of their choosing (and to be approved by the instructor). This project will test each student’s ability to both perform and communicate rigorous econometric analyses. Students are expected to leverage AI as their research assistant throughout the process of working on their project.

Students must use either cross sectional or panel data, but not time series data. In addition, students must have at least one outcome variable and at least three explanatory variables. See examples throughout.

Students will submit checkpoints throughout the semester to ensure they are on target. Expectations for each checkpoints are below.

Checkpoint 1

ePortfolio | 20 Points

For this checkpoint, students will have to do some set up for the rest of the semester. First, they will have to create an outline of an ePortfolio. Maximally, you should think of your ePortfolio as a personal website. Minimally, your ePortfolio will act as an online resume where you can show off your work. Second, students must must upload/embed a placeholder HTML file for their final project. This is important because students will need to do this at the end of the semester when submitting their final project. It is better to do this now, before things get too crazy. Lastly, students will need to initialize and share a project-specific chat with ChatGPT using a prompt provided below.

ePortfolio

ODU supports student ePortfolios via the Office of Academic Success Initiatives & Support, and students are encouraged to explore these resources (e.g., they have a list of set up tutorials). I have personally used Google Sites in the past and found it to be both intuitive and functional. Other faculty members might suggest websites such as Wix. The two major advantages of these services are that they are “point and click” softwares that require no coding and you can continue using them even after you graduate.

Note: if you choose to use Google Sites (which, again, I think is a good idea), you will need to modify a setting to make your site available to the public. Once you create your site, click on the “share” button (it should look like a silhouette of a person with a “+” next to them), then under “Published Site”, click on “Public”. See below for a short .gif demonstration.

Project Placeholder

Once you have created your ePortfolio and downloaded R and RStudio (instructions will be given in class), you should generate and upload your project placeholder. To do so, download the project template, render it in RStudio, and upload/embed. Sometimes, this last part can be a bit of a pain for students. Note that you will first need to view and copy the HTML file’s underlying code (how to view underlying code, then ctrl+a to highlight all, then ctrl+c to copy). Then, you will need to paste this code into an “Embed” element in your ePortfolio (click for Google Sites; click for Wix). I will be happy to help troubleshoot via email or Zoom on any of these steps, so please reach out if you are having issues!

Initialize AI Chat

Use the following prompt to initialize your AI research assistant:

This prompt is meant to initialize you as a research assistant for a student taking an introduction to econometrics course. The course leans towards the applied side of econometrics so students will also be learning (base, not tidyverse) R simultaneously. As a research assistant, you may help/advise the student but not do things for the student. This chat will be used throughout the semester for the student’s final project, where they are meant to write an empirical research paper on a topic of their choosing. You can help them settle on a research topic/question, find and clean data, and run their analyses. At the beginning of each of your messages, always put the current date and time [YYYY-MM-DD H:i] (use eastern time). The student will now begin interacting with you.

Next, click on the share button button in the upper right of the screen. For the first time, it should say “Create link,” but each time after that it should say “Update link.” You should now be able to copy-and-paste the link into the submission box. You will have to re-share this link with me each time you submit a checkpoint.

To receive full credit for this checkpoint, students must submit a working URL that navigates to an ePortfolio home page with a professional picture (or a placeholder image), their name, major, home town, graduation date, and small bio. Students must also upload/embed a placeholder HTML file for their final project and share with me their initial AI conversation.

Checkpoint 2

Topic(s) | 20 Points

To receive full credit, students must submit multiple research topics they’re interested in. Students are encouraged to submit multiple (3-5) topics to keep their options open. Some examples of topics include: air quality, housing prices, sports, education, the wage gap, crime, life expectancy / mortality, traffic fatalities, etc. Be sure to identify and/or discuss some potential data sources. Generally speaking, students seem to have the most success when using readily available state by year level data. Full credit will be given to students with at least three topics accompanied with possible data sources.

For this checkpoint, you should brainstorm with your AI research assistant. You should describe your interests to the AI and ask it to come up with some topics related to your interests. Be sure to also tell your research assistant that this is your first time working with R, so you would prefer data that is easy to work with from a programming point of view. For example, you would not want the AI to recommend a project that would require millions of rows of data, or hours and hours of data collection or data cleaning. You should not necessarily follow exactly what the AI tells you, but rather iterate and use it to help you brainstorm. You may also come up with projects on your own, and then ask the AI what it thinks about the project’s difficulty level, etc. You will need to re-share your chat link upon submission. Note that I want your submission to be written by you and not by the AI.

An example topic proposal that would satisfy the requirements:

Example Materials

I am interested in looking at country-level success at the Olympics and country-level economic indicators. Specifically, my hypothesis is that the number of medals won by a country is related to the country’s economic characteristics such as GDP per capita, income inequality, and population. I believe stronger economies will perform better at the Olympics, but more income inequality will reduce success. As someone who is interested in both macroeconomics and sports, this seems like an exciting project. The number of Olympic Medals by year and country can be found here: http://www.olympedia.org/countries. I have not yet found GDP per capita nor income inequality, but I found some population data here: https://data.worldbank.org/indicator/SP.POP.TOTL?end=2022&start=1960&view=chart. I am working on getting this data to be in a country-by-year format (meaning each row represents a country in a given year), but I do not have it yet.

Once the topic(s) is(are) approved by the instructor, the student should look to start/continue their data collection. Note that students should expect to work with datasets that are a few hundred, if not a few thousand, rows.

Checkpoint 3

Data | 20 Points

For this checkpoint, students should submit proof of progress of collecting/cleaning data for their project. Students need not submit or upload their data, but must have evidence that they have made progress such as a table with summary statistics and/or some form of data visualizations to receive full credit. As a reminder, students must have a minimum of four variables: one outcome and three explanatory. Students should submit something like the following (in addition to some writing about the data, their hypotheses, etc.). It would be a good idea to do this within the project template and to consult their research assistant for help with data cleaning, etc. Again, links must be re-shared.

Example Materials

Data for this project come from a few sources. First, information regarding each country’s success at the Olympics comes from olympedia.org. Second, data for countries’ GDP per Capita, gini index, and population come from the World Bank. Data are collected for 13 countries over 63 years (1960 - 2022). However, since the Olympics only occur once every four years, I can only keep 16 of the years.

Summary statistics are presented in the table below…

The figure below displays the unconditional relationship between medals won in the Olympics and the GDP per Capita…

Unique Missing Pct. Mean SD Min Median Max
gdp 191 9 20.7 19.5 0.1 14.3 101.2
pop 208 0 147.7 288.4 3.6 56.9 1407.7
gini 64 62 33.2 4.7 24.9 33.2 42.4
medals 158 10 347.1 175.0 9.0 313.0 849.0
Plot

Checkpoint 4

Analysis Pt. 1 | 20 Points

For this checkpoint, students should submit at least one written regression equation1, some estimated coefficients, and their interpretations. Full credit will be awarded to students with these three requirements. This can be a simple single variable equation if that is all the student is ready for, but the idea is to have something I can give feedback on. Be sure to discuss your modeling decisions – are you using logs? Binary variables? An example would be as follows:

Example Materials

I will model the log of Medals won by country \(i\) in time \(t\) using the log of the country’s GDP, the log of the country’s Population, and the coutry’s Gini coefficient. I think using log for Medals, GDP, and Population is approprite because … . This would change the interpretation from … to … . Gini coefficient is not logged because … .

\[\log(\text{Medals}_{it}) = \beta_0 + \beta_1 \log(\text{GDP}_{it}) + \beta_2\log(\text{Population}_{it}) + \beta_3 \text{Gini}_{it} + \epsilon_{it}\]

  1. \(\beta_1\): A one percent increase in GDP per capita is associated with a \(\beta_1\)% increase in medals. I expect \(\widehat{\beta_1}\) to be a positive number since I hypothesize that richer countries will perform better at the Olympics.
  2. \(\beta_2\): …
Relationship Between Macroeconomic Factors and Olympic Medals
Olympic Medals
log(GDP per Capita) 0.153***
(0.043)
log(Population) 0.222***
(0.049)
Gini Index 0.023+
(0.013)
Constant 3.825***
(0.331)
Num.Obs. 79
R2 0.589

The estimate of \(\beta_1\), or \(\widehat{\beta_1}\), is equal to 0.153, indicating an expected increase of 0.15% for each 1% increase in GDP per capita…

You can and should consult AI for this step, but do not forget to update and include the link to your chat.

Checkpoint 5

Analysis Pt. 2 | 20 Points

This checkpoint should be thought of as a continuation of the previous checkpoint, though you should incorporate feedback from the previous checkpoint.

As an example, maybe you receive feedback that using GDP and Population as independent variables is inappropriate, and you should instead use GDP per Capita. I might ask you to estimate the first model from the previous checkpoint along side this new model, and compare the results. Full credit will be given to students who incorporate feedback and submit a more fleshed out analysis.

Final Project

Final projects must be rendered .html files that are uploaded to an ePortfolio using the final project template. Projects must be written by you (not AI) and consist of the following sections:

  • Introduction
    • In about two or three paragraphs, introduce and motivate your topic. Convince the reader that this topic/problem/question is important and that they should care.
    • In about one or two paragraphs, give a short summary of your analysis. What data do you use? How do you model your outcome variable(s)? What do you find? What is learned from your analysis?
    • By the end of this section, the reader should understand what you are trying to solve, how you solve it, and what the solution is. See this link for a helpful “introduction formula.”
  • Data
    • Introduce and discuss your data. Where do the data come from? How are the data collected? How did you collect the data? Is the data unique in some way? Why are these data appropriate for answering your question?
    • Describe the qualitative and quantitative properties of your data. Create and discuss summary statistics and visualizations that help the reader understand the data.
    • By the end of this section, someone who has never seen your data should not have any questions about your data.
  • Empirical Analysis
    • Write down the model(s) you estimate. Explain your modeling choices (e.g. log(), etc.). List out your hypotheses and the rationales behind them.
    • Present the results of your analysis. Interpret the coefficients.
    • Now, put these results in the context of your topic. Are these results expected or unexpected? Large or small? Significant or insignificant? What does this mean for making policy makers, business leaders, etc.?
  • Conclusion
    • Remind the reader of your topic, why it is important, and what you find.
    • Discuss the implications of your findings. What about the limitations? What could be done better? Are there any possible alternative explanations for your findings?
    • See this link for a helpful “conclusion formula.”

Project Rubric

REDO EVERYTHING HERE …

Structure

0 - 10 Points

  • The paper is organized into sections with appropriate (sub-)headings and free from formatting issues.
  • Writing is free of spelling and grammar mistakes. The style of writing is professional though clearly human.
  • The paper is uploaded to the student’s ePortfolio correctly and can be accessed easily.

Econometric Analysis

0 - 30 Points

  • Data sources are provided, described, and thoroughly discussed.
  • The data used is appropriate for addressing the research question.
  • Summary statistics are presented and discussed.
  • Visualizations are labeled, informative, and discussed.
  • The econometric model is written out and clear.
  • Hypotheses for each coefficient are provided and each aligns with economic theory.
  • Results are estimated and displayed appropriately in either a table or figure, or both.
  • Both the magnitudes and signs of the results are interpreted correctly. Weaknesses/limitations of the analysis are discussed.
  • Technical concepts are effectively and efficiently communicated throughout the project.

Code

0 - 10 Points

  • All code is present, easy to follow, well commented, and neatly folded. A reader could easily replicate the project using what is described in the text and the code supplied.

Footnotes

  1. You can have your research assistant (AI) help you with writing the equation. Ask it to write the equation in LaTeX for Quarto.↩︎