Imagination Exercise

Author
Affiliation

Alex Cardazzi

Old Dominion University

All materials can be found at alexcardazzi.github.io.

Completion Requirements: Complete the following questions in RStudio via the imagination template. When you are ready, submit your rendered html to Canvas. Be sure to prepare your pitch for an in class discussion.

Grading Criteria: Full credit will be given to well formatted and detailed answers. Partial credit will be given if I can follow your work and/or see your thought process via code, comments, and text. Point totals are listed next to each question.

Assignment Summary

Malcolm Gladwell, the host of Revisionist History, spends six episodes asking (social) scientists about what their “magic wand experiment” would be.

What if you could design any experiment you wanted? Without worrying about money, ethics, logistics, or even the laws of nature? Revisionist History kicks off the season by giving some of the world’s smartest scientists a magic wand to create the experiment of their dreams.

These scientists come up with a bunch of different studies and research designs that would allow them to differentiate between correlation and causality. In this assignment, students will have to imagine up their own “magic wand” experiments with one condition: each experiment or study must use the particular causal inference strategy being studied. Students will have to write up their research designs and pitch them.

In general, students may find it helpful to consider their research topic when brainstorming ideas. For the first imagination exercise, think through an RCT experiment that would yield a causal estimate of the relationship you are interested in. I provide an example below that asks how racial integration of local Major League Baseball teams in the 1950s influence local segregation.

Your imagination exercise should address the following. Write your answers in a Quarto document and submit a properly rendered HTML file on Canvas using the template hyperlinked above.

Research Setting

Introduce and describe the necessary background information on your research setting. What happened? Who or what was effected, and by what? What is the time frame? What is the research question you think this setting might help you answer? The setting studied should be either historic (e.g. actually happened) or a realistic hypothetical. (4 Points)

Setting Example

On April 15th, 1947, Jackie Robinson, an African American baseball player, played his first game with the Brooklyn Dodgers. Robinson became the first African American to play in Major League Baseball (“MLB”), acting as an important precursor to the civil rights movement in the ’50s and ’60s. Slowly, over the next decade or so, MLB teams began to integrate. At this point in history, segregation was very much alive and Robinson became a divisive figure.

The research question I am interested in exploring is how did the timing of the decision by the local MLB team to integrate effect segregation? For example, the Brooklyn Dodgers integrated in 1947, but the Pittsburgh Pirates did not integrate until 1954, the Atlanta Braves until 1950, the Chicago White Sox in 1951, etc. Did segregation in Brooklyn, relative to Pittsburgh, Atlanta, and other cities with still-segregated teams, decrease following 1947?

Application Description

Discuss how the particular method we are studying will help you answer the research question. How would the method work in the context of this setting? Discuss the assumptions needed for this method to deliver credibly causal estimates, and critically evaluate how likely the assumptions are to hold in this setting. Are there any potential backdoors or colliders you need to be wary of? (8 Points)

Setting Example

This setting is ripe for study using a difference-in-differences (“DiD”) design because there are units that receive treatment at different times. Therefore, in each time period (MLB season, month, year, quarter, etc.), there are some treated units (teams/cities) and some control units. Therefore, I can compare the degree segregation in treated and control cities before and after their teams integrate. If this were Module 2, and we have not been introduced to DiD yet, I could write about randomly assigning integration to different teams in an RCT.

Then, I would need to discuss the assumptions associated with DiD (parallel trends, SUTVA, etc.), and whether I believe they would hold. Also, what are other confounding factors? For example, what if changing political beliefs of cities caused reduced segregation and caused their local MLB team to integrate? This would be an example of a confounder that I would need to control for.

Data Description

Describe the ideal data set for identifying the causal effect of interest described in the application above. You don’t need to constrain yourself to known, existing variables and data sets, although identifying actual data is OK. You may assume you have unlimited ability/resources to measure variables you want. Be sure to comment on the structure of the data. In other words, identify the unit of observation, the dependent variable, and the explanatory variables. (3 Points)

Data Description Example The ideal dataset would contain city-by-month level data with measures of segregation (dep. variable), demographic information, maybe political beliefs, and an indicator for whether the local MLB team had integrated. Measuring segregation could be difficult, so I am leaning on the “unlimited ability to measure variables” part of the assignment to deal with that issue. Demographic information might include percent of the population that is white, the percent of the population that identify as anti-segregationists, percent of the population that are baseball fans, and median age of the population. City and time fixed effects would also be used in this analysis.

Empirical Model

Write down a regression model (or series of models) that corresponds to your setting and question, and identify the causal parameter of interest.1 If the equation is too complex, you may articulate it in words, although be warned that this can sometimes be trickier. (3 Points)

Empirical Model Example

Segregation \(S\) in city \(c\) and year-month \(t\) is modeled as:

\[S_{ct} = \delta I_{ct} + X_{ct}\beta + \alpha_c + \tau_t + \epsilon_{ct}\] where \(I_{ct}\) represents an indicator for whether the local team had integrated, \(X_{ct}\) represents a vector of control variables, and \(\alpha_c\) and \(\tau_t\) represent city and time fixed effects. \(\delta\) represents the causal parameter of interest, which is the effect of integration on segregation.

(Expected) Findings

What do you expect to find? Who does the answer to this question help, and how/why? (2 Points)

(Expected) Findings Example

I expect MLB team integration will reduce segregation as people become more accustomed to the idea of equality by watching their team integrate. More here…

or

I do not expect MLB team integration to effect segregation in the rest of the city. This is because team owners are they themselves citizens of their MLB team’s city, and thus reflect (albeit potentially non-representative) city-wide opinions on segregation. Therefore, teams that integrate first are likely already in more liberal cities with individuals more likely to embrace ending segregation. From this point of view, we would need to be very careful about omitted variable bias! More here…

or

I believe MLB team integration will intensify segregation, as conservative fans feel their power slipping in sports so they tighten their grasp elsewhere in society. More here…

Video Abstract

Submit a short (e.g. 2-5 minutes) video of you pitching your imagination exercise. Be sure to cover all of the above topics including (but not limited to) institutional knowledge/background, specific research question, general and specific threats to your identification strategy, necessary assumptions, data, etc. This is your chance to convince me that you have an intuitive knowledge of the research design and can apply it. (10 Points)

Footnotes

  1. You may exclude any control variables or fixed effects that are not essential to the casual inference strategy.↩︎