Synthetic Control

Module 5.1: Basics

Alex Cardazzi

Old Dominion University

ODU & EVMS

In September 2023, the Virginia General Assembly amended the commonwealth’s budget to include funding for the upcoming ODU & EVMS merger. This represents significant forward progress integrating EVMS with ODU. Both institutions’ websites list quite a few potential benefits of this merger. Since we’re skeptical economists, we might be interested in evaluating some of the claims made by both institutions. How could we investigate the effects of the merger on some outcome?

ODU & EVMS

At first, you might think, “no sweat, just use DiD,” and you’d be right that we probably could use difference-in-differences. However, we’d run into a few issues with this sort of approach. First, with only one treated unit, we’d have very low statistical power and our standard errors would be very large. Therefore, we’d have a lot of variance in our estimate and it would be hard to differentiate it from zero. Second, it might be difficult to select a control group of universities. For example, is JMU a good control unit? What about VCU? UVA? Norfolk State? Even if we tried to calculate inverse probability weights, or use matching, having a single treated unit would severely limit our statistical power.

ODU & EVMS

So, if we can’t use DiD or PSM/IPW, are we out of luck? In this module, we are going to use a relatively new econometric method called Synthetic Control to estimate causal effects. This method is designed for case studies like the ODU/EVMS merger. Unfortunately, since ODU and EVMS have yet to merge, we cannot investigate the effects of the merger. Rather, to demonstrate this method, we are going to replicate a famous study by Abadie et al. (2015) which examines the effects of reunification in Germany in 1990.

ODU & EVMS

Abadie et al. (2015) provides some historical context for German reunification in 1990. In the interest of time and avoiding jumping into a subject matter I’m less than qualified to discuss at any length, I will defer to Abadie et al. (2015) and Wikipedia for history. See below for a geographic reference.

Map of West Germany and Surrounding Countries.

Synthetic Control

As alluded to, synthetic control is similar in spirit to difference-in-differences in terms of design (looking at treated/control units before/after some treatment), but not in terms of implementation. The goal of synthetic control, in words, is to create a “synthetic” version of the treated unit by using a weighted average of control units. In the case of German reunification, we are going to create a weighted portfolio of countries such that the time series of the outcome variable for synthetic West Germany looks similar to that of real West Germany before reunification.

Synthetic Control

Once we have synthetic West Germany, we can observe how its outcome variable evolves over time in comparison to the real West Germany. Remember – the synthetic West Germany doesn’t merge with East Germany – only the real one does! Therefore, so long as there aren’t any dramatic changes in the countries making up the synthetic, the synthetic can be thought of as an estimate of West Germany’s counterfactual potential outcome. Having an explicitly estimated counterfactual makes causal inference easy! We can then simply take the difference between $Y(D = 1)$ and $Y(D = 0)$ at each point in time.

Synthetic Control

Before getting into the details of how to estimate a synthetic West Germany, let’s work through setting up the problem as a more familiar DiD. First, let’s read in the data.

Synthetic Control

These data include per capita GDP (the outcome variable) in addition “inflation rate, industry share of added, investment rate, schooling, and a measure of trade openness”. Variable definitions can be found on page 15. The authors start with 23 countries plus West Germany, but explain why they drop some of them from the analysis in footnote 14. The exclude Luxembourg and Iceland due to their size, Turkey due to its outlier per capita GDP, and Canada, Finland, Sweden, and Ireland due to “profound structure shocks” during the sample period. The authors are left with 17 total countries including West Germany from 1960 to 2003. While the sample of countries are chosen for particular reasons, the time period less so.

Synthetic Control

In the following code chunk, produce a time series plot of per capita GDP for West Germany and the average per capita GDP across all other countries in the sample. Hint: use aggregate() to calculate averages by time and group.

The plot you generate should look something like the following:

../Code

df <- read.csv("https://alexcardazzi.github.io/econ708/data/repgermany.csv")
aggregate(list(real = df$gdp),
          list(year = df$year,
               wg = df$index == 7), mean) -> agg
plot(agg$year, agg$real, type = "n",
     xlab = "Year", ylab = "Per Capita GDP")
lines(agg$year[agg$wg], agg$real[agg$wg], col = "tomato", lwd = 2)
lines(agg$year[!agg$wg], agg$real[!agg$wg], col = "dodgerblue", lwd = 2)
legend("topleft", legend = c("OECD", "W. Germany"),
       lty = 1, col = c("dodgerblue", "tomato"), lwd = 2,
       bty = "n")
abline(v = 1990, lty = 2)

Plot

Synthetic Control

Hopefully it’s clear that an (equally weighted) average of the other OECD countries may not look so similar to West Germany. For example, you can see the two trends diverging over the course of the sample period. In a difference-in-differences framework, diverging pre-trends often signals potential violations of the parallel trends assumption. Of note, so far we have a likely violation of parallel trends and low statistical power resulting from a single treated unit. Let’s continue with our DiD analysis for instructional purposes and completeness. In the next code block, estimate a DiD model of the following form:

\[\begin{aligned}\text{GDP}_{it} =& \ \alpha + \beta\times \text{W.Germany}_i + \gamma\times\text{Reunification}_t\\ &+ \delta\times(\text{W.Germany}_i \times\text{Reunification}_t) + \epsilon_{it}\end{aligned}\]

Synthetic Control

The DiD coefficient estimate should be 603.98. How should we interpret this point estimate? We would say that the per capita GDP increased by about 600 units following reunification compared to what it would have been without reunification.

Synthetic Control

However, this point estimate can only be taken seriously if we believe the parallel trends assumption. As we said before, interpreting this as causal is probably not a great idea because of the diverging pre-trends. In addition, the point estimate is particularly noisey; the coefficient’s 95% confidence interval spans $-2,800$ to $+4000$. In other words, we shouldn’t have much confidence that this estimate isn’t just a product of randomness. So, using DiD, we end up with a positive, but statistically insignificant, result. Let’s see what we can do with synthetic control.

Synthetic Control

First, let’s dive a bit deeper into the units in the control group. We will call this sample of countries the donor pool, since they are candidate contributors to the synthetic West Germany. Some of these “donors” will have more relative importance to the synthetic than others. For example, New Zealand is probably less similar to West Germany than Austria, Belgium, or Denmark, all of which share a border with West Germany. Therefore, New Zealand will probably get less weight than these other countries when constructing the synthetic. Intuitively, this is fairly easy to think through. Mathematically, though, this can get quite complicated. Hopefully, the DiD and PSM/IPW math was not too difficult to follow. The math behind synthetic control is much more cumbersome, and ultimately outside the scope of this course, so we’ll move ahead straight into coding the method.

Code

To estimate a synthetic control for West Germany, we are going to use the Synth package. We will be able to do everything we want using Synth, but there are other packages that make some of the following steps a bit easier. I am choosing not to demonstrate these other packages for two main pedagogical reasons. First, one of these packages makes use of R syntax that is inconsistent with what we’ve used in this course thus far. I can imagine this being confusing, so I would like to stay away from that. Second, some of the packages require additional installation steps that differ across operating systems. I am happy to point interested students to these alternate packages, but I just wanted to make that disclaimer.

Code

Let’s load Synth and get going. This package contains three important functions that we will use:

dataprep(): this function prepares the data for estimation.
synth(): this function performs the synthetic control estimation.
synth.tab(): this function generates some summary tables to evaluate how the synthetic.

Code

Let’s begin by preparing the data for estimation with dataprep(). There are a lot of arguments you’ll need to use for the function (reference manual), so be sure to take a look at the comments in the code below for some descriptions.

../Code

dataprep_out <- dataprep(
  foo = df, # Dataset
  unit.variable = "index", # Name of the unit IDs
  unit.names.variable = "country", # Name of units
  time.variable = "year", # Name of time identifier
  dependent     = "gdp", # Dependent Variable
  treatment.identifier = 7, # Which of the units IDs is treated?
  # Controls Identifiers tells which units should be considered control
  # Note -- here we use all *except* 7.
  controls.identifier = unique(df$index)[!unique(df$index) %in% c(7)],
  # Time periods to calculate average predictor values to use in optimization
  time.predictors.prior = 1981:1990,
  # Time periods to optimize over
  time.optimize.ssr = 1960:1989,
  time.plot = 1960:2003, # Total time period (pre + post)
  # Columns to help with prediction/optimization.
  predictors    = c("gdp","trade","infrate"),
  # Columns to help with prediction, except for only specific time periods.
  special.predictors = list(
    list("industry" ,1981:1990, c("mean")),
    list("schooling",c(1980,1985), c("mean")),
    list("invest80" ,1980, c("mean"))
  )
)

Output


 Missing data: treated unit; special predictor: special.industry.1981.1990 ; for period: 1990 
 We ignore (na.rm = TRUE) all missing values for predictors.op.

Code

Once the data has been organized with this function, the output is simple to generate via synth(). This function is doing a lot of calculation, which we are again skipping for brevity, so it may take some time to run. However, once it’s complete, we can evaluate the output using synth.tab().

../Code

synth_out <- synth(dataprep_out)
tab_out <- synth.tab(synth_out, dataprep_out)

Code

The element tab_out will contain a few interesting tables. First, we’ll examine tab_out$tab.pred, which gives us summary statistics for our predictors in the real and synthetic West Germany.

../Code

tab_out$tab.pred

Output

                              Treated Synthetic Sample Mean
gdp                         15808.900 15805.386   13669.381
trade                          56.778    58.830      59.831
infrate                         2.595     4.356       7.617
special.industry.1981.1990     34.538    34.533      33.794
special.schooling.1980.1985    55.500    53.472      38.659
special.invest80.1980          27.018    27.020      25.895

Code

In general, it seems like synthetic West Germany has a nearly identical economic profile as real West Germany. Okay, that’s great, but what countries, and how much of each, make up synthetic West Germany? The answer to this question can be found in tab_out$tab.w:

../Code

tab_out$tab.w[order(-tab_out$tab.w$w.weights),]

Output

   w.weights  unit.names unit.numbers
3      0.390     Austria            3
12     0.179 Switzerland           12
1      0.169         USA            1
14     0.079       Japan           14
10     0.060      Norway           10
2      0.057          UK            2
9      0.009 Netherlands            9
4      0.007     Belgium            4
5      0.007     Denmark            5
6      0.007      France            6
8      0.007       Italy            8
20     0.007   Australia           20
21     0.007 New Zealand           21
16     0.005      Greece           16
19     0.005       Spain           19
18     0.004    Portugal           18

Code

Perhaps unsurprisingly, we find that Austria makes up 39% of synthetic West Germany followed by Switzerland and the USA combining for another 35%. As a sanity check, it’s often helpful to see the weights assigned to different units in the donor pool. That said, you should not agonize over these weights. For example, it makes sense that Austria, Switzerland, and the USA were given large weights while New Zealand and Australia were given low weights. However, you might be surprised to see that Japan got more weight than Belgium and Denmark. This is an observation you might want to note, but not stress.

Code

Now that we have our weights, let’s generate the per capita GDP time series for our synthetic West Germany. To do this, we need two pieces of information. First, dataprep_out contains the per capita GDP trend for each unit in the donor pool. We can access it via dataprep_out$Y0plot. These data are organized such that each column represents a control unit and each row represents a year. Next, we need the weights for each unit from the synth_out object. We can access this via synth_out$solution.w. We are going to multiply each column in dataprep_out$Y0plot by its respective unit weight in synth_out$solution.w. Then, we are going to add the values across rows so we are left with a single column and many rows. In math, this is called matrix multiplication. To do this in R, we are going to use %*%. dataprep_out also contains the time units (dataprep_out$tag$time.plot) and real West Germany data (dataprep_out$Y1plot). Let’s create a data.frame with this information.

Code

../Code

sc_w_germany <- data.frame(
  time = dataprep_out$tag$time.plot,
  real = as.numeric(dataprep_out$Y1plot),
  synth = as.numeric(dataprep_out$Y0plot %*% synth_out$solution.w))

# %*% -- this is matrix multiplication in R.
# In words, it multiplies each time series by the unit's weight, and then sums.
# This is how we calculate the synthetic from the raw data.

Now that we have this data.frame, we can plot both the real West Germany’s per capita GDP and its synthetic’s per capita GDP.

../Code

# Create an empty set of axes
plot(sc_w_germany$time, sc_w_germany$real, type = "n",
     ylab = "Per-Captia GDP", xlab = "Year",
     ylim = range(sc_w_germany$real, sc_w_germany$synth))
# Fill in the synthetic W. Germany
lines(sc_w_germany$time, sc_w_germany$synth, col = alpha("dodgerblue", .6), lwd = 3)
# Fill in the real W. Germany
lines(sc_w_germany$time, sc_w_germany$real, col = alpha("tomato", .6), lwd = 3)
abline(v = 1990, lty = 2)
legend("topleft", bty = "n", lty = 1,
       col = c("tomato", "dodgerblue"),
       c("W. Germany", "Synth. W. Germany"))

Plot

Code

Given this figure, it should be fairly obvious that the results of the synthetic control suggest a negative effect of German reunification on per capita GDP. However, since we have $Y(D = 1)$ and $Y(D = 0)$, we can directly calculate a treatment effect via subtraction. This is usually shown as the “gap” between the real and the synthetic. Let’s plot this gap:

../Code

plot(sc_w_germany$time, type = "l",
     sc_w_germany$real - sc_w_germany$synth,
     ylab = "Gap in Per-Captia GDP", xlab = "Year", lwd = 3)
abline(h = 0, v = 1990, lty = c(1, 2))

Plot

Code

How should we think about this? Initially, there appears to be a positive bump in per capita GDP followed by a large reduction in per capita GDP. An important thought/question you might have now is the following: using DiD, we found a positive effect, but it was insignificant. How do we know this increase (or decrease) is not also insignificant? How can we create a confidence interval for these effects? If you’ve asked these questions, you’re touching one of two most critical weaknesses of synthetic control (the other weakness being how computationally expensive it is compared to other estimators like OLS). The issue of inference in synthetic control is on the current ~~cutting~~ bleeding edge of econometric research.

Placebo Tests

Inference is unique in synthetic control environments in that we do not calculate standard errors nor confidence intervals. Rather, we perform placebo tests, sometimes called permutation tests, for each control unit in the donor pool. Essentially, one at a time, we pretend that each other unit in the donor pool was treated instead of West Germany. Of course, none of these units were in fact treated, so there shouldn’t be any large changes between the actual and the synthetic. These placebos should provide some variance for us to use when making inference. If we find that most of the placebo effects are as large, or larger, than the estimated effect of our treated unit, then we should reduce our confidence in our estimated treatment effect. On the other hand, though, if the placebo effects for all other treated units are much smaller in magnitude compared to the treatment effect for the treated unit, then we should have more confidence in our estimate.

Placebo Tests

The procedure for this is simple albeit time consuming computationally. Effectively, we will re-run the above code for each other unit and compare the average gap for each unit to the average gap for the treated unit.

../Code

uniq_id <- unique(df$index)
uniq_id <- uniq_id[uniq_id != 7]
placebo_list <- list()
for(i in uniq_id){
  
  # Prepare Data
  dataprep_out_i <- dataprep(
    foo = df,
    unit.variable = "index",
    unit.names.variable = "country",
    time.variable = "year",
    dependent     = "gdp",
    treatment.identifier = i,
    controls.identifier = unique(df$index)[!unique(df$index) %in% c(7, i)],
    time.predictors.prior = 1981:1990,
    time.optimize.ssr = 1960:1989,
    time.plot = 1960:2003,
    predictors    = c("gdp","trade","infrate"),
    special.predictors = list(
      list("industry" ,1981:1990, c("mean")),
      list("schooling",c(1980,1985), c("mean")),
      list("invest80" ,1980, c("mean"))
    )
  )
  
  # Calculate Synthetic Weights
  synth_out_i <- synth(dataprep_out_i)
  
  # Generate data.frame Output
  data.frame(i = i,
             time = dataprep_out_i$tag$time.plot,
             synth = as.numeric(dataprep_out_i$Y0plot %*% synth_out_i$solution.w),
             real = as.numeric(dataprep_out_i$Y1plot)) -> df_i
  
  # Store Result
  placebo_list[[length(placebo_list) + 1]] <- df_i
}

# Combine Results
placeboz <- do.call(rbind, placebo_list)

# Add W. Germany from earlier...
sc_w_germany$i <- 7
sc <- rbind(placeboz, sc_w_germany)

Placebo Tests

Once we calculate the gap between the real and synthetic for each control unit, we can visually inspect how West Germany’s gap compares to all the others. To do this, we will plot all of these gaps on the same set of axes. See below for the necessary code:

../Code

# Create Empty Set of Axes
plot(sc$time, sc$real - sc$synth, type = "n",
     xlab = "Year", ylab = "Gap")
for(i in unique(sc$i)){
  
  # Loop through each index and plot lines...
  lwd <- 1; if(i == 7) lwd <- 4
  lines(sc$time[sc$i == i], lwd = lwd,
        (sc$real - sc$synth)[sc$i == i],
        col = alpha("black", ifelse(i == 7, 1, 0.3)))
}
abline(v = 1990, lty = 2)

Plot

Placebo Tests

I’d like you to notice two things about the figure above. First, the gap for West Germany is smaller than the gaps of a few other control units. This is suggestive that maybe the change in per capita GDP isn’t significant. Second, notice how the pre-treatment fit for some of these units is particularly poor. Often, when pre-treatment fit is poor, the estimated causal effect tends to be quite large. We will return to this idea in a moment, but I wanted to raise the point here while we’re inspecting the image.

Placebo Tests

Now that we have seen the distribution of gaps, we want to create some measure similar to a p-value for inference. Recall the definition of a p-value: the probability of observing an effect at least as large as this given the null hypothesis is true. Of course, our null hypothesis is that reunification did not cause a change in per capita GDP. So, how do we get from gaps to p-values? First, we want to calculate the average (post-period) gap for each unit. Next, we want to order the effect sizes by magnitude. Then, we can directly calculate the probability of observing an effect at least as large as the effect for the treated unit. To calculate the exact p-value, we would divide the position of the treatment effect (for West Germany) in the ordered list by the number of total number of elements in the list. For example, if West Germany had the 3^rd largest effect size out of 17, then its p-value would be $\frac{3}{17}$, or $0.176$. Let’s calculate the p-value for West Germany’s estimate using this method.

Placebo Tests

../Code

# Average Post-Period Gap per Index
aggregate((sc$real-sc$synth)[sc$time >= 1990],
          list(sc$i[sc$time >= 1990]), mean) -> effect_size
# Ordering effect_size by abs() of Estimate
effect_size <- effect_size[order(-abs(effect_size$x)),]
# Identifying position of ID 7 within the effects
#   divided by the total number
cat("p-Value:", round(which(effect_size$Group.1 == 7) / nrow(effect_size), 3))

Output

p-Value: 0.353

Placebo Tests

Now, let’s return to what was previously mentioned about poor fits in the pre-period. If we cannot generate a good fit in the pre-period, we’re also likely to struggle to estimate a valid counterfactual in the post-period. So, how can we quantify “poor” in terms of pre-treatment fit? We determine pre-treatment fit using Root Mean Squared Prediction Error (RMSPE). Essentially, we take an average of the squared gap across pre-period observations. The, we take the square root of this. This is very similar to converting from variance to standard deviation. Then, if the fit is worse than two or three times that of the treated unit’s RMSPE, we remove it from the p-value calculation. For some code that does this, see below:

../Code

# Calculate RMSPE for each unit in the pre-period.
aggregate((sc$real-sc$synth)[sc$time < 1990]^2,
          list(sc$i[sc$time < 1990]),
          function(x) sqrt(mean(x))) -> rmspe
# Using 3x as the cutoff.
keep <- rmspe$Group.1[which(rmspe$x <= 3*rmspe$x[rmspe$Group.1==7])]

numerator <- which(effect_size$Group.1[effect_size$Group.1 %in% keep] == 7)
denominator <- nrow(effect_size[effect_size$Group.1 %in% keep,])
cat("p-Value:", round(numerator / denominator, 3))

Output

p-Value: 0.1

Placebo Tests

Once we identify which units to drop from the p-value calculation, we can repeat the routine from above with a slight modification.

../Code

plot(sc$time, sc$real - sc$synth, type = "n",
     ylim = range((sc$real-sc$synth)[sc$i %in% keep]),
     xlab = "Year", ylab = "Gap")
for(i in unique(sc$i)){
  
  if(!i %in% keep) next
  lwd <- 1
  if(i == 7) lwd <- 4
  lines(sc$time[sc$i == i], (sc$real - sc$synth)[sc$i == i], lwd = lwd,
        col = alpha("black", ifelse(i == 7, 1, 0.3)))
}
abline(v = 1990, lty = 2)

Plot

Placebo Tests

Before wrapping up, I want to mention that the authors estimate alternate robustness checks, falsifications, and placebos. It would be worth it to read about these in the manuscript linked above. As a summary, see below:

In-time placebo (Figure 4)
Ratio of Pre- and Post-period RMSE (Figure 5)
Leave-One-Out (Figure 6)

Synthetic Control

ODU & EVMS

ODU & EVMS

ODU & EVMS

ODU & EVMS

Synthetic Control

Synthetic Control

Synthetic Control

Synthetic Control

Synthetic Control

Synthetic Control

Synthetic Control

Synthetic Control

Synthetic Control

Code

Code

Code

Code

Code

Code

Code

Code

Code

Code

Code

Placebo Tests

Placebo Tests

Placebo Tests

Placebo Tests

Placebo Tests

Placebo Tests

Placebo Tests

Placebo Tests

Placebo Tests

Further Reading

Further Reading