Difference-in-Differences is one of the most widely used research designs in the social sciences. This is likely due to how (relatively) easy it is to set up and to understand. While economists tend to prefer designs like Regression Discontinuity (Module 7) in terms of credibility, DiD remains a much more popular method. There have been many econometric innovations in the past decade or so, but none as big as the DiD one. Certainly, no other method has had a “revolution” in the way DiD has.
In these notes, we are going to discuss the innovations that have come to light since ~2020.
When treatment timing is staggered (for example: policies are rolled out in different places at different times), and treatment effects are heterogeneous (for example: different across time/units), the DiD coefficient occasionally fails to represents what we think it should represent. In short, this is due to improper comparisons of groups at different points in time.
Newly treated units to never treated units (good!)
Newly treated units to not-yet treated units (good!)
Newly treated units to already treated units (bad!)
This last group, where newly treated units are compared to already treated units, can result in negative estimates of positive treatment effects and vice versa.This issue puzzled social scientists and ubiquitously stalled research. For a few years, economists everywhere were scrambling to figure out what was wrong with two-way fixed effects models, who Andrew Goodman-Bacon was, and how to do his difference-in-differences thing.This was such a FAQ that in late 2019, Goodman-Bacon wrote a paper So You’ve Been Told to Do My Difference-in-Differences Thing: A Guide.
Let’s spend some time motivating and understanding the problem with TWFE before moving onto the shortcutssome solutions to the problem.
TWFE
Consider the following example that we will simulate data for. We have three groups over ten time periods.The first group is never treated throughout the sample.The second and third groups are treated at \(t = 4.5\) and \(t = 7.5\), but we only observe the units at \(t \in \{1, 2, ..., 10\}\).Put more simply: at \(t = 4\), the second group is 100% untreated and at \(t = 5\), the group is 100% treated.The treatment is either going to change the trajectory of growth for each group, or simply be a static level shift, depending on the arguments you choose for the simulation. You should experiment with both, however.
Below are the simulation functions that create the data and then plot the data. I am providing the code so you can see what it looks like under the hood.2
Code
create_data <-function(dynamic =FALSE, rnd =FALSE){set.seed(757) df <-data.frame(id =rep(1:3, times =10),tt =rep(1:10, each =3)) df$col <-ifelse(df$id ==1, "black",ifelse(df$id ==2, "tomato","dodgerblue"))# Setting the binary treatment variable df$D <-ifelse((df$id ==2& df$tt >=5) | (df$id ==3& df$tt >=8), 1, 0) df$btrue <-ifelse(df$D ==1& df$id ==3, 5,ifelse(df$D ==1& df$id ==2, 3, 0))# Decide if treatment is dynamic or static.if(dynamic){ df$rel_time <- df$tt -c(NA, 5, 8) df$rel_time <-ifelse(is.na(df$rel_time), -100, df$rel_time) } else { df$rel_time <-1 }# Setting the "true" treatment effect for each time period# This will be nice for later comparison df$treatment_effect <- df$btrue*df$D*df$rel_time# Create outcome variable df$y <- df$id + df$tt + df$treatment_effect +if(rnd) rnorm(nrow(df), 0, 0.01) else0return(df)}plot_data <-function(df){plot(df$tt, df$y, col = df$col, pch =19,xlab ="Time", ylab ="Y", las =1)for(i in1:3){lines(df$tt[df$id == i], df$y[df$id == i],col = df$col[df$id == i]) }abline(v =c(4.5, 7.5), lty =2)}
Compare the static and dynamic data by switching between TRUE and FALSE below:
Since we simulated these data, and we know the true treatment effect for each observation (via treatment_effect), we should be able to average them across treated units. Of course, since the treatments are inherently different for groups 2 and 3, there are many ways we can summarize these treatment effects.For example, since the two treatment effects are 3 and 5 in the static case, we can use the average (4) for our treatment effect.We could also weigh 3 and 5 by how many time periods they’ve been observed for (6 and 3, respectively). This would give us an average of 3.67.We could also find an average treatment by time-since-treatment like in an event study.
Let’s think about what a regression would give us. Suppose we have a TWFE model of the following form3:
Intuitively, OLS should return an average of all the treatment effects in the data as \(\delta^{DD}\).This would be consistent with a time-weighted treatment effect of 3.67 in static case (or 6.67 in the dynamic case).
Let’s compare our time-weighted treatment effect with what TWFE returns.
Spoiler: our coefficient is (seemingly) too large in the static case and too small in the dynamic case.4
Goodman-Bacon
Wait, so what gives? Well, it’s complicated and we (they: the econometricians) just figured this all out.
As simple as this example is, this was a non-trivial observation to econometricians.Andrew Goodman-Bacon, the most recent scapegoat for grumbling economists, demonstrated in an immediately seminal work that the DiD coefficient in a TWFE framework (\(\delta^{DD}\)) is a (sometimes strangely) weighted average of all possible 2x2 DiD comparisons.
What does “all possible” 2x2 DiD comparisons mean?
In this three-group case with one untreated group, there are four possible 2x2 DiD comparisons.
The first two comparisons are the two treated groups each separately compared to the never-treated group.
The third comparison is the earlier treated group compared to the later treated group while the later treated group remains untreated.
The final comparison is the later treated group compared to the earlier treated group while the earlier treated group is always treated.
It’s easier to visualize this as the following:
Code
df <-create_data(TRUE)par(mfrow =c(2, 2), mar =c(2.1, 2.1, 2.1, 0.6))## Comp. 1 -- Early vs Neverlim <- df$id %in%c(1, 2)r1 <- fixest::feols(y ~ D | id + tt, df[lim,])plot(df$tt[lim], df$y[lim], pch =19,ylim =range(df$y),xlim =range(df$tt),xlab ="", ylab ="", las =1,main ="I. Early vs Never",col = df$col[lim])lines(df$tt[df$id ==1], df$y[df$id ==1], col = df$col[df$id ==1], lty =2)lines(df$tt[df$id ==2], df$y[df$id ==2], col = df$col[df$id ==2])abline(v =c(4.5, 7.5), lty =2)legend("topleft", legend =c("Treatment", "Control"),lty =1:2, bty ="n")## Comp. 2 -- Late vs Neverlim <- df$id %in%c(1, 3)r2 <- fixest::feols(y ~ D | id + tt, df[lim,])plot(df$tt[lim], df$y[lim], pch =19,ylim =range(df$y),xlim =range(df$tt),xlab ="", ylab ="", las =1,main ="II. Late vs Never",col = df$col[lim])lines(df$tt[df$id ==1], df$y[df$id ==1], col = df$col[df$id ==1], lty =2)lines(df$tt[df$id ==3], df$y[df$id ==3], col = df$col[df$id ==3])abline(v =c(4.5, 7.5), lty =2)## Comp. 3 -- Early vs Latelim <- df$id %in%c(2, 3) & df$tt <min(df$tt[df$id ==3& df$D ==1])r3 <- fixest::feols(y ~ D | id + tt, df[lim,])plot(df$tt[lim], df$y[lim], pch =19,ylim =range(df$y),xlim =range(df$tt),xlab ="", ylab ="", las =1,main ="III. Early vs Late",col = df$col[lim])lim2 <- lim & df$id ==2lines(df$tt[lim2], df$y[lim2], col = df$col[lim2])lim2 <- lim & df$id ==3lines(df$tt[lim2], df$y[lim2], col = df$col[lim2], lty =2)abline(v =c(4.5, 7.5), lty =2)## Comp. 4 -- Late vs Earlylim <- df$id %in%c(2, 3) & df$tt >=min(df$tt[df$id ==2& df$D ==1])r4 <- fixest::feols(y ~ D | id + tt, df[lim,])plot(df$tt[lim], df$y[lim], pch =19,ylim =range(df$y),xlim =range(df$tt),xlab ="", ylab ="", las =1,main ="IV. Late vs Early",col = df$col[lim])lim2 <- lim & df$id ==2lines(df$tt[lim2], df$y[lim2], col = df$col[lim2], lty =2)lim2 <- lim & df$id ==3lines(df$tt[lim2], df$y[lim2], col = df$col[lim2])abline(v =c(4.5, 7.5), lty =2)par(mfrow =c(1, 1))
Plot
Each of these 2x2 comparisons will be given positive weights when making up \(\delta^{DD}\). However, we only necessarily want to consider comparisons I-III.Comparison IV is flawed because the “control” group is already tainted by treatment dynamics.
One of the innovations in Goodman-Bacon was the ability to decompose \(\delta^{DD}\) into it’s parts.Using the bacondecomp package, we can apply Goodman-Bacon’s diagnostic methodology to split up \(\delta^{DD}\) into its different 2x2 comparisons and their resulting estimates and weights.
Code
library("bacondecomp")
Loading required package: fixest
Warning: package 'fixest' was built under R version 4.3.3
Code
bacon(formula = y ~ D, data = df,id_var ="id", time_var ="tt",quietly =FALSE) -> our_decomp
Output
type weight avg_est
1 Earlier vs Later Treated 0.18182 3.00000
2 Later vs Earlier Treated 0.13636 -4.00000
3 Treated vs Untreated 0.68182 6.33333
This output summarizes the three possible comparisons:treated vs never-treated (good!), early vs late treated (good!), and late vs early treated (maybe bad?).What should raise an eyebrow, though, is the late vs early comparison group with an average estimate of -4!This is, of course, not representative of what we want OLS to reflect. Looking at the plot of our dynamic simulation data, it’s crystal clear that our treatment effects are positive.5
An aside about bacondecomp::bacon()
If we print the result of our decomposition, we can observe each 2x2. We can also always use this to re-aggregate to the summary bacon() output for us before.
treated untreated estimate weight type
2 5 Inf 7.5 0.3636364 Treated vs Untreated
3 8 Inf 5.0 0.3181818 Treated vs Untreated
6 8 5 -4.0 0.1363636 Later vs Earlier Treated
8 5 8 3.0 0.1818182 Earlier vs Later Treated
########## Summary ##########
type weight avg_est
1 Earlier vs Later Treated 0.1818182 3.000000
2 Later vs Earlier Treated 0.1363636 -4.000000
3 Treated vs Untreated 0.6818182 6.333333
This is no mistake. When I plotted the four comparisons, I also calculated the TWFE regressions for each. Displaying these coefficients, we can see how Goodman-Bacon’s decomposition gets it’s estimates.6
Code
library("modelsummary")
Warning: package 'modelsummary' was built under R version 4.3.3
`modelsummary` 2.0.0 now uses `tinytable` as its default table-drawing
backend. Learn more at: https://vincentarelbundock.github.io/tinytable/
Revert to `kableExtra` for one session:
options(modelsummary_factory_default = 'kableExtra')
options(modelsummary_factory_latex = 'kableExtra')
options(modelsummary_factory_html = 'kableExtra')
Silence this message forever:
config_modelsummary(startup_message = FALSE)
Code
options(modelsummary_factory_default ='kableExtra')options("modelsummary_format_numeric_latex"="plain")modelsummary(list("E vs U"= r1, "L vs U"= r2,"E vs L"= r3, "L vs E"= r4),gof_map ="", estimate ="{estimate}",title ="All 2x2 DiD Coefficients")
All 2x2 DiD Coefficients
E vs U
L vs U
E vs L
L vs E
D
7.500
5.000
3.000
−4.000
(0.000)
(0.000)
(0.000)
(0.000)
With this very simple simulation, we’ve found a case where all treatment effects are positive (visually), but part of the decomposed \(\delta^{DD}\) is negative.7In fact, it’s possible for this comparison to turn the entire treatment effect negative. See below for an example.8
##### Goodman-Bacon Decomp #####
type weight avg_est
1 Earlier vs Later Treated 0.5 51.80009
2 Later vs Earlier Treated 0.5 -65.18046
##### TWFE Results #####
OLS estimation, Dep. Var.: y
Observations: 30,000
Fixed-effects: id: 1,000, year: 30
Standard-errors: Clustered (id)
Estimate Std. Error t value Pr(>|t|)
treat -6.69019 0.670213 -9.98218 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 29.1 Adj. R2: 0.803436
Within R2: 0.003954
Plot
As a note: unfortunately, the weights are not computed in any intuitive way. According to Andrew Baker9, “we can think of the TWFE estimator as a weighted average of individual 2x2 DiD estimators, with the weights proportional to group sizes and the variance of the treatment indicators in each pair, which is highest for units treated in the middle of the panel.” Therefore, \(\delta^{DD}\) is sensitive to both panel length and which groups appear towards the center. These are not necessarily things we want to influence our estimates.If you remain interested in how weights are derived, I encourage you to read Goodman-Bacon (2019).
Solutions
So, now what?
There have been many estimators proposed that solve this problem. In a blog post, Scott Cunningham notes three categories of these estimators.While this is a helpful dichotomy to have, these estimators are all quite similar in reality. Of course, each has its own strengths and weaknesses, so it’s important to be familiar with more than just one, but the main thread through all of them is to fix the issues associated with wonky comparisons.
Some of the proposed estimators, just to name a few, are as follows:
We do not have enough time to cover each method, so instead we are going to focus on two: Gardner (2022) and Dube et al. (2023)
Gardner (2022) proposes something called “Two-Stage Difference-in-Differences”, while Dube et al. (2023) propose a method they call a “Local Projections” approach to Difference-in-Differences.While these two are less known relative to, say, Callaway & Sant’Anna (2021), I think their methodologies are more intuitive.
Two-Stage DiD
The first solution we will discuss is, in my opinion, the most intuitive. This is one of those papers that make you think, “I wish I thought of that!”Gardner (2022) takes a clever approach towards estimating the TWFE model.The idea starts like this:
If the parallel trends assumption holds (is true), then the potential outcome for \(Y_{it}^0\) is simply a sum of unit \(i\)’s unit-specific effect and time period \(t\)’s common time-specific effect.10Perhaps we should re-frame the problem and switch our priority to accurately measure these parameters, since they provide information about the unobservable counterfactual.
Gardner goes through quite a bit of math to effectively (albeit more elegantly) say the following: since we only have a single, static \(\delta^{DD}\) in our equation, any heterogeneity in treatment effects gets absorbed by the fixed effects \(\alpha_i\) and \(\lambda_t\).Informally, OLS is trying to fit a square peg in a round hole. Forcing the peg and biasedly estimating \(\alpha_i\) and \(\lambda_t\) in turn biases \(\widehat{\delta^{DD}}\)!
The solution Gardner proposes is rather simple: if estimating \(\alpha_i\) and \(\lambda_t\) causes us problems, let’s remove them.Wait…how can we just remove them?!If we can somehow get unbiased estimates of our fixed effects, we can simply subtract them from our outcome before estimating \(\delta^{DD}\)!
Before moving on, let’s think about what fixed effects are. Suppose the data we have contain Virginia from 2010 to 2019. A Virginia-fixed-effect is a time-invariant effect.So, if we estimate a Virginia-fixed-effect, it shouldn’t matter whether we use data from 2010-2014, 2015-2019, strictly odd years, or any other combination of years. In theory, all estimations should yield the same (or similar) result.The same logic holds for time-period-fixed-effects: it shouldn’t matter which units you use to estimate them since the effect is invariant across units.
Since we know treatment is messing up our fixed effects, we can estimate \(\alpha_i\) and \(\lambda_t\) using only data from untreated time periods. We will call this our first-stage.The resulting fixed effects from this estimating will necessarily be unbiased (assuming parallel trends) since there is no treatment involved.Then, in a second stage, we can fit the following model using the entire dataset:
Let’s take a look at the mechanics of this. We are going to start with the first stage, and regress our outcome variable on only fixed effects only using observations from untreated units.
Finally, estimate the second-stage regression without fixed effects.
Code
lm(y_2sdid ~ D, df)
Output
Call:
lm(formula = y_2sdid ~ D, data = df)
Coefficients:
(Intercept) D
-3.243e-16 6.667e+00
Notice how this is exactly equivalent to our average treatment effect of 6.67 for the dynamic group! The only issue with doing this two-step procedure is that the standard errors can get weird. This is true for all two-stage procedures.There’s an R package called did2s that can help implement two-stage DiD for you. I highly recommend this package.
In addition to estimating \(\delta^{DD}\), we can also estimate an event study using 2SDiD. To do this, simply replace the treatment variable with the fixest notation for an event study. See Module 3.1 for this information.
As a final note, keep in mind that in order to estimate the unit and/or time fixed effect, you need to observe some untreated units in both. What I mean is that there cannot be any units that are always-treated and there cannot be any times that only contain treated units.Their fixed effects will, of course, not be estimated in the first stage, so they automatically get removed from the second stage.
Stacked Difference-in-Differences
The next approach we’ll discuss is stacked difference-in-differences. There is not a specifically associated paper with this estimator, but the intuition has been used in multiple studies. One of the most famous studies that uses a variation of this method is Cengiz et at. (2019).
First, let’s define some terms:
Event: an event is going to be any time some policy is changed or some shock is experienced within a unit (e.g. state, etc.) For example, New Jersey changing its minimum wage would constitute an event.
Sub-experiment: a sub-experiment is a collection of treatment and clean control groups. We’ll define “clean” in a moment. An event might be New Jersey changing its minimum wage. However, if New Jersey and Michigan change their minimum wages at the same time, then their events, in addition to their controls, would constitute a single sub-experiment. The way to refer to New Jersey’s and Michigan’s events would be a cohort.
Clean control: a clean control group is made up of units that do not experience any treatment, or residual treatment dynamics, during a sub-experiment’s treatment periods.
Let’s start with a fresh df so we can experiment with stacking.
Code
df <-create_data(TRUE)# Since this dataset is so small, I need to increase# the number of observations for this demo.# You would **not** do this in real life. df <-rbind(df, df)df <- df[order(df$id, df$tt),]
Since there are two treated units in this example, we would have two sub-experiments (one for each event). To focus in on the treatment, we can create a window of \(\pm2\) time period relative to treatment for each sub-experiment.
Now that we have our two separately defined sub-experiments, with clean control units, we can create variables to enumerate each experiment and denote relative time within each experiment.Finally, we’ll stack the two subsets of data on top of each other in relative time. By “stack”, I mean that we will literally be vertically concatenating the data.In R, this is done via rbind().
Code
# Enumerate the sub-experimentdf1$experiment <-1df2$experiment <-2# Add in relative timedf1$rel_time <- df1$tt -5df2$rel_time <- df2$tt -8# Stack the data with rbind()stacks <-rbind(df1, df2)# Creat a treatment variablestacks$ever_treat <-ave(stacks$D, paste0(stacks$id, stacks$experiment), FUN = max)
Using these stacked data, we can now estimate a simple 2x2 TWFE specification. Note: the fixed effects should be interacted with the sub-experiment identifier. With fixest, we can do this by using FE1^subexperiment.
Code
feols(y ~ D | id^experiment + tt^experiment, stacks)
Output
OLS estimation, Dep. Var.: y
Observations: 50
Fixed-effects: id^experiment: 5, tt^experiment: 10
Standard-errors: Clustered (id^experiment)
Estimate Std. Error t value Pr(>|t|)
D 3.85714 0.613813 6.28391 0.0032753 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 1.23889 Adj. R2: 0.870041
Within R2: 0.351834
As I see it, there are two main advantages of stacking. First, after stacking, we’re back to the familiar 2x2 case that we’re already comfortable with.Moreover, and this is the second benefit, we can also estimate \(\delta^{DD}\) for each sub-experiment and summarize/combine the results however we’d like.See below for an example of how to do this with fixest.
Code
# Estimate an event study by sub-experiment using splitfeols(y ~i(rel_time, ever_treat, -1)| id^experiment + tt^experiment, stacks,split =~experiment) -> r2iplot(r2, grid = F)# Estimate the aggregate event studyfeols(y ~i(rel_time, ever_treat, -1)| id^experiment + tt^experiment, stacks) -> r1iplot(r1, add =TRUE, col ="dodgerblue")
Plot
While this is seems easy enough with just three groups, the complexity increases when the number of sub-experiments and/or the size of the data becomes large. This becomes even more difficult when treatment is non-absorbing. Treatments that are absorbing forever alter the trajectory of the unit.It is the opposite for non-absorbing treatments, which “wears off”. In other words, after some time, the effect of the treatment levels off, a new equilibrium is reached, and the unit effectively returns back to being untreated.The type of treatment, absorbing or non-absorbing, does not imply the magnitude of the effect, but rather whether the treated units can eventually return to the control group after some time. This is an assumption that necessarily depends on the institutional setting.
An example of non-absorbing treatment would be a pipe in my apartment bursting. I would likely have to move out for a few days or so while the pipe is fixed, but eventually I would be able to return to my normal life. An example of absorbing treatment would be my landlord evicting me from my apartment. I would need to find a new place to live and (not to be dramatic) my life would be changed forever (in a causal inference sense).
More concrete examples would be minimum wage laws compared to texting-and-driving laws.Minimum wages are increased every few years to account for inflation, which necessarily makes the changes non-absorbing.On the other hand, once texting-and-driving has been made illegal it will always be illegal.
Unfortunately, there are not any (well developed) R packages that address stacked difference-in-differences. However, that last sentence may be out-dated by the time you read it with how quickly theory and software are developing around this topic. Again, at the time of writing, I am working on developing a package to do this.
Local Projections DiD
While there are, at the time of writing, no packages that handle stacked DiD in the way I have outlined, there is a method that is tangential – Dube et al. (2023). The general idea of this estimator is as follows. Let’s consider a simple 2x2 TWFE model. The equation would, of course, look like the following:
Instead of estimating this equation, we can first do some algebra. We are going to subtract \(Y_{i,t}\) and \(Y_{i,t-1}\). This will yield the following:
Estimating \(\delta^{DD}\) in this equation will result in a numerically equivalent parameter as a simple TWFE model if there exists one pre-period and one post-period. In fact, this is the way Card and Krueger (1994) set up their data.
What if we have more pre- and post-periods than just one on both sides of treatment? If we want to include a total of \(H\) additional post-periods, we can adjust this formula to look like the following:
\[Y_{i,t+h} - Y_{i,t-1} = \delta_h^{DD}\Delta D_{it} + \tau_{t}^h + \epsilon_{it}^h\] where \(h \in \{0, 1, ..., H\}\). Of course, this also will work for \(h \in \{-H, ..., -3, -2\}\). We are unable to estimate a value for \(h = -1\) because this is our baseline, and all values would be \(0\).
So, what is this equation saying? \(\delta_{h}^{DD}\) is akin to an event-study estimate of the treatment at time \(t + h\) while \(\tau_{t}^h\) controls for the effect of treatment going into place at time \(t\).However, to avoid the wonky comparison problem, we are going to restrict our estimation sample for each \(h\).We are only going to use observations that meet one of the following criteria:
Newly treated: \(\Delta D_{it} = 1\) (treated units at time \(t\))
Clean control: \(D_{i,t+h} = 0\) (control units that have not been treated at \(t+h\))
This is a bit math-y, so let’s take a look at how to manually estimate this. Warning: estimation is, in fact, a non-trivial procedure. Moreover, even after we estimate the model, summarizing/plotting the results can become even more tedious.
Code
library("plm") # Use this packages for leads and lags.# We need to set rnd = TRUE to introduce some randomness.# This is because of the small sample.df <-create_data(TRUE, TRUE)df <-pdata.frame(df, index =c("id", "tt"))window <--2:2# Drop -1 because this is going to be mechanically set# to zero like in an event study.window <- window[window !=-1]coefz <-list()for(time in window){# Figure out treated obs. treat <-which(df$D -lag(df$D) ==1)# Figure out control obs. ctrl <-which(lag(df$D, -time) ==0) lim <-unique(c(treat, ctrl))# Create outcome and treatment. df$y_lp <-lag(df$y, -time) -lag(df$y) df$D_lp <- df$D -lag(df$D) r <-feols(y_lp ~ D_lp | tt, df[lim,])$coeftable r$h <- time coefz[[length(coefz) +1]] <- r}coefz <-do.call(rbind, coefz)coefz
Again, while there is no stacking package, there is an LPDiD package available called lpdid. Unfortunately, this cannot be installed by install.packages() because the developer is lazy and did not want to be bothered with putting it on CRAN.11 To install this package, you will need to run the following code:
Find an equally weighted treatment effect (rather than perhaps odd OLS weights) using reweight = TRUE
Remove units that are treated soon after from the control group using composition_correction = TRUE
Estimate non-absorbing treatments using nonaborbing = TRUE, setting nonabsorbing_lag (tells how long after treatment the unit can be considered control again) and nonabsorbing_treat_status.
Wait a minute – this package does exactly what a stacking package would do and more! Why can’t we just always use LPDiD?Unfortunately, lpdid, as written, can only handle panel data and not pooled cross sections.For example, this package works for data that contains one observation per state-year. However, it does not work if there are many observations per state-year, such as survey data. This is because of the first-differencing being ambiguous in a pooled cross section environment.
Comparisons
Finally, let’s revisit that dataset where all the treatment effects were clearly positive but the TWFE estimate turned out negative. We can compare the results the 2SDiD, LPDiD and TWFE. We did not discuss it, but Sun and Abraham (2021) is also very easy to implement in fixest using the sunab function. This might also be something you’re interested in toying with once you’ve read up on it. First, let’s try all of these with a pooled effect.
Note: it’s not the case that static \(\rightarrow\) too small and dynamic \(\rightarrow\) too large. This is just an artifact of our simulated data. My point is that the coefficient estimate can be either too large or too small and it’s not necessarily clear which will happen.↩︎
Note: in real life, things are hardly ever as clear!↩︎
This is not to say that some parts cannot be negative. In real life, this will likely be the case. However, in this illustration, it’s clear that the negative is unintended.↩︎
Note: in these data, there is no never-treated group. Therefore, the decomposition only contains two groups: early vs late and late vs early.↩︎
This is an excellent blog post, and would be worth a read.↩︎
I am using \(Y^0\) to denote \(Y\) as untreated.↩︎
Other developers who host packages via GitHub are not necessarily lazy, but this one is.↩︎