While we love pretending to be guidance counselors, we really love pretending to be real estate agents. For (maybe?) the last time, let’s read in our ames housing data (data; documentation). We are going to keep all of the variables we did last time plus one more: BsmtFin.Type.1.
Definition/Details of BsmtFin.Type.1
BsmtFin Type 1 (Ordinal): Rating of basement finished area
GLQ Good Living Quarters
ALQ Average Living Quarters
BLQ Below Average Living Quarters
Rec Average Rec Room
LwQ Low Quality
Unf Unfinshed
NA No Basement
Let’s take a peak at the bsmnt variable to see what all it contains. To do this, we are going to use table(), but add in useNA = "always" just in case there are any missing data elements in the column.
According to this tabulation, there appears to be 80 observations with no input, and 0 observations with a blank input. According to this resource, NA values (here, they are blank) indicate no basement. Therefore, we can convert these blank values to say "No Basement".
Code
ames$bsmnt <-ifelse(ames$bsmnt =="", "No Basement", ames$bsmnt)
Housing Price Models
If we were real estate agents, it would be really nice to have some way to quantify this basement variable so we could use it in our model. To start, let’s just create a dummy variable equal to 1 if the property has a basement and a zero otherwise.
Similar to how we controlled for gender differences, which is a categorical variable, we can include this variable into our housing price model. For simplicity, we are going to just use square footage as our other explanatory variable. Students are encouraged to explore the inclusion of other variables by themselves.
Housing Price Models
Estimating our model:
This model suggests that a 1% increase in square footage increases expected sale price by 0.89%, controlling for whether the property has a basement. Second, this model says that having a basement (controlling for square footage) increases price by 38.8% relative to no basement.
Housing Price Models
Let’s think about this for a second. A home that is 2,000 square feet including a basement is worth more than a 2,000 square foot home without a basement? Put differently, a 1,600 square foot home with a 400 square foot basement is worth more than a 2,000 square foot home?If this sounds fishy to you, check out the definition of the area column.
It seems that area only measures the non-basement square footage. Therefore, this coefficient represents a composite effect of both adding a basement as well as adding the square footage that comes with the basement.In other words, this regression is really saying that a 2,000 square foot home with a 400 square foot basement (for a total of 2,400 ft2) is worth more than a 2,000 square foot home without a basement.This should sound awfully similar to the same omitted variable bias we had in the bedroom situation! However, let’s turn a blind eye to this for now, and address it on the homework.
Housing Price Models
Let’s return to the original model that said adding a basement increases prices by 38.8%. This is helpful, but of course there are many different “ratings” of basement quality reported in the bsmnt variable that we are ignoring. The lowest quality is “Unf”, or unfinished. Let’s make a dummy variable for this type of basement.
We find that having a basement, relative to no basement, increases the expected price by 43.6%. Now, what if that basement was unfinished? The coefficient suggests -16.8%, but this is not the end of the story. Notice that has_bsmnt is always equal to one whenever unfinished is equal to one. Therefore, the overall effect is 43.6% - 16.8% = 26.8%. In other words, an unfinished basement is worth more than no basement, but a finished basement is worth more than an unfinished basement (\(0 < 26.8 < 43.6\)).
Housing Price Models
This interpretation is a bit clunky because it requires us to sum two coefficients to get our total estimate. To make the interpretation easier, we can redefine has_bsmnt to exclude unfinished basements:
This interpretation is much easier. An unfinished basement, relative to no basement, is worth a 26.8% increase. A non-unfinished basement, relative to no basement, is worth a 43.6% increase.If we wanted to estimate the effect of some other basement quality level, we would have to again redefine has_bsmnt to exclude that basement type, and create a new dummy variable for it.If we take this logic to the finish line, we’d end up \(K-1\) dummy variables where \(K\) is the number of possible categories. We subtract one because one group must be the reference group (no basement in this case).
Housing Price Models
I’m sure you’ve heard this one before: work smarter, not harder. In programming, they say: A good programmer is a lazy programmer. While one of these is earnest and the other sarcastic, they mean similar things. In our case, making a bunch of dummy variables can be a lot of work! Luckily, R has a lazyefficient solution for us.
To create many dummies at once, we can use the factor() command. Moreover, we can even select which factor becomes our reference group! To do this in R, we would do the following:
Code
# I am putting a "_" to differentiate between the original variable.ames$bsmnt_ <-as.factor(ames$bsmnt)head(ames$bsmnt_)cat("\n")ames$bsmnt_ <-relevel(ames$bsmnt_, ref ="No Basement")head(ames$bsmnt_)
Then, if we throw this into a regression, R will create all of the dummies for us. I will also include bedrooms and age into this model for completeness.
The model spits out a bunch of numbers for bsmnt_, but it’s important to remember that they are all interpreted as having that “type” of basement relative to not having a basement, controlling for the property’s age, size and number of bedrooms. Moreover, we can plot the coefficients and show that they’re generally increasing in the order that we’d expect.
Plot
Basement Rating Reminder
BsmtFin Type 1 (Ordinal): Rating of basement finished area
GLQ Good Living Quarters
ALQ Average Living Quarters
BLQ Below Average Living Quarters
Rec Average Rec Room
LwQ Low Quality
Unf Unfinshed
NA No Basement
It is important to point out the assumptions of this model. We are assuming that changing the square footage (or age or bedrooms) of a home impacts sale price independently of the type of basement. Of course, we could estimate a fully interacted model, but the output would be huge and uninterpretable.