Example with R: Hedonics

Calculating a linear hedonic price index is very easy in R.

# Bring in some data
df <- read.csv("csv/data.csv")
df

##    period price chicken liver salmon
## 1       0     2       1     0      0
## 2       0     4       1     0      0
## 3       0     3       1     0      0
## 4       0     4       1     0      0
## 5       0     5       1     0      0
## 6       0     1       0     1      0
## 7       0     3       0     1      0
## 8       0     2       0     1      0
## 9       0     1       0     1      0
## 10      0     1       0     1      0
## 11      0     5       0     0      1
## 12      0     7       0     0      1
## 13      0     8       0     0      1
## 14      0     4       0     0      1
## 15      1     5       1     0      0
## 16      1     3       1     0      0
## 17      1     2       0     1      0
## 18      1     2       0     1      0
## 19      1     1       0     1      0
## 20      1     3       0     1      0
## 21      1     1       0     1      0
## 22      1     1       0     1      0
## 23      1     9       0     0      1
## 24      1     8       0     0      1
## 25      1     5       0     0      1

This is a simple data set for cat food, with three characteristics—is it chicken cat food, liver cat food, or salmon cat food? To evaluate the impact of a hedonic price index, it is useful to calculate a simple transaction-price index by pooling transactions for all three types of cat food together.

# Calculate pooled price index
pooled <- lm(log(price) ~ period, df)
exp(coef(pooled)[2]) * 100

##   period 
## 93.86748

The general linear hedonic index is easy to calculate with a simple linear regression. The only trick is remembering to remove the average of each characteristic in the interaction term.

# Calculate hedonic imputation index
hedonic_imputation <- lm(log(price) ~ period + liver + salmon + 
                           period:(I(liver - mean(liver)) + I(salmon - mean(salmon))), 
                         df)
exp(coef(hedonic_imputation)[2]) * 100

##   period 
## 112.2817

This is the same as the two-step calculation that is normally done for a hedonic imputation index.

# Period-0 regression
hi0 <- lm(log(price) ~ liver + salmon, df, subset = period == 0)

# Period-1 regression
hi1 <- lm(log(price) ~ liver + salmon, df, subset = period == 1)

# Calculate index using predicted prices
exp(mean(predict(hi1, df) - predict(hi0, df))) * 100

## [1] 112.2817

It is also the same as manually calculating the stratified index.

# Bring in gpindex library
library(gpindex)

# Weight cat food by its frequency
weights <- colSums(df[-(1:2)])

# Calculate an index for each type of cat food
strata_indices <- 
  sapply(list(subset(df, chicken == 1), 
              subset(df, liver == 1), 
              subset(df, salmon == 1)),
         function(x) with(x, geometric_mean(price[period == 1]) / geometric_mean(price[period == 0]))
         )

# Aggregate
geometric_mean(strata_indices, weights) * 100

## [1] 112.2817

The results are very similar if a time-dummy model is used instead.

# Time-dummy index
time_dummy <- lm(log(price) ~ period + liver + salmon, df)
exp(coef(time_dummy)[2]) * 100

##   period 
## 112.2246

Adding weights for the transactions in each stratum can easily be done with a weighted regression.

# Make some weights
df$weights <- 1:25 / sum(1:25)

# Calculate weighted hedonic imputation index
hedonic_imputation <- lm(log(price) ~ period + liver + salmon + 
                           period:(I(liver - weighted.mean(liver, weights)) + 
                                   I(salmon - weighted.mean(salmon, weights))), 
                         df, weights = weights)
exp(coef(hedonic_imputation)[2]) * 100

##   period 
## 111.1855

This gives the same answer as the two-step calculation.

# Period-0 regression
hi0 <- lm(log(price) ~ liver + salmon, df, subset = period == 0, weights = weights)

# Period-1 regression
hi1 <- lm(log(price) ~ liver + salmon, df, subset = period == 1, weights = weights)

# Calculate index using weighted predicted prices
exp(weighted.mean(predict(hi1, df) - predict(hi0, df), df$weights)) * 100

## [1] 111.1855