Linear hedonic indices

The canonical form for the hedonic model \(h\) is a linear function of a good’s characteristics, so that \(h(X, 0) = \alpha_{0} + X\beta_{0}\) and \(h(X, 1) = \alpha_{1} + X\beta_{1}\), and thus \[\begin{align*} E(\rho | X, t) = \alpha_{0} + t(\alpha_{1} - \alpha_{0}) + X\beta_{0} + t \cdot X (\beta_{1} - \beta_{0}). \end{align*}\] With a linear form for the hedonic model, transaction prices are expressed as a linear regression on the characteristics of a good—the parameters in a hedonic model are just population regression coefficients. The parameter vector \(\beta_{t}\) is usually interpreted as a vector of implicit prices for the non-market characteristics of a good. This interpretation stems from a structural view of a hedonic model—the goal here is simply to specific a parametric model for the conditional price function, \(E(\rho | X, t)\), so \(\beta_{t}\) need not be given any special interpretation.36

With a linear hedonic model, and conditional independence between potential prices and time, \[\begin{align*} \log(I^{Q}) = E(h(X, 1) - h(X, 0)) = \alpha_{1} - \alpha_{0} + E(X)(\beta_{1} - \beta_{0}). \end{align*}\] The effect of specifying a model for \(E(\rho | X, t)\) is that a constant-quality price index now has a simple parametric form that can be used to relax the overlap condition.37 Nothing precludes using a more complex non-linear model, but in application the hedonic model is usually linear.38

This type of hedonic index is sometimes called the hedonic imputation model, because it is the geometric index formed by taking the ratio of the average predicted prices from the linear regressions of (log) price on characteristics at two points in time. As mentioned in the previous section, however, all hedonic price indices have this form, whether average price is modeled as a linear function of characteristics or not. Consequently, the label “hedonic imputation” is not very useful and leads to a variety of “different” hedonic price indices that are really all the same thing.

One advantage of specifying a linear model for \(E(\rho | X, t)\) is that the resulting index has a very intuitive form:39 \[\begin{align*} \log(I^{Q}) =& \underbrace{E(\rho | t = 1) - E(\rho | t = 0)}_{\text{transaction-price index}}\\ &+ \underbrace{[E(X | t = 0) - E(X | t = 1)][\beta_{0} P(t = 1) + \beta_{1} P(t = 0)]}_{\text{correction term}}. \end{align*}\] Provided that conditional independence holds, and that average transaction price is a linear function of the characteristics of the goods sold, the constant-quality index can be decomposed into a geometric transaction-price index and a correction term that captures the change in the composition of product characteristics over time. The correction term takes the average change in characteristics over time, and uses this to adjust the transaction-price index—the term \(\beta_{0} P(t = 1) + \beta_{1} P(t = 0)\) governs whether increasing the presence of a characteristic on average increases prices or not. For example, if products in period 1 have, on average, more of characteristics that are positively associated with price, then the correction term will exert a negative influence on the transaction-price index to arrive at a constant-quality index. This is because the transaction-price index will show an increase in price over time in part because the goods selling in period 1 are of higher quality than those selling in period 0, and would have sold for more in period 0 than the goods that actually sold in period 0. Consequently, the transaction-price index overstates the pure change in price, hence the negative correction term.

Calculating a linear hedonic price index is easy, as \[\begin{align*} E(\rho | X, t) &= \alpha_{0} + t (\alpha_{1} - \alpha_{0}) + X \beta_{0} + t \cdot X (\beta_{1} - \beta_{0}) \\ &= \alpha_{0} + t (\alpha_{1} - \alpha_{0} + E(X)(\beta_{1} - \beta_{0})) + X \beta_{0} + t (X - E(X)) (\beta_{1} - \beta_{0}) \\ &= \alpha_{0} + t \log(I^{Q}) + X \beta_{0} + t (X - E(X)) (\beta_{1} - \beta_{0}). \end{align*}\] The hedonic index is then just the coefficient on a time-dummy variable in a linear regression, and is therefore extremely easy to calculate.

It is worth concluding this section by noting that in almost all cases it is an assumption that average transaction price is a linear function of characteristics. There is, however, one interesting case where this assumption is always correct. If goods are partitioned by their characteristics, so that each combination of \(x\) for \(X\) belongs to its own group and gets its own parameter in the hedonic model at each point in time, then \(E(\rho | X, t)\) is necessarily linear. In this case the hedonic price index is simply a stratified price index. In this way the linear hedonic price index is a more general type of price index than the stratified index, and finds its value when the stratified index cannot be calculated because of a failure of overlap.


  1. There are some theoretical issues with specifying a linear hedonic model—see Hausman (2003) for an example when a CPI is supposed to measure changes in the cost of living, and Rosen (1974), which is usually taken as giving the conceptual foundation for the hedonic approach.↩︎

  2. Overlap is not gone entirely—goods still need to sell in both periods and, at each point in time, none of the characteristics in \(X\) can be a linear combination of each other—but is not as onerous as with a stratified index.↩︎

  3. It is easy to show that a set of linear regression coefficients for the regression of \(X\) on \(y\) solves the following problem: \(\min_{b \in \mathbb{R}^{k}} E[(E(y | X) - Xb)^{2}]\). That is, a linear regression gives the minimum mean-square error linear approximation to the conditional expectation function. Hence, even if average price is a non-linear function of characteristics, the linear hedonic model may still give a good approximation.↩︎

  4. To show this directly, note that \(E(\rho | t) = E(E(\rho | X, t)) = E(\alpha_{t} + X\beta_{t} | t) = \alpha_{t} + E(X | t) \beta_{t}\).↩︎