Conditional independence
The setup for the stratified index closely follows the setup for the general constant-quality index in the previous section, except that the characteristics for a good need to be explicitly modeled. To do so, let \(X\) be a (random) vector of observable characteristics upon which goods can be stratified. In the context of housing, for example, \(X\) may include square footage, number of bedrooms, and the age of the house. Each stratum corresponds to a different realization of \(x\) for \(X\) (e.g., houses with 1500-2500 square feet, 3 bedrooms, and that are 10-25 years old).
The geometric transaction-price index between period 0 and period 1 for stratum \(x\) compares average transaction prices over time for the goods in stratum \(x\), and is given by \[\begin{align*} \log(I^{T}_{x}) = E(\rho | X = x, t = 1) - E(\rho | X = x, t = 0). \end{align*}\] If potential price is independent of time, conditional on the characteristics used to stratify goods, then the sub-indices for each stratum, computed with transaction prices, have a constant quality interpretation. Formally, this is the conditional independence assumption \(\{p(1), p(0)\} \perp t | X\), the within-stratum analogue of the independence assumption from the previous section. For each stratum, whether a good sells in period 0 or period 1 is essentially due to chance, and so there are no systematic differences between goods that sell in period 0 and period 1 that influence price within a stratum. Put differently, the only reason that potential prices could change over time is due to a change in the composition of the characteristics \(X\)—once these characteristics are held fixed, any change in observed prices over time must be a pure price change.
Formally, with conditional independence, it must be that \(E(\rho(t) | X, t) = E(\rho(t) | X)\) for \(t = 0,1\), and therefore \[\begin{align*} \log(I^{T}_{x}) &= E(\rho(1) | X = x, t = 1) - E(\rho(0) | X = x, t = 0) \\ &= E(\rho(1) | X = x) - E(\rho(0) | X = x). \end{align*}\] Conditional independence gives a constant-quality index for each stratum that coincides with the transaction-price index for that stratum.
In the extreme case when goods are grouped into pairs across time, so that each stratum contains two goods, the stratified index is just a pure matched-model index. To see this, enumerate the population of pairs by \(i = 1,\ldots, n_{p}\) so that \[\begin{align*} I^{Q} = \prod_{i = 1}^{n_{p}} \left(\frac{p_{i1}}{p_{i0}}\right)^{\omega_{i}}, \end{align*}\] where \(P(X = x) = \omega_{i}\), and \(p_{it}\) is the price of the good in pair \(i\) that sells in period \(t = 0,1\). Conditional independence holds for a matched-model index if the goods in each pair are sufficiently similar across time so that the transaction price for the good that sells in period 0 gives a good baseline for what the good that sells in period 1 would have sold for in period 0. This also shows that the standard index-number formulas are special cases of the stratified index, and implicitly make an assumption of conditional independence.
It is worth emphasizing that conditional independence is not a testable assumption. It is inherently an economic assumption about how goods sells over time. But it is what allows a constant-quality index to be identified from a transaction-price index.