Download An Introduction to Statistics used in Real Property Assessment

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Categorical variable wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
AN INTRODUCTION TO
STATISTICS USED IN
MASS APPRAISAL
16
This chapter is a summary of the statistics that are used in real property appraisal and assessment, in particular
those necessary for creating and testing mass appraisal models. This chapter has been prepared to give you a
basic overview of the statistics that will be used in the Statistical and Computer Applications in Valuation course
BUSI 344. It is hoped that when you come across these statistics in this more advanced course, this chapter will
help you to understand what the statistics mean and how they are interpreted.
This chapter is organized into two main sections. The first is an overview of multiple regression analysis,
focussing on the statistics used in creating and evaluating regression results. The second section describes how
ratio studies are used to evaluate regression results in an assessment context.
Multiple Regression Analysis
Introduction
Multiple regression analysis (MRA) is a statistical technique for estimating unknown data on the basis of known
and available data. In mass appraisal, the unknown data are market values. The known and available data are
sales prices and property characteristics.
MRA models can be additive, multiplicative, or hybrid. Additive models are the simplest and most common.
The general structure of an additive MRA model is:
S ' bo % b1X1 % b2X2 % ... % bpXp
(Equation 16.1)
where
S is estimated sale price (dependent variable);
X1, X2, ..., Xp are the independent variables;
b1, b2, ..., bp are coefficients or prices per unit assigned by the algorithm to the independent variables;
and
b0 is a constant determined by the algorithm.
(This general model can be used for any dependent variable Y. S is used throughout this discussion because it
is the dependent variable of interest to property appraisers.)
As a simplified illustration, consider the equation:
S ' $7,800 % $32.10 X1 & $746 X2
(Equation 16.2)
where
X1 is square feet of living area; and
X2 is effective age.
In this case, b0 is $7,800, b1 is $32.10, and b2 is -$746. For a house with 2,000 square feet and an effective age
of 5 years, the predicted value is:
16.1
Chapter 16
S ' $7,800 % $32.10(2,000) & $746
(Equation 16.3)
= $7,800 + $64,200 - $3,730
= $68,270
A more realistic example would contain additional independent variables. In any case, the coefficients calculated
for the variables are derived from analysis of sale prices and state each variable's contribution (or influence on)
price.
Theory and Method
The objective of MRA, as applied to mass appraisal, is to model the relationship between property characteristics
and sale price, so that the sale price can be estimated from the property characteristics. For example, the
relationship between housing size and sale price can be investigated from data on square feet of living area and
sale price. Table 1 shows such data for twenty-five recently sold single-family residences. These data are
graphed and a straight line fitted to the points with a ruler, as was done to generate line AAN in Figure 1. (For
ease of illustration, the horizontal line is truncated at 1,000 square feet, because all twenty-five properties sold
were larger than 1,000 square feet.) The sale price of an unsold property can be estimated by noting its square
footage and reading the corresponding estimated sale price from the line. For example, to estimate the sale price
of an unsold house with 1,500 square feet, draw a vertical line upward from 1,500 square feet to line AAN.
Then draw a second line horizontally from line AAN to the vertical axis. This process is illustrated by the dashed
lines in Figure 1. The estimated sale price of the house is approximately $84,000.
Table 1
Regression Data
Sale number
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
16.2
Square feet
1,050
1,090
1,150
1,220
1,250
1,275
1,300
1,300
1,340
1,360
1,400
1,425
1,480
1,520
1,590
1,610
1,650
1,680
1,700
1,750
1,790
1,830
1,880
1,980
2,140
Sale price
$ 59,000
66,900
68,000
75,000
59,800
74,500
79,000
75,900
67,000
80,000
86,000
87,000
73,900
84,000
74,000
79,600
93,500
102,000
94,900
98,200
100,500
118,000
82,000
93,000
124,500
An Introduction to Statistics Used in Mass Appraisal
Figure 1
Plot of Regression Data
Regression analysis, a more scientific, objective, and efficient method of fitting the line, uses the principle that
a straight line can be determined by one point and the slope. In fact, the regression equation for one independent
variable:
S ' b0 % b1X1
(Equation 16.4)
is simply the equation of a straight line, where b1 is the slope and b0 the point at which the line intersects the
vertical axis. The slope of line AAN in Figure 1 thus corresponds to b1. The major difference is that the slope
of AAN was "eyeballed," whereas b1 is calculated.
Consider point B in Figure 1, which corresponds to sale 23 in Table 1. The property has 1,880 square feet and
sold for $82,000. Based on line AAN, the estimated sale price is approximately $100,000. In statistical terms,
this difference is called the amount of error (ei) in the estimate, although calling this an error is misleading in
that nothing has been done wrong! Regression analysis calculates b0 and b1 in a manner that minimizes the sum
of squared differences between actual and predicted prices; that is, MRA minimizes:
j ei
2
' j (Si & Ŝi )2
(Equation 16.5)
where Si is the actual sale price of property i and Sˆi is the estimated price of property i.
16.3
Chapter 16
In the present example, regression of sales prices on square feet produces the equation:
Si ' $11,493 % $47.90Xi
(Equation 16.6)
where Xi is square feet of living area. That is, on average, sales prices increase at a rate of $47.90 per square
foot of living area. The "constant," $11,493, is the value at which the regression line intersects the vertical axis
when X = 0 (not shown in Figure 1). This equation minimizes the sum of the squared differences between Si
and Sˆi ; any other equation would produce a larger sum of squared differences. In the example of an unsold
house of 1,500 square feet, the regression estimated sale price is:
Si = $11,493 + $47.90(1,500) = $83,343
which agrees closely with the previous "eyeball" estimate.
How well does this equation estimate sale prices? Consider Figure 2, which illustrates a regression of sales
prices on square feet of living area for three different neighbourhoods (in each case, models are developed from
sales in only that neighbourhood). In plots for Neighbourhoods A and B, the regression line has the same slope
and intersects the vertical axis at the same point. However, this equation does not estimate property sale prices
with equal precision in all three. In Neighbourhood A, actual sales prices lie very close to the sale prices
predicted by the regression line. In other words, the sum of squared differences, 3ei2, is small, and it appears
that we can be confident of the regression-estimated sale prices. In Neighbourhood B, actual sales prices are
loosely fitted by the regression line. In Neighbourhood C, there is virtually no relationship between living area
and sale price, making it impossible to draw the line of best fit. 3ei2 is very large and the average sale price
would produce almost equally good sale price estimates. We conclude that regression analysis is a useful
predictor of property sale prices when 3ei2 is small, but not when it is large.
One means of minimizing 3ei2 is to use additional variables. In Figure 1, many points probably lie below AAN
because they represent properties with some negative features, such as poor physical condition. Other points
probably lie above AAN because they represent properties with generally positive features, such as good physical
condition.
The model might be respecified, then, as:
S ' b0 % b1X1 % b2X2
(Equation 16.7)
where X2 is a number that represents physical condition relative to some norm, and may be either positive (better
than normal condition) or negative (poorer than normal condition). Again, MRA would calculate the regression
coefficients b0, b 1, and b 2 in such a way as to minimize 3e i2, where, in this case, the predicted values are a
function of both living area and physical condition. Note that the importance of any one variable in the
regression equation is directly related to its contribution in reducing 3ei2.
16.4
An Introduction to Statistics Used in Mass Appraisal
Figure 2
Comparison of Regression Results
16.5
Chapter 16
Evaluating Regression Results
This section of the chapter will cover six statistics used in evaluating regression results. Four are measures of
goodness of fit and relate to evaluation of the predictive accuracy of the equation. They are the coefficient of
determination (R2), the standard error of the estimate (SEE), the coefficient of variation (COV), and the F-ratio.
In different ways, each indicates how well the equation succeeds in minimizing 3ei2 and predicting sales prices.
The other two statistics relate to the importance of individual variables in the model. They are the correlation
coefficient (r) and the t-statistic.
1. Coefficient of Determination
The coefficient of determination, R2, is the percentage of the variance in sales prices explained by the regression
model. Possible values of R2 range from 0 to 1. When R2 = 0, none of the variation in sales prices is explained
by the model. In this case, the average sale price ( S ) provides overall estimates of individual sales prices just
as good as those from the regression model (see Neighbourhood C in Figure 2). On the other hand, when R2
= 1, all deviations from the average sale price S are explained by the regression equation and 3ei2 = 0. In a
one-variable model, this implies that all sales prices lie on a straight line (Neighbourhood A in Figure 2 best
approximates this condition). For the twenty-five observations in Table 1 and Figure 1, R2 is 0.704.
In another example, consider a data set which lists sale prices and living areas of houses. The R2 statistic
measures the percentage of variation in the dependent variable (sale price) explained by the independent variable
(living area). If the R2 is 0.6286, this means that the regression line is able to explain about 63% of the variation
of the sales prices (“variation” refers to the squared differences between sales prices and the average sale price).
Loosely speaking, living area explains about 5/8ths of the variation in sales prices; the remaining 3/8ths would
be explained by other characteristics or by random variations in price.
The following is a further illustration of calculating and interpreting the R2. Consider a regression model which
has carport market value as the dependent variable and quality of carport as the independent variable. This
model attempts to use carport quality to predict carport market value, and the statistical measures shown later
in this section describe how successful this attempt will be.
According to the regression equation calculated, the estimated value of a low quality carport is1:
Yi = 2,419.51 - 1,415.29 (1) + 1,915.83 (0)
= $1,004.22
The estimated value of an average quality carport is:
Yi = 2,419.51 - 1,415.29 (0) + 1,915.83 (0)
= $2,419.51
The estimated value of an above average quality carport is:
Yi = 2,419.51 - 1,415.29 (0) + 1,915.83 (1)
= $4,335.34
1
Carport quality in this model is described using three binary variables: low, average, and above average quality. A binary variable has
two values, 0 for no or 1 for yes. For example, a property with a low quality carport would have these three variables: LOW=1,
AVERAGE=0, ABOVE_AVERAGE=0. In the regression formula shown, average quality is the base condition, shown to have a value
of $2,419.51. This value is adjusted up or down for higher or lower quality.
16.6
An Introduction to Statistics Used in Mass Appraisal
The coefficients calculated would be used to estimate the carport's market value; e.g., a high quality carport is
shown to add $1,915 to value. The first check in a regression calculation is to see if the coefficients have the
expected signs (i.e., whether they are positive, adding to value, or negative, subtracting from value) and
expected magnitude (i.e., a very high quality carport should have greater value than a moderately high quality
carport).
In general, the higher the R2, the better the model is at explaining variation in the dependent variable. The R2
for this model is only .291, which indicates that only 29% of the variation in carport market value is explained
by the regression model. This indicates that there are factors other than quality which influence carport market
value, but have not been included in this analysis.
The use of R2 has two shortcomings. First, as we add more regression variables, R2 can only increase or stay
the same, which can overstate goodness of fit when insignificant variables are included or the number of
variables is large relative to the number of sales. Assume that we have regressed sales prices on eighteen
independent variables and obtained an R2 of 0.920. Now suppose we rerun the model with a nineteenth variable,
number of windows. As long as number of windows has any correlation whatsoever with sale price, R2 will
increase from 0.920.
Fortunately, R2 can be adjusted to account for the number of independent variables, resulting in its sister
statistic, adjusted R2 or R 2 . In the present example, the addition of number of windows as a nineteenth variable
will cause adjusted R2 to fall unless the variable makes some minimum contribution to the predictive power of
the equation.
The second shortcoming of R2 (shared also by R 2 ) is more a matter of care in interpretation. There can be no
specified universal critical value of R2; i.e., you cannot say "acceptable results have an R2 of 85%" or any other
value. The critical value of the R2 statistic will vary with several factors and there are several non-mathematical
reasons for variations in R2 which make setting a specific target for this statistic inadvisable.
In mass appraisal, we often divide properties into homogeneous strata and develop separate equations for each.
Because this reduces the variance among sales prices within each stratum, we should not expect MRA to explain
as large a percentage as when one equation is fit to the entire jurisdiction. As you will see in the course BUSI
444, when a sample is stratified before the model is created, there is usually less variation in the dependent
variable. If there is less variation in the data, then a model may produce a better estimate of the dependent
variable even when the R2 statistic indicates that a lesser percentage is explained by the model. For example,
if one model is developed to estimate sale price for all neighbourhoods in a sales database, there may be
$300,000 in variation among the sales prices. A model that explains 80% of the variation, still leaves 20% or
$60,000 unexplained. A model for a single neighbourhood, with only $50,000 variation in sale price may have
an adjusted R2 of only .60 but will produce better estimates of sales prices in that neighbourhood because 40%
of the variation is only $20,000. The standard error and COV (discussed later) will show this improvement.
In general in regression models, improving the standard error and COV is more important than increasing the
adjusted R2, but you should generally try to have the adjusted R2 as high as possible and the standard error and
COV as low as possible.
2. Standard Error of the Estimate
The assessor must not only be able to estimate the equation for the regression line, he or she must also be able
to measure how well the regression line fits the points. The techniques provided so far enable the assessor to
determine a best fit regression line and measure its overall goodness of fit using R2. However, it is also
desirable to find out how well the regression equation fits each individual observation. It may be that the best
fit line is very accurate at representing the data, or alternatively, if the data points are highly dispersed, the best
16.7
Chapter 16
fit line may be very poor. The standard error of the estimate is one measure of how good the best fit is. The
standard error of the estimate (SEE) measures the amount of deviation between actual and predicted sales prices.
If the standard error of the estimate is small, Yi is tightly scattered around the regression line. If the standard
error of the estimate is large, Yi is widely scattered around the regression line. The smaller the standard error,
the better the fit.
The SEE is calculated in a manner analogous to the standard deviation, and indeed can be viewed as the standard
deviation of the regression errors. Thus, if the errors are normally distributed, two-thirds of actual sales prices
will fall within about 1 SEE of their predicted values, 95 percent within about 2 SEEs, and so on (see example
below). Note that whereas R2 is a percentage figure, the SEE is a dollar figure if the dependent variable is price.
For the twenty-five observations in Table 1, the SEE is $9,107. The SEE is free from the second interpretive
shortcoming of R2 mentioned above. In other words, whereas R 2evaluations the seriousness of the errors
indirectly by comparing them with the variation of the sales prices, the SEE evaluates them directly in dollar
terms.
Continuing with the carport example, the standard error of the estimate (SEE) is found to be 707.79. This
statistic, also known as the root mean square error (RMSE), represents the standard deviation of the regression
errors. Assuming that these errors are normally distributed, approximately 68% of the errors will be $707.79
or less and approximately 95% will be $1,415.58 or less. Since the SEE is expressed in the same units of
measurement as the dependent variable (in this case, dollars of carport value), its size can only be viewed in
relation to the size of the dependent variable. To put the SEE in relative terms, it is often divided by the mean
of the dependent variable to derive a percentage SEE. This division compensates for price level differences
between samples and allows for comparisons among regressions on different data. When the standard error is
converted to a percentage, this new measure is termed the coefficient of variation (COV).
3. Coefficient of Variation
In regression analysis, the coefficient of variation (COV) is the SEE expressed as a percentage of the average
sale price and multiplied by 100.
Most MRA software reports the SEE but not the COV. Nevertheless, the COV is preferred for assessment
purposes, because its interpretation is independent of the average sale price. The example based on the data
from Table 1 has a SEE of $9,107. This would indicate a good predictive model when average property values
are high, but not when they are low. Expressing the SEE as a percentage of the mean sale price removes this
source of confusion. In the present example, the mean sale price is $83,848 (not shown), resulting in a COV
of 10.86.
COV =
(100)($9,107)
= 10.86%
$83,848
This implies that, given a normal distribution, roughly two-thirds of sales prices lie within 10.86 percent of their
MRA-predicted values.
Continuing with the carport example, the SEE can be difficult to interpret because it is expressed in the same
units of measurement as the dependent variable (in this case, dollars of carport value), and therefore its size can
only be viewed in relation to the size of the dependent variable. The standard error can be converted to a
percentage, termed the coefficient of variation (COV), by dividing by the average of the dependent variable.
If the average carport value is $2,342, the COV is 30.2%:
COV = 707.79 ÷ 2,342 × 100 = 30.2%
16.8
An Introduction to Statistics Used in Mass Appraisal
This relatively large value indicates that the observations are widely dispersed along the estimated regression
line and that carport quality, while definitely related to carport value, is not by itself a good predictor of carport
market value. In general residential models which have sale price as the dependent variable, a COV of
approximately 20% is acceptable, while a COV of approximately 10% indicates a very good result.
4. Correlation Coefficient
The correlation coefficient, abbreviated r, is the first of two statistics that relate to individual regression
variables. The correlation coefficient is a measure that is designed to indicate the strength of the relationship
between two variables. The linear correlation coefficient r, is a measure of the linear relationship between X
and Y. It is a summary measure just as mean and standard deviation are summary measures. The correlation
coefficient may be positive, negative, or zero.
-
If the correlation coefficient is positive, as one variable increases (decreases), the other variable will
increase (decrease).
-
If the correlation coefficient is negative, as one variable increases (decreases), the other variable will
decrease (increase).
-
If the correlation coefficient is zero, there is no relationship; there is no tendency for one variable to
increase or decrease as the other changes.
The mathematical definition of the correlation coefficient means that it takes on values only from -1.0 to +1.0.
If the correlation coefficient equals +1.0, then the two variables are perfectly positively correlated. In this case,
when the data pairs are graphed, they will all lie on a straight line sloping upwards and to the right. Similarly,
if the correlation coefficient is -1.0, then the two variables are said to be perfectly negatively correlated. In this
case, when the data pairs are graphed, they will all lie on a straight line sloping downwards and to the right.
As the correlation coefficient gets close to +1.0 or -1.0, the points become more compactly distributed around
a straight line. If the correlation coefficient is 0, then the two variables are said to be uncorrelated. There are
numerous configurations which would yield a correlation coefficient of 0.
A strong correlation does not mean there is a causal relationship between the variables. In the above example,
it was determined that the correlation between number of rooms and sale price for a series of houses is 0.72.
This does not mean an increase in the number of rooms causes an increase in sale price per unit; nor does it
mean an increase in the sale price causes an increase in the number of rooms. Another example will illustrate
this point. Studies have shown there is a large and positive correlation between the number of bars and churches
in town. However, it is unlikely there is any causal relationship between the two variables. If the number of
churches increases, the number of bars will not necessarily increase and vice versa. There is probably a high
correlation between the number of bars and churches because they are both related to the population of the city.
As the population increases, the number of bars increase and the number of churches increase.
Furthermore, r = .5 does not mean the strength of the relationship between X and Y is "halfway" between
perfect correlation and no correlation. In general, r > .8 indicates a strong relationship, .4 < r < .8 indicates
a moderate relationship, and r < .4 indicates a weak relationship.
MRA software usually includes an optional correlation matrix showing the correlation coefficient between each
pair of variables. In analysing correlations with the dependent variable, remember that the correlation coefficient
is a dimensionless figure or percentage, indicating only whether two variables are linearly related. The
correlation coefficient measures how strongly two variables have a straight line relation to each other, but does
not give the exact relationship. Two sets of data (x,y) yielding exactly the same regression equation (straight
line) may have very different correlation coefficients between x and y. Regression coefficients, on the other
hand, indicate how variables are related; that is, how many units (dollars) the dependent variable changes when
16.9
Chapter 16
the independent variable changes by one unit (for example, one square foot) with other variables in the equation
held constant. For the data in Table 1, the correlation coefficient between square feet and sale price is 0.839;
b1, the regression coefficient, is $47.90. Thus, there is a strong positive linear relationship between square feet
and sale price. As the number of square feet increases by 1, the estimated sale price increases by $47.90.
5. t-Statistic
The t-statistic is a measure of the significance or importance of a regression variable in explaining differences
in the dependent variable (sale price). It tests whether the slope of the regression line is equal to zero.
The t-values and their associated significance levels indicate the degree of confidence one can place on the
regression coefficients. The significance of the t-values varies with the number of observations, so the
significance level is more useful for determining the relevance of the variables. Higher t-values and lower
significance levels increase the reliance the model builder can place on the statistical significance of the
coefficients. A high t-value leads to the acceptance of the hypothesis that the coefficient is significantly different
than zero. Normally in assessment work, a significance level of less than .10 is desired, and often .05 or less.2
A significance level of .10 suggests that one can be at least 90% confident that the variable coefficient is
significantly different from 0. A significance level of less than .05 would indicate that the probability of the
coefficient being equal to zero is 5% or less, which indicates a reliable result.
A t-statistic in excess of ± 2.58 indicates that one can be 99 percent confident that the independent variable Xj
is significant in the prediction of sale price S. For the data in Table 1, the t-statistic for square footage is 7.40.
Referring to the t-table, when tj is ± 3.767, one can be 99.9 percent confident that the correlation does not equal
0. Therefore, in this case we can conclude with confidence that square feet of living area is significant in
estimating residential values.
Note that the t-statistic is dependent on the number of observations, and as a rule, a specific universal value for
acceptance or rejection cannot be given. However, the criteria for this is usually to have a t-value over 1.6 (90%
confidence) or 2.0 (95% confidence). If the probability is high that the coefficient is equal to zero, this would
indicate that the variable provides no useful information to the model. A t-value of approximately 1.6 indicates
significance at the 90% confidence level, which means that there is less than 10% probability that the coefficient
is equal to zero. If a regression model was estimating sale price and the independent variables number of
bedrooms and family rooms had significance values of 0.842 and 0.919 respectively, this would show a high
probability that neither of these variables are useful in the model.
Continuing with the carport example, the coefficients in this model indicate that carport values are, on average,
$1,415 less for low quality carports and $1,916 more for high quality carports. In this case, the significance
levels for both high and low quality were .0000, indicating that one can be virtually 100% confident that carport
quality affects carport value.
6. F-Value
The F-Value (or F-Ratio) is related to the correlation coefficient (r). It measures whether the overall regression
relationship is significant; that is, it tests whether the model is useful in representing the sample data. The Fvalue is a ratio showing the portion of the total variation of the dependent variable that is explained by the
regression divided by the remaining variation that is left unexplained by the model.
2
Note that the choice of significance level depends on a number of factors and no one limit is applicable to all situations; for example in
some assessment work, a significance level of 20 to 25% may be appropriate.
16.10
An Introduction to Statistics Used in Mass Appraisal
F=
variance explained by the regression
variance unexplained
If explained variation is small relative to unexplained variation, the regression equation does not fit the data well
and the regression results are not considered statistically significant. A small value of F (generally less than 4)
leads to acceptance of the hypothesis that the regression relationship is not significant. If F is large, the
hypothesis that the derived regression model is not significant is rejected and it is concluded that the overall
regression results are statistically significant.
Continuing with the carport example, the F-Ratio shows the overall quality of the regression, as opposed to the
usefulness of the individual variables as reported by the t-statistic. An F-Ratio which is greater than 4 indicates
that the estimates produced by the regression model provide a better representation of the sample data than the
mean of the observations. In other words, the regression estimates fit the data well and the results are
statistically significant. However, the size of the F-Ratio above the critical value of 4 must be viewed with
caution. At larger magnitudes, the F-Ratio is useful mostly as a relative measure; for example, if two models
are identical in all respects other than their F-Ratios, the model with the larger F-Ratio is probably the better
one. The absolute measure of the F-Ratio is less meaningful because F-Ratios are sensitive to the number of
observations and the number of variables in the model. Few observations together with a relatively large number
of variables will generally produce a low F-Ratio. The F-Ratio in this example is 37.89, which is greater than
the critical value of 4 and indicates that the estimates produced by the model are better predictors of value than
the mean. However, the large number of observations and few variables in the model would be expected to
produce a very high F-Ratio. Thus, the actual F-Ratio of 37.89 is not necessarily a good result, as an F-Ratio
of 150 or 200 could be more appropriate in this case.
In evaluating regression models, it is important to evaluate both how well the regression model captures the
observed variation in the dependent variable (price), using adjusted R2 , as well as the error generated from the
model, using SEE or COV. In the carport model, there is a low R2 combined with a high COV. This indicates
that the carport quality does not "explain" the variation in value well. It is likely that the carport value is more
dependent on carport size than on quality. If information on carport size was available, it could potentially form
part of the final model in addition to carport quality. The F statistic measures performance of the model overall
when compared to the result that would be obtained by estimating the carport value by simply using the mean
carport value.
Summary
When checking the regression output, the following points are important:
C
C
C
C
C
C
the coefficients have the expected sign (positive or negative);
the t-statistics are significant, i.e. greater than 1.64 (significance level less than .10);
the F-statistic is "large" and the probability provided with the F-statistic should be less than .05;
the standard error of the estimate or SEE (also termed the "root mean square error" or RMSE) should
be small;
the Coefficient of Variation (COV = SEE / Mean Sale Price) should be small; and
the adjusted R2 should be large.
Note that these are just general guidelines and cannot be applied universally in all cases. Regression analysis
is extremely complex and there are many interrelated factors that can affect results. Because of this complexity,
the analyst must be very careful about not relying on universal measures or "cookbook" procedures.
16.11
Chapter 16
Introduction to Ratio Studies
If a property tax is to be fair and provide adequate revenue for local government, mass appraisal must produce
accurate appraisals and equitable assessments. The primary tool used to measure mass appraisal performance
is the ratio study.
A ratio study compares appraised values to market values. Market value is the most probable price in cash that
a property would bring in a competitive and open market, assuming that the buyer and seller are knowledgeable,
sufficient time is allowed for the sale, and price is not affected by special influences. In a ratio study, market
values are usually represented by sales prices.
The ratios used in a ratio study are formed by dividing appraised values made for tax purposes by other estimates
of market value, such as sales prices or independent appraisals. For example, a property appraised for tax
purposes at $40,000 and sold for $50,000 has a ratio of 0.80, or 80 percent:
ASR =
A
$40,000
=
= 0.80
S
$50,000
where ASR is the assessment-sales ratio (also shown as A/S), A is the appraised value, and S is the sale price.
Ratio studies measure two primary aspects of mass appraisal accuracy: level and uniformity. Appraisal level
refers to the overall, or typical, ratio at which properties are appraised. In mass appraisal, appraised values do
not always equal their indicators of market value (sales prices or independent appraisals), but over-appraisals
should balance under-appraisals, so that the typical ratio is near 100 percent. Measures of appraisal level are
treated later in this chapter.
Appraisal uniformity relates to the fair and equitable treatment of individual properties. Uniformity requires,
first, that properties be appraised equitably within groups or categories (use classes, neighbourhoods, and so
forth) and, second, that each of these groups be appraised at the same level, or ratio, of market value. That is,
appraisal uniformity requires equity within groups and between groups.
Measures of Appraisal Level
Appraisal level measures describe the typical level of appraisal as a percentage of sale price among the
observations. The assessment sales ratio (ASR) is the ratio of assessed value to sale price for each property.
Ideally, the ASR should be close to 1 indicating that properties are assessed at 100 percent of value. In practice
it would be extremely difficult, and probably inefficient, to assess all properties at exactly 100 percent of market
value. Therefore, average ASRs and other measures are used to determine an overall appraisal level.
Measures of appraisal level are calculated statistically by measures of central tendency, which describe the
typical level of appraisal by a single number or statistic. The four such measures applicable to ratio studies are
the median; the mean (also known as the average, arithmetic average, or unweighted average); the aggregate
ratio (sometimes referred to in Assessment literature as weighted mean); and the geometric mean. Only the
mean, median, and aggregate ratio will be described in detail in this chapter.
Because each measure has relative advantages and disadvantages, it is good practice to compute several or even
all of them in a ratio study. Comparing them provides useful information about the distribution of ratios. Wide
differences among the measures indicate undesirable patterns of appraisal performance.
16.12
An Introduction to Statistics Used in Mass Appraisal
1. Median
The median is the midpoint, or middle ratio, when the ratios are arrayed in order of magnitude. It divides the
ratios into two equal groups and is therefore little affected by extreme ratios.
Figure 3 shows the application of this formula for three data sets. In Example A, the computed rank is 3, which
corresponds to a ratio of 0.900. In example B, with an even number of ratios, the two middle ratios, 0.900 and
0.950, are averaged to produce a median of 0.925. Example C illustrates the negligible effect of outliers on the
median. Although the sixth ratio (2.000) is much greater than in Example B (1.050), the median is unchanged.
Figure 3
Computing the Median and Mean
Example A
Sale number
1
2
3
4
5
Example B
(ASR)
0.800
0.850
0.900
0.950
1.000
4.500
Median rank
0.5(5)
=3
Median
= 0.900
Mean 4.500/5 = 0.900
Sale number
1
2
3
4
5
6
Example C
(ASR)
0.800
0.850
0.900
0.950
1.000
1.050
5.550
0.5(6) + 0.5 = 3.5
(0.9 + 0.95)/2 = 0.925
5.550/6
= 0.925
Sale number
1
2
3
4
5
6
(ASR)
0.800
0.850
0.900
0.950
1.000
2.000
6.500
0.5(6) + 0.5 = 3.5
(0.9 + 0.95)/2 = 0.925
6.500/6
= 1.083
ASR is the ratio of appraisal value to sale price.
The median has several advantages in ratio studies. It is relatively easy to compute and interpret. Because it
discounts the effects of extreme ratios, the median is little affected by data errors, unlike other measures of
central tendency. The median is also the base from which the coefficient of dispersion, the primary measure
of appraisal uniformity, is calculated. Finally, the sample median provides an unbiased estimate of the
population median. Accordingly, the median is the preferred measure of central tendency in many ratio study
applications.
A possible disadvantage of the median is that it gives no added weight to legitimate outliers. Some outliers are
valid and may need to be given extra weight in a ratio study. Also, the median does not lend itself to certain
statistical calculations as readily as the mean.
2. Mean
The mean is the average ratio. It is found by summing the ratios and then dividing by the number of ratios.
Figure 3 illustrates calculation of the mean. A comparison of Examples B and C demonstrates the pronounced
effect of outliers on the mean. The medians in the two examples are equal, but the mean ratio is 1.083 in
Example C and 0.925 in Example B. The mean accurately reflects the full magnitude of every ratio, which is
desirable only if outliers are based on valid data and occur within the same frequency in both the sample and
the population. Outliers particularly affect the mean in small samples.
16.13
Chapter 16
Like the median, the mean is easy to compute and explain. It is widely used in statistics and is the basis for
many other mathematical calculations. When the sample has been properly obtained and the data carefully
screened and processed, the mean provides a valid measure of appraisal level.
3. Aggregate Ratio
The aggregate ratio weights each ratio in proportion to its sale price, whereas the mean and median give equal
weight to each sale price. Table 2 demonstrates calculation of the aggregate ratio and illustrates this weighting
feature. In Example A, the mean is 0.720 and the aggregate ratio, 0.600. In effect, the single $100,000 sale
has as much weight in calculation of the aggregate ratio as the four $25,000 sales. By contrast, the mean assigns
equal weight to each ratio.
Table 2
Calculating the Aggregate Ratio
Example A
Sale
number
1
2
3
4
5
Example B
Appraised
value (A)
$20,000
20,000
20,000
20,000
40,000
$120,000
Sale
price (S)
$25,000
25,000
25,000
25,000
100,000
$200,000
Ratio
(ASR)
0.800
0.800
0.800
0.800
0.400
3.600
Mean = 3.600/5 = 0.720
Aggregate ratio = $120,000 ÷ $200,000 =
0.600
Sale
number
1
2
3
4
5
Appraised
value (A)
$10,000
20,000
20,000
20,000
80,000
$150,000
Sale
price (S)
$25,000
25,000
25,000
25,000
100,000
$200,000
Ratio
(ASR)
0.400
0.800
0.800
0.800
0.800
3.600
Mean = 3.600/5 = 0.720
Aggregate ratio = $150,000 ÷ $200,000 =
0.750
In Example B, the mean again is 0.720, but the aggregate ratio is 0.750, somewhat higher than the mean and
very different from the aggregate ratio in Example A.
Because of this weighting feature, the aggregate ratio is the appropriate measure of central tendency for
estimating the total dollar value of a population of parcels. If, for example, the total appraised value of a class
of property is $100 million, and the aggregate ratio is 0.800, then the best estimate of the total market value of
the class is $125 million ($100 million divided by 0.800). The aggregate ratio is also required in calculation of
the price-related differential.
The major disadvantage of the aggregate ratio is its susceptibility to sampling error; for example, when a sample
contains several properties of high value appraised at a different level from other properties in the sample. The
aggregate ratio can also mask problems in the appraisal of properties of low value, which have minimal effect
on this statistic.
Measures of Appraisal Uniformity
Determining the quality of mass appraisal also requires measuring uniformity: uniformity between groups of
properties and uniformity within groups. Uniformity between groups can be evaluated by comparing measures
of appraisal level calculated for each group. Measuring uniformity within groups is more complex.
16.14
An Introduction to Statistics Used in Mass Appraisal
The need for measuring intragroup uniformity, not just the level of appraisal, is shown in Table 3. In both
Examples A and B the overall level of appraisal is perfect, with all three measures of central tendency at or near
1.000. If uniformity were also perfect, each ratio would be 1.000. In Example A, the ratios all lie within 16
percent of 1.000: from 0.840 to 1.160. In Example B, however, the range is much wider: from 0.400 to
1.600. Uniformity, and therefore tax equity, although not perfect in either case, is clearly better in Example
A.
Table 3
Appraisal Level v. Uniformity
Example A: Good Uniformity
Sale
Appraised
Sale
number
value (A)
Price (S)
1
2
3
4
5
6
7
$21,000
44,000
28,000
60,000
32,000
56,000
29,000
$270,000
$25,000
50,000
30,000
60,000
30,000
50,000
25,000
$270,000
Ratio
(ASR)
0.840
0.880
0.933
1.000
1.067
1.120
1.160
7.000
Median = 1.000
Mean = 7.000 ÷ 7 = 1.000
Aggregate ratio = $270,000 ÷ $270,000 =
1.000
Example B: Poor Uniformity
Sale
Appraised
Sale
number
value (A)
price (S)
1
2
3
4
5
6
7
$10,000
30,000
22,500
60,000
37,500
70,000
40,000
$270,000
$25,000
50,000
30,000
60,000
30,000
50,000
25,000
$270,000
Ratio
(ASR)
0.400
0.600
0.750
1.000
1.250
1.400
1.600
7.000
Median = 1.000
Mean = 7.000 ÷ 7 = 1.000
Aggregate ratio = $270,000 ÷ $270,000 =
1.000
In small samples, such as those in Table 3, the degree of uniformity can be seen by direct observation of the
array. In larger samples, this is not possible, and one must quantify the degree of uniformity to evaluate the
seriousness of any problems. Six measures of appraisal uniformity are in common use:
•
•
•
•
•
•
range, quartiles, and percentiles
average absolute deviation
coefficient of dispersion
standard deviation
coefficient of variation
price-related differential
1. Range, Quartiles, and Percentiles
The range, quartiles, and percentiles offer simple measures of data uniformity based on an array of the data.
The range is simply the difference between the highest and lowest ratios in the sample. In Example A (Table
3), the range is 0.32 (1.160 - 0.840); in Example B, 1.200 (1.600 - 0.400). Larger ranges may indicate poorer
uniformity, but because the extreme outliers completely control the range, it can be a misleading indicator of
overall uniformity.
Table 4 demonstrates the insensitivity of the range to all but the most extreme ratios. Although the ranges in
samples A and B are identical, uniformity is better in sample B than in sample A. Because the two most extreme
ratios control the range, the statistic is inadequate for larger samples.
16.15
Chapter 16
Table 4
Influences of Outliers on the Range
Sale number
Sample A
Sample B
1
2
3
4
5
6
7
8
9
10
0.100
0.100
0.100
0.100
2.000
2.000
2.000
2.000
2.000
0.100
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
2.000
Range = 2.00 - 0.1000 = 1.900
Percentiles and quartiles are dividing points between specific percentages of the data. The median, for example,
is the 50th percentile and second quartile. It exceeds 50 percent, or two quarters, of the ratios. The first
quartile corresponds to the 25th percentile and exceeds one quarter, or 25 percent, of the ratios. Similarly, the
third quartile corresponds to the 75th percentile and exceeds three quarters, or 75 percent, of the ratios.
Several measures of data uniformity based on quartiles and percentiles have been developed. The interquartile
range is the difference between the third and first quartiles. For example, if the first quartile is 0.660 and the
third quartile is 0.988, the interquartile range, the range within which 50 percent of the ratios lie, is 0.328 (0.988
- 0.660). Dividing the interquartile range by the median permits it to be interpreted as a percentage. Hence,
if the median in the present example were 0.800, then 50 percent of the ratios would lie within 41.0 percent of
the median (0.328/0.800).
Other interpercentile ranges can be developed in the same way. For example, to develop the 90/10
interpercentile range, find the ratios corresponding to the 90th and 10th percentiles and subtract the latter from
the former. Again, interpretation is improved by dividing the result by the median ratio.
Below is another example of the calculation and interpretation of interquartile range, where the median is 97%:
25th ! 75th
10th ! 90th
87.85 - 108.13
80.63 - 123.25
50% of ASRs lie within approximately ±11% of median
80% of ASRs lie within approximately ±26% of median
Quartiles, percentiles, and related statistics can be useful, but they convey no information about the degree of
uniformity outside the chosen range. Unlike other measures, they do not measure overall uniformity using every
ratio in the sample.
2. Average Absolute Deviation
The average absolute deviation, often referred to simply as the average deviation, measures the average spread,
or difference, between each ratio and the median ratio. The average absolute deviation is calculated by
subtracting the median from each ratio, summing the absolute values of the computed differences, and dividing
this sum by the number of ratios.
Table 5 shows how to calculate the average absolute deviation for two data sets presented in Table 3. Although
the medians are the same in the two examples, the average deviation is much larger in Example B (0.351) than
16.16
An Introduction to Statistics Used in Mass Appraisal
in Example A (0.099). In fact, the results indicate that appraisal uniformity is more than three times worse in
Example B than in Example A:
Table 5
Calculating the Average Absolute Deviation
Example A (from Table 3)
Example B (from Table 3)
Sale
number
Ratio
(ASR)
Absolute difference
from median
Sale
number
Ratio
(ASR)
Absolute difference
from median
1
2
3
4
5
6
7
0.840
0.880
0.933
1.000
1.067
1.120
1.160
0.160
0.120
0.067
0.000
0.067
0.120
0.160
0.694
1
2
3
4
5
6
7
0.400
0.600
0.750
1.000
1.250
1.400
1.600
0.600
0.400
0.250
0.000
0.250
0.400
0.600
2.500
Median = 1.000
Average absolute deviation
= 0.694 ÷ 7
= 0.099
Median = 1.000
Average absolute deviation
= 2.500 ÷ 7
= 0.357
ASR is the ratio of appraised value to sale price.
A major drawback to the average absolute deviation is that it measures appraisal uniformity in raw percentage
points rather than in relative terms. This limits the usefulness of the statistic because, for example, an average
deviation of 0.10 is good if the median is near 1.000, but not if the median is, say, 0.300. Similarly, average
absolute deviations cannot be compared between property groups unless the measures of central tendency are
similar.
3. Coefficient of Dispersion
The coefficient of dispersion (COD) is the most used measure of uniformity in ratio studies. The COD is based
on the average absolute deviation, but expresses it as a percentage of the median. Thus, the COD is generally
a more useful measure of appraisal uniformity than the average absolute deviation because it is independent of
the level of appraisal and permits direct comparisons between property groups.
The COD is calculated by dividing the average absolute deviation by the median and multiplying by 100 to
convert the ratio to a percentage. Table 6 shows how to calculate the COD. Note that the average absolute
deviation is the same as in Example A of Table 5, but that the COD is twice as large. The median in Example
A of Table 5 is 1.000, so the COD equals the average absolute deviation. By contrast, in Table 6 the median
is only 0.500, so that the COD is twice as large. In general, the lower the level of appraisal (median ASR ratio),
the greater will be the COD relative to the average absolute deviation.
16.17
Chapter 16
Table 6
Calculating the Coefficient of Dispersion
Sale
number
Appraised value
(A)
1
2
3
4
5
6
7
$ 8,500
19,000
13,000
30,000
17,000
31,000
16,500
Sale Price
(S)
$25,000
50,000
30,000
60,000
30,000
50,000
25,000
Ratio
(ASR)
0.340
0.380
0.433
0.500
0.567
0.620
0.660
Absolute difference
from median
0.160
0.120
0.067
0.000
0.067
0.120
0.160
0.694
Median = 0.500
Average absolute deviation = 0.694 ÷ 7 = 0.099
COD = (0.099/0.500)(100) = 19.8
Although the COD measures the average percentage deviation from the median, it does not measure the typical
or median deviation. In a normal distribution, 57 percent of the ratios will fall within one COD of the median.
Low CODs (15.0 or less) tend to be associated with good appraisal uniformity. (Specific mass appraisal
performance standards based on the COD are discussed later in this chapter.) CODs of less than 5.0 are very
rare except in:
(1) subdivisions in which lot prices are strictly controlled by the developer;
(2) extremely homogeneous property groups, such as condominium units all located in the same complex;
(3) appraisal ratio studies in which the assessor's values and the independent appraisals reflect the same
appraisal manuals and procedures; or
(4) appraisals that have been adjusted to match sales prices.
One drawback to the COD is that it does not provide a basis for probability statements concerning appraisal
uniformity. We cannot use the statistic, for example, to evaluate the chance, or probability, that a given
property will be appraised above a given level (say, 1.000) or within a given range (say, 0.900 to 1.100).
4. Standard Deviation
The standard deviation is the primary measure of dispersion in scientific research and, under certain
assumptions, can be a powerful measure of appraisal uniformity. Table 7 shows how to calculate the standard
deviation. The standard deviation is almost four times larger in Example B than in Example A, reflecting the
much greater dispersion of the ratios.
16.18
An Introduction to Statistics Used in Mass Appraisal
Table 7
Calculating the Standard Deviation
Example A (from Table 3)
Sale
number
Ratio
(ASR)
1
2
3
4
5
6
7
0.840
0.880
0.933
1.000
1.067
1.120
1.160
7.000
Example B (from Table 3)
Difference
from mean
Difference
squared
-0.160
-0.120
-0.067
0.000
0.067
0.120
0.160
Sale
number
Ratio
(ASR)
1
2
3
4
5
6
7
0.400
0.600
0.750
1.000
1.250
1.400
1.600
7.000
0.0256
0.0144
0.0045
0.0000
0.0045
0.0144
0.0256
0.0890
Mean = 7.000 ÷ 7 = 1.000
Variance = 0.0890 ÷ 6 = 0.0148
Standard deviation = 0.0148 = 0.122
Difference
from mean
-0.600
-0.400
-0.250
0.000
0.250
0.400
0.600
Difference
squared
0.3600
0.1600
0.0625
0.0000
0.0625
0.1600
0.3600
1.1650
Mean = 7.000 ÷ 7 = 1.000
Variance = 1.1650 ÷ 6 = 0.1942
Standard deviation = 0.1942 = 0.441
ASR is the ratio of appraised value to sale price.
Interpretation of the standard deviation depends on an unbiased, representative sample in which the data are
normally distributed. A normal distribution is characterized by a symmetrical, bell-shaped curve, in which the
mean and median are identical; they should at least be similar for normality to be assumed. Non-normal
distributions are skewed either to the left or right of the median.
In ratio studies, the larger the standard deviation, the wider the range within which a given portion of properties
are appraised relative to market value. The example in Table 9 clearly illustrates this principle. Assume that
the ratios in all three property groups are normally distributed about a mean ratio of 0.950. In Group 1, with
a standard deviation of 0.100, 95 percent of parcels can be assumed to be appraised between 75 and 115 percent
of market value (two standard deviations away from the mean: .95 - 2(.1) = .75 and .95 + 2(.1) = 1.15). In
Group 2, with a standard deviation of 0.200, 95 percent of ratios will fall between 55 and 135 percent of market
value [(.95 - 2(.2) and .95 + 2(.2)]. In Group 3, with a standard deviation of 0.300, the corresponding range
is 35-155 percent of market value [(.95 - 2(.3) and .95 + 2(.3)].
If the ASRs are normally distributed, 68% of the ASRs will be within one standard deviation, 95% within 2
standard deviations, and 99% within 3 standard deviations. As an illustration of this, consider a regression
model with a mean ASR of 100% and a standard deviation of 17.50%. This would provide the following
intervals if the data was normally distributed:
68% between
(100 ! 17.5) and (100 + 17.5)
82.50 and 117.50
95% between
(100 ! 2(17.5)) and (100 + 2(17.5))
65 and 135
99% between
(100 ! 3(17.5)) and (1.000 + 3(17.5))
47.5 and 152.5
16.19
Chapter 16
Table 8
Percent of Data within One, Two or Three Standard Deviations from the Mean
Normal distribution
Non-normal distribution
Number of standard
deviations from mean
Percent of data
Number of standard
deviation from mean
Percent of data
±1
±2
±3
68%
95%
99%
±1
±2
±3
unknown
75%
89%
Table 9
Range of Appraisal Levels for Parcels within Specified Standard Deviations
Appraisal level for indicated percent of parcels
Property group
Standard deviation
68 percent
of parcels
95 percent
of parcels
99 percent
of parcels
1
2
3
0.100
0.200
0.300
0.85-1.05
0.75-1.15
0.65-1.25
0.75-1.15
0.55-1.35
0.35-1.55
0.65-1.25
0.35-1.55
0.05-1.85
Note also that estimates of this kind apply to the entire population of properties in the group, not just those in
the sample. That is, results obtained for the sample apply to the population as well, provided, of course, that
the sample is representative of the population. If the data do not approximate a normal distribution, the standard
deviation is less useful.
Depending on the representativeness of the sample and distribution of the data, the standard deviation can be
either a powerful or a misleading measure of appraisal uniformity. Accordingly, the analyst must verify that
the data approximate a normal distribution before placing credence in the statistic. Frequency distributions and
histograms are good tools for this.
5. Coefficient of Variation
The coefficient of variation (COV) expresses the standard deviation as a percentage, just as the COD does with
the average absolute deviation. Expression as a percentage makes comparisons of appraisal levels between
groups easier. The COV of ratios is computed by dividing the standard deviation(s) by the mean ASR ratio and
multiplying the result by 100.
Table 10 illustrates calculation of the COV. It is instructive to compare Table 10 with Table 7, Example A.
In both cases the standard deviation is .122 (rounded to three decimal places), but the COV in Table 10 is twice
as high, because the mean is only 0.500, versus 1.000 in Table 7.
16.20
An Introduction to Statistics Used in Mass Appraisal
Table 10
Calculating the Coefficient of Variation (COV)
Sale
number
1
2
3
4
5
6
7
Ratio
(ASR)
Difference
from mean
0.340
0.380
0.433
0.500
0.567
0.620
0.660
3.500
-0.160
-0.120
-0.067
0.000
0.067
0.120
0.160
Difference
squared
0.0256
0.0144
0.0045
0.0000
0.0045
0.0144
0.0256
0.0890
Mean = 3.500 ÷ 7 = 0.500
Variance = 0.0889 ÷ 6 = 0.0148
Standard deviation = %&&&&&&
(0.0148) = 0.1218
COV = (0.1218/0.500)(100) = 24.36
Note: Data are from Table 6.
The COV is interpreted in the same manner as the standard deviation except that it is a percentage of the mean
rather than a raw decimal or ratio. The rules in Table 8 will hold in a normal distribution.
For example, if the mean is 0.800 and the COV is 25.0, then 68 percent of the ratios will lie between 0.600
[0.80 - (0.25 x 0.80)] and 1.000 [0.80 + (0.25 x 0.80)].
Like the standard deviation, the predictive power of the COV depends on the extent to which the data are
normally distributed. When the normality assumption is met, however, the COV can be a powerful measure
of uniformity. However, the coefficient of variation is rarely used in assessment studies; the COD is much more
commonly used to measure dispersion.
6. Price-Related Differential
Property appraisals sometimes result in unequal tax burdens between high- and low-value properties in the same
property group. Appraisals are considered regressive if high-value properties are under-appraised relative to
low-value properties and progressive if high-value properties are relatively over-appraised. Ideally, appraisals
should be neither regressive or progressive but proportional where all properties are appraised at 100 percent
of market value.
The price-related differential (PRD) is a statistic for measuring assessment regressivity or progressivity. It is
calculated by dividing the mean by the aggregate ratio. The mean weights the ratios equally, whereas the
aggregate ratio weights them in proportion to their sales prices. A PRD greater than 1.00 suggests that the
high-value parcels are under-appraised, thus pulling the aggregate ratio below the mean. On the other hand, if
the PRD is less than 1.00, high-value parcels are relatively over-appraised, pulling the aggregate ratio above the
mean.
16.21
Chapter 16
Table 11
Calculating the Price-Related Differential (PRD)
Example A - No Bias:
Sale
number
1
2
3
4
5
6
Example B - Regressivity:
Appraised
value (A)
$ 25,000
24,000
31,000
40,000
60,000
79,000
$259,000
Sale
price (S)
$ 20,000
30,000
40,000
50,000
60,000
70,000
$270,000
Ratio
(ASR)
Sale
number
1.250
0.800
0.775
0.800
1.000
1.129
5.754
1
2
3
4
5
6
Mean = 5.754 ÷ 6 = 0.959
Aggregate ratio = $259,000 ÷ $270,000 = 0.959
PRD = 0.959 ÷ 0.959 = 1.00
Appraised
value (A)
Sale
price (S)
Ratio
(ASR)
$ 20,000
30,000
40,000
50,000
60,000
70,000
$270,000
$ 30,000
40,000
45,000
50,000
40,000
45,000
$250,000
1.500
1.333
1.125
1.000
0.667
0.643
6.268
Mean = 6.268 ÷ 6 = 1.045
Aggregate ratio = $250,000 ÷ $270,000 = 0.926
PRD = 1.045 ÷ 0.926 = 1.13
Example C - Progressivity:
Sale
number
1
2
3
4
5
6
Appraised
value (A)
$ 6,000
12,000
30,000
60,000
75,000
90,000
$273,000
Sale
price (S)
Ratio
(ASR)
$ 20,000
30,000
40,000
50,000
60,000
70,000
$270,000
0.300
0.400
0.750
1.200
1.250
1.286
5.186
Mean = 5.186 ÷ 6 = 0.864
Aggregate ratio = $273,000 ÷ $270,000 = 1.011
PRD = 0.864 ÷ 1.011 = 0.85
PRD
Interpretation
Favours
Type of bias
0.98 to
1.03
Low- and high-value properties
appraised equally
Neither
None
Less than
0.98
High-value properties
over-appraised
Low-value
properties
Progressive
More than
1.03
High-value properties
under-appraised
High-value
properties
Regressive
In practice, PRDs have an upward bias. Recall that, as an estimator of the population mean the sample mean,
has a slight upward bias, but the aggregate ratio does not (except for very small samples). Therefore, the PRD
has a slight upward bias. Assessment time lags can also contribute to regressivity. In addition to measurement
bias, one must leave a reasonable margin for sampling error in interpreting the PRD. As a general rule, except
for small samples, PRDs should range between 0.98 and 1.03. Lower PRDs suggest significant assessment
progressivity; higher ones suggest significant regressivity.
Table 11 above illustrates three conditions. In Example A, the PRD is exactly 1.00 and indicates no bias
between low- and high-value properties, in Example B the PRD of 1.13 indicates assessment regressivity, and
in Example C the PRD of 0.85 suggests assessment progressivity. Figure 4 is a scatter diagram of the three
cases.
16.22
An Introduction to Statistics Used in Mass Appraisal
Figure 4
Price-Related Differential (PRD)
Both regressive and progressive PRDs can result from misclassifications or systematic problems in appraisal
schedules or techniques. The PRD provides only an indication, not proof, of appraisal bias. When sample sizes
are small, PRDs outside the acceptable range may occur simply because of random sampling error.
A final note on the PRD: generally speaking, the PRD has uncertain foundations. It is not found in any standard
statistical work; its confidence limits are unknown. It is used for consistency with other jurisdictions and
comparability to records of past performance, but it may not be the best way to measure, or identify the presence
of, vertical inequity.
Ratio Study Standards
Assessment agencies should maintain standards for evaluating ratio study results. Such standards promote
improvement in the appraisal process and ensure consistency in reappraisal or equalization actions. They can
also be used to set goals for individual appraisers or field crews.
The IAAO Standard on Ratio Studies (1990) recommends the following standards for jurisdictions in which
current market value is the legal basis of assessment.
Appraisal Level
The overall level of appraisal for all parcels in a jurisdiction should be within 10 percent of the legal level; that
is, between 0.90 and 1.10.
16.23
Chapter 16
Appraisal Uniformity
•
Uniformity among strata
Each major stratum should be appraised within 5 percent of the overall level of appraisal for the
jurisdiction. Thus, if the overall level is 0.90, each property class and area should be appraised between
0.855 [0.90 - (0.05 × 0.90)] and 0.945 [0.90 + (0.05 × 0.90)].
•
Single-family residences
CODs should generally be 15.0 or less, and for newer and fairly homogeneous areas, 10.0 or less.
•
Income-producing property
CODs should be 20.0 or less, and in larger, urban jurisdictions, 15.0 or less.
•
Vacant land and other property
CODs should be 20.0 or less.
•
Other real property and personal property
Target COD should reflect the nature of the properties, market conditions, and the availability of
reliable market indicators.
In addition, PRDs should be between 0.98 and 1.03. This range is centred slightly above 1.00 to allow for the
measurement bias inherent in the PRD.
It can be difficult to conclude with confidence that these or other mandated standards have been met for all
parcels, not just for those in the sample. When sample sizes are small, calculated statistics may fall outside the
requirements because of sampling error. Appraised values made for assessment purposes should not be
selectively adjusted so as to match sales prices. Doing so will invalidate the ratio study by making the sample
results unrepresentative of true performance.
Summary
This chapter summarizes the terminology, techniques, and statistical measures used by assessors in carrying out
computer assisted mass appraisal using multiple regression analysis.
The first section of the chapter covers multiple regression analysis in general, explaining the theory and method
as well as the measures used to evaluate regression results: coefficient of determination (R2), standard error of
the estimate (SEE), coefficient of variation (COV), correlation coefficient (r), t-statistic, and F-statistic.
The second section of the chapter covers ratio studies. Ratio studies are the principal tool used by appraisers
to assure the quality of mass appraisals. Measures of appraisal level covered included the median, mean, and
aggregate ratio. Measures of appraisal uniformity included ranges, quartiles, and percentiles; average absolute
deviation (AAD); coefficient of dispersion (COD); standard deviation (F), coefficient of variation (COV), and
price-related differential (PRD).
16.24