Download Simple Linear Regression and Correlation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
BG 2200 Statistics II
1
Notes 1
Topic 0: Important Preliminaries
Statistics refers to the body of techniques used for collecting, organizing, analyzing, and
interpreting data (information or facts). The data may be quantitative, with values expressed
numerically, or it may be qualitative, with the characteristics of observations being tabulated. The
data in business can usually be classified into three types, namely nominal (just names), ordinal (just
ordering or ranks), and interval (measurements of size).
Statistics is used in business to help make better decisions by understanding the sources of
variation and by uncovering patterns and relationships to business data. Descriptive statistics
include the techniques that are used to summarize and describe numerical data. These methods
can either be graphical or involve computational analysis. Inferential statistics include those
techniques by which decisions about a statistical population or process are made based only on a
sample having been observed. Sample data can be obtained by means of survey, observation,
interview or experiment.
An experiment is a process that lead to the occurrence of one (and only one) of several
possible outcomes. A random experiment is an experiment, the outcome of which cannot be
predicted with certainty, but all possible outcomes of which can be described prior to its
performance, and which can be repeated under the same conditions. The collection of all possible
outcomes is called the sample space. For a random experiment with a sample space, a function X,
which assigns to each outcome of the sample space one and only one real number is called a
random variable. A discrete random variable can have observed values only at isolated points along
a scale of values. In business statistics, such data typically occur through the process of counting;
hence, the values generally are expressed as integers (whole numbers). A continuous random
variable can assume a value at any point along a specified interval of values. Continuous data are
generated by the process of measuring. A collection of every value of a random variable is called a
population. A part of population is called a sample. A characteristic of a population is called a
parameter. A characteristic of a sample is called a statistic. The difference between a sample
statistic and its corresponding population parameter is called a sampling error.
A set of pairs (X, f), where X is a value of a random variable and f is the frequency of value X
is called a frequency distribution. The set of pairs (X, f ), where X is a value of the random variable
n
of interest, f is the frequency of value X, and n =  f is the total frequency of all values or the
number of all data (all measurements) is called a relative frequency distribution.
Probability of an event E (Classic definition), P(E): P(E) =
Number of favorable outcomes
Number of all possible outcomes
Probability Distribution: The set of pairs (X, P(X)), where X is a value of the random variable of
interest and P(X) is the probability of X.
Binomial probability distribution: A particular experiment has two possible outcomes, usually called
success ( + ) and failure ( – ). The experiment is repeated n times. Then the probability that x
Total 9 topics and 11 pages
© 2010-1 Dr. Min Aung
Basics-SLR_MLR_ANOVA_INDEX
BG 2200 Statistics II
2
successes occur is P(X = x) = nCxx(1–)n–x =
Notes 1
n!
π x (1 - π) n- x , where n is the number of
x!(n - x)!
experiments,  is the probability of a success on each trial. Expectation or mean of the binomial
distribution E(X) = n and the variance 2 = n(1–).
Sampling distribution of sample means for samples of a fixed size is the set of pairs of the
type ( X , P( X )), where X is a sample mean and P( X ) is the probability of sample mean X .
The probability for a continuous random variable X is usually given by means of a continuous
curve, where the area between any two points under the curve indicates the probability of a value
between these two points occurring by chance. The function represented by that curve is called the
probability distribution of the continuous random variable X. The most popular continuous probability
distribution is the normal probability distribution.
The normal probability distribution is important in statistical inference for three distinct
reasons: (1) The measurements obtained in many random experiments are known to follow this
distribution. (2) Normal probabilities can often be used to approximate other probability distributions,
such as the binomial and Poisson distributions. (3) Distributions of such statistics as the sample
mean and sample proportion are normally distributed when the sample size is large, regardless of
the distribution. (See Central Limit Theorem)
Central Limit Theorem: If all samples of a particular size are selected from any population
with mean  and standard deviation , the sampling distribution of the sample means is
approximately a normal distribution with mean  X =  and standard deviation  X =

n
. The
approximation improves with larger samples.
Point estimate for a population mean : Sample mean X
Confidence interval estimate for a population mean: X  z

, where
n
X
is a sample mean,  is
the population standard deviation, n is the sample size (the number of data in the sample). If  is
unknown, use s. If  is unknown and n < 30, use s instead of  and t instead of z. The value of t is
determined by level of confidence (1 – ) and degree of freedom n – 1, Note that the value of t is to
be read in the two-tailed column in the t table.
One-Sample t-Test: Testing a possible value of mean , using the sample mean X , sample
standard deviation s and sample size n, DF = n – 1, t = Formula 14 (the first one).
Independent-Samples t-Test: Testing the value of 1 – 2, DF = n1 + n2 – 2.
( X 1 - X 2 ) -( μ - μ )
1
2 if the two populations have unequal variances,
Use the formula t 
2
2
s1 s 2

n1
Total 9 topics and 11 pages
n2
© 2010-1 Dr. Min Aung
Basics-SLR_MLR_ANOVA_INDEX
BG 2200 Statistics II
or the formula t 
3
( X 1  X 2 )  ( 1   2 )
( n1  1 ) s12  ( n2  1 ) s 22
1
1


n1  n2  2
n1 n2
Notes 1
if the two populations have equal variances.
Topic 1: Simple Linear Regression and Correlation
To find out whether one interval variable depends upon another interval variable and to
construct a linear relationship (a linear equation) between them, one uses Simple Linear Regression
method. In using it, one should follow the following steps.
1. Determine the independent variable and dependent variable.
The variable we want to estimate (compute) is called dependent. If u and v are variables and
we want to find a formula like v = a + bu, then v is called dependent and u is independent.
2. Draw Scatter diagram to study if there is a linear relationship between two variables.
To draw Scatter diagram, draw a horizontal real line and vertical real line meeting at a point.
Write the name of independent variable (predictor) and name of its unit at the horizontal line.
Write the name of dependent variable (predicted variable) and the name of its unit at the vertical
line. For each pair of data, plot a dot with horizontal coordinate equal to the value of the
independent variable and the vertical coordinate equal to the value of the dependent variable.
If all scatter points are on a straight line, we say that X and Y have a perfect linear
relationship. If the scatter points are roughly around a straight line, it is quite reasonable to
construct a simple linear regression formula for estimation purpose.
3. Determine the Pearson coefficient of correlation r (Formula 2) and interpret it.
The scatter points are usually not on a straight line. But, there is a measurement of how close
are the scatter points to a straight line. The standard measurement is called Pearson’s
coefficient of correlation.
Computation of r by a scientific calculator: 1) Choose LR mode. 2) Check the value of n (=
number of data already entered in the calculator). If n is 0, you can enter new data file. 3) If n 
0, there are old data in the calculator and erase them. 4) Enter data. Type a value of X
(independent variable), then comma, and next the correspond value of Y (dependent variable),
and finally enter button.
Until you erase data, the data file will remain in calculator and you can check anytime summary
statistics such as X , X, X2, Xn, Xn–1, Y , Y, Y2, Yn, Yn–1, XY, a, b, r, n by pressing
corresponding buttons of the calculator.
In some calculators like Casio fx3800p, (1) the list of modes and (2) functions of keys are
mentioned on the face. (3) To erase data press Shift and AC or Shift, AC, and =. The second
functions are used by pressing Shift first. For example, Shift 1 = X and shift 9 = r. The third
Total 9 topics and 11 pages
© 2010-1 Dr. Min Aung
Basics-SLR_MLR_ANOVA_INDEX
BG 2200 Statistics II
4
Notes 1
functions are used by pressing Kout or RCL first. For instance, Kout 3 = n. Comma buttons is
marked by XD, YD below the key.
In some calculators like Casio fx350MS, (1) the list of modes and (2) functions of keys are
mentioned in the menu inside. Call them by Shift 1 or Shift 2. (3) To erase data press Shift CLR 1
=.
Enter-data button is usually denoted by DATA or DT below the key.
Interpretation of r: The value of r is always at least –1 and at most 1. If r is either 1 or – 1, all
scatter points are on a straight line. We can say that the scatter points are in a 100% straight line
shape. We can also say that there is a perfect linear relationship between the two variables.
“Perfect” means that all points are exactly on a straight line. If r is 0.9 or – 0.9, we can say that
scatter points are nearly in a straight line shape and the straight line shape they form is 90%
straight. Similarly, if r is either 0.8 or – 0.8, the scatter points are quite close to a straight line, but
not as close as in the case of r being 0.9 or – 0.9. In this case, we can say that, the scatter points
are 80% in the straight line shape. If the scatter points are in a straight line shape at the extent of
50% or above, we should say that the linear relationship between the two variables is strong. The
closer r is to 1 or – 1, is the relationship between the two variables stronger. “Strong” means that
the scatter points are close to a straight line and in that case, if we know the value of one
variable, then we can estimate the value of another variable quite accurately. Otherwise, the
relationship is weak. If r is positive, the relationship is direct: The straight line shape is going
upward to the right or it has a positive slope. “Direct” means that if the value of one variable
increases, the corresponding value of another variable increases. If r is negative, the relationship
is inverse: The straight line shape is falling down to the right or it has a negative slope. “Inverse”
means that if the value of one variable increases, the corresponding value of another variable
decreases.
Thus, we can see the following possible interpretations of the value of r:
There is a strong/weak, positive/negative linear relationship between X and Y.
r = 0 → no linear relationship,
r  0 → There is a linear relationship between “X” and “Y” (If the value of one
variable changes, then the value of another variable changes.)
r =  1 → perfect linear relationship
0 < r < 0.5 → weak, direct / positive linear relationship
– 0.5 < r < 0 → weak, inverse / negative linear relationship
0.5  r < 1 → strong, direct / positive linear relationship
– 1 < r  – 0.5 → strong inverse / negative linear relationship
4. Test the significance of correlation  between X and Y.
“Significant” means “True in the population”.
Two-tailed test: H0:  = 0 (There is no linear relationship between X and Y in the population.)
Total 9 topics and 11 pages
© 2010-1 Dr. Min Aung
Basics-SLR_MLR_ANOVA_INDEX
BG 2200 Statistics II
5
Notes 1
Ha:   0 (There is a linear relationship between X and Y in the population.)
Right-tailed test: H0:   0 (There is no direct/positive linear relationship between X and Y in
the population.)
Ha:  > 0 (There is a direct/positive linear relationship between X and Y in
the population.)
Left-tailed test: H0:   0 (There is no inverse/negative linear relationship between X and Y in
the population.)
Ha:  < 0 (There is an inverse/negative linear relationship between X and Y in
the population.)
Critical value: Read it in t distribution for the given  value in the correct choice of one-tailed
or two-tailed column, and degree of freedom n – 2. Assign – for left-tailed test, + for right-tailed
test,  for two-tailed test.
Test statistic: The original test statistic is r. Change r to standard test statistic t by Formula
10.
Decision Rule: In a right-tailed test, the rejection region is on the right of the critical value. In
a left-tailed test, the rejection region is on the left of the critical value. In a two-tailed test, the
rejection region is on the right of the positive critical value and on the left of the negative critical
value. If the test statistic is in the rejection region, reject H0. Otherwise, do not reject H0.
Conclusion: Conclusion and decision are the same. Rejecting H0 means that we decide H0 to
be false and Ha to be true. Not rejecting H0 means that H0 is true. Decision is the end of the
statistical process. Conclusion is the report to the original questioner. Therefore, conclusion is to
be written in the format of original question.
5. If  is significant, construct the regression or least-squares equation Ŷ = a + bX.
b = Formula 1 (the first one), a = Formula 1 (the second one)
Practically, read the values of a and b in the calculator.
Interpretation of a: If X is 0 unit then estimated Y is a units.
Interpretation of b: For one more unit in X, the estimated Y will be b units more. Or, If X
increases by 1 unit, Y is estimated to increase by b units. If b is negative (b = – c), then we say,
for one more unit in X, the estimated Y will be c units less. Or, If X increases by 1 unit, Y is
estimated to decrease by b units.
Least-Squares Concept: The least-squares equation produces estimated Y values ( Ŷ )
which are closest to all sample values of Y in total. Therefore, Regression line (graph of the
regression equation) is the straight line which has the least sum of squares of vertical distances
n

to the scatter points. It has minimum Sum of Squared Error SSE = ∑( Yi Yi ) 2 . Since it has
i 1
minimum SSE, it also has minimum Mean of Squared Errors MSE among all straight lines, since
MSE is obtained by dividing SSE by the fixed number n – 2. Hence, it has minimum Se, too, since
Se is obtained by taking square root of MSE.
Total 9 topics and 11 pages
© 2010-1 Dr. Min Aung
Basics-SLR_MLR_ANOVA_INDEX
BG 2200 Statistics II
6
Notes 1
6. Determine the standard error of the estimate Se ( Formula 7 ): It is the standard deviation of the Y
values around the estimated Y value for any fixed value of X. It is used to construct confidenceinterval estimates for the Y values for any fixed value of X.
7. Determine the standard error of the slope Sb ( Formula 4 ): It is the standard deviation of the
distribution of sample slopes. Sb is used to construct confidence-interval estimate for the
population slope .
8. Construct confidence interval for slope or regression coefficient β: b  tSb, DF = n – 2, twotailed.
9. Construct ANOVA table.
Construction
Regression
Error or
Residual
Total
SS
SSR (Numerator in
Formula 9)
SSE = SST – SSR
SST (Denominator in
Formula 9)
DF
1
MS
F
MSR = SSR  1
F = MSR  MSE
n–2
MSE = SSE  (n – 2)
n–1
Concept
n
SST is the total squared deviation of sample values Y from Y (mean Y):  ( Yi  Y ) 2
i1
SST measures the total difference between Y values and Y . If we use knowledge on variable Y
alone, when we want to estimate (guess) we have to use Y .In that case in the sample at hand, we
will make the total amount of error SST.
n
SSR is the total squared deviation of Ŷ (regression estimates) from Y (mean Y):  ( Ŷi  Y ) 2
i1
n
SSE is the total squared deviation of sample values Y from Ŷ (estimated Y values):  ( Yi  Ŷi ) 2 .
i1
SSE measures the total difference between actual values of Y and the values Ŷ estimated by the
regression equation. Therefore, the smaller the value of SSE is, is the regression equation better. As
SSR = SST – SSE by Formula 3, SSR will be large if SSE is small. Therefore, the larger the value of
SSR is, is the regression equation better. SSR is the total variation removed by the regression
equation or by the independent variables. MSR is the average variation removed by one
independent variable. Hence, SSR divided by 1 is MSR as there is only independent variable.
SSE is the total variation made at n sample points. On the average, at a point, the error of
estimated Y (difference between Y and Ŷ is MSE = SSE  (n–2). In fact, SSE must be divided by n.
However, the probability theory suggests using n– 2 instead of n for using 2 variables in samples.
Total 9 topics and 11 pages
© 2010-1 Dr. Min Aung
Basics-SLR_MLR_ANOVA_INDEX
BG 2200 Statistics II
7
Notes 1
Now, we can see that the bigger MSR is, is the regression estimates the better. And, the smaller
MSE is, is the regression estimates the better. As F is the ratio of MSR and MSE, the larger F value is,
is the regression estimates the better.
SSR
, and Se = MSE ( Formula 7 )
SST
10. Coefficient of determination r2: It is another measurement of the strength of relationship. It is
SSR
square of r: r2 =
. It is the proportion of squared deviation removed by the regression
SST
equation. Hence, we have the following interpretation of r2: 100  r2 % of the variation in Y is
explained by the regression (or by the variation in X).
b
11. Test of significance of the regression equation with β: t, n – 2, (The same CV, TS as  test)
Sb
There are three important values in ANOVA table. F, r2 =
This test can be used instead of the  test mentioned in paragraph 4. The two tests are exactly
the same here, but not the same if we use more than one independent variables (in the next
topic).
The following are interpretations of slope  in the unknown regression equation Ŷ = a + bX of
the population:
  0: Y depends upon X. ( If X changes, then Y changes. )
 > 0: Y depends upon X directly / positively. ( If X increases, then Y increases. )
 < 0: Y depends upon X inversely / negatively. ( If X increases, then Y decreases. )
Topic 2: Multiple Linear Regression and Correlation
Multiple linear regression is the generalization of simple linear regression to more than one
independent variables. Two-value (0, 1) dummy variables can also be included. The number of
independent variables is denoted by k. Total sample size (the number of paired data) is denoted by
n. Therefore, the regression equation in this case is of the form Ŷ = a + b1X1 + b2X2 + … + bkXk. A
procedure of using Multiple Linear Regression is as follows.
1. Study Pearson’s coefficient of correlation between dependent variable Y and all independent
variables Xi. Test the significance of correlation between Y and each Xi. Drop the variables Xi
which are insignificant for Y. After selecting independent variables which are significantly
correlated with Y, check whether any multicollinearity problem exists. If two independent
variables are changing in a straight line, we say that there is a multicollinearity. A multicollinearity
problem exists when there is a pair of two independent variables Xi and Xj which are too strongly
correlated. Usually, if the Pearson coefficient of correlation r between Xi and Xj is stronger than 
0.7 then we decide that the multicollinearity problem occurs between Xi and Xj If multicollinearity
problem occurs between a pair of significant independent variables, drop one of these two
independent variables which has weaker r with Y.
Total 9 topics and 11 pages
© 2010-1 Dr. Min Aung
Basics-SLR_MLR_ANOVA_INDEX
BG 2200 Statistics II
8
Notes 1
2. Construct regression equation for Y on the selected variables X1, X2, …, Xk. Conduct the global
test of significance of the regression equation:
H0: β1 = β2 = … = βk = 0
Ha: Not all βi’s are 0. (Or) At least one βi  0.
If p-value for F statistic in ANOVA table is less than , reject H0 and write the conclusion: “
The regression model is significant as a whole ”. If the p-value is not less than , do not reject H0
and write the conclusion: “The regression model is not significant”. The critical value is found in F
distribution table at column (degree of freedom for numerator) k and row (degree of freedom for
denominator) n – k – 1. It is always a right-tailed test.
3. If the regression model is significant as a whole, conduct the individual test of significance of
each independent variable Xi:
H0: βi = 0,
Ha: βi  0.
If p-value for t test statistic in the coefficient table (written sig. which is two-tailed) is less than
, reject H0 and write the conclusion: “ Xi is a significant explanatory variable for Y ”. Otherwise,
b
write“ Xi is not a significant explanatory variable for Y. ” Test statistic t = i , DF = n – k – 1.
S bi
4. Drop insignificant independent variables.
5. Rerun SPSS. Confirm that Global test and individual tests show significance of the regression
equation and each independent variable. Use the regression equation to estimate. Read the
point estimate for Y, confidence interval estimate for mean Y and prediction interval for individual
Y in the data viewer.
6. Interpret the values of a, bi, R and R2.
a: If all Xi are 0 unit then estimated Y is a units.
bi: Holding other independent variables constants, for one more unit in Xi, estimated Y will be bi
units more. (Or) Holding other independent variables constants, if Xi increases by 1 unit, Y is
estimated to increase by bi units.
R : There is a strong/weak linear relationship between X1, X2, …, Xk and Y. (No sign)
R = 0: no linear relationship,
R = 1: perfect linear relationship
0 < R < 0.5: weak linear relationship
0.5  R < 1: strong linear relationship
R2: 100R2% of the variation in Y is explained by the regression (by the variations in X1, X2, …,
Xk).
7. Construct a confidence interval for βi: bi t S b i , n – k – 1, two-tailed.
Total 9 topics and 11 pages
© 2010-1 Dr. Min Aung
Basics-SLR_MLR_ANOVA_INDEX
BG 2200 Statistics II
9
Notes 1
Topic 3: ANOVA (Analysis of variance)
1. Variance Test (Comparing two population variances):
H0:  12 =  22
Ha:  12   22
  22
 12 >  22
 12
 12
 12
  22
<  22
Get critical value from F table at column (degree of freedom for numerator) n1 – 1 and row
(degree of freedom for denominator) n2 – 1. Divide  by 2 for two-tailed test. Compute Test statistic
(Formula 15).
Notice: In computing test statistic, always put the larger variance above and the smaller variance
below. The size of the sample with larger variance minus 1 is the degrees of freedom for numerator
and the size of the sample with smaller variance minus 1 is the degrees of freedom for denominator
The signs of Critical value and Test statistic are always positive. It is a right-tailed test.
2. One-Way ANOVA (Comparing more than two population means)
There are two variables. One variable is of interval level (numbers) and another is of nominal
level (names) with k values. It is a right-tailed test.
H0: 1 = 2 = … = k
Ha: Not all k population mean “name of the variable” are equal.
Get critical value from F table at column k – 1 (df for numerator = k – 1) and row n – k (df
for denominator = n – k). Construct ANOVA table to get test statistic F. This is a right-tailed test.
How to construct ANOVA table
Using SD mode fill the following summary statistics table.
Sample 1
Sample 2
Sample 3
Total
Xi2
Ti = Xi
ni
N = ni =
Fill the following ANOVA table. Use Formula 12 to compute SST (Sum Squares Total),
Formula 13 to compute SSB (Sum Squares Between), and Formula 3 (the first one) to compute
SSW (Sum Squares Within).
SS
SSB
DF
k–1
MS
F
MSB = SSB  (k – 1)
F = MSB  MSW
Within
SSW = SST – SSB
n–k
MSW = SSW  (n – k)
Total
SST
n–1
Between
Idea of the test
Total 9 topics and 11 pages
© 2010-1 Dr. Min Aung
Basics-SLR_MLR_ANOVA_INDEX
BG 2200 Statistics II
10
Notes 1
We here use the following notation: X i denotes the mean X of sample i. X denotes the grand
mean which is the mean of all N data and at the same time the mean of all k sample means
weighted with corresponding sample sizes.
k ni
SST is the total variation of all data: SST = ∑∑( X ij _ X ) 2 . It is the total squared difference
i =1 j =1
between N data and their mean X . To find the average squared difference called variance, we
have to divide SST by N – 1 (instead of N, as suggested by probability theory). But, we do not
need the variance here.
SSB is the total variation between sample means and their weighted mean X : SSB =
k
2
∑ni ( X i _ X ) . It is the total variation between k sample means and their mean, so the degree of
i =1
freedom is k – 1. If we divide SSB by k – 1, we get MSB (Mean Squares Between).
k ni
SSW is the sum of total variations (SSTs) in k samples: SSW = ∑∑( X ij _ X i ) 2 . The degree of
i =1 j =1
freedom is ni – 1 in each sample i, so the total is N – k. Dividing SSW by N – k, we get MSW
(Mean Squares Within).
Now, it can be seen that, if MSB is large, the difference among k sample means is big, and if
MSW is small, the data are close to each other in each sample. Therefore, if F = MSB  MSW is
big, there is a great difference among the sample means, while values of the test variable are not
so different in each sample. Hence, a large value of F is a strong evidences for that population
means are different.
SPSS contains, inside ANOVA, several statistical tests which can be used to see which pairs
of means are significantly different. Scheffe test is one of them.
SPSS Scheffe interpretation: Two population means are unequal if the corresponding
sample means are found in different columns and not in the same column. For each unequal
pair of means, the population mean with larger sample mean will be larger.
Topic 4: Index Numbers
In this topic, we study the basic idea about an economic measurement of index and the nine
most widely used types of indexes.
a
Index is percent: I = ×100 . But, it is customary not to write the percentage sign “ % “.
b
Index compares current value and base-period value of the same variable.
The nine indexes should be studied in three steps: (1) Three basic indexes, (2) Simple or
unweighted, (3) Weighted.
(1) Three Basic Indexes: Price, Quantity, and Value
p
Price index compares prices: I p = t ×100 (0 = Base Period, t = Current period)
p0
Total 9 topics and 11 pages
© 2010-1 Dr. Min Aung
Basics-SLR_MLR_ANOVA_INDEX
BG 2200 Statistics II
11
Quantity index compares quantities: I q =
Notes 1
qt
×100
q0
Value index compares values, Value = Price  Quantity (v = pq ): Iv =
vt
pq
×100 = t t ×100
v0
p0 q 0
(2) Simple or Unweighted: Aggregates and Average
Aggregates = Comparing group prices
Average = Average of price indexes for all items
(3) Weighted: Aggregates (Laspeyres, Paasche, and others), and Average
Weighted Aggregates
Unweighted or Simple Aggregates
Weighted aggregates price index:
 Unweighted aggregates price index:


w
p
p
∑p t w
×100
∑p 0 w
∑p t ×100
∑p 0
Special cases*: 1) w = q0 → Laspeyres
2) w = qt → Paasche
Unweighted or Simple Average
 Unweighted average of relatives price index:
∑
Weighted Average
 Weighted average of relatives price index*:
∑Iw
∑w
w
Price Indexes
N
∑I p
N
p
p0
N
p
p0
=
×100
N
∑ t ×100 ∑ t
=
∑I p w
=
∑w
p
p0
Price Indexes Ip
p
p0
=
×100
∑w
∑ t w ×100 ∑ t w
∑w
How to compute: Write down the formula. Enter values. Check the values again. Use simple stepby-step computation and write down every step. Show two decimal places in the answer.
Index is percent (Don’t show % sign)
( Ip, Iq, Iv )
Unweighted Aggregate ( )
Unweighted Average (
Weighted Aggregate ( , w )
Weighted Average (
∑
N
)
∑Iw
)
∑w
( Laspeyres: w = q0 )
( Paasche: w = qt )
SPSS Commands: Topics 1 & 2: Graphs – Scatter, Analyze – Correlate – Bivariate (Pearson),
Analyze – Regression – Linear (Statistics and Save), Topic 3: Analyze – Compare Means – One-way
ANOVA (Post-Hoc – Scheffe)
Total 9 topics and 11 pages
© 2010-1 Dr. Min Aung
Basics-SLR_MLR_ANOVA_INDEX