Download Chapter 3 Survey Methods

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Department of Business Administrative IVE-HW
QT1-Exam revision note
Quantitative Techniques 1 2004-2005
Revision Note
(1St Term)
Chapter 1: Introduction to Statistic
Statistics is the science of:
Collecting(收集), Organizing(組織), Presenting(陳述), Analyzing(分析),
Interpreting(解釋)
Purpose of statistic:
Decision making, Decision understanding and Prediction
Descriptive Statistics (敘述統計學):
Methods of organizing, summarizing, and presenting data in an
informative way.
Purpose: To present data by tables, charts, statistical measures
Inferential Statistics (推論統計學 ):
Draw inferences about a population from a sample
Make statements about the population's characteristics from the information
contained the sample
Purpose: Make Decisions About Population Characteristics which Involves
estimation and hypothesis testing
Data Source:
Data Type:
Key words to remember:
Population - set or collection of items of interest
Population Parameters/Parameters - Numerical characteristics of population
Sample - Portion of Population which are selected from the population
Numerical characteristics of samples - Referred as Sample Statistics, or Simply Statistics
Data Set (Data Files) - Collection of data which organized to facilitate data analysis
Item - Any entity of interest (e.g. person, company …). In practice, item always refer as elements, objects, or units
Data value - A measurement of a variable. Data set may include several variables. See the example in lecture notes
Observation - Make up by the data values of a single item from all of the variables. Referred as a case, a row or a record in the practice of statistics. Refer to
example in lecture notes
Qualitative or Categorical variable - Can be classified into single category, Nonnumeric. E.g. Good, Bad, Average
Quantitative variable - Measured on a numerical scale. E.g. Age, Weight, Height
Experimental Studies - Investigator directly controls/determines which subjects or experimental items or materials receive treatments that are thought to
affect variables of interest.
Observational Studies - We test the variables of interest by using observed or historical data. We do not directly control/determine which subjects/items
receive treatment that is thought to affect the variables of interest in the study
Level of measurement:
1. Nominal level (grouping) - Lowest level of measure. Any data that
you may have can be grouped or categorized in some way. E.g: [Man;
Woman].
2. Ordinal Level (Grouping & Ranking) – The measure shows the
information about order. No exact value is presented. E.g.: [Superior; Good]
3. Interval level: Includes the exact distance between measures, but
never contain a zero ( 0 ) as a starting point. No exact value is
presented. E.g. [A > B by 1]
4. Ratio level: Data has a meaningful zero point. The ratio of two data values
is meaningful. Exact value is presented. E.g. [John’s height is 5”8]
Chapter 2: Sampling Methods
Reasons for not using the whole population: Time Consuming, Costly, Inefficiency, Destruction of the data nature and the full membership is unknown
Sample design - To obtain a representative sample from the population.
Survey method - The study for collecting useful information from the selected sample.
Representative sample - contains the relevant characteristics of the population in the same proportion
Probability Sampling:
Characteristics:
 Allows each possible item to have a known and equal probability of being included in the sample.
 The selection of one item does not affect the chance of any other item being selected.
Random Sampling:
Use of random number table: Refer to lecture note.
Systematic Random Sampling:
Elements are arranged in some ways, a starting point is randomly selected (by the random number table), and then other elements are selected from the
Date: 16 Dec, 2004
Edited by: Jacky Wong
Page:1
Department of Business Administrative IVE-HW
QT1-Exam revision note
population at an uniform interval.
Adv: Spread more evenly over the entire population
Disadv: Possible presence of hidden periodicity
Stratified Random Sampling:
The population is divided into numbers of non-overlapping homogeneous groups, called Strata. Elements with similar characteristics are grouped
together, called homogeneous group.
Proportional: the number of items selected from each stratum be in the same proportion as in the population
Mean = X 1  X 2  ...  X n
n
Non-Proportional: Equal numbers of elements are selected from each stratum, and give weight to the results according to the stratum's proportion of
the whole population.
Mean = N1 X 1  ...  N n X n
N1  ...  N n
Adv: Stratified random sampling are used to get rid of bias in sampling
Cluster Random Sampling:
The total area of interest (population) is divided into numbers of small, non-overlapping blocks (or clusters) a number of these blocks (clusters) are then
randomly selected for inclusion in the overall sample assume that these individual blocks are representative of the population as whole.
Adv: Cluster Sampling are usually not as reliable as estimates based on Simple Random Sampling of the same size, they are usually more reliable per
unit cost
Comparison between Stratified and cluster sampling:
Stratified random sampling: small variation within group, but wide variation between the groups.
Cluster random sampling: considerable variation within each group, but groups similar to each other
Non-Probability Sampling:
Characteristics:
Used primarily as a matter of convenience, it may produce quite accurate estimates of population parameters, but the drawback is that since the sample
is not chosen using probability methods, there is no valid way of determining of the resulting estimates.
Judgement Sampling:
Personal judgement plays a significant role in their selection. E.g. Testing markets for new products
Quota Sampling:
Interviewers are simply given quotas to be filled. Once the quota is set, interviewers are granted flexibility in the choice of sample members.
Bias samples: Occurs when we chose unsuitable sampling method, or collect insufficient number of samples.
Chapter 3: Survey Methods
Primary Data - Data that are used for the specified purpose for which they are collected
Secondary Data - Data that are being used for some purpose other than that for which they were originally collected
Internal Data - Data are generated from the activities within a firm.
External Data - Data are obtained from sources outside the firm.
Different kinds of survey methods:
Direct Observation:
Observe a phenomenon with your own eyes. It is concerned with what
people do rather than why they do it. Provide accurate data, free of biases
introduced by Interviewers. Useful in continuously collecting data about
routine consumer behaviour
Adv:
1. Actual actions or habits of person are observed.
2. Applicable when it is undesirable for people to know an experiment is
taking place.
3. Provides one of the most reliable methods of data collection.
Disadv:
1. Result of observation depend on the skill of the observe
2. Opinions and attitudes cannot be obtained by observation
3. Some forms of behaviour cannot be obtained by 'one-time' observation
4. Expensive to tie up personnel
Date: 16 Dec, 2004
Interview:
Adv:
1. Generates very rich data sources, both quantitatively and
qualitatively
2. Normally achieves a high response rate
3. May assess the person being interviewed in terms of age and social
class, and even sometimes assess the accuracy of the information
given
Disadv:
1. Probably the most expensive
2. Interviewers must also be well trained
3. People may not like to give embarrassing information
4. Some types of people are more difficult to locate and interview
Edited by: Jacky Wong
Page:2
Department of Business Administrative IVE-HW
QT1-Exam revision note
Phone Interview:
Adv:
1. Speed and relative economy only a limited amount of information is
required
2. Computer-assisted telephone interviewing - increases data input
accuracy and saves on labour costs
Disadv:
1. Refusing to answer questions is easier
2. Time may be wasted in phoning people who are not in home
Postal Questionnaires:
Adv:
1. Speed and the cost
2. No interviewer bias
3. Respondent has enough time to consult
Disadv:
1. Design of questionnaires requires great care
2. Poor response rate and incomplete or wrongly completed forms
3. Spontaneous answers cannot be collected
4. "Wrong" person may complete the questionnaire
Questionnaires
Brevity - The questionnaire should be as brief as possible.
Simplicity - A complicated form may well conceal the real point of the questionnaires. It is not necessary to use four or five words when one would
suffice.
Ambiguity -The respondent must be in no doubt as to what a question means. E.g. Have you ever been involved in an accident in the past?
Leading Questions - It is unwise to lead the respondent to provide a certain response to a question you have posed. E.g. Responsible jewellers always
use the machine guards, do you use guards?
Personal Questions - Avoid the use of personal questions unless they are absolutely necessary
Important points for survey studies:
 Decide your objectives, your target interviewee and questions.
 Try to use closed-ended answers or multiple choice.
 Try to collect personal data at the end of your questionnaire.
 “All information is for statistical purpose only”
 People are lazy to think and write, find out all answers as possible (or the most common ones) and set them out
 Identify yourself before you talk to your target interviewees
 If the necessary information is either already available or impossible to obtain, there is no point in carrying out the survey
 Is the relevant population available?
 There is no unique way to go about providing the "best" sampling scheme
 The investigator will want to obtain answers from as high a proportion as possible of the sample members.
 Collect answers that are as accurate and as honest as possible. There is an art in designing questions.
Respond rate could be improved by including: covering letter, post-paid envelope and gift
Sampling Error: Resulting from the fact that information is available on only a subset of all the population members
Non-Sampling Error: unconnected with the kind of sampling procedure used
Reasons for Sampling Error:
1. The population sampled is not the relevant one
2. Survey subjects may give inaccurate or dishonest answers, in the worst case non-response
Action for non-response:
 Use a good approach to conduct the survey.
 The characteristics of respondents and non-respondents should be compared, in such matters as age, sex, and race, to see if there are obvious
differences between two groups.
 Try to contact non-respondents, some of who may well be prepared to provide answers to a few key questions.
Chapter 4: Graphical Presentation
Understanding of the following diagrams:
 Scatter diagram - provides insights into the nature of the relationship between the two variables.
 Line Chart - shows the magnitudes/trends for two quantitative variables or for one variable over time.
 Bar chart - shows the magnitude of data for different qualitative categories.
 Grouped bar chart - shows the magnitudes of two or more grouped data items for different qualitative categories or over time.
 Multiple Bars - a number of single bars superimposed on top of each other.
 Component Bar Charts - Different shading is used to distinguish one set of bars from another.
 Combination Charts – use both lines and bars to show the magnitudes of two or more data values.
 Pie Chart - show the proportion or percentages of a total quantity.
 Exploded Pies – a pie chart that has one or more segments slightly removed.
 Three-dimensional Pie - Using 3D in an exploded pie makes the picture much more eye-catching.
 Comparative Pies - compare relative proportions at two different times.
Date: 16 Dec, 2004
Edited by: Jacky Wong
Page:3
Department of Business Administrative IVE-HW
QT1-Exam revision note
Characteristic of different chart:
1.
To show the relative sizes of data: Bar Chart
2.
To show the proportional sizes: Pie Chart
3.
To show the change in data over time: Line Chart
4.
For casual reader: Pictorial Charts
Chapter 5: Frequency Distributions
Frequency distribution (or frequency table):
A table summary of a set of data that shows the frequency or number of data
items that fall in each of several distinct classes
Cumulative Frequency Distributions:
It enables us to see how many observations lie below or above certain
values
Relative frequency distribution:
Expressed the frequency as a fraction or a percentage of the total number of
observations. The sum of all the relative frequencies equals to 1.00 or
100%.
Cumulative Relative Frequency Distributions:
It enables us to see what is the cumulative fractions or percentages of
observations lie below or above certain values, rather than recording
the percentages of items within intervals.
Quantitative class: class that can be measured on a numerical scale (e.g. Height).
Qualitative class: class that classifies information according to qualitative characteristic (e.g. feelings).
Open-ended class: Consists of either the upper or the lower end of a quantitative.
Close-ended class: Consists of BOTH the upper or the lower end of a quantitative.
Discrete Class: Separate entities that progress from one class to the next with a break.
Continuous Class: Progress from one class to the next without break.
Stem-and-leaf display:
Use “leading digits” and “trailing digits” to separate data. Both can be single digit or multi digits. (You MUST be able to draw the stem-and-leaf diagram!).
Leafs are NOT sort in order
Revised Stem-and-leaf Display: Leafs are sorted in order.
Stem-and-leaf display will show the following information:
The shape of data distribution, the maximum and minimum values, the central tendency and dispersion and the actual data value.
Class Limits: Lower and upper values of the classes (e.g. 5-10)
Class Boundaries: Lower and upper values mark as the common points between classes (e.g. 4.5-10.5)
Class Width/Interval: Upper class boundaries - Lower class boundaries
Class Mid-points: Midway between upper and lower class boundaries
Approximate number of classes: 1 + 3.322 log(number of data)
Approximate class width/interval: (Largest value – smallest value) / No.of classes
**Please remind that:
1.
If raw data are grouped into classes, a certain amount of information is lost, since no distinction is made between observations falling in the
same class.
2.
The larger the class interval is, the greater is the amount of information lost.
3.
The smaller the class interval is, the little is the amount of information lost.
4.
If the class interval is too small, the small irregularities in the histogram merely reflect the accidents of sampling.
Histogram: For unequal class interval, the area of the bar over a class interval must be proportional to the frequency of the class.
Frequency Polygon: Plotting the class frequencies versus the class mid-points. The polygon should touch the horizontal axis at both ends of the
distribution.
Ogive: For cumulative frequency distribution, should have “Less Than” or “More Than”
Chapter 6: Descriptive Statistics
Know how to find the followings parameters for both Grouped/Ungrouped data:
Parameters:
Advantages
Disadvantages
Mean/Weight Mean:
It is calculated by summing all the observations
in a batch of data and then dividing the total by
the number of items involved.




One number representing a whole data set
Each data set has one and only one mean
Every observation is taken into account
Useful as comparing the means from several
data sets
 Affected by extreme values
 Takes time to compute
 Cannot compute a mean value with
open-ended class
Median:
Middle value in an ordered sequence of data
 Extreme values do not affect the median
 Easy to understand and can be calculated
from any kind of data
 Able to find the median even data are
qualitative descriptions
 More complex
 time-consuming for any data set with a
large number of elements
Date: 16 Dec, 2004
Edited by: Jacky Wong
Page:4
Department of Business Administrative IVE-HW
QT1-Exam revision note
Mode:
A measure of central tendency. The value that
is repeated most often in the data set
“Bimodal Distribution”- a data set contain two
mode.
 Used as a central location for qualitative as
well as quantitative data.
 Not affected by extreme values
 Can be used even when one or more of the
classes are open-ended
 Not used as often
- no modal value
- every value is the mode, is useless
measure
 Difficult to interpret and compare
 Grouped data cannot reflect the mode
Range:
The difference between the largest and smallest
values
 The range is easy to understand and to
calculate.
 Ignores the nature of the variation among
all other observations, it is heavily
influenced by extreme values.
 Open-ended distributions have no range
 The range is less stable of measures.
 As the number of observations is
increased, the range generally tends to
become larger
Midrange:
 The range is easy to understand and to
calculate.
 It is heavily influenced by extreme values.
 Open-ended
distributions
have
no
midrange because no “highest” or “lowest”
value exists in the open-ended class.
 The midrange is less stable of measures.
for example, in repeated samples taken
from the some sources, the midrange will
exhibit more variation from sample to
sample than the other measures.
Midhinge:
 Ignore extreme values by using only the
middle half of the data. Thus distinct
advantages over the range, which is affected
by the extreme values.
 Like the midrange, the midhinge is based
on only two values from the data set
Mean Absolute Deviation (MAD):
 Takes every observation into account.
 It weights each item equally
 It is difficult to use in the mathematical
operations
Standard Deviation (SD):
 It takes into account every observation in the
data set
 Not as easy to calculate as the Range.
 Cannot be computed from open-ended
distributions.
 Extreme values in the data set distort the
value of the standard deviation, although
to a lesser extent than they do the Range
Interquartile Range:
Measures approximately how far from the
median we must go either side before we can
include one half of the values of the data set
 Ignore extreme values by using only the
middle half of the data
 More complicated to calculate than the
range
 Based on only two values from the data
set
Coefficient of Variation:
Relative measure of dispersion, expressed as a
percentage rather than in terms of the units of
the particular data
Useful when comparing the variability of two or
more batches of data that are expressed in
different units of measurement
 Ignore extreme values by using only the
middle half of the data
Coefficient of Skewness < 0, negatively or left skewed.
Coefficient of Skewness > 0, positively or right skewed
Quartiles:
First quartile, Q1:
 25% of the observations are smaller and
 75% of the observations are larger
 Q1 = value corresponding to the (N+1)/4 th observation
Second quartile, Q2:
 50% of the observations are smaller and
 50% of the observations are larger
 Q2 = median = the value corresponding to the (N+1)/2 th observation
Third quartile, Q3:
 75% of the observations are smaller and
 25% of the observations are larger
 Q3 = value corresponding the 3(N+1)/4 th observation
Outlier: is defined as a value that is more than 1.5 times the interquartile range larger than Q 3 or smaller than Q1.
(Q1 – 1.5 interquartile range) < Outliers < (Q3 + 1.5 interquartile range)
Date: 16 Dec, 2004
Edited by: Jacky Wong
Page:5
Department of Business Administrative IVE-HW
QT1-Exam revision note
Box-and-Whisker Plot (Known as five-number-summary): (Outliers will not be put in this plot)
Two important theories about standard deviation:
Chebyshev’s Theorem
Empirical Rule
No matter what the shape of the distribution:
 The interval ( ± 2) will contain at least 75 % of the
measurements.
 The interval ( ± 3) will contain al least 89 % of the
measurements.
 The interval ( ± 4) will contain al least 94 % of the
measurements.
Given a Symmetrical and Bell-Shaped distribution:
 The interval ( ±  ) will contain approximately 68 % (68.26 %) of
the measurements.
 The interval ( ± 2 ) will contain approximately 95 % (95.44 %)of
the measurements.
 The interval ( ± 3 ) will contain all or almost all (99.73 %) of the
measurements
Chapter 7: Basic Probability
Basic concept:
Complement
Union (A  B)
Intersection
Mutually Exclusive
(A  B)
Mutually Exclusive &
Collectively Exhaustive
Three approaches for probability study:

Classical Approach:
Probability of an event = Number of outcomes favourable to occurrence of the event / Total number of possible outcomes (E.g. toss coin/dice)

Relative Frequency Approach (Empirical Concept):
Probability of an event = The proportion of times an event occurs in the long run under uniform condition (E.g. Statistic of a ball game)

Subjective Approach:
Probability of an event = The degree of belief or degree of confidence placed in the occurrence of the event by a particular individual based on the
evidence available. (E.g. A judge is deciding whether to allow the construction of a nuclear power plant)
Counting Rules

If there are k1 mutually exclusive and collectively exhaustive events on the first trial, k2 events on the second trial, ..., and a kn events on the nth
trial, then the number of possible outcomes is: (k1)(k2) … (kn)

Factorial: n! = n (n-1) (n-2) … (2) (1) ; 0! Is defined as 1

Permutations: nPr = n! / (n-r)!
Combinations: nCr = n! / r! (n-r)!
Given 4 students, Peter, John, Sue and Mary. Three students are randomly selected from them
What is number of arrangement (Order is concerned)? Ans: 4P3 = 24
What is number of combination (Order is NOT concerned)? Ans: 4C3 = 4
Probability rules:
0  P(A)  1
For any event A
P(S) = 1
S is the sample space
P(A) + P(A’) = 1 or, P(A’) = 1 - P(A)
For any event A
Addition Rule:
P(A or B) = P(AB) = P(A) + P(B) - P(AB)
for A & B are not mutually exclusive
Date: 16 Dec, 2004
P(A or B) = P(AB) = P(A) + P(B)
for A & B are mutually exclusive
Edited by: Jacky Wong
Page:6
Department of Business Administrative IVE-HW
QT1-Exam revision note
Conditional Probability
P(A|B) means the probability that event A will occur, given the condition that the event B has occurred, or simply the probability of A given B.
Formula for conditional probability:
1.
P(A|B) = P(AB) / P(B)
2.
P(AB) = P(A|B) X P(B)
3.
P(B) = P(AB) / P(A|B)
Independent events:
E and F are independent events if P(E|F) = P(E) or P(F|E) = P(F)
Multiplication Rule: (By formula 2)
For Dependent event:
For Independent event:
P (A and B) = P(AB) = P(A|B) P(B) = P(B) P(A|B)
P( A and B)
OR
= P(AB) = P(A|B) P(B) = P(A) P(B)
P (A and B) = P(AB) = P(B|A) P(A) = P(A) P(B|A)
= P(BA) = P(B|A) P(A) = P(B) P(A) = P(A) P(B)
Law of Total Probabilities:
Suppose that the sample space S consists n mutually exclusive and collectively exhaustive events, B1, B2, ..., Bn , then the probability of any event A,
consists of the joint probability of event A occurring with event B1, and the joint probability of event A occurring with event B2, and up to the joint
probability of event A occurring with event Bn.
P(A) = P(AB1) + P(AB2) + ... + P(ABn) = P(A|B1) P(B1) + P(A|B2) P(B2) + … + P(A|Bn) P(Bn)
Bayes’ Theorem:
Chapter 8: Probability Contribution
A probability distribution: a specification (in a form of graph, a table or a function) of the probability associated with each value of the random variable.
Probability Mass Function (p.m.f.): A probability distribution involving only discrete value of x
Cumulative Mass Function (c.m.f.): The sum of values of the probability mass function for all values of the random variable x that are less than or equal to
x.
Expected value of X: E[X] =  x p(x) for all x
The variance of a discrete random variable: V[X] = 2 = E[ (x-)2 ] =  (x-)2 p(x), S.D  =  V[X]
Binomial Distribution:
Conditions:

Each observation can be classified as one of two mutually exclusive events. (i.e. success or failure)

The probability for the two possible outcomes must be constant from observation to observation.

The result of any observation is independent to the result of any other observations.
P(x successes in n trials) = P(X = x \ n, p) = nCx px qn-x, Notation =>X  B ( n , p ) or b(x : n, p)
Mean() = Expected value = E[X] = np
Variance = V[X] = 2 = npq, S.D =  npq
Poisson Distribution
Determine the probability of x occurrence per unit time. Only parameter is the mean rate lambda ( ).
Four basic assumptions:

Possible to divide time interval of interest into many sub-intervals.

Probability of an occurrence remains constant through the time interval.

Probability of two or more occurrences in a sub-interval is small enough to be ignored.

Independent of occurrences.
Date: 16 Dec, 2004
Edited by: Jacky Wong
Page:7
Department of Business Administrative IVE-HW
QT1-Exam revision note
Mean =  = 
Variance = 2 = , S.D = 
General formula:
Mean =  = t
Variance = 2 = t, S.D = t
Poisson Approximation to Binomial Distribution
Necessary condition: n is large, normally greater than 100, and p is small, preferably close to zero.
If the condition is holds we can approximate the binomial distribution by poisson distribution using,  =  = np
Normal Distribution
The curve is completely symmetrical about the mean
Two parameters describe the Normal Distribution,  representing the mean, and  representing the standard deviation.
Notation: X  N(, 2) or n(x: , 2)
Standard Normal Distribution:  = 0,  = 1
We are able to transform all the observations of any normal random variable X to a new set of observation of a normal random variable Z with mean 0
and variance 1. By the transformation: Z = (X - )/
Normal Approximation to the Binomial Distribution: condition => np  5 and nq  5= np,  = np,  = npq
Correctional factor:
Normal Approximation to the Poisson Distribution: condition =>   5,  = ,  = 
Chapter 9: Linear Regression & Correlation Analysis
Liner regression: Concentrated on describing the nature of the relationship between two variables Understanding of the dependent (Y) and independent
variables (X)
The regression equation:
**Remarks: The estimated regression equation is valid only over the same range as the one from which the sample was taken initially.
Date: 16 Dec, 2004
Edited by: Jacky Wong
Even there is a
Page:8
Department of Business Administrative IVE-HW
QT1-Exam revision note
relationship between X and Y, it does not imply X causes Y.
The Standard Error of the Estimate:
SEE:
Linear Correlation: Determine the strength of the linear relationship between these variables.
Correlation Coefficient:
Coefficient of Determination:
 r must range from -1 to +1.
 Negative values corresponding to lines with negative slopes
 Positive values corresponding to lines with positive slopes
 If r  0.7, then a strong linear relationship can be
concluded, otherwise weak relationship is concluded.
 r2, is the percentage of data variation
explained by the regression equation.
 If r2  0.5, it means at least 50% of data
variation is explained by the estimated
regression line. The regression equation is
concluded to be good-fit for the sample
data.
Spearman’s Coefficient of Rank Correlation
Rank Correlation Coefficient, rs, measures the degree of correlation that exists between two sets of ranks rather than their actual numerical values.
To calculate rs:
1. Rank the X’s among themselves, giving rank 1 to the largest (or smallest), rank 2 to the second largest (or smallest), and so on.
2. Then rank the Y’s similarly
3. Find the sum of the squares of the difference, d, between the ranks of X’s and Y’s . d = (x-y)2
rs is from -1 to 1, it is interpreted as same as r.
Chapter 10 Index Number
An index number measures change in time series variable in comparison to a base year
Price Index: Compares levels of price from one period to another
Consumer Price Index (CPI): Measures overall price change of variety of consumer goods and services, and is used to define the cost of living
Quantity Index: Measures how much the number of quantity of a variable changes over time
Value Index: The value index measures changes in total monetary worth.
Composite Index: A single index that reflect a composite, or group, of changing variables (e.g CPI)
Objectives of using Index Numbers:
1. Show changes in a series of data values over time
2. Compare data values for different periods
3. Compare the growth of manufacturing output
Index: ( (Value of that year) / (Value of base year) ) X 100
Market basket: the total number of items of food with the quantities they were purchased
Criteria to determine base period:
Date: 16 Dec, 2004
Edited by: Jacky Wong
Page:9
Department of Business Administrative IVE-HW
QT1-Exam revision note

the base period should be fairly recent, since an index number should help people compare present values with past values. If the comparison is to
be meaningful, the past (base period) should be recent enough that make people remember its conditions. It is meaningless to tell that prices are
200% above what they were in the Middle age.
 Base period should be a period of normal condition for the series whose index is sought. If a year of war is chose to be the base year, the
consuming pattern may be abnormal in that year.
 Select a base period that of comparability. For comparisons to be valid, the indexes should have the same base period. e.g. Company A said the
index of material cost was 105, company B said that it was 120. the comparison is meaningful unless the base period is the same.
 Select a base period that of the availability of data. The base period should be a period for which accurate and complete data are available.
Sometimes people will choose the census year to be the base year.
Compare Laspeyre with Paasche Indexes:
 Paasche Index requires the quantities to be measured each year and this can be a costly exercise. Laspeyre Index only requires them for the base
year.
 The denominator  p0qn in the Paasche Index changes each year, we can only compare one year’s Paasche Index with the base year. For
Laspeyre Index, the denominator  p0q0 is fixed then each year’s index can be compared with any other year’s index.
 Because of (ii) above, Laspeyre Index number for several different year can be directly compared, whereas with the Paasche Index comparisons
can only be drawn directly between the current year and the base year.
 Paasche Index keeps current purchasing patterns updated as it continually updates the items in the shopping basket. The weights for Laspeyre
Index becomes out of date.
Limitation of index number
 Index Numbers are usually only approximation of changes in price or quantity over time, and must be interpreted with care.
 Weightings become out of date as time passes. Unless a Paasche Index is used, the weightings will gradually cease to reflect current reality.
 New products or items may appear, and old ones cease to be significant. for example spending has changed in recent years, to include new items
such as domestic computers and video recorders, whereas demand for black & white televisions has declined. These changes would make the
weightings of a retail prices index for consumer goods out of date and the base of the index would need revision.
 Sometimes, the data used to calculate index numbers might be incomplete, out of date, or inaccurate. For example the quantity indices of imports
and exports are based on records supplied by traders which may be prone to error or even falsification.
 The base year of an index should be a normal year, but there is probably no such thing as a perfectly normal year. Some error in the index will be
caused by untypical values in the base period.
 the “basket of items” in an index is often selective. For example the Retail Prices Index (RPI) is constructed from a sample of households and, more
importantly, from a basket of only about 600 items.
 A national index cannot necessarily be applied to an individual town, or an region. for example if the national index of wages rises from 100 to 115, we
cannot assume that the wages of people in Glasgow have gone by 15%.
 It does not reflect the quality of products.
Different kinds of index
Laspeyres Index:
Paasche Index:
Laspeyres Quantity Index: :
Paasche Quantity Index:
Chain base index: The base year progresses a year at a time, so that each index is measured relative to the previous year. It shows how the rate of
change is changing as well as the extension of the change over the pervious week. It is calculated with respect to the immediately preceding time point.
This approach must be used when the basic nature of the commodity (or the components of the index) changes over the whole time period.
Date: 16 Dec, 2004
Edited by: Jacky Wong
Page:10