Download INFERENTIAL STATISTICS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
INFERENTIAL STATISTICS
Sample
Population
Information
information
(Statistic)
Statistical inference falls into 2 branches:(i)
Statistical estimation
(ii)
Hypothesis/significance testing
In estimation, a sample statistic is used to provide an approximation (estimate) of the population
parameter e.g. sample mean (x) can be used to estimate population mean (Π).
Sample standard deviation (s) – Population standard deviation (6)
In Hypothesis testing, the population characteristic (s) in form of parameter(s) is/are known or
assumed. Sample results(s) are used to verify or confirm the population characteristics.
ESTIMATION
Sample
Population
Estimation in take two forms:
(a) Point estimation
(b) Interval estimation
Point estimation
In point estimation we calculate/determine a sing value or single point from sample data to
approximate source unknown population parameter. The single value is calculated using some
formula referred to as point estimator e.g point estimator for population mean is:X = ∑fx - Estimate for 0 is S =
∑f
∑f ( x – x ) ²
∑f
The value obtained by substituting the data value into the estimator is known as point estimate
i.e. specific value e.g. X = 15 e.t.c.
An estimator (Ө) is derived on basis of the “good” characteristic for a good estimator.
These characteristics include:

Unbiasedness = E(Ө) e.g. E(x) equal = Ц

Efficiency – The most efficient estimator has the smallest Variance i.e. if var (Ô) ═> Ô1
is more efficient than Ô2

Consistency – A sufficient estimator uses all the values in the data.
Situations suitable for point estimate
1. Comparative studies e.g. comparing wealth of nations (per capita)
2. Almost an accurate estimate required e.g. approximate order size for an expensive item.
3. Where error may lead to dangerous implications e.g. approximation of voltages.
Interval Estimation
A point estimate may be right or wrong. It does not incorporate the possible error or precision
of the estimate and yet a sample value can never be an exact representation of the population
value. It is always associated with some lend of error or uncertainty. Interval estimation
incorporates the degree of error or precision in the estimator.
An Interval estimator consists of two values; a lower value and an upper u within which some
unknown population parameter has with some specified probability (1 - ∞) 100% ═> P(L <Ө <
U) = 1 - α
The interval estimator consists of two values; a lower value L and an upper value U within which
some unknown population parameter has with some specified probability
(1 - α) 100 ═> P (l < Ө < (1) = 1 – O
The interval limits L and called (1 - α) 100% confidence limits and the interval (L, U) called ( 1 α) 100% confidence interval.
It simply indicates that we are (1 - α) 100% confident or sure that the unknown population
parameter lies between L and U.
The theory of interval estimation is based on the concept of sampling distribution for instance if;
Population size, N = 6
Sample size, N = 3? How many possible samples can be selected?
Let the no be S=(6) = 6( 3 =
3
6
= 20
3! (6 – 3)!
No of possible sample means (or Ô value) = 20
i.e X1 , X 2……………………………….., X 20
or Ô1, Ô2,………………………………………………….., Ô20
Ô’s are numerical values ═> can be arranged as a frequency table which can be developed into a
probability distribution.
(ii) P (ЦÔ – Z α /2 Ô< Ô + Z α /2) = 1 - ∞
µӨ - Z α /2
µӨ
µӨ + Z α /2
Or Ө ═> Ө
Ө = > ӨÔ
P (Ө - Z Ô Ô < Ô < + Z Ô = 1 - α
But since the intention is to estimate Ө - Z Ô the expression is adjusted using arithmetical
manipulation to reflect this. It becomes: P (Ө - ZÔ Ө< Ô < Ô + Z Ô Ө
The interval Ô + Z α /2 Ô + ZÔ)
=1-α
Interval estimation of population, µ
Ө = Ц Ө= X SEX
= ӨX
=S/n
Confidence limits are:µ = x + Z α /2 s/n
Suppose (a) α = 5% = 0.05 => 1 - α = 0.95 (95%)
=> Ө - z Ө
Ө
Ө + zӨ
(a) α = 0.01 (1%) => 1 - α= 0.99 (99%)
Using the reasoning in (a) above. Z= 2.58
µ = x ± 2.58 s/√n
P[x ± 2.58
Z
s
s
/√n ≤ µ ≤ x + 2.58 s/√n] = 0.99
/√n is called error of estimate which can be reduced by sample size n.
In real life application the maximum error allowed is usually specified and the task becomes
that of determining the sample size that will guarantee this maximum errors.
Let the maximum allowed error be = E
=>
Z
s
/√n ≤ E
=> Z2 S2 ≤ n E2
n≥ Z2 S2/E2
Illustrations
1. An organization wishes to estimate its average monthly profit. The accountant picks a
random sample of 35 monthly profits from previous records. The sample indicates a mean
value of Ksh 105 M and standard deviation of Ksh.25 M.
(a) Determine (i) 95% (ii) 99% confidence intervals for the actual monthly mean
profit.
(b) The company policy states that any estimate for the monthly mean profit should
be within a maximum error of Ksh 30M. what sample size should the accountant
pick to be within the requirements of the company policy at (i) 95% (i) 99%
s
µ = x ± Z /√n
(a) (i) µ =105 ± 1.96
25
/√35= 105 ± 8.28
=> P[96.72 ≤ µ≤ 113.28]=0
= (96.72, 113.28) .This means that we are 95% confident that the actual mean
monthly profit lies between 96.72m and 113.28 M.
(ii) 105 ± 2.58
25
/√35 = 105 ± 10.9
=> (94.1, - 115.9)
=> [94.1≤ µ≤ 115.9) =0.99
=> P (96.72 < 113.38) = 0.99
This means that we are 99% confident that the actual mean monthly profit lies between 96.72m
and 113.28m
(b) n> Z²S²
E²
(i) n> 1.96² . 25²
> 4.6225 => nmin = 5
30²
SAMPLING DISTRIBUTION
1. µ pop mean
X
2 Population
Sample pop.
proportion
µ
S/
П
П (I – П)
P
Z S/
n
Z S/
n
n
П
n
p(I – p)
Z
n
%
P(1 – p)
n
p(100 – p)
n
3. Difference
between 2 pop
= means i.e. µ1
S1 + S2
X1 – X2
µ1 - µ2
n 1 n2
Z
– u2
S1 + S2²
n1
=
S1² + S2²
n1
n2
n1
4. Difference
between 2
P1 – P2
П-П
population
P (1- P1) P2(1-P2)
n 1
n2
Z
P1 (1 – P1)P2(1
n
proportions i.e.
П1 – П2
Exercise
1. A management company wished to estimate the proportion defective output from their latest
population line. For this purpose, a random sample of 800 units of output from the line resulted
to 150 defective units.
(a) Determine the estimate of the defective level of the new Pat line at (i) 95% (ii) 99% (iii)
90% C.Ls.
(b) Determine minimum sample size to guarantee a maximum error of estimates of 5% of (i)
95% (ii) 99% confidence levels.
2. National supermarket Co. Ltd operate their supermarkets in 2 regions i.e. CBD and outskirt of
the city. They wish to estimate the average deference in their daily sales between the 2 regions.
They picked a random sample of 55 daily sales which resulted to a mean of Ksh2m and Standard
deviation of Ksh 200,000. A random sample of 40 daily Ksh 1.6M and standard deviation of
Ksh.150,000. Determine an estimate for the difference between the mean daily sales for the 2
regions using 95% C.L
n
3. A presidential candidate wishes to determine the difference in how popularity between
Central and Western regions of the country. His campaign advisors picked a random
sample of 2,000 registered voters from the central region and 1,200 favored his candidature.
A random 1,400 voters from the Western region indicated that 800 were in favor of the
candidate. Based on this results determine the difference of the candidature popularity in
the 2 regions at (i) aj % (ii) 99% confidence levels.
HYPOTHESIS/SIGNIFICANCE TESTING
X
S
µ
б
Sample
Population
In hypothesis testing the sample statistic is used to confirm/ascertain the population parameter.
Does the sample value/statistic differ significantly from the population value?
If they differ significantly, then the population is not the one we had in mind, it has changed.
e.g The coin is fair
P(H) = ½
Toss the coin 100 times
Suppose;
(a) = n(H) = 48
Accept coin is fair – sample value not significantly different from the expected population value.
(b) n (H) = 30
Reject that coin is fair
Reason
The difference/sample value is significantly different from the population
value. e.g Ho: P(H) = ½ , Ho : is rejected or nullified
Denoted as HA : or H1
e.g H A: µ < 2kg
Steps in hypothesis testing
1. Statement of the hypothesis i.e states the hypothesis. This takes two forms which includes
(a) Null hypothesis denoted as Ho and
(b) Alternative hypothesis denoted as H1or HA
2. Choose significance level
µ - Zб
µ
µ + Zб
Rejected = α / 2
Common significance levels, α
(i) α = 0.05 (5%) confidence level, 1 - ∞ = 0.95 (95%)
If Ho: rejected at 5%, then, the difference (sample – population) is said to be significant.
(ii) α = 0.05 (1%) C.L 99 (99%). Hence very significant
(iii) α = 0.001 (0.1 %) C.L = 0.999 (99.9%).
The difference is said to be very highly significant.
The level chosen depends on the implications of rejecting the null hypothesis. If the implications
are senior e.g. dismissing an employee then, the significance level should be small like 0.01 or
0.001.
3. Decision rule
Decide whether the two – tail test or one – tail test, depending on e.g Ho: µo = 2kg
(a) HA : µ = µo = 2kgs either too high or too low
=> Two – tail test
If α= 0.05
0.025
0.025
µo – 1.96 бx
µo + 1.96 бx
µ
=> Reject Ho when sample statistic, in this case
µo + 1.96б x or x < µo – 1.96 бx
OR
In standardized form, reject Ho when > = x - µo > 1.91
бx
(b) HA : µ > when only too high.
= > one – tail test / right tail
If α = o.o5
0.45
µo
0
1.645
µo + 1.645 бx = > Reject when x > µo + 1.64
Reject when x > 1.645
(c) HA µ< µo
Customer => one tail test/left tail
µo - 1.645 - µo
----------------->
Reject when x <- - 1.645
4. Obtain the necessary sample data and then determine the require test value/test statistic
in either (i) absolute form i.e Õ
(ii) Standardized Z = Õ = µo
e.g for
(a) Ho: µ = µo
Test statistic
Z = X - µo
S/ n
(b) Ho: П = Пo
Test statistic z = p – Пo
Пo (I – Пo)
n
(c ) Ho: µ1 = µ2 or µ1 - µ2 = 0
Sample statistic = X1 – X2
Test statistic, Z = ( X1 – X2) – (µ1 - µ2)
S1 ² + S2 ²
n1
n2
Since µ1 - µ2 =0, then
= > X1 – X 2
S1²+ S2²
n1
n²
(d) Ho: П1 = П2 or П1 – П2 = 0
Sample statistic = P1 – P2
Test statistic, Z = (P1 – P2) – (П1 – П2)
P1 (1 – P2) + P2 (1 – P2)
n1
n2
= > Z = P 1 – P2
5. Conclusion/inference
It is categorized into stages
i. Statistical/conclusion
(a) Reject Ho
(b) Fail to reject/nullify H o i.e. evidence from the
ii. Managerial/conclusion
Express the statistical conclusion in a language that will be understood by other stakeholders’
layman’s language.
EXERCISE
1. The mean time taken for the setting of jam is 70 minutes with standard deviation for 8
minutes. It is known that the quality of the jam can be improved by doing 1 of the
ingredients but it is not clear whether this would affect the setting time. To investigate
this concern, 40 batches of the new jam are produced and their measurements indicate an
average setting time of 78 minutes respond to the concern at (i) 5% = α = 0.0 (ii) α =
0.01.
2. The manufacturer of a certain pelt believes that the pelt has 45% share of the market. A
market research survey was conducted and showed that out of random sample of 1000
consumers, 400 brought the manufacturer’s pelt and the rest bought other brands. Test
the manufacture belief.
3. In testing of cake recipes, a quality assessment is made based on awarding marks for the
cake quality. Two recipes are tested to find out if they differ in quality. The 1 st recipe is
tested by 40 people whose average ma was so with standard deviation of 10. The 2 nd
recipe was tested by 30 people and the average mark was 45 with standard deviation of 8.
Do these results indicate any difference in appeal between the 2 recipes?
4. The ministry of transport has recently carried out an intensive advertising campaign to
encourage vehicles drivers to use their seat belts.
A survey was carried out in Nairobi
and Kisumu with a view to determine the %age of vehicle drivers who use seat belts after
the campaign. A random sample of 1000 drivers was observed in Nairobi and 260 of
them used seat belts. In Kisumu, 500 drivers were observed and 100 of them used seat
belts. Do these results indicate that greater proportion of drivers in Nairobi use seat belts
that the drivers in Kisumu.