Download CH5 - uob.edu.bh

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Statistical inference wikipedia , lookup

Foundations of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Statistical inference:
confidence intervals and hypothesis
testing
Objective
The objective of this session is




Inference statistic
Sampling theory
Estimate and confidence intervals
Hypothesis testing
Statistical analysis
 Descriptive
calculate various type of descriptive statistics in order to
summarize certain quality of the data
 Inferential
use information gained from the descriptive statistics of
sample data to generalize to the characteristics of the
whole population
Inferential statistic application
2 broad areas
 Estimation
 create confidence intervals to estimate the true
population parameter
 Hypothesis testing
 test the hypotheses that the population parameter has
a specified range
Population & Sample
population
sample
mean:

X
standard
deviation:

s
Sampling theory
 When working with the samples of data we have
to rely on sampling theory to give us the
probability distribution pertaining to the particular
sample statistics
 This probability distribution is known as
“the sampling distribution”
Sampling distributions
 Assume there is a population …
B
 Population size N=4
C
 Random variable, X,
is age of individuals
 Values of X: 18, 20,
22, 24 measured in
years
D
A
Sampling distributions
Summary measures for the Population Distribution
P(X)
N


X
i 1
.3
i
N
18  20  22  24

 21
4
N

 X
i 1
i

N
.2
.1
0
2
 2.236
A
B
C
D
(18)
(20)
(22)
(24)
Population mean Distribution
Sampling distributions
Summary measures of sampling distribution
N
X 
X
i 1
N
i

18  19  19 
16
N
X 
 Xi  X
i 1
 21
2
N
18  21  19  21 
16
2


 24
2
  24  21
2
 1.58
Properties of summary measures
Sampling distribution of the sample arithmetic mean

X
Sampling distribution of the standard deviation of the
sample means

SE 

n

X ~ N , s / n
2

Estimation and confidence intervals
 Estimation of the population parameters:
 point estimates
 confidence intervals or interval estimators
 Confidence intervals for:
 Means
Large or Small samples ???
 Variance
Confidence intervals for means
Probability distribution
large samples (n >= 30)
 apply Z-distribution
 /2
1
confidence interval

 /2
Confidence intervals for means
large samples (n >= 30)
 From the normally distributed variable, 95% of
the observations will be plus or minus 1.96
standard deviations of the mean
Confidence intervals for means
Probability distribution
large samples (n >= 30)
 The confident interval is given as
2.5%
in tail
  1.96
95% confidence interval 2.5%
in tail
-1.96 SE

+1.96 SE
s
n
Confidence intervals for means
large samples (n >= 30)
Probability distribution

s
s 
p  X  1.96
   X  1.96
  0.95
n
n

2.5%
in tail
95% confidence interval 2.5%
in tail
-1.96 SE

+1.96 SE
Confidence intervals for means
large samples (n >= 30)
 Thus, we can state that:
“the sample mean will lie within an interval plus
or minus 1.95 standard errors of the population
mean 95% of the time”
Confidence intervals for means
large samples (n >= 30)
Example
we have data on 60 monthly observations of the
returns to the SET 100 index. The sample mean
monthly return is 1.125% with a standard
deviation of 2.5%. What is the 95% confidence
interval mean ???
Confidence intervals for means
large samples (n >= 30)
Example (cont’d)
 Standard error is calculated as
2.5
SE 
 0.3227
60
 the confidence interval would be
1.125  0.6325    1.125  0.6325
0.4925    1.7575
 The probability statement would be
p0.4925    1.7575   0.95
Confidence intervals for means
large samples (n >= 30)
Example (cont’d)
 The probability statement would be
p0.4925    1.7575   0.95
 How does the analyst use this information ???
Confidence intervals for means
Probability distribution
What about small samples (n < 30)
 apply t-distribution
  2  1

  1   2
Confidence intervals for means
What about small sample ??? (n < 30)
 Apply t-distribution
 The confidence interval becomes
X  tn 1, / 2
S n 1
S
   X  tn 1, / 2 n 1
n
n
 The probability statement pertaining to this confidence
interval is
S
S 

p X  t n 1, / 2 n 1    X  tn 1, / 2 n 1   1  
n
n

Confidence intervals for means
Example
 From 20 observations, the sample mean is calculated as
4.5%. The sample standard deviation is 5%. At the 95%
level of confidence:
the confidence interval is …
the probability statement is …
Confidence intervals for variances
 Apply a  distribution
 The confidence interval is given as
2
2
 (n  1) s 2

(
n

1
)
s
2
  2
 2

 n1, / 2 
  n1,1 / 2
 The probability statement pertaining to this confidence
interval is
2
 (n  1) s 2

(
n

1
)
s
2
p 2
  2
  1
 n1, / 2 
  n1,1 / 2
Confidence intervals for variances
Example
 From a sample of 30 monthly observations the variance
of the FTSE 100 index is 0.0225. With n-1 = 29 degrees
of freedom (leaving 2.5% level of significant in each tail)
the confidence interval is …
the probability statement is …
Hypothesis testing
2 Broad approaches
 Classical approach
 P-value approach
 is an assumption about the value of a
population parameter of the probability
distribution under consideration
Hypothesis testing
 When testing, 2 hypotheses are established
 the null hypothesis
 the alternative hypothesis
 The exact formulation of the hypothesis depends upon
what we are trying to establish
 e.g. we wish to know whether or not a population
parameter,  , has a value of  0
H 0 :   0
H1 :    0
Hypothesis testing
 How about we wish to know whether or not a population
parameter,  , is greater than a given figure  0, the
hypothesis would then be …
 And if we wish to know whether or not a population
parameter is greater than a given figure  0, the
hypothesis would then be …
The standardized test statistic
 In hypothesis testing we have to standardizing the test
statistic so that the meaningful comparison can be made
with the
 Standard normal (z-distribution)
MEAN
 t-distribution
VARIANCE
2
  distribution
 The hypothesis test may be
 One-tailed test
 Two-tailed test
Hypothesis test of the population mean
Two-tailed test of the mean
 Set up the hypotheses as
H 0 :   0
H1 :    0
 Decide on the level of significance for the test (10, 5, 1%
level etc.) and establish 5, 2.5, 0.5% in each tail
 Set the value of  0 in the null hypothesis
 Identify the appropriate critical value of z (or t) from the
tables (reflect the percentages in the tails according to
the level of significance chosen)
Hypothesis test of the population mean
Two-tailed test of the mean
 Applying the following decision rule:
Accept H0 if
z
Reject H0 if otherwise
X  0
s2 / n
z
Hypothesis test of the population mean
Example
 Consider a test of whether or not the mean of a portfolio
manager’s monthly returns of 2.3% is statistically
significantly different from the industry average of 2.4%.
(from 36 observations with a standard deviation of 1.7%)
Hypothesis test of the population mean
Example
 An analyst claims that the average annual rate of return generated
by a technical stock selection service is 15% and recommends that
his firm use the services as an input for its research product. The
analyst’s supervisor is skeptical of this claim and decides to test its
accuracy by randomly selecting 16 stocks covered by the service
and computing the rate of return that would have been earned by
following the service’s recommendations with regards to them over
the previous 10-year period. The result of this sample are as follows:
 The average annual rate of return produced by following the service’s
advice on the 16 sample stocks over the past 10 years was 11%
 The standard deviation in these sample results was 9%
Determine whether or not the analyst’s claim should be accepted or
rejected at the 5% level of significant ???
Hypothesis test of the population mean
One-tailed test of the mean (Right-tailed tests)
 Set up the hypotheses as
H 0 :   0
H1 :    0
 Applying the following decision rule:
Accept H0 if
Reject H0 if
X  0
s2 / n
X  0
2
s /n
z
z
Hypothesis test of the population mean
Example
 If we wish to test that the mean monthly return on the
FTSE 100 index for a given period is more than 1.2.
From 60 observations we calculate the mean as 1.25%
and the standard deviation as 2.5%.
Hypothesis test of the population mean
Example
 We wish to test that the mean monthly return on the
S&P500 index is less than 1.30%. Assume also that the
mean return from 75 observations is 1.18%, with a
standard deviation of 2.2%.
Hypothesis test of the population mean
Two-tailed test Applying the following decision rule:
Accept H0 if

2
(1( / 2 ))

(n  1) s 2

2
0

2
( / 2 )
Reject H0 if otherwise
One-tailed test Applying the following decision rule:
(n  1) s 2
Accept H0 if
Reject H0 if

2
0
(n  1) s 2

2
0


2
(1 )
Left or right
tailed test ???
2
(1 )
How ‘bout the
other ???
Hypothesis testing of the variance
Two-tailed test
 The standardized test statistic for the population
variance is
(n  1) s 2
 02
 This standardized test statistic has a
 distribution
2
Hypothesis testing of the variance
Example
 If we wish to test the variance of share B is below 25.
The sample variance is 23 and the number of
observation is 40
The p-value method of
hypothesis testing
 The p-value is the lowest level of significance at which
the null hypothesis is rejected
 If the p-value ≥ the level of significance (α)
accept null hypothesis
 If the p-value < the level of significance (α)
reject null hypothesis

accept H 0 if p  value  
reject H 0 otherwise
Calculation the p-value
 If we wish to find an investment give at least 13.2%.
Assume that the mean annualized monthly return of a
given bond index is 14.4% and the sample standard
deviation of those return is 2.915%, there were 30
observations an the returns are normally distributed.
Calculation the p-value

H 0 :   13.2
H1 :   13.2
 The test statistic is:
X  0
14.4  13.2

 2.255
2
s / n 2.915 / 30
 With degree of freedom = 29
a t-value of 2.045 leaves 2.5% in the tail
a t-value of 2.462 leaves 1% in the tail
Calculation the p-value
 Calculate p-value from interpolation
2.255  2.045 0.21

 0.50
2.462  2.045 0.417
 P-value = 0.025 – (0.50 x (0.025 – 0.01) = 0.0175 = 1.75%
 P-value (1.75%) < α (5%), thus reject null hypothesis
Conclusion
 Meaning of statistical inference
 Sampling theory
 Application of statistical inference
 Confidence intervals
 Estimation
means
variance
 Hypothesis testing
 Two-tailed
 One-tailed
Z-distribution
t-distribution
X2-distribution
Conclusion
Under the following circumstances:
The Appropriate Reliability Factor
for Determining Confidence
Intervals for a Population Mean is:
1.
The data in the population are
normally distributed with a known
standard deviation.
Z-value
2.
The data in the population are
normally distributed, there standard
deviation is unknown, but can
estimated from sample data.
T-value
3.
The data in the population are not
normally distributed, there standard
deviation is known, and the sample
size is large.
Z-value
4.
The data in the population are not
normally distributed, there standard
deviation is known, and the sample
size is large.
No good reliability factor exists
However, a Z-value can be
used as an approximation of the
t-value, if the sample is large.