Download Analysis of Means - Open Online Courses

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Gibbs sampling wikipedia , lookup

Statistical inference wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Inferential Statistics &
Hypothesis Testing
Heibatollah Baghi, and
Mastee Badii
Objectives

Conduct one sample mean test
– Using Z statistics
– Using t statistics
Inferential Statistics
Usage

Researchers use inferential statistics to
address two broad goals:
– Estimate the value of population
parameters
– Hypothesis testing
Distribution of Coin
Tosses
Possible Outcomes
Toss No. and Probabilities
1 H=.500
2 HH=.250
3 HHH=.125
4 HHHH=063
5 HHHHH=.031
6 HHHHHH=.016
7 HHHHHHH=.008
8 HHHHHHHH=.004
9 HHHHHHHHH=.002
10 HHHHHHHHHH=.001
T=.500
HT=.250 TH=.250 TT=.250
HHT=.125 TTH=.125 TTT=.125
HHHT=.063 HHTT=.063 HTTT=.063 TTTT=.063
Total
Probability
1.000
1.000
If you see 10 heads in a row,
is it a fair coin?
Sample & Population



Think of any sequence of throws as a sample
from all possible throws
Think of all possible throws as the entire
population.
One-Sample Inferential Tests estimate the
probability that a sample is representative of
the total population (within +/- ~2 standard
deviations of the mean, or the middle 95% of
the distribution).
Logic of Hypothesis
Testing

Is the value observed consistent with the
expected distribution?
– On average, 100 coin tosses should lead to
50/50 chance of heads.
– Some coin tosses will be outliers, giving
significantly different results.
– Are differences significant or merely random
variations?
Statistics is the art of
making sense of distributions
Logic of Hypothesis
Testing
The further the observed value is
from the mean of the expected
distribution, the more significant
the difference
10
/9
0
20
/8
0
30
/7
0
40
/6
0
50
/5
0
60
/4
0
70
/3
0
80
/2
0
90
/1
0
99
/0
1
What about this point?
10
8
6
4
2
0
What about this point?
10
8
6
4
2
0
10/90 20/80 30/70 40/60 50/50 60/40 70/30 80/20 90/10 99/01
10
/9
0
20
/8
0
30
/7
0
40
/6
0
50
/5
0
60
/4
0
70
/3
0
80
/2
0
90
/1
0
99
/0
1
Is this point part of the
distribution?
10
8
6
4
2
0

– Mean
– Variance
It is a chance event
10
/9
0
20
/8
0
30
/7
0
40
/6
0
50
/5
0
60
/4
0
70
/3
0
80
/2
0
90
/1
0
99
/0
1
Depends on
location
10
/9
0
20
/8
0
30
/7
0
40
/6
0
50
/5
0
60
/4
0
70
/3
0
80
/2
0
90
/1
0
99
/0
1

10
/9
0
20
/8
0
30
/7
0
40
/6
0
50
/5
0
60
/4
0
70
/3
0
80
/2
0
90
/1
0
99
/0
1
Probability of Membership
in a Distribution
10
8
6
4
2
0
10
8
6
4
2
0
10
8
6
4
2
0
One-Sample Tests

We observe a sample and
infer information about
the population
If the observation is
outside the standard, we
reject the hypothesis that
the sample is
representative of the
population
8
6
4
2
10
/9
0
20
/8
0
30
/7
0
40
/6
0
50
/5
0
60
/4
0
70
/3
0
80
/2
0
90
/1
0
99
/0
1
0
10
8
6
4
2
0
10
/9
0
20
/8
0
30
/7
0
40
/6
0
50
/5
0
60
/4
0
70
/3
0
80
/2
0
90
/1
0
99
/0
1

We set a standard beyond
which results would be
rare (outside the
expected sampling
error)
10
8
6
4
2
0
10
/9
0
20
/8
0
30
/7
0
40
/6
0
50
/5
0
60
/4
0
70
/3
0
80
/2
0
90
/1
0
99
/0
1

10
Random Sampling



A simple random sampling procedure is one in
which every possible sample of n objects is
equally likely to be chosen.
The principle of randomness in the selection of
the sample members provides some protection
against the sample unrepresentative of the
population.
If the population were repeatedly sampled in
this fashion, no particular subgroup would be
over represented in the sample.
Sampling Distribution


The concept of a sampling distribution, allows
us to determine the probability that the
particular sample obtained will be
unrepresentative.
On the basis of sample information, we can
make inference about the parent population.
Sampling Distribution

Sampling Error.
– No sample will have the exact same mean and
standard deviation as the population

Sampling distribution of the mean
– In research sampling error is often unknown
since we do not have the population parameters
– A distribution of means of several different
samples of our population
– Less widely distributed than the population
– Usually Normal
Population of IQ
scores, 10-year olds
µ=100
σ=16
n = 64
Sample
1
X 1  103.70
Sample
2
Sample
3
X 2  98.58
X 3  100.11
Etc
Is sample 2 a likely
representation
of our population?
Distribution of Sample
Means
1.
2.
3.
The mean of a sampling distribution
is identical to mean of raw scores in
the population (µ)
If the population is Normal, the
distribution of sample means is also
Normal
If the population is not Normal, the
distribution of sample means
approaches Normal distribution as
the size of sample on which it is
based gets larger
Central
Limit
Theorem
Standard Error of the
Mean


2
The standard deviation
of
means
in .a90
(
X
X
)
148
 is known as the  4.07
S
sampling distribution
(n -mean.
1)
9
standard error of the
X   from the
It can be calculated
tc 
standard deviation
S X of observations
s
4.07 
S xError
 of X :    1.29
Standard
3.16 n
n
(9sample
.90  6.size,
75) the
The largertcour

 2.44
1.29error
smaller our standard
t  2.262
X
3.
2.44  2.262
Sample of
observations
Entire population of
observations
Random selection
Statistic
Parameter
µ=?
X
Statistical inference
Estimation Procedures

Point estimates
– For example mean of a sample of 25 patients

No information regarding probability of accuracy
– Interval estimates
– Estimate a range of values that is likely

Confidence interval between two limit values
– The degree of confidence depends on the
probability of including the population mean
When Sample size is
small …
_
X
_
95% CI = X + t S _
X
A constant from
Student t Distribution
that depends on confidence
interval and sample size
HYPOTHESIS TESTING





Hygiene procedures are effective in
preventing cold.
State 2 hypotheses:
Null: H0 : Hand-washing has no effect on
bacteria counts.
Alternative: Ha : Hand-washing reduces
bacteria.
The null hypothesis is assumed true: i.e.,
the defendant is assumed to be innocent.
TWO TYPES OF ERROR
True
Reject H0
Fail to Reject H0
error
correct
decision
False
correct
decision
error
Alpha & Beta Errors
Decision
Reject H0
Fail to Reject H0
Ho is True
Ho is False
α
1-β
1-α
β
Two Types of Error in
Admission to ICU

Correct decisions
– Patients admitted to ICU who would have failed
if otherwise
– Patients denied admission who do fine in step
down unit

Errors
– Patient admitted who does not need to be there
– Patient denied admission who needs to be there
Two Types of Error

Alpha: α
– Probability of Type I Error
– P (Rejecting Ho when Ho is true)

Beta: β
– Probability of Type II Error
– P (Failing to reject Ho when Ho is false)
Power & Confidence Level

Power
– 1- β
– Probability of rejecting Ho when Ho is
false

Confidence level
– 1- α
– Probability of failing to reject Ho when Ho
is true
Steps in Test of
Hypothesis
1.
2.
3.
4.
5.
6.
Determine the appropriate test
Establish the level of significance:α
Determine whether to use a one tail
or two tail test
Calculate the test statistic
Determine the degree of freedom
Compare computed test statistic
against a tabled value
1. Determine Appropriate
Test




Level of measurement
Number of groups being compared
Sample size
Extent to which assumption for
parametric tests have been met
– Relatively Normal distribution
– Approximately interval level variable
2. Establish Level of
Significance





α is a predetermined value
The convention
α = .05
α = .01
α = .01
3. Determine Whether to
Use One or Two Tailed Test


If the alternative hypothesis specifies
direction of the test, then one tailed
Otherwise, two tailed
– Most cases
4. Calculating Test
Statistics
X  265
 

x


n

For one sample tests, use Z test
X
statistic if population is Normal,
zc 
 is known, or if sample size is
x
large
z c   1.80
For one sample tests, use T
static if population distribution is
not known or if sample size is
small (less than 30)
5. Determine Degrees of
Freedom


Number of components that are free
to vary about a parameter
Df = Sample size – Number of
parameters estimated
– Df is n-1 for one sample test of mean
6. Compare the Computed
Test Statistic Against a
Tabled Value
Test statistic Theoretical distribution Table
Areas of the Normal
distribution for
Z statistic
Normal distribution
selected z scores
Critical values of
T statistic
Student t-distribution Student t distribution
Example of Testing
Statistical Hypotheses
About µ When σ is Known
(Large Sample Test for
Population Mean).
Research Question
“Does Home Schooling Affect
Educational Outcomes?”
Statistical Hypotheses



Dr. Tate, a researcher at GMU decided to
conduct a study to explore this question. He
found out that every fourth-grade student
attending school in Virginia takes CAT.
Scores of CAT are normally distributed with
µ = 250 and σ = 50.
Home – schooled children are not required
to take this test.
Statistical Hypotheses


Dr. Tate selects a random sample of 36
home –schooled fourth graders and has
each child complete the test. (It would be
too expensive and time-consuming to test
the entire population of home-schooled
fourth-grade students in the sate.)
Step 1: Specify Hypotheses
H0: µ = 250
Ha: µ > 250
α = 0.05
Calculated Z

Select the sample, calculate the
necessary sample statistics
n=36
σ =50
X  265
 
x
zc 

50

 8.33
n
36
X

x
zc   1.80
zc 
265  250
8.33
Critical Z

Determine zα
–  = 0.05 one sided
– CI of 95%
– Refer to the Z table and find the
corresponding Z score:
– Z = 1.65
Make Decisions
Regarding Ho


Because the calculated z is greater
than the critical z, Ho is rejected.
1.80 > 1.65 and Ha is accepted
The mean of the population of homeschool fourth graders is not 250.
Alternative Steps

Step 1: Specify Hypotheses
Ho: µ = 250 Ha: µ > 250 α = .05

Step 2: Select the sample, calculate
sample statistics n=36 σ =50
X  265
 
x
zc 

50

 8.33
n
36
X

x
zc   1.80
zc 
265  250
8.33
Using P value to Reject
Hypothesis



Step 3: Determine the p-value . A z of
+1.80 corresponds to a one tailed
probability of 0.036.
Step 4: Make decision regarding Ho.
Because the p-value of 0.036 is less than α
=0.05
H0 is rejected. The mean of the population
of home-school fourth graders is not 250.
DECISION RULES


In terms of z scores:
If Zc > Zα
Reject H0
In terms of p-value:
If p value < α Reject H0
The One-sample Z Test


One-Sample tests of significance are used to
compare a sample mean to a (hypothesized)
population mean and determine how likely it
is that the sample came from that
population. We will determine the extent to
which they occur by chance.
We will compare the probability associated
with our statistical results (i.e. probability of
chance) with a predetermined alpha level.
The One-sample Z Test


If the probability is equal to or less
than our alpha level, we will reject the
null hypothesis and conclude that the
difference is not due to chance.
If the probability of chance is greater
than our alpha level, we will retain the
null hypothesis and conclude that
difference is due to chance.
Take Home Lesson
Procedures for Hypothesis Testing and
Use of These Procedures in One
Sample Mean Test for Normal
Distribution