Download Z-scores

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
5/6/12
Empirical Loop
Descriptive
Statistics
Collect
Data
Z-scores:
apples vs. oranges
Research
Design
1.8
1.6
Apple sizes
1.4
Orange sizes
1.2
1
0.8
Probability
0.6
0.4
0.2
0
Inferential
Statistics
Hypothesis
-4
-3
-2
0.4
Mathematically:
z =
0.25
0.2
1 SD
X-µ
σ
4
Mathematically:
0.3
z =
-1.96
1 SD
Result: the orange is bigger!
0.25
0.2
95%
of data
0.15
points
fall
here
0.1
0.15
(Well, it’s bigger relative to other oranges than the apple is relative to other apples.)
68%0.1
of data
points
fall
0.05
here
z =
+1.96
z =
X-µ
σ
0.05
 New York
0
-1
3
0.35
0.3
-2
2
Standard0.45
normal curve
0.35
-3
1
generally speaking
Standard0.45
normal curve
-4
0
Z-scores:
apples vs. oranges
Really rare
-0.2
Fruit diameter in decimeters
Z-scores:
0.4
-1
0
0
1
2
Fruit diameter in z-scores
3
4
-4
-3
-2
Really unlikely
-1
0
1
2
3
4
Z-scores
1
5/6/12
So you can calculate
a z-score for
individual data points.
Mathematically:
z =
Z-distribution of samples
Sampling
distribution
the mean
Standard
normal of
curve
0.45
X-µ
σ
0.4
Mathematically:
0.35
Mean-centered score divided by standard deviation
z =
0.3
You can also calculate a zdistribution for a sample,
but it means something
different.
0.25
Mathematically:
_
X-µ-
z =
-
X
σX
Mean-centered sample average divided by standard error
X-µX-
σ-
X
0.2
95%
of data
Really unlikely to occur points fall
for this distribution
here
0.15
IMPORTANT: We can
do this because know
what the mean and
standard error should
be for 100 tosses of a
fair coin.
0.1
0.05
0
-4
-3
-2
-1
0
1
2
3
4
Z-score: # of heads in samples of 100 coin tosses
Z-test
Z-value for z-test:
z =
X-µX-
σ-
X
When you might use a
z-test
Standard error of estimate:
• 
σ
_
σX-
=
√Ν
• 
• 
For the same population distribution, a larger sample size results in a smaller standard error.
(The more observations, the more accurate your estimate is.)
I’m a farmer. I have developed a new breed of Granny Smith apples, the
Great Granny. I want to be able to say that my apples are notably
bigger than regular Granny Smiths.
You want to know whether UCSD undergrads’ GRE scores are higher
than the national average.
You have to know sigma (σ)!
• 
• 
• 
• 
• 
Agricultural data: probably
Standardized tests: definitely
Reaction times in a lexical decision experiment? …
Spatial frames of reference in residents of Papua New Guinea? …
BOLD activation when looking at faces, houses, robots
2
5/6/12
area from
0 to z
area from
z to ∞
What
would B
be when
z=0?
What
would C
be when
z=0?
What if you don’t know sigma?
• 
• 
• 
• 
Most of the time.
Statistics to the rescue!
If you don’t know sigma, you can estimate it from your own
sample.
• 
You have to correct for it, of course. (df)
There’s a sampling distribution like the z distribution, but
for unknown population σ: the t-distribution!
• 
• 
The t distribution
•  Unlike z, there is a different t-
distribution for each sample size.
df
df
df
For extremely large samples, it is the z-distribution.
Usually, we don’t have samples big enough to get to the
z-distribution, so we use t.
Use just like z-distribution, but it has heavier tails.
More on this next week.
3
5/6/12
Empirical Loop
Hypothesis testing
Standard0.45
normal curve
Way.
0.4
Descriptive
Statistics
Collect
Data
Research
Design
0.35
No way.
0.3
0.25
Probability
0.2
0.15
0.1
0.05
0
-3
-2
-1
distribution
0
distribution
1
3
4
Some
Some
probability
probability
of 2
the mean
Blue line=your “baseline” Z-scores
µ & σ
Green=your sample
-4
Inferential
Statistics
Hypothesis
Binomial Distribution
How many outcomes of “heads” will we get from N flips
of a coin with weighting p?
Introduction to Hypothesis Testing:
The Binomial Test
17
Let’s try it ourselves…
18
4
5/6/12
Hypothesis Testing:
Binomial Distribution
Try it yourself at:
http://www.adsciengineering.com/bpdcalc/index.php
19
Hypothesis Testing,
translated:
Neyman-Pearson paradigm for hypothesis testing:
1.Null hypothesis H0: What would be the situation
if there’s no difference?
2. Alternative hypothesis H1: What would be the
situation if there is a difference? 3. Define what numeric outcome would convince you
that there is a difference
4. Collect observations (data)
5. If data are highly unlikely given the no-difference
scenario, reject the null hypothesis (Yay! Usually.)
6. Otherwise, “retain” the null hypothesis
21
Neyman-Pearson paradigm for hypothesis testing:
1. Assume a probabilistic model for the data (the “null
hypothesis”, H0)
2. Define an “alternative hypothesis” H1 (this can be very
vague) 3. Define a “decision rule” which specifies which future
observations will lead you to reject the null hypothesis
4. Collect observations (data)
5. If data highly unlikely in way that favors your
alternative hypothesis, then reject the null hypothesis
6. Otherwise, “retain” the null hypothesis
20
Hypothesis Testing:
NOTE: Retaining the null hypothesis H0 is NOT the
same as proving that H0 is true. It simply means
that we didn’t have enough evidence to reject it
(e.g., we might have, given more data)
This is analogous to when a jury declares someone
“not guilty.” It does not mean that the person is
innocent, only that there is not enough evidence to
show that she/he is guilty. 22
5
5/6/12
Hypothesis Testing:
Hypothesis testing
Possible outcomes:
Criterion
Your decision
True state of the world
H0 is CORRECT
H0 is INCORRECT
Accept H0
Reject H0
Correct decision (1-α)
Type I Error (α)
Type II Error (β)
Correct decision (1-β)
Type I Error: We reject the null
hypothesis even though it’s true
Type I1 Error
Type 1
Way
(β)
No way
Type II Error: We don’t reject the null
hypothesis even though it is NOT true
23
No way
Some probability distribution
Null hypothesis (H0) µ & σ
Z-scores
Hypothesis testing
Green=your sample
Hypothesis testing
Way
No way
Error
(α)
Way
No way
No
way
Some probability distribution
Null hypothesis (H0) µ & σ
Z-scores
Green=your sample
No
way
Some probability distribution
Null hypothesis (H0) µ & σ
Z-scores
Green=your sample
6
5/6/12
Hypothesis testing
Hypothesis testing
Way
Way
No
way
No
way
No
way
No
way
Some probability distribution
Null hypothesis (H0) µ & σ
Z-scores
Green=your sample
Some probability distribution
Null hypothesis (H0) µ & σ
Z-scores
Hypothesis Testing:
Hypothesis testing
Criterion
Way
Type I1
Error
(β)
No
way
Type 1
Error
(α)
No
way
Some probability distribution
Green=your sample
Null hypothesis (H0) µ & σ
Z-scores
Green=your sample
Your decision
True state of the world
H0 is CORRECT
H0 is INCORRECT
Accept H0
Reject H0
Correct decision (1-α)
Type I Error (α)
Type II Error (β)
Correct decision (1-β)
Type I Error: We reject the null hypothesis even
though it’s true (“False positive”)
Alpha (α): P(Type I Error)
p-Value: The smallest α you could have used and
rejected the null hypothesis given your data
P is for probability
30
7
5/6/12
Hypothesis Testing:
Your decision
Accept H0
True state of the world
H0 is CORRECT
H0 is INCORRECT
Correct decision (1-α)
Type II Error (β)
Reject H0
Type I Error (α)
Hypothesis testing
Criterion
Correct decision (1-β)
Type II Error: We retain the null hypothesis even
though it’s false (“False negative”)
Beta (β): P(Type II Error)
Power: 1-β
β
We often don’t know what beta is because our
alternative hypotheses are too vague.
31
1-β
(power)
1-α
α
Some probability distribution
Null hypothesis (H0) µ & σ
Z-scores
Green=your sample
Hypothesis Testing:
Trade off between Type I & Type II Error:
The smaller your α, the larger your β
Question: Is the simulated coin toss at an
online casino biased?
[You can achieve more power by accepting a
greater chance of making a false positive] Effect of Sample Size on Type I and Type II
Error: The bigger your sample size the
more power you can achieve w/ fixed α 33
35
8
5/6/12
A Test:
1. Flip the coin twice
2. If the coin comes up heads both times, we
decide it’s a biased magic store coin.
Assume a Fair Coin:
Decide
“Fair”
Probability
Decide
“Biased”
α = P(Type I error)=.25
# of Heads
36
37
A Better Test:
1. Flip the coin four times
2. If the coin comes up heads all four times,
we decide it’s a biased coin.
Assume a Fair Coin:
Decide
“Fair”
Probability
α = P(Type I error)=.06
38
Decide
“Cheat”
# of Heads
39
9
5/6/12
Assume an unfair coin:
Assessing power:
Suppose P(Heads=0.6)
We can only assess the actual power of a
statistical test by imagining what the
world might be like (other than like H0)
40
Hypothesis Testing:
Effect of Increasing Sample Size: By
increasing your sample size you can
decrease beta without reducing alpha.
Decide
“Fair”
Probability
α = P(Type I error) = .06
β = P(Type II error) = .87
Power = 1-β = .13
Decide
“Cheat”
# of Heads
41
A Less Conservative Decision Rule:
Decide
“Fair”
Probability
Decide
“Cheat”
α = P(Type I error)=.31
# of Heads
42
43
10
5/6/12
Empirical Loop
Hypothesis Testing:
Descriptive
Statistics
Trade off between Type I and Type II
Error:You can adjust your decision rule
such that it decreases alpha but it will
increase beta (and vice-versa).
Collect
Data
Probability
Inferential
Statistics
44
Inferential
Statistics
Hypothesis
Testing
binomial test
z-test
Research
Design
Hypothesis
45
Lecture Outline
•  z-test Review & Cohen’s d
•  How many samples do we need?
•  How good is my estimate of the
mean?
•  What do we do when we don’t
know the standard deviation of the
null hypothesis?
11
5/6/12
Is there a reason why some people perceive
clockwise vs. counterclockwise rotation?
http://www.news.com.au/perthnow/story/0,21598,22492511-5005375,00.html"
Hypothesis Testing Form (n=56)
Null Hyp. (H0)
Data come from a normal distribution with
μ=0.5, σ=0.5
Alt. Hyp. (H1)
μ≠0.5
Tail of Test
two-tailed
Type of Test
z-test
Alpha Level
α=0.05
Critical Value(s)
mean S to N>.631 or
mean S to N<.369 Observed Value
37/56=.661 S to N
Decision
Reject H0
p-value
p=.0164
12
5/6/12
Cohen’s d: A unitless measure of effect size
d=
!
x "µ
#
µ = mean of H 0
" = standard deviation of H 0
x = sample mean (i.e., estimate
of population mean)
0.20
small
0.50
medium
0.80
large
see pg. 299
!
13
5/6/12
Reporting our results:
We found that participants were significantly more
likely to perceive the dancer as spinning south to
north (z(n=56)=2.40, p=.0164, d=.322).
Central Limit Theorem
For large n (e.g., 25-100), the sum (or mean) of n
independent samples of random variable X is
approximately normally distributed. 1 Flip
Do UCSD students do better on the SAT than average?
The mean and standard deviation of all SAT scores
in the USA for 2007 is 1050 and 70 respectively.
To find out if UCSD students do better than
average on the SAT, I randomly select 49 students
and collect their SAT scores. The mean of those
49 scores is 1090. Can I be 95% sure that UCSD
students really are better? 14
5/6/12
z-Test
Useful for testing hypotheses when:
• 
you know the mean and the standard
deviation of the null hypothesis
• 
your data are normally distributed OR you
have a sufficiently large sample size (e.g.,
25-100)
z-Test
1.  Decide if you need to do a two-tailed, upper
tailed, or lower tailed test.
2.  Compute the mean of your data, X.
3.  Compute the standard error of the mean of
the distribution of the null hypothesis.
4.  Convert X into a z-score.
5.  If X exceeds your critical z-score, then
reject the null hypothesis.
15
5/6/12
Cohen’s d
A unitless measure of effect size:
• 
magnitude of d doesn’t depend on sample size
(unlike p-values)
• 
useful for getting a sense of how big an effect
is, whereas p-values give you a sense of how
reliable an effect is
Each of the following statements could inspire a hypothesis test. For each
statement, would you use a two-tailed, upper-tailed, or lower-tailed. State H0 and
H1.
a) To increase rainfall, extensive cloud-seeding experiments are to be conducted and the
results are to be compared with a baseline figure of 0.54 inches (SD=.11) of rainfall (the
amount of rain when cloud seeding wasn’t done).
b) Public health statistics indicate that American males gain an average of 23 pounds
(SD=10) during the 20 year period after age 40. An ambitious weight-loss program,
spanning 20 years, is being tested with a random sample of 40 year old men.
c) A basketball coach wonders if listening to CDs of positive comments during sleep will
affect a player’s performance. On the one hand, it may boost self-confidence and
subsequently boost performance. On the other hand it may disturb their sleep and
hinder their performance.
16