Download Lecture notes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Stats 3: Standardized Scores,
Hypothesis Testing
Cohen Chpts 4/5
Joe got 70% on his physics test. Is that good?
A: s.d. about 5.
B: s.d. about 20
How many standard deviations is X from the mean?
Class A: s.d. = 5, z=…..
Class B: s.d. = 20, z=….
What are the units of z?
z is a standardized score.
Which is better? 70 in (60,20) or 60 in (50,10)?
Properties of Standardized Scores:
Mean of the distribution is 0, std. dev. is 1
---Regardless of whether the original distribution
was normal or not
If the original distribution is normal, so is the
standardized distribution. The shape of a distribution
is not changed by changing to standard scores.
The Standard Normal Distribution
Symmetrical
Mean 0, s.d. 1
Z
Mean Beyond
to z
z
.98 .3365
.1635
.99 .3389
.1611
1.0 .3413
.1587
1.01 .3438
.1562
Area under the curve (normal distribution) translates
directly into probability measures.
Probability of X is a number between 0 and 1 (1 is
certainty).
For continuous scales (real numbers), we can only
provide probability estimates for ranges, not specific
values (e.g. height between 60 and 62 inches)
Regard the area under the curve as being 1, we can
identify some subset of that, and ask, how likely is it
that a sample from this distribution will lie within that
subset: what proportion of the area under the curve
lies within that range….
Using tables to
calculate area under
the normal curve
Sampling Distribution of the Mean
Typically, we work with groups of subjects, rather than
single subjects (when might this not be true?)
We wish our sample to stand-in for the population. We
want our sample to be somehow typical of the underlying
population.
How typical is the mean of our sample? If we use a sample
of 10 subjects, and we repeat our experiment 100 times,
with a different 10 subjects each time, how much variability
will there be in our results?
Does sample size matter?
Standard Error of the
Mean
Sampling Distribution of the Mean Approaches a
Normal Distribution as N gets big
= Central Limit Theorem
Finding raw scores corresponding to a given area
Joining Mensa requires an IQ in the top 2% of the
population (and an arrogance quota in the top 1%). If IQs
are normally distributed, mean=100, sd=16, what IQ is
required for entry?
Groups and Individuals
How typical is Group G1 compared to some population of
groups? (teams, subject pools, etc)
Same as standard scores for individuals, but we must use
the SEM instead of the standard deviation.
Intuition: Any individual is liable to be ‘off the scale’. A
group ought to be more reliable.
Why do casinos like you to place a lot of little bets?
Z-scores: some caveats….
We can go from z-score to area (probability) only if we
know the underlying distribution is normal. (We usually
don’t know this)
Samples must be independent and random
A note on probabilities
P(A or B) = P(A) + P(B)
[mutually exclusive events]
P(A or B) = P(A) + P(B) - P(A and B) [overlapping events]
P(A and B) = P(A) * P(B) [independent events]
If one tenth of the people in the world are Chinese, one
twentieth are Indian, and half are male, what is
P(Chinese)?
P(Chinese or Indian)? P(Chinese and Indian)?
P(Chinese and Male)?
P(Chinese or Male)?
Independent Events
Two events are independent if p(A and B) = p(A)*p(B)
e.g. Two successive flips of a coin:
p(HH) = 0.5*0.5 = 0.25
Gambler’s fallacy assumes non-independence
Non-independence: Conditional Probability
P(A and B) = p(A)*p(B|A)
p(B|A) = p(B) given A
e.g. Probability of being dealt 2 hearts in a row
(Compare this with sampling with replacement)
.
End of Chpt 4
Basic Hypothesis Testing
We do an experiment and obtain a result, x. We would
like to know what the probability is that this arose by
chance.
Fictitious example: Mathematical aptitude is measured in
the USA using SAT scores, which have a mean of 500,
and an s.d. of 100.
I have a psychic who can predict mathematical aptitude
based on reading auras. He selects 25 people who he
claims will have higher average math aptitude. The
average aptitude in this group is 530.
Wow?
Meet the Skeptic: Dr Null
Dr Null is always the first to examine your results. He
always claims that you obtained your result by chance.
It is highly unlikely that any sample of 25 will have a mean
SAT of exactly 500. About half the time it will be higher,
and half the time it will be lower.
You got 530 by chance.
How do we decide?
Dr Null could always be right.
How much risk do we take in rejecting Dr Null’s case?
Peculiarities of the present case:
We grant that our sampling is random (all members of the
population equally likely to be selected).
We grant that our samples are independent (choosing P1
does not affect our choice of P2)
We grant that we know the mean and s.d. of the
population (500, 100).
Dr Null’s plan of attack
Dr Null decides to go out and also sample 25 people. He
gets a mean SAT of 490. Rats.
So he does it again, and gets a mean SAT of 540. Ah ha! I
told you it was just chance.
What can we do?
Dr Null is sampling from the population, 25 people at a time.
His samples have an expected mean of 500 (the population
mean) and an expected standard deviation of …… 20 (????)
Our score was 530. Which is a z-score of….. 1.5 (???)
How likely is it that we got a score this large (or larger) by
chance, if Dr Null is correct?
0.0668% of the time, we might expect a result this large or
larger (p=0.0668)
Selecting alpha: Choosing your risk level
Scientists have a rule of thumb: If the chance of Dr Null beating
you is less than 1:20, we will take you seriously and ignore his
protests (for now).
We set α=0.05
Here, p>0.05, so we cannot reject the null hypothesis (drat).
We do not have a statistically significant result.
Caution: statistically significant does not mean significant!!
Using a test statistic
Z here is a test statistic:
Based on one or more sample statistic(s)
Follows a well-known distribution
Large z-score --> lower p-level
Usually only report one of p<{0.05,0.01,0.001}
We do not report exact p values.
Large z-scores allow us to confidently reject the null
hypothesis….
…but do not guarantee that the alternative is
interestingly different.
Large z-scores are easier to get with large samples
(why?)
Two Types of Error
The null hypothesis is…
Do we reject
the null?
yes
no
true
false
Type I Error
(False alarm)
p=α
Correct Rejection
(Hit)
p=1-β
Correct Failure
To Reject
p=1-α
Type II Error
(Miss)
p=β
The choice of α is a trade-off between Type I and Type II
errors.
What are the costs of a false alarm and of a miss for the
following:
•A pilot emerges from the fog and estimates whether her
position is suitable for landing
•A doctor estimates whether a fuzzy spot could be a tumor
•You receive a letter with some white powder inside
With our psychic, we had a clear directional hypothesis.
…We were only looking for a large SAT score
Often, we must be open to scores which are either larger or
smaller than expected under the null hypothesis
Two tailed test
Doing a simple hypothesis test:
1. State your hypothesis
1. Define the null:
2. And its alternative
2. Select a test and significance level
3. Get some data
4. Find region of rejection (critical value)
5. Calculate test statistic (z) and compare to critical value
6. Interpret the result
Assumptions of the One-Sample z-Test
The Sample is drawn randomly
The variable measured has a normal distribution in the
population
Std dev of sample is same as std dev of population
(impossible to test?)
…Rarely used
Reporting:
“A one-tailed test showed that SAT scores were less than the
population mean (z=-2.76, p<0.05).”
One-sample z-Test in R
R does not provide a standard z-test.
To find the area under the normal curve corresponding to all
scores below a specific value, use pnorm:
pnorm(1.96)
[1] 0.9750021
 pnorm(-1.96)
[1] 0.02499790
Labs and R…… feedback?
Basic skill list:
1. Start R, enter commands
2. Change working directory
3. Get help(!!!)
4. Read in commands (source…)
5. Read in data (read.table)
6. Data subsetting: dat$x, dat[i], dat[i,j], dat[i,], dat[1:5,], etc
7. List creation: c(a.b.c)
8. Basic plotting: hist(…, breaks), boxplot(x,y,z,names=….)
9. Generation: rnorm(x,y,z), runif(x,y,z)
10. Useful functions: seq, rep, summary, par…..