Download Hypothesis testing 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Hypothesis Testing
CJ 526
Probability






Review
P = number of times an even can occur/
Total number of possible event
Bounding rule of probability
Minimum value is 0
Maximum value is 1
Probability






Probability of an event NOT occurring is
the complement of an event
Probability of an illness = .2
Probability that illness will not occur =
1=probability of event or 1 - .2 = .8
Odds of an event is the ratio
Odds of illness = .2/.8 or 1 to 4 or odds of
not getting ill are 4 to 1
Addition rule of p




What is the probability of either one event
OR another occurring?
If the events are mutually exclusive,
simply add the probabilities (Venn
diagram)
What is the p of having a boy or a girl?
P = 5. + .5 = 1
Multiplication rule



What is the probability of A and B
occurring?
If the events are independent of one
another, they can be multiplied
What is the p of having both schizophrenia
and epilepsy?
Probability distributions


A probability distribution is theoretical—we
expect it based on the laws of probability
That is different from an empirical
distribution—one which we actually
observe
Normal probability distribution




Probability distribution for continuous
events
Probability of an event occurring is higher
in the center of the curve
Declines for events at each of the two
ends (tails) of the distribution
Neither of the tails touches the x axis
(infinity)
Normal distribution




Theoretical probability distribution
Unimodal, symmetrical, bell-shaped curve
Symmetrical: draw a line down the center,
left and right halves would be mirror
images
Can be expressed as a mathematical
formula (p. 220)
Normal distribution





Family of normal distributions
Dependent on mean and SD
(Illustrate)
More spread out: larger SD
Narrower: smaller SD
Variations



Skewness
Skewed to the right or the left, as
opposed to symmetry
Kurtosis: degree of “peakedness” or
“flatness”
Area under the normal curve





Remember that for any continuous
distribution there is a mean and SD
Example: Mean = 10 and SD = 2
If the distribution is not skewed, the
majority (2/3) of scores will be from 8 to
12
8 and 12 are each one SD from the mean
See p. 225
Area under the normal curve




If a distribution is normal, we can express
standard deviation in terms of z scores
A z score = (a score – the mean)/SD
If we convert all our raw scores to z
scores, then we get what is call the
standard normal distribution
It STANDARDIZES our scores
Standard normal distribution




Then distributions of different measures
can be compared against one another
The standard normal distribution has a
mean of 0 and an SD of one
If you use the formula for z scores, all the
scores can be converted
If a distribution has a mean of 10, the z
score for 10 will be (10-10)/SD = 0
Standard normal distribution



If a distribution has a mean of 10 and an
SD of 2, the z score for 12 would be z =
(12-10)/2 = 1
The z score for 8 would be z = (8-10)/2 =
-1
The negative and positive sign have
meaning: a + sign means a score is
above the mean
Standard normal distribution


A minus sign means the score is less than
the mean
The z score also tell about magnitude—the
larger the z score, the further from the
mean, and the smaller the z score, the
closer to the mean
Standard normal distribution



We can also make statements about
where an individual score is in relation to
the rest of the distribution
.3413 (or 34.13%) of scores will fall
between the mean and 1 SD
.3413 (or 34.13%) of scores will fall
between the mean and – 1 SD
Standard normal distribution


.6826 (0r 68.26) of scores will be between
-1 and + 1 SD on a normal distribution
Thus, when we see a mean and SD, if it is
normally distributed, about 2/3 of the
scores will fall between the mean – the SD
and the mean + the SD
Standard normal distribution





50% of the scores will be above the mean
50% of the scores will be below the mean
.1359 (13.59%) will fall between -1 and -2
SD and between +1 and +2 SD
.0215 (2.15%) will fall between -2 and -3
SD and +2 and +3 SD
See p. 223, illustrate
Standardized normal distribution





Tells us about any distribution
Example of IQ scores, mean = 100, SD =
15
About 2/3 between 85 and 115
Less (13.5%) between 115 and 130, and
70 and 85
About 2% between 130 and 145, and 55
and 70
Standardized normal




SAT scores, mean = 500, SD = 100
Illustrate
Use of z table, p. 724
Reading the table
Utility of the normal distribution



Use of the normal distribution underlies
many statistical tests
Many variables not normally distributed
However, the normal distribution useful
anyway because of the apparently validity
of the Central Limit Theorem
Sampling distributions



To understand the Central Limit Theorem,
need to understand sampling distributions
Say we draw many samples, and calculate
a statistic for each sample, such as a
mean
When we draw the samples, the mean will
not be the same each time—there will be
variation
Sampling distributions


If you were to obtain some measure on
several samples of patients with the same
disorder, there would be variation in the
mean of the measure for each sample.
There is an actual mean for the entire
population of patients that have the
disorder, but that is not known, because
we don’t have measures for the whole
population
Sampling distributions


However, we could obtain means based on
a large number of samples
Central limit theorem: if an infinite
number of random samples of size n are
drawn from a population, the sampling
distribution of the sample means will itself
approach being normally distributed
(even if the measure is not itself normally
distributed)
Number of subjects




With sample sizes greater than 100, the
Central Limit Theorem can be used
If the measure is not terribly skewed, then
samples could be around 50
With sample sizes of less than 50, the
central limit theorem probably should not
be used.
Application of the central limit theorem
(ex)