Download Population

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript



Summary from last week
The normal distribution
Hypothesis testing
 Type I and II errors
 Statistical power

Exercises
 Exercises on SD etc.
 Descriptive data analysis in SPSS
We covered the following:





Populations and samples
Frequenzy distributions
Mode, Median, Mean
Standard Deviation
Confidence intervals

We are [often] interested in answering questions
about populations
 i.e. going beyond the samples

Samples are used to make a guess about what
results we would get, if we used the entire
population

One of the first operations we perform having
obtained new data from a sample of people, is to
summarize them

This is done to figure out the general patterns
within the data

Two choices:
 Calculate a summary statistic, which tells us something
about the scores collected
 Draw a graph – for the same purpose
The simplest graph summarizes how many
times each score collected occurs: A
frequency distribution (or histogram)
Frequency of errors made
9
Histogram
shows that
most people
had 6+ errors
8
7
Frequency of errors

6
5
4
3
2
1
0
1
2
3
4
5
6
Number of errors made
7
8
9
10
Different ways of summarizing data: The Mean:
 Add all the scores together and divide by the total
number of scores.
 e.g. (3+4+4+5+6) / 5 =
22 / 5 = 4.4
X
X

N

Standard deviation shows the accuracy of
the mean

Often we have more than one sample of a population
This permits the calculation different sample means,
whose value will vary, giving us a sampling distribution
Sampling distribution
 = 10
Mean = 10
SD = 1.22
4
3
M = 10
M=9
M = 11
M=9
2
1
M = 10
M=8
Frequency

M = 12
0
6
M = 10
M = 11
7
8
9
10
11
Sample Mean
12
13
14

The distribution of sample means is normally distributed

... No matter what the shape of the original distribution of
raw scores in the population.

This is due to the
Central Limit Theorem

This holds true only for
sample sizes of 30 and greater
Means: odds of sample means being similar is very high

If we obtain a sample mean that is much higher or lower
than the population mean, there are two possible reasons:

(1) Our sample mean is a rare "fluke" (a quirk of sampling
variation);

(2) Our sample has not come from the population we
thought it did, but from some other, different, population.

The greater the difference between the sample and
population means, the more plausible (2) becomes
Sample means from populations tend to be
similar. If not, there are two explanations:

(1) the sample is a fluke: By chance our
random samples contained people with very
different properties

(2) the sample does not come from the
population we thought they did
We can decide between these alternatives as follows:

The differences between any two sample means from the same
population are normally distributed, around a mean difference
of zero.

Most differences will be relatively small, since the Central Limit
Theorem tells us that most samples will have similar means to
the population mean (similar means to each other).

If we obtain a very large difference between our sample means,
it could have occurred by chance, but this is very unlikely - it is
more likely that the two samples come from different
populations.

The term does not necessarily refer to a set of
individuals or items (e.g. cars). Rather, it refers
to a state of individuals or items.

Example: After a major earthquake in a city (in
which no one died) the actual set of individuals
remains the same. But the anxiety level, for
example, may change. The anxiety level of the
individuals before and after the quake defines
them as two populations.
Frequency of errors
Frequency of errors made
9
8
7
6
5
4
3
2
1
0
1
2
3
4
5
6
7
Number of errors made

The Normal curve is a mathematical abstraction which
conveniently describes ("models") many frequency
distributions of scores in real-life.
8
9
10
length of time before someone
looks away in a staring contest:
length of pickled gherkins:
Francis Galton (1876) 'On the height and weight of boys aged 14, in town and
country public schools.' Journal of the Anthropological Institute, 5, 174-180:
Francis Galton (1876) 'On the height and weight of boys aged 14, in town and
country public schools.' Journal of the Anthropological Institute, 5, 174-180:
Height of 14 year-old children
16
country
14
town
10
8
6
4
2
0
51
-5
2
53
-5
4
55
-5
6
57
-5
8
59
-6
0
61
-6
2
63
-6
4
65
-6
6
67
-6
8
69
-7
0
frequency (%)
12
height (inches)
Properties of the Normal Distribution:
1. It is bell-shaped and asymptotic at the extremes.
2. It's symmetrical around the mean.
3. The mean, median and mode all have same value.
4. It can be specified completely, once mean and
s.d. are known.
5. The area under the curve is directly proportional
to the relative frequency of observations.
e.g. here, 50% of scores fall below the mean, as
does 50% of the area under the curve.
e.g. here, 85% of scores fall below score X,
corresponding to 85% of the area under the curve.
Relationship between the normal curve and the
standard deviation:
frequency
All normal curves share this property: the s.d. cuts off
a constant proportion of the distribution of scores:-
68%
95%
99.7%
-3
-2
-1
mean
+1
+2
+3
Number of standard deviations either side of mean

About 68% of scores will fall in the range of the mean plus and minus 1 s.d.;

95% in the range of the mean +/- 2 s.d.'s;

99.7% in the range of the mean +/- 3 s.d.'s.

e.g.: I.Q. is normally distributed, with a mean of 100 and s.d. of 15.

Therefore, 68% of people have I.Q's between 85 and 115 (100 +/- 15).

95% have I.Q.'s between 70 and 130 (100 +/- (2*15).

99.7% have I.Q's between 55 and 145 (100 +/- (3*15).
68%
85 (mean - 1 s.d.)
115 (mean + 1 s.d.)

Just by knowing the mean, SD, and that scores are normally
distributed, we can tell a lot about a population.

If we encounter someone with a particular score, we can
assess how they stand in relation to the rest of their
group.

e.g.: someone with an I.Q. of 145 is quite unusual: This is 3
SD's above the mean. I.Q.'s of 3 SD's or above occur in only
0.15% of the population [ (100-99.7) / 2 ].
 Note: divide with 2 as there are 2 sides to the normal distribution!
Conclusions:

Many psychological/biological properties are
normally distributed.

This is very important for statistical
inference (extrapolating from samples to
populations)

Scientists are interested in testing hypotheses
 Testing the scientific question they are interested in

With most experimental work, we have a prediction that our
manipulation of the IV will lead to a result – this is the
experimental hypothesis

The reverse of the experimental hypothesis is called the null
hypothesis – this specifies that our prediction was wrong
and that the experiment did not have an effect

Example: Alchohol makes you fall over

The experimental hypothesis (H) is that those
that drink alchohol will fall over more than those
that do not

The null hypothesis (H0) is that people
will fall over the same amount
regardless of how much alchohol they
have drunk

Inferential statistics are used to discover whether the
experimental hypothesis is likely to be true

We can never be 100% sure– so we deal with probabilities

We calculate the probability that the result we have obtained are
a chance result – typically 5% (0.05)

As this probability decreases, we become more confident that
out experimental hypothesis is correct (and the null hypothesis
can be rejected)

Working with humans, we normally work with a 95% threshold for
confidence
Example:
 Two groups of dinosaurs
 We hit one of the groups over the head with meteors,
measuring how many hits it take before they get a headache.
 We would expect the means of the two groups to be similar
– i.e. require similar numbers of meteors
▪ Only different means if by random chance we got dinosaurs from the
extremes of the populations – unlikely given the normal distribution
 We would expect our manipulation to have an effect on the
mean of the experimental group
 We manipulate our experimental group
 Measure the mean of the DV.
 If the mean is different from the control
group, there are two possible explanations:
▪ The manipulation changed the thing we are measuring –
we now have samples from 2 different populations
▪ The manipulation did not have an effect, but we just two
samples of people who are by random chance very
different – the observed difference is a fluke of sampling

The bigger the difference between our sample
means, the more likely for our experiment to have
had an effect!

When means are similar between control groups
and experiment groups after experimentation, we
are less confident about our experiment having had
an effect

These ideas form the basis for hypothesis testing

We calculate the probability that two samples come from
the same population

When this is high, we conclude our experiment had no effect
(null hypothesis is true)
When it is low, we conclude the experiment had an effect
(experimental hypothesis is true)


If the propability of the two samples being from the same
population is 5% or less, we accept that the experimental
manipulation was succesful

How do we calculate propabilities that samples are from the
same population?


This depends on the experimental design and the test used
Some general principles:
 We know there are two types of variation in an
experiment:
▪ Systematic: Variation due to the experimental manipulation of the IV
▪ Unsystematic: Variation due to natural differences in people

We compare the amount of variation created by the experimental
manipulation with the amount of variance due to random factors – this
because we expect our experiment to have a greater effect than than
the random factors alone
 When trying to find our if the experimentally caused variance
is bigger than random variance, we calculate a test statistic
 A test statistic is a statistic that has a known frequency
distribution – by knowing this we can work out the
probability of obtaining a particular value
▪ E.g.: 2% chance of getting the value ”34”
 In general: Test statistic = systematic variance /
unsystematic variance
 Should be greater than 1!

Once we have calculated a test statistic, we can use its frequency
distribution to tell us how probable it was that we got this value

E.g.: ”A test statistic value of 34 has a 2% (0.02) chance of occuring”

The bigger the test statistic, the less likely to occur by chance

When the probability of a test statistic size falls below 0.05
(5%), we have enough confidence to assume the test statistic is as
large as it is because of our experimental manipulation – and we
accept our experimental hypothesis

Which test statistic to use? This depends on the experiment
design and the test we are using – more on this next week!
Two types errors can occur when testing hypothesis:


Type 1 error: Reject H0 when it is true
We think our experimental manipulation has had an effect,
when in fact it has not
 (Also known as α, "alpha“ error)

Type 2 error: Retain H0 when it is false
We think our experimental manipulation has not had an
effect, when in fact it has
 (Also known as β, "beta“ error)

Any observed difference between two sample means could
in principle be either "real" or due to chance - we can never
tell for certain

But:

Large differences between samples from the same
population are unlikely to arise by chance.

Small differences between samples are likely to have arisen
by chance.

Problem: Reducing the chances of making a Type 1 error
increases the chances of making a Type 2 error, and vice versa.

We therefore compromise between the chances of making a
Type 1 error, and the chances of making a Type 2 error:

We (generally) set the probability of making a Type 1 error at
0.05 (5%)

When we do an experiment, we accept a difference between
two samples as "real", if a difference of that size would be likely
to occur, by chance, 5% of the time, i.e. five times in every
hundred experiments performed
Non-directional (two-tailed) hypothesis
 Merely predicts that sample mean
X will be significantly different from
population mean (µ)
X < µ: Differences this extreme X > µ: Differences this extreme (or
(or more) occur by chance with
more) occur by chance with p =

p = 0.025.
0.025.

X<µ
X=µ
possible differences
X>µ
Directional (one-tailed) hypothesis
More precise - predicts the direction of difference (i.e.,
either predicts is bigger
than µ, or predicts is X
X
smaller than µ).
X > µ: Differences this extreme
(or more) occur
 by chance

with p = 0.05.

X <µ
X =µ
possible differences
X>µ

So: Whether a test is one-tailed or two-tailed is
related to whether our hypothesis is directional or
not
 Men and women eat different amounts of chocolate ->
non-directional
 Men eat more chocolate than women -> directional

Some statistical tests are run differently depending
on whether the hypothesis is directional or nondirectional