Download sample mean

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Regression toward the mean wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript

Summary from last week
Descriptive Statistics

Exercises

 Descriptive data analysis in SPSS
 Mouse experiment continued

Exercise tomorrow Tuesday 13-15, room 4a58
(same as normal)
4 AIMS OF SCIENCE:

Reliability: Results can be replicated by others

Validity: Results show what we intend them to show

Generalizability: Results have a wider application than
merely the participants and the circumstances of the test

Importance: Results should be important (subjective).
 Results are never important if not reliable, valid and generalizable

Experiments are a useful tool for establishing cause
and effect - but other methods (e.g. observation)
are also important in science.

A good experimental design ensures that the only
variable that varies is the independent variable
chosen by the experimenter - the effects of
alternative confounding variables are eliminated (or
at least rendered unsystematic by randomisation).
Disadvantages of the experimental method:

Intrusive - participants know they are being observed, and this may
affect their behaviour.

Experimenter effects

Not all phenomena are amenable to experimentation, for practical or
ethical reasons (e.g. post-traumatic stress disorder, near-death
experiences, effects of physical and social deprivation, etc.)

Some phenomena (e.g. personality, age or sex differences) can only be
investigated by methods which are, strictly speaking, quasiexperimental.

Good experimental designs maximise validity

Internal validity:
Extent to which we can be sure that changes in the
dependent variable are due to changes in the
independent variable [meteor kills dinosaurs].

External validity (ecological validity, generalizability):
Extent to which we can generalise from our participants
to other groups (e.g. to real-life situations).
Research methods:

Observational methods
 No manipulation of variables

Quasi-experimental methods
 When we cannot do a real experiment

True experimental methods
 Manipulation of IVs, objective measurement of
effect of manipulation
TRUE EXPERIMENTAL DESIGNS
2 types: Between-groups versus within-subjects designs
Between-groups (independent measures)
Each subject participates in only one condition of the study.
Within-subjects (repeated measures)
Each subject does all of the conditions in a study.
Mixed designs
Mixture of both approaches


MULTI FACTORIAL DESIGNS
If two or more Independent Variables [factor = IV]

Advantage: Can observe how IVs interact
 E.g. Meteors and Mad Sharks

Disadvantage: For between-groups =
lots of participants needed

4+ IVs = hugely complicated statistics
 Rather run several experiments ...






Whenever possible, use true experimental
designs
Get at least one score per participant
Get ratio data
Use repeated-measures design whenever
possible – need fewer participants than
between-groups designs
Include an extra independent variable if
possible – more data
Don´t get too ambitious!

Populations and samples

Frequenzy distributions

Mode, Median, Mean

Standard Deviation

Confidence intervals

Descriptive statistics are used to describe
datasets

They form the first analyses that we do when
working with an unexplored dataset

We are interested in answering questions
about populations

A population is a collection of people, and can
be general to very specific
 Everyone on the planet
 Everyone with dark hair
 Everyone living in Copenhagen aged 21, playing
the cello

It is not practical to collect data from
everyone in our target population
 So we sample it
Sample
Population

Samples are used to make a guess about
what results we would get, if we used the
entire population

The smaller the sample, the higher the
chance of variation in their behavior
compared to the population

One of the first operations we perform having
obtained new data from a sample of people, is to
summarize them

This is done to figure out the general patterns
within the data

Two choices:
 Calculate a summary statistic, which tells us something
about the scores collected
 Draw a graph – for the same purpose
The simplest graph suammrizes how many
times each score collected occurs: A
frequency distribution (or histogram)
Frequency of errors made
9
Histogram
shows that
most people
had 6+ errors
8
7
Frequency of errors

6
5
4
3
2
1
0
1
2
3
4
5
6
Number of errors made
7
8
9
10

In this example we have the following
scores:
Number of
We can now calculate the
frequency of each score

e.g. 8 of 40 participants had
8 errors
errors
1
2
3
4
5
6
7
8
9
10
Frequency
2
4
2
2
2
4
4
8
6
6

Types of distributions:
Frequency distributions come in different
shapes and sizes

We need to be able to describe them

In an ideal world, all scores would be
distributed symmetrically around the centre
of all scores. This is called the normal
distribution


It is characterized by a symmetrical, bellshaped curve

The majority of the scores lie around the
center of the distribution
The further away we get from
the centre, the frequency of
the score occuring decreases

At the far ends, the odds of a score occuring is
very small indeed


Two main deviations from the normal distribution: Skewed
distributions
These are not symmertrical, and have the most frequent
scores clustered towards one end
Positive skew
Negative skew


Distributions also vary in their pointy-ness
This is called kurtosis – it reflects how scores cluster
towards either the tails of the distribution, or
towards the center

Apart from drawing graphs, we can calculate
summary statistics

Frequency distributions indicate that the
center of the scores is important

We want a single value to sum up our data (to
roughly tell us what the result was of our
experiment)
The Range:
 The difference between the highest and lowest scores. (i.e. range =
highest - lowest).
Advantages:
 Quick and easy to calculate, easy to understand.
Disadvantages:
 Unduly influenced by extreme scores: 3, 4, 4, 5, 100. Range = (100-3) = 97.
3, 4, 4, 5, 5. Range = (5-3) = 2.


Conveys no information about the spread of scores between the highest
and lowest scores.
e.g. 2, 2, 2, 2, 2, 20 and 2, 20, 20, 20, 20, 20 have exactly the same range
(18) but very different distributions.
The Mode:

The most frequent score in a set of scores.
6, 11, 22, 22, 96, 98. Mode = 22

Advantages of the mode:
(i) Simple to calculate, easy to understand.
(ii) The only average which can be used with nominal data.
Disadvantages of the mode:

(i) May be unrepresentative and hence misleading.
e.g.: 3, 4, 4, 5, 6, 7, 8, 8, 96, 96, 96.
Mode is 96 - but most of the scores are low numbers.

(ii) May be more than one mode in a set of scores.
e.g.: 3, 3, 3, 4, 4, 4, 5, 7, 9 has two modes!
Bimodal or multimodal distributions
The Median:
When scores are arranged in order of size, the median is
either

(a) the middle score (if there is an odd number of scores)
4, 5 ,6 ,7, 8, 8, 96. Median = 7.

or

(b) the average of the middle two scores (if there is an even
number of scores).
4, 5, 6, 7, 8, 8, 96, 96. Median = (7+8)/2 = 7.5.


Advantages of the median:
 (i) Resistant to the distorting effects of extreme high
or low scores.
Disadvantages of the median:
 (i) Ignores scores' numerical values, which is
wasteful if data are interval or ratio.
 (ii) More susceptible to sampling fluctuations than
the mean.
 (iii) Less mathematically useful than the mean.
Quartiles




The three values that split the sorted data into four equal parts.
Second Quartile = median.
Lower quartile = median of lower half of the data
Upper quartile = median of upper half of the data
The Mean:
 Add all the scores together and divide by the total
number of scores.
 e.g. (3+4+4+5+6) / 5 =
22 / 5 = 4.4
X
X

N
Advantages of the mean:

(i) Uses information from every single score.

(ii) Resistant to sampling fluctuation - i.e., varies the least from sample to
sample. (Important since we normally want to extrapolate from samples
to populations).
Disadvantages of the mean:

(i) Susceptible to distortion from extreme scores.
e.g.: 4, 5, 5, 6 : mean = 5. 4, 5, 5, 106: mean = 30.

(ii) Can only be used with interval or ratio data, not with ordinal or
nominal data.

The mean is a model of what happens in the
real world: the typical score

It is not a perfect representation of the data

How can we assess how well the mean
represents reality?
Slide 36

How do we know if the mean is a good description
of our dataset?

Example:
10, 10, 10, 0.1, 0.1, 0.1 – mean = 5.05
This is not very descriptive of the frequency
distribution!
Problem: The mean can be influenced by extreme
scores

To evaluate the mean, we need to see how it relates
to the actually recorded scores, i.e. how scores
deviate from the mean
6
5
4
3
2
1
0
0
1
2
3
4
5
6

The deviation from the scores to the mean allows us to estimate the
accuracy of the mean as a representation of the scores

There are several ways of doing this.

Sum of squared errors (SS): All differences between mean and score,
squared


A good mean produces a low SS.
The problem is that the more scores we have, the larger SS becomes!

We divide by number of samples: = variance (s2)

We can use the variance to compare the accuracy of the
mean across samples with different numbers of observations

Problem: Variance is in ”units squared”

To get back to the unit of our original score, we take the
square root of the variance : = standard deviation (s)

Standard deviation shows the accuracy of
the mean

Sum of squared errors, variance and standard
deviation all measure the same thing: The
accuracy of the mean

The scores are proportionate – a large SS will
result in large s2 which will result in a large s.

The mean is most accurate when the scores
are similar, less accurate if the scores are very
dissimilar).
Complications in using the mean and SD:
 We usually obtain the mean and SD from a sample –
not the parent population.

Sometimes we are content to describe our sample
per se, but sometimes we want to extrapolate to
the population from our sample.
Population
Sample
A sample mean is a good estimate of the population
mean.
 A sample SD tends to underestimate the
population SD

Therefore, when using the sample SD as a
description of the sample, divide by n (number of
scores).
 When using the sample SD as an estimate of the
population SD, divide by n-1 (to make the SD larger
than it would otherwise have been).

sample SD as
description of a
sample:
sample SD as an
estimate of the
population SD:
 X  X 
2
s
n
sample mean as
description of a
sample:
X
x
n
population SD if
you measure every
member of the
population:
 X  X 
2
s
n 1
 X  X 
2
 
N
sample mean as an population mean
estimate of
(“mu”):
population mean:
X
x
n
X

N


The variance and SD tell us something about the frequency
distribution
Mean is center of the distribution; the smaller the SD, the
closer scores to the center:

Imagine we collect 1.000.000 samples of data
about how many meteors it takes to kill a Trex, calculating the mean for each

From the means and SD´s, we can calculate
the boundaries within which those samples
lie – e.g. 2 to 25




We can now say that we are reasonably sure
that any other sample will have a mean
between 2 and 25
Often we want to describe how ”sure” we are
– often we want to be 95% sure
Say that 95% of our samples fall between 3
and 24.
[3-24] is known as a confidence interval

Calculating the confidence interval

Lower boundary = mean-2*SE

Upper boundary = mean+2*SE

Mean is always at the centre of the confidence
interval

The more accurate the mean, the smaller the
confidence interval

Example
 Mean meteor count from our 1 million samples: 10
 Standard error = 2.5
 95% confidence interval:
 Lower boundary: = 10-(2*2.5) = 5
 Upper boundary: = 10+(2*2.5) = 15
 So 95% of all sample means should lie between 5-15
meteors

We can now describe the sample, but:

How well does our sample represent the
population?
HEIGHT OF ALL ADULT WOMEN IN ENGLAND
high
µ = 63 in.
σ = 2 in.
frequency
of raw
scores
σ
low
59
61
63
65
Height (inches)
67
Sample of 100 adult women from England
X  64.2
high
s = 2.5 inches
N = 100
frequency
of raw
scores

s
low
64.2 in

If we take repeated samples, each sample has a mean
height, a standard deviation (s), and a shape/distribution.
s1
s2
X2
s3
X3
Samples
.
.
.
.
.
.
 Due to random fluctuations, each sample is different - from
other samples and from the parent population.
 These differences
are predictable - we can use samples to


make inferences about their parent populations.
X1
X  30
X  25
X  33
X  30
X  29

Often we have more than one sample of a population
This permits the calculation different sample means,
whose value will vary, giving us a sampling distribution
Sampling distribution
 = 10
Mean = 10
SD = 1.22
4
3
M = 10
M=9
M = 11
M=9
2
1
M = 10
M=8
Frequency

M = 12
0
6
M = 10
M = 11
7
8
9
10
11
Sample Mean
12
13
14

The sampling distribution informs about the
behavior of samples from the population

We can calculate SD for the sampling
distribution

This is called the Standard Error of the Mean
(SE)


SE shows how much variation there is within
a set of sample means
Therefore also how likely a specific sample
mean is to be erroneous, as an estimate of
the true population mean
means of
different
samples
actual
population
mean

SE = SD of the sample means

We can estimate SE via one sample
x



n
Estimate SE = SD of the sample divided with
the square root of the sample size (n)

If the SE is small, our obtained sample mean is more likely to be
similar to the true population mean than if the SE is large
x



n
Increasing n reduces the size of the SE
 A sample mean based on 100 scores is probably closer to the population
mean than a sample mean based on 10 scores (!)

Variation between samples decreases as sample size increases –
because extreme scores become less important to the mean
2
2
X 
  0.20
100 10
Suppose the n = 16 instead of 100

2
2
X 
  0.50
16 4

The distribution of sample means is normally distributed

... No matter what the shape of the original distribution of
raw scores in the population.

This is due to the
Central Limit Theorem

This holds true only for
sample sizes of 30 and greater
Means: odds of sample means being similar is very high

Example: Annual income of American citizens.

This distribution is positively skewed. Many
people in the lower and medium income bracket;
very few are ultra rich.

Suppose we take many samples of size N = 50.

The sampling distribution
of the mean will be normal.
Given the distribution is normal, we can do
interesting things
 This is because the normal distribution is
symmetrical


For example, various proportions of scores fall
within certain limits of the mean
 68% fall within the range of the mean +/- 1 standard
deviation
 95% within +/- 2 standard deviations
 Etc. - more on this next week!
Z-scores
 Standardising a score with respect to the other scores
in the group.
 Expresses a score in terms of how many standard
deviations it is away from the mean.
 The distribution of z-scores has a mean of 0 and SD = 1.
Score
XX
z
s
Sample
mean
SD
Going beyond the data: Z-scores
Using z-scores, we can represent a given score in terms of
how different it is from the mean of the group of scores.
SD = 2
μ = 63
Xi = 64
How to calculate z-score:
zX 
Xi  

64  63 1

  0.50 - SD from the mean
2
2
We can do the same thing to calculate the relationship of a
sample mean to the population mean:
μ = 63
64 
X
(1) we obtain a particular sample mean;
(2) we can represent this in terms of how different it is from the
mean of its parent population.

zX 
X 
64  63 1

  2.00

2
2
4
N
16

If we obtain a sample mean that is much higher or lower
than the population mean, there are two possible reasons:

(1) Our sample mean is a rare "fluke" (a quirk of sampling
variation);

(2) Our sample has not come from the population we
thought it did, but from some other, different, population.

The greater the difference between the sample and
population means, the more plausible (2) becomes
Example: The human population I.Q. is 100.
 A random sample of people has a mean I.Q. of 170.
high
frequency
of sample
means
low
population mean I.Q. (100)
sample mean I.Q. (170)
There are two explanations:
 (1) the sample is a fluke: By chance our
random sample contained a large number of
highly intelligent people.

(2) the sample does not come from the
population we thought they did: Our sample
was actually from a different population e.g., aliens masquerading as humans.
This logic can be extended to the difference between two
samples from the same population:
We compare two groups of people:
 An experimental group and a control group.
 Experimental group get a "wolfman" drug.
 Control group get a harmless placebo.

Dependent Variable: Number of dog-biscuits consumed.

At the start of the experiment, they are two samples
from the same population ("humans").

At the end of the experiment, are they:

(a) still two samples from the same
population? (i.e., still two samples of
"humans" - our experimental treatment
has left them unchanged)


OR:
(b) now samples from two different populations one from the "population of humans" and one from
the "population of wolfmen"?
We can decide between these alternatives as follows:

The differences between any two sample means from the same
population are normally distributed, around a mean difference
of zero.

Most differences will be relatively small, since the Central Limit
Theorem tells us that most samples will have similar means to
the population mean (similar means to each other).

If we obtain a very large difference between our sample means,
it could have occurred by chance, but this is very unlikely - it is
more likely that the two samples come from different
populations.