Download statistics - remember the pebble mass

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
tom.h.wilson
[email protected]
Department of Geology and Geography
West Virginia University
Morgantown, WV
Back to statistics remember the pebble
mass distribution?
Pebble masses collected from beach A
0.40
0.35
Probability
0.30
0.25
0.20
0.15
0.10
0.05
0.00
150
200
250
300
350
400
Mass (grams)
450
500
550
224
242
256
256
265
269
277
283
283
283
284
287
290
294
301
301
302
303
307
307
311
314
317
318
318
322
324
324
326
327
329
330
331
331
331
334
335
338
338
338
340
340
341
342
342
343
346
346
350
352
353
355
355
355
357
358
359
359
364
366
367
368
369
370
370
371
373
374
374
375
379
380
383
384
384
384
386
389
389
393
394
394
395
397
400
401
403
403
403
407
408
409
420
422
423
432
433
435
450
454
Mt. Aso, Japan (see Davis pages 179-181)
Number of Eruptions
50
40
30
20
10
0
0
10
20
30
40
50
Years between successive Eruptions
60
Pebble masses collected from beach A
0.40
0.35
Probability
0.30
0.25
0.20
0.15
0.10
0.05
0.00
150
200
250
300
350
400
450
500
550
Mass (grams)
The probability of occurrence of specific
values in a sample often takes on that
bell-shaped Gaussian-like curve, as
illustrated by the pebble mass data.
Probability Distribution of Pebble Masses
0.01
Probability
0.008
0.006
Series1
0.004
Series2
0.002
0
0
200
400
600
800
Pebble Mass (grams)
The Gaussian (normal) distribution of pebble
masses looked a bit different from the
probability distribution we derived directly from
the sample, but it provided a close
approximation of the sample probabilities.
Equivalent Gaussian Distribution of Pebble Masses
Probability
0.01
0.008
0.006
Series1
0.004
Series2
0.002
0
0
200
400
600
800
Pebble Mass (grams)
Range (g)
201-250
251-300
301-350
351-400
401-450
451-500
Measured
probability
0.02
0.12
0.35
0.36
0.14
0.01
Range
(multiple of s)
-3.10 to -2.06
-2.06 to -1.02
-1.02 to 0.02
0.02 to 1.06
1.06 to 2.10
2.10 to 3.13
Gaussian (normal)
probability
0.019
0.134
0.354
0.347
0.127
0.017
The pebble mass data represents just
one of a nearly infinite number of
possible samples that could be drawn
from the parent population of pebble
masses.
We obtained one estimate of the
population mean and this estimate is
almost certainly incorrect.
What might additional pebble mass
samples look like?
Sample2 <x>=350.6
Sample1 <x>=348.84
25
20
20
15
N 15
N
10
10
5
5
0
0
200 300 400 500
200 300 400 500
Mass (grams)
Mass (grams)
Sample 3 <x>=356.43
N
Sample 4 <x>=354.5
25
30
20
25
20
N
15
10
5
15
10
5
0
0
200 300 400 500
200 300 400 500
Mass (grams)
Mass (grams)
Sample 5 <x>=348.42
20
15
N
10
5
0
200 300 400 500
Mass (grams)
These samples were
drawn at random
from a parent
population having
mean 350.18 and
variance of 2273
(standard deviation
of 47.68 gm).
Sample2 <x>=350.6
Sample1 <x>=348.84
25
20
20
15
N 15
N
10
10
5
5
0
0
200 300 400 500
200 300 400 500
Mass (grams)
Mass (grams)
Sample 3 <x>=356.43
N
Sample 4 <x>=354.5
25
30
20
25
20
N
15
10
5
Mean
15
10
5
0
0
200 300 400 500
200 300 400 500
Mass (grams)
Mass (grams)
Sample 5 <x>=348.42
20
15
N
10
5
0
200 300 400 500
Mass (grams)
Note that each of
the sample means
differs from the
population mean
348.84
350.6
356.43
354.5
348.42
Variance
2827.5
2192.59
2124.63
1977.63
2611.3
Standard
deviation
53.17
46.82
46.09
44.47
51.1
The distribution of 35 means
calculated from 35 samples drawn
at random from a parent population
with assumed mean of 350.18 and
variance of 2273.
Distribution of Means
12
10
8
N
6
4
2
0
330
335
340
345
350
Mass
355
360
365
Distribution of Means
12
10
8
N
6
4
2
0
330
335
340
345
350
355
360
365
Mass
The mean of the above distribution of means
is 350.45.
Their variance is 21.51 (i.e. standard
deviation of 4.64).
Statistics of the distribution of sample
means tell us something different from the
statistics of the individual samples.
The statistics of the distribution of
means give us information about the
variability we can anticipate in the mean
value of 100 specimen samples.
Just as with the individual pebble mass
values observed in the sample,
probabilities can also be associated with
the possibility of drawing a sample with
a certain mean and standard deviation.
This is how it works You go out to your beach and take a
bucket full of pebbles in one area and
then go to another part of the beach and
collect another bucket full of pebbles.
You have two samples, and each has
their own mean and standard deviation.
You ask the question - Is the mean
determined for the one sample different from
that determined for the second sample?
To answer this question you use
probabilities determined from the
distribution of means (inferred
indirectly from those of an individual
sample).
The means of the samples may
differ by only 20 grams. If you look
at the range of individual masses
which is around 225 grams, you
might conclude that these two
samples are not really different.
However, you are dealing with means
derived from individual samples each
consisting of 100 specimens.
The distribution of means is different
from the distribution of specimens. The
range of possible means is much smaller.
Histogram of pebble masses
Distribution of means
40
12
10
Number of occurrences
Number of Occurrences
35
30
25
20
15
10
8
6
4
2
5
0
200
250
300
350
400
Mass (grams)
450
500
0
200
250
300
350
400
Mean Mass (grams)
450
500
Thus, when trying to estimate the
possibility that two means come from
the same parent population, you need
to examine probabilities based on the
standard deviation of the means and
not those of the specimens.
Number of
standard
deviations
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Area
0.000
0.080
0.159
0.236
0.311
0.383
0.451
0.516
0.576
0.632
0.683
Number of
standard
deviations
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
Area
0.729
0.770
0.806
0.838
0.866
0.890
0.911
0.928
0.943
0.954
Number of
standard
deviations
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0
Area
.964
.972
.979
.984
.988
.991
.993
.995
.996
.997
In the class example just presented
we derived the mean and standard
deviation of 35 samples drawn at
random from a parent population
having a standard deviation of 47.7.
Recall that the standard deviation
of means was only 4.64 and that
this is just about 1/10th the
standard deviation of the sample.
This is the standard deviation of the
sample mean from the true mean and
is referred to as the standard error.
The standard error, se,
is estimated from the
standard deviation of
the sample as se  sˆ / N
What is a significant difference?
To estimate the likelihood that a sample
having a specific calculated mean and
standard deviation comes from a parent
population with given mean and standard
deviation, one has to define some
limiting probabilities.
There is some probability, for example,
that you could draw a sample whose mean
might be 10 standard deviations from
the parent mean. It’s really small, but
still possible.
What chance of being wrong will you accept?
This decision about how different the
mean has to be in order to be
considered statistically different is
actually somewhat arbitrary.
In most cases we are willing to accept
a one in 20 chance of being wrong or a
one in 100 chance of being wrong.
This chance of being right is 19 out of
20 or 99 out of 100 and this chance is
referred to as our “confidence limit.”
Confidence limits used most often
are 95% or 99%. The 95%
confidence limit gives us a one in 20
chance of being wrong. The
confidence limit of 99% gives us a
one in 100 chance of being wrong.
The risk that we take (the chance
of being wrong) is referred to as
the alpha level.
If our confidence limit is 95%
our alpha level is 5% or 0.05.
If our confidence limit is 99%  is 1% or 0.01
Whatever your bias may be, whatever
your desired result, you can’t go
wrong in your presentation by clearly
stating the confidence limit you used.
Number of
standard
deviations
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Area
0.000
0.080
0.159
0.236
0.311
0.383
0.451
0.516
0.576
0.632
0.683
Number of
standard
deviations
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9 1.96
2.0
Area
0.729
0.770
0.806
0.838
0.866
0.890
0.911
0.928
0.943
0.954
Number of
standard
deviations
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0
Area
.964
.972
.979
.984
.988
.991
.993
.995
.996
.997
In the above table of probabilities (areas
under the normal distribution curve), you
can see that the 95% confidence limit
extends out to 1.96 standard deviations
from the mean.
The standard deviation we are interested
in using when comparing means is the
standard error, se.
Assuming that our standard error is 4.8
grams, then 1.96se corresponds to  9.41
grams.
Notice that the 5% probability of being
wrong is equally divided into 2.5% of the
area greater than 9.41grams from the
mean and less than 9.41 grams from the
mean.
You probably remember the discussions
of one- and two-tailed tests.
The 95% probability is a two-tailed probability.
So if your interest is only to make the
general statement that a particular
mean lies outside  1.96 standard
deviations from the assumed population
mean, your test is a two-tailed test.
If you wish to be more specific in your
conclusion and say that the estimate is
significantly greater or less than the
population mean, then your test is a onetailed test.
The probability of error in your one-tailed
test is 2.5% rather than 5%.
i.e.  is 0.025 rather than 0.05
Using our example dataset, we have
assumed that the parent population
has a mean of 350.18 grams, thus
all means greater than 359.6 grams
or less than 340.8 grams are
considered to come from a
different parent population at the
95% confidence level.
Mean
348.84
350.6
356.43
354.5
348.42
Variance
2827.5
2192.59
2124.63
1977.63
2611.3
Standard
deviation
53.17
46.82
46.09
44.47
51.1
i.e.
±9.41
Note that the samples we
drew at random from the
parent population have
means which lie inside this
range and are therefore
not statistically different
from the parent population.
It is worth noting that - we could very
easily have obtained a different “first
sample” which would have had a different
mean and standard deviation. Remember
that we designed our statistical test
assuming that the sample mean and
standard deviation correspond to those of
the population.
Mean
348.84
350.6
356.43
354.5
348.42
Variance
2827.5
2192.59
2124.63
1977.63
2611.3
Standard
deviation
53.17
46.82
46.09
44.47
51.1
This would give us different confidence
limits and slightly different answers. Even
so, the method provides a fairly objective
quantitative basis for assessing statistical
differences between samples.
Mean
348.84
350.6
356.43
354.5
348.42
Variance
2827.5
2192.59
2124.63
1977.63
2611.3
Standard
deviation
53.17
46.82
46.09
44.47
51.1
95%
C. L.
338.29 - 359.39
341.31 - 359.89
347.28 - 365.57
345.68 - 363.33
338.28 - 358.56
The method of testing we have just
summarized is known as the z-test,
because we use the z statistic to
estimate probabilities, where
m2  m1
z
se
Not to be
confused with the
standardized
variable
Remember the t-test?
Tests for significance can be improved if
we account for the fact that estimates
of mean derived from small samples are
inherently sloppy estimates.
The t-test acknowledges this sloppiness
and compensates for it by making the
criterion for significant difference more
stringent when the sample size is smaller.
The z-test and t-test yield similar
results for relatively large samples larger than 100 or so.
The 95% confidence limit for example,
diverges considerably from 1.96 s for
smaller sample size. The effect of
sample size (N) is expressed in terms
of degrees of freedom which is N-1.
~2.6 s
1.96 s
Making the t-test
using tables of the
critical values of t
for various levels
of confidence.
How do we compute the critical
value of t - the test statistic?
The test statistic
X1  X 2
t
se
We have the means of two samples X1 & X 2
se
is the standard error,
But its computed differently from the
single sample standard error which is just
sˆ
se 
N
where -
ŝ is the unbiased estimate of
the standard deviation
X1  X 2
t
se
In this case se  s p
1 1

n1 n2
sˆ
se 
N
where
sp is the pooled estimate of the
standard deviation derived as follows
2
2
(
n

1
)
s

(
n

1
)
s
1
2
2
s 2p  1
n1  n2  2
Going through this computation for the samples
of strike at locations A and B yields t ~ 5.2.
Strike Location A
11
26
29
30
34
36
37
42
44
45
47
48
48
50
51
54
54
55
61
61
43.15
Mean
Standard Deviation 12.6959
Variance
161.19
Strike Location B
60
54
69
59
58
59
66
62
48
62
53
72
62
69
70
41
59
76
54
64
60.85
8.41224
70.766
Is bedding strike
measured at these
two locations
statistically
different?
Evaluating the statistical significance
of differences in strike at locations A
and B using the t-test
161.19 6.5895 70.766 14.408
-2.532306
-1.350826
-1.11453
-1.035764
-0.720703
-0.563172
-0.484407
-0.09058
0.0669505
0.1457159
0.3032466
0.3820119
0.3820119
0.5395426
0.618308
0.854604
0.854604
0.9333694
1.4059615
1.4059615
0.001273
0.012619
0.016885
0.018378
0.024236
0.026815
0.027944
0.031294
0.031352
0.031091
0.030011
0.029212
0.029212
0.027166
0.025955
0.02181
0.02181
0.020327
0.011695
0.011695
Probability Distribution of Strikes at Location A
0.035
Normal Probability
Strike A
Dip A
Strike B
Dip B
11
22
60
25
26
23
54
22
29
21
69
22
30
24
59
19
34
21
58
27
36
23
59
16
37
23
66
18
42
21
62
22
44
22
48
23
45
22
62
16
47
22
53
16
48
18
72
19
48
26
62
21
50
24
69
21
51
26
70
26
54
30
41
20
54
21
59
16
55
25
76
22
61
21
54
22
61
21
64
12
43.15
22.8
60.85
20.25
12.69594 2.566997 8.41224 3.795773
0.03
0.025
0.02
Series1
0.015
0.01
0.005
0
0
20
40
60
80
Strike (degrees)
Probability that average strike at locations A and B is different.
Probability that average dip at locations A and B is different.
The cells below refer to explicit computation of the t statistic
see handout
The pooled variance =
The pooled estimate of the standard deviation =
t=
3.58E-06
115.9763
3.40553
5.19743
The value of our test
statistic, t = 5.2
Degrees of freedom =
n1+n2-2 = 38
Closest value in
the table is 40
Note the similarities of the table
below to Davis’s Table A.2
 = 0.1 %
=0.001 as a one-tailed
probability
= 1 chance in 1000 that
these two means are
actually equal or were
drawn from the same
parent population.
Critical Value ( = 0.001)
300
250
200
150
100
50
0
0
20
40
60
80
100
120
100
120
Degrees of Freedom
Critical Value ( = 0.001)
The variation of
critical value from
30 to 40 degrees of
freedom is small and
we could assume it
to be linear.
t - distribution
350
t - distribution
5.00
4.50
4.00
3.50
3.00
2.50
0
20
40
60
80
Degrees of Freedom
Since t ~ 5.2, we know
that the actual
probability for this to
happen will be much
less than 0.001.
Related documents