Download P - UCL

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
MA in English Linguistics
Experimental design and statistics II
Sean Wallis
Survey of English Usage
University College London
[email protected]
Outline
• Plotting data with Excel™
• The idea of a confidence interval
• Binomial  Normal  Wilson
• Interval types
– 1 observation
– The difference between 2 observations
• From intervals to significance tests
Plotting graphs with Excel™
• Microsoft Excel is a very useful tool for
 collecting data together in one place
 performing calculations
 plotting graphs
• Key concepts of spreadsheet programs:
– worksheet - a page of cells (rows x columns)
• you can use a part of a page for any table
– cell - a single item of data, a number or text string
• referred to by a letter (column), number (row), e.g. A15
• each cell can contain:
– a string: e.g. ‘Speakers
– a number: 0, 23, -15.2, 3.14159265
– a formula: =A15, =$A15+23, =SQRT($A$15), =SUM(A15:C15)
Plotting graphs with Excel™
• Importing data into Excel:
– Manually, by typing
– Exporting data from ICECUP
• Manipulating data in Excel to make it useful:
– Copy, paste: columns, rows, portions of tables
– Creating and copying functions
– Formatting cells
• Creating and editing graphs:
– Several different types (bar chart, line chart, scatter, etc)
– Can plot confidence intervals as well as points
• You can download a useful spreadsheet for
performing statistical tests:
– www.ucl.ac.uk/english-usage/statspapers/2x2chisq.xls
Recap: the idea of probability
• A way of expressing chance
0 = cannot happen
1 = must happen
• Used in (at least) three ways last week
P = true probability (rate) in the population
p = observed probability in the sample
a = probability of p being different from P
–
–
sometimes called probability of error, pe
found in confidence intervals and significance tests
The idea of a confidence interval
• All observations are imprecise
– Randomness is a fact of life
– Our abilities are finite:
• to measure accurately or
• reliably classify into types
• We need to express caution in citing numbers
• Example (from Levin 2013):
– 77.27% of uses of think in 1920s data
have a literal (‘cogitate’) meaning
The idea of a confidence interval
• All observations are imprecise
– Randomness is a fact of life
– Our abilities are finite:
• to measure accurately or
• reliably classify into types
• We need to express caution in citing numbers
• Example (from Levin 2013):
– 77.27% of uses of think in 1920s data
have a literal (‘cogitate’) meaning
Really? Not 77.28, or 77.26?
The idea of a confidence interval
• All observations are imprecise
– Randomness is a fact of life
– Our abilities are finite:
• to measure accurately or
• reliably classify into types
• We need to express caution in citing numbers
• Example (from Levin 2013):
– 77% of uses of think in 1920s data
have a literal (‘cogitate’) meaning
The idea of a confidence interval
• All observations are imprecise
– Randomness is a fact of life
– Our abilities are finite:
• to measure accurately or
• reliably classify into types
• We need to express caution in citing numbers
• Example (from Levin 2013):
– 77% of uses of think in 1920s data
have a literal (‘cogitate’) meaning
Sounds defensible. But how confident
can we be in this number?
The idea of a confidence interval
• All observations are imprecise
– Randomness is a fact of life
– Our abilities are finite:
• to measure accurately or
• reliably classify into types
• We need to express caution in citing numbers
• Example (from Levin 2013):
– 77% (66-86%*) of uses of think in 1920s
data have a literal (‘cogitate’) meaning
The idea of a confidence interval
• All observations are imprecise
– Randomness is a fact of life
– Our abilities are finite:
• to measure accurately or
• reliably classify into types
• We need to express caution in citing numbers
• Example (from Levin 2013):
– 77% (66-86%*) of uses of think in 1920s
data have a literal (‘cogitate’) meaning
Finally we have a credible range of values needs a footnote* to explain how it was
calculated.
Binomial  Normal  Wilson
• Binomial distribution
– Expected pattern of observations found when repeating an
experiment for a given P (here, P = 0.5)
– Based on combinatorial mathematics
F
P
0.1
0.3
0.5
p
0.7
0.9
Binomial  Normal  Wilson
• Binomial distribution
– Expected pattern of observations found when repeating an
experiment for a given P (here, P = 0.5)
– Based on combinatorial mathematics
F
– Other values of P have different
expected distribution patterns
P
0.3
0.1
0.05
0.1
0.3
0.5
p
0.7
0.9
Binomial  Normal  Wilson
• Binomial distribution
– Expected pattern of observations found when repeating an
experiment for a given P (here, P = 0.5)
– Based on combinatorial mathematics
F
• Binomial  Normal
– Simplifies the Binomial distribution
(tricky to calculate) to two variables:
• mean P
– P is the most likely value
• standard deviation S
– S is a measure of spread
S
P
0.1
0.3
0.5
p
0.7
0.9
Binomial  Normal  Wilson
• Binomial distribution
• Binomial  Normal
– Simplifies the Binomial distribution
(tricky to calculate) to two variables:
• mean P
• standard deviation S
• Normal  Wilson
F
p
P
– The Normal distribution predicts
observations p given a population
0.1 0.3 0.5 0.7 0.9
value P
– We want to do the opposite: predict the true population
value P from an observation p
– We need a different interval, the Wilson score interval
Binomial  Normal
• Any Normal distribution can be defined by
only two variables and the Normal function z
F
 population
mean P
 standard deviation
S =  P(1 – P) / n
z.S
0.1
0.3
– With more
data in the
experiment, S
will be smaller
z.S
0.5
0.7
p
Binomial  Normal
• Any Normal distribution can be defined by
only two variables and the Normal function z
F
 population
mean P
 standard deviation
S =  P(1 – P) / n
z.S
z.S
– 95% of the curve is within ~2 standard
deviations of the expected mean
2.5%
2.5%
– the correct figure
is 1.95996!
95%
0.1
0.3
0.5
0.7
p
= the critical value
of z for an error
level of 0.05.
Binomial  Normal
• Any Normal distribution can be defined by
only two variables and the Normal function z
F
 population
mean P
 standard deviation
S =  P(1 – P) / n
z.S
z.S
– 95% of the curve is within ~2 standard
deviations of the expected mean
2.5%
2.5%
95%
0.1
0.3
0.5
0.7
p
– The ‘tail areas’
– For a 95%
interval, total 5%
The single-sample z test...
• Is an observation p > z standard deviations
from the expected (population) mean P?
F
observation p
z.S
2.5%
0.1
z.S
2.5%
P
0.3
0.5
• If yes, p is
significantly
different
from P
0.7
p
...gives us a “confidence interval”
• The interval about
p is called the
Wilson score interval (w–, w+)
observation p
• This interval
reflects the
Normal interval
about P:
F
w–
• If P is at the upper
limit of p,
p is at the lower
limit of P
w+
P
2.5%
0.1
0.3
0.5
(Wallis, 2013)
2.5%
0.7
p
...gives us a “confidence interval”
• The Wilson score interval (w–, w+) has
a difficult formula to remember
observation p
F
 p' = p + z²/2n
1 + z²/n
 s' =  p(1 – p)/n
+ z²/4n²
w– w+
1 + z²/n
+) = (p' – s', p' + s')
P
 (w–, w2.5%
2.5%
0.1
0.3
0.5
0.7
p
...gives us a “confidence interval”
• The Wilson score interval (w–, w+) has
a difficult formula to remember
observation p
• You do not need
to know this
formula!
 p' = p + z²/2n
F
1 + z²/n
 s' =  p(1 – p)/n
+ z²/4n²
w– w+
1 + z²/n
• You can use the
2x2 spreadsheet!
– www.ucl.ac.uk/english
-usage/statspapers/
2x2chisq.xls
+) = (p' – s', p' + s')
P
 (w–, w2.5%
2.5%
0.1
0.3
0.5
0.7
p
An example: uses of think
• Magnus Levin (2013) examined uses of think in the
TIME corpus in three time periods
– This is the graph we
created in Excel
Wilson intervals without continuity correction
‘cogitate’
1
‘intend’
quotative
0.9
interpretative
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1920s
1960s
2000s
– http://corplingstats.wordpress.com/2012/04/03/plotting-confidence-intervals/
An example: uses of think
• Magnus Levin (2013) examined uses of think in the
TIME corpus in three time periods
– This is the graph we
created in Excel
Wilson intervals without continuity correction
‘cogitate’
1
‘intend’
quotative
0.9
– Not an alternation study
• Categories are not
“choices”
– The graph plots the
probability of reading
different uses of the
word think (given the
writer used the word)
interpretative
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1920s
1960s
2000s
– http://corplingstats.wordpress.com/2012/04/03/plotting-confidence-intervals/
An example: uses of think
• Magnus Levin (2013) examined uses of think in the
TIME corpus in three time periods
– This is the graph we
created in Excel
– Has Wilson score
intervals for each
point
Wilson intervals without continuity correction
‘cogitate’
1
‘intend’
quotative
0.9
interpretative
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1920s
1960s
2000s
– http://corplingstats.wordpress.com/2012/04/03/plotting-confidence-intervals/
An example: uses of think
• Magnus Levin (2013) examined uses of think in the
TIME corpus in three time periods
– This is the graph we
created in Excel
– Has Wilson score
intervals for each
point
Wilson intervals without continuity correction
‘cogitate’
1
‘intend’
quotative
0.9
interpretative
0.8
0.7
0.6
– It is easy to spot where
intervals overlap
• A quick test for
significant difference
0.5
0.4
0.3
0.2
0.1
0
1920s
1960s
2000s
– http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/
An example: uses of think
• Magnus Levin (2013) examined uses of think in the
TIME corpus in three time periods
– Wilson score intervals
for each point
– It is easy to spot where
intervals overlap
• A quick test for
significant difference
Wilson intervals without continuity correction
‘cogitate’
1
‘intend’
quotative
0.9
interpretative
0.8
0.7
0.6
0.5
– No overlap = significant
– Overlaps point = ns
– Otherwise test fully
0.4
0.3
0.2
0.1
0
1920s
1960s
2000s
– http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/
A quick test for significant difference
• No overlap = significant
• Overlaps point = ns
• Otherwise test fully
w1+
0.8
p1
0.7
0.6
0.5
w1–
w2+
p2
w2–
– http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/
A quick test for significant difference
• No overlap = significant
• Overlaps point = ns
• Otherwise test fully
w1+
Upper bound
0.8
Observed probability
p1
0.7
Lower bound
0.6
0.5
w1–
w2+
p2
w2–
– http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/
Test 1: Newcombe’s test
• This test is used when data is drawn from different
populations (different years, groups, text categories)
– We calculate a new Newcombe-Wilson interval (W–, W+):
• W– = -(p1 – w1–)2 + (w2+ – p2)2
• W+ = (w1+ – p1)2 + (p2 – w2–)2
(Newcombe, 1998)
+
w1
0.8
p1
0.7
0.6
0.5
w1–
w2+
p2
w2–
– http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/
Test 1: Newcombe’s test
• This test is used when data is drawn from different
populations (different years, groups, text categories)
– We calculate a new Newcombe-Wilson interval (W–, W+):
• W– = -(p1 – w1–)2 + (w2+ – p2)2
• W+ = (w1+ – p1)2 + (p2 – w2–)2
(Newcombe, 1998)
w1+
– We then compare
+
W– < (p2 – p1) < W0.8
p1
0.7
0.6
0.5
w1–
w2+
p2
w2–
– http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/
Test 1: Newcombe’s test
• This test is used when data is drawn from different
populations (different years, groups, text categories)
– We calculate a new Newcombe-Wilson interval (W–, W+):
• W– = -(p1 – w1–)2 + (w2+ – p2)2
• W+ = (w1+ – p1)2 + (p2 – w2–)2
(Newcombe, 1998)
w1+
– We then compare
+
W– < (p2 – p1) < W0.8
p1
0.7
0.6
0.5
w1–
(p2 – p1) < 0 = fall
w2+
p2
w2–
– http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/
Test 1: Newcombe’s test
• This test is used when data is drawn from different
populations (different years, groups, text categories)
– We calculate a new Newcombe-Wilson interval (W–, W+):
• W– = -(p1 – w1–)2 + (w2+ – p2)2
• W+ = (w1+ – p1)2 + (p2 – w2–)2
(Newcombe, 1998)
w1+
– We then compare
+
W– < (p2 – p1) < W0.8
– We only need to0.7
check the inner
interval
0.6
0.5
p1
w1–
w2+
p2
w2–
– http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/
Test 2: 2 x 2 chi-square
• This test is used when data is drawn from the same
population of speakers (e.g. grammar -> grammar)
– We put the data into a 2 x 2 table
• www.ucl.ac.uk/english-usage/statspapers/2x2chisq.xls
observed
‘cogitate’
other
total
independent variable
1920s
1960s
51
108
15
73
66
181
total
159
88
247
(Wallis, 2013)
– http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/
Test 2: 2 x 2 chi-square
• This test is used when data is drawn from the same
population of speakers (e.g. grammar -> grammar)
– We put the data into a 2 x 2 table
• www.ucl.ac.uk/english-usage/statspapers/2x2chisq.xls
observed
‘cogitate’
other
total
independent variable
1920s
1960s
51
108
15
73
66
181
– The test uses the formula 2 = (o – e)2
e
• where e = r x c / n
total
159
88
247
(Wallis, 2013)
– http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/
Expressing change
• Percentage difference is a very common idea:
– “X has grown by 50%” or “Y has fallen by 10%”
– We can calculate percentage difference by
• d% = d / p1 where d = p2 – p1
– We can put Wilson confidence intervals on d%
• BUT Percentage difference can be very misleading
– It depends heavily on the starting point p1 (might be 0)
– What does it mean to say
• something has increased by 100%?
• it has decreased by 100%?
• It is better to simply say that
– “the rate of ‘cogitate’ uses of think fell from 77% to 59%”
– http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/
Summary
• We analyse results to help us report them
– Graphs are extremely useful!
• You can include graphs and tables in your essays
– If a result is not significant, say so and move on…
• Don’t say it is “nearly significant” or “indicative”
– An error level of 0.05 (or 95% correct) is OK
• Some people use 0.01 (99%) but this is not really better
• Wilson confidence intervals tell us
– Where the true value is likely to be
– Which differences between observations are likely
to be significant
• If intervals partially overlap, perform a more precise test
Summary
• Always say which test you used, e.g.
– “We compared ‘cogitate’ uses of think with other uses,
between the 1920s and 1960s periods, and this was
significant according to 2 at the 0.05 error level.”
• Tell your reader that you have plotted (e.g.) “95% Wilson
confidence intervals” in a footnote to the graph.
• For advice on deciding which test to use, see
– http://corplingstats.wordpress.com/2012/04/11/choosing-righttest/
• The tests you will need in one spreadsheet:
– www.ucl.ac.uk/english-usage/statspapers/2x2chisq.xls
References
• Levin, M. 2013. The progressive in modern American English. In
Aarts, B., J. Close, G. Leech and S.A. Wallis (eds). The Verb
Phrase in English: Investigating recent language change with
corpora. Cambridge: CUP.
• Newcombe, R.G. 1998. Interval estimation for the difference
between independent proportions: comparison of eleven
methods. Statistics in Medicine 17: 873-890
• Wallis, S.A. 2013. z-squared: The origin and application of χ².
Journal of Quantitative Linguistics 20: 350-378.
• Wilson, E.B. 1927. Probable inference, the law of succession,
and statistical inference. Journal of the American Statistical
Association 22: 209-212
• Assorted statistical tests:
– www.ucl.ac.uk/english-usage/staff/sean/resources/2x2chisq.xls