Download P - UCL

MA in English Linguistics Experimental design and statistics II Sean Wallis Survey of English Usage University College London [email protected] Outline • Plotting data with Excel™ • The idea of a confidence interval • Binomial  Normal  Wilson • Interval types – 1 observation – The difference between 2 observations • From intervals to significance tests Plotting graphs with Excel™ • Microsoft Excel is a very useful tool for  collecting data together in one place  performing calculations  plotting graphs • Key concepts of spreadsheet programs: – worksheet - a page of cells (rows x columns) • you can use a part of a page for any table – cell - a single item of data, a number or text string • referred to by a letter (column), number (row), e.g. A15 • each cell can contain: – a string: e.g. ‘Speakers – a number: 0, 23, -15.2, 3.14159265 – a formula: =A15, =$A15+23, =SQRT($A$15), =SUM(A15:C15) Plotting graphs with Excel™ • Importing data into Excel: – Manually, by typing – Exporting data from ICECUP • Manipulating data in Excel to make it useful: – Copy, paste: columns, rows, portions of tables – Creating and copying functions – Formatting cells • Creating and editing graphs: – Several different types (bar chart, line chart, scatter, etc) – Can plot confidence intervals as well as points • You can download a useful spreadsheet for performing statistical tests: – www.ucl.ac.uk/english-usage/statspapers/2x2chisq.xls Recap: the idea of probability • A way of expressing chance 0 = cannot happen 1 = must happen • Used in (at least) three ways last week P = true probability (rate) in the population p = observed probability in the sample a = probability of p being different from P – – sometimes called probability of error, pe found in confidence intervals and significance tests The idea of a confidence interval • All observations are imprecise – Randomness is a fact of life – Our abilities are finite: • to measure accurately or • reliably classify into types • We need to express caution in citing numbers • Example (from Levin 2013): – 77.27% of uses of think in 1920s data have a literal (‘cogitate’) meaning The idea of a confidence interval • All observations are imprecise – Randomness is a fact of life – Our abilities are finite: • to measure accurately or • reliably classify into types • We need to express caution in citing numbers • Example (from Levin 2013): – 77.27% of uses of think in 1920s data have a literal (‘cogitate’) meaning Really? Not 77.28, or 77.26? The idea of a confidence interval • All observations are imprecise – Randomness is a fact of life – Our abilities are finite: • to measure accurately or • reliably classify into types • We need to express caution in citing numbers • Example (from Levin 2013): – 77% of uses of think in 1920s data have a literal (‘cogitate’) meaning The idea of a confidence interval • All observations are imprecise – Randomness is a fact of life – Our abilities are finite: • to measure accurately or • reliably classify into types • We need to express caution in citing numbers • Example (from Levin 2013): – 77% of uses of think in 1920s data have a literal (‘cogitate’) meaning Sounds defensible. But how confident can we be in this number? The idea of a confidence interval • All observations are imprecise – Randomness is a fact of life – Our abilities are finite: • to measure accurately or • reliably classify into types • We need to express caution in citing numbers • Example (from Levin 2013): – 77% (66-86%*) of uses of think in 1920s data have a literal (‘cogitate’) meaning The idea of a confidence interval • All observations are imprecise – Randomness is a fact of life – Our abilities are finite: • to measure accurately or • reliably classify into types • We need to express caution in citing numbers • Example (from Levin 2013): – 77% (66-86%*) of uses of think in 1920s data have a literal (‘cogitate’) meaning Finally we have a credible range of values needs a footnote* to explain how it was calculated. Binomial  Normal  Wilson • Binomial distribution – Expected pattern of observations found when repeating an experiment for a given P (here, P = 0.5) – Based on combinatorial mathematics F P 0.1 0.3 0.5 p 0.7 0.9 Binomial  Normal  Wilson • Binomial distribution – Expected pattern of observations found when repeating an experiment for a given P (here, P = 0.5) – Based on combinatorial mathematics F – Other values of P have different expected distribution patterns P 0.3 0.1 0.05 0.1 0.3 0.5 p 0.7 0.9 Binomial  Normal  Wilson • Binomial distribution – Expected pattern of observations found when repeating an experiment for a given P (here, P = 0.5) – Based on combinatorial mathematics F • Binomial  Normal – Simplifies the Binomial distribution (tricky to calculate) to two variables: • mean P – P is the most likely value • standard deviation S – S is a measure of spread S P 0.1 0.3 0.5 p 0.7 0.9 Binomial  Normal  Wilson • Binomial distribution • Binomial  Normal – Simplifies the Binomial distribution (tricky to calculate) to two variables: • mean P • standard deviation S • Normal  Wilson F p P – The Normal distribution predicts observations p given a population 0.1 0.3 0.5 0.7 0.9 value P – We want to do the opposite: predict the true population value P from an observation p – We need a different interval, the Wilson score interval Binomial  Normal • Any Normal distribution can be defined by only two variables and the Normal function z F  population mean P  standard deviation S =  P(1 – P) / n z.S 0.1 0.3 – With more data in the experiment, S will be smaller z.S 0.5 0.7 p Binomial  Normal • Any Normal distribution can be defined by only two variables and the Normal function z F  population mean P  standard deviation S =  P(1 – P) / n z.S z.S – 95% of the curve is within ~2 standard deviations of the expected mean 2.5% 2.5% – the correct figure is 1.95996! 95% 0.1 0.3 0.5 0.7 p = the critical value of z for an error level of 0.05. Binomial  Normal • Any Normal distribution can be defined by only two variables and the Normal function z F  population mean P  standard deviation S =  P(1 – P) / n z.S z.S – 95% of the curve is within ~2 standard deviations of the expected mean 2.5% 2.5% 95% 0.1 0.3 0.5 0.7 p – The ‘tail areas’ – For a 95% interval, total 5% The single-sample z test... • Is an observation p > z standard deviations from the expected (population) mean P? F observation p z.S 2.5% 0.1 z.S 2.5% P 0.3 0.5 • If yes, p is significantly different from P 0.7 p ...gives us a “confidence interval” • The interval about p is called the Wilson score interval (w–, w+) observation p • This interval reflects the Normal interval about P: F w– • If P is at the upper limit of p, p is at the lower limit of P w+ P 2.5% 0.1 0.3 0.5 (Wallis, 2013) 2.5% 0.7 p ...gives us a “confidence interval” • The Wilson score interval (w–, w+) has a difficult formula to remember observation p F  p' = p + z²/2n 1 + z²/n  s' =  p(1 – p)/n + z²/4n² w– w+ 1 + z²/n +) = (p' – s', p' + s') P  (w–, w2.5% 2.5% 0.1 0.3 0.5 0.7 p ...gives us a “confidence interval” • The Wilson score interval (w–, w+) has a difficult formula to remember observation p • You do not need to know this formula!  p' = p + z²/2n F 1 + z²/n  s' =  p(1 – p)/n + z²/4n² w– w+ 1 + z²/n • You can use the 2x2 spreadsheet! – www.ucl.ac.uk/english -usage/statspapers/ 2x2chisq.xls +) = (p' – s', p' + s') P  (w–, w2.5% 2.5% 0.1 0.3 0.5 0.7 p An example: uses of think • Magnus Levin (2013) examined uses of think in the TIME corpus in three time periods – This is the graph we created in Excel Wilson intervals without continuity correction ‘cogitate’ 1 ‘intend’ quotative 0.9 interpretative 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1920s 1960s 2000s – http://corplingstats.wordpress.com/2012/04/03/plotting-confidence-intervals/ An example: uses of think • Magnus Levin (2013) examined uses of think in the TIME corpus in three time periods – This is the graph we created in Excel Wilson intervals without continuity correction ‘cogitate’ 1 ‘intend’ quotative 0.9 – Not an alternation study • Categories are not “choices” – The graph plots the probability of reading different uses of the word think (given the writer used the word) interpretative 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1920s 1960s 2000s – http://corplingstats.wordpress.com/2012/04/03/plotting-confidence-intervals/ An example: uses of think • Magnus Levin (2013) examined uses of think in the TIME corpus in three time periods – This is the graph we created in Excel – Has Wilson score intervals for each point Wilson intervals without continuity correction ‘cogitate’ 1 ‘intend’ quotative 0.9 interpretative 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1920s 1960s 2000s – http://corplingstats.wordpress.com/2012/04/03/plotting-confidence-intervals/ An example: uses of think • Magnus Levin (2013) examined uses of think in the TIME corpus in three time periods – This is the graph we created in Excel – Has Wilson score intervals for each point Wilson intervals without continuity correction ‘cogitate’ 1 ‘intend’ quotative 0.9 interpretative 0.8 0.7 0.6 – It is easy to spot where intervals overlap • A quick test for significant difference 0.5 0.4 0.3 0.2 0.1 0 1920s 1960s 2000s – http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/ An example: uses of think • Magnus Levin (2013) examined uses of think in the TIME corpus in three time periods – Wilson score intervals for each point – It is easy to spot where intervals overlap • A quick test for significant difference Wilson intervals without continuity correction ‘cogitate’ 1 ‘intend’ quotative 0.9 interpretative 0.8 0.7 0.6 0.5 – No overlap = significant – Overlaps point = ns – Otherwise test fully 0.4 0.3 0.2 0.1 0 1920s 1960s 2000s – http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/ A quick test for significant difference • No overlap = significant • Overlaps point = ns • Otherwise test fully w1+ 0.8 p1 0.7 0.6 0.5 w1– w2+ p2 w2– – http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/ A quick test for significant difference • No overlap = significant • Overlaps point = ns • Otherwise test fully w1+ Upper bound 0.8 Observed probability p1 0.7 Lower bound 0.6 0.5 w1– w2+ p2 w2– – http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/ Test 1: Newcombe’s test • This test is used when data is drawn from different populations (different years, groups, text categories) – We calculate a new Newcombe-Wilson interval (W–, W+): • W– = -(p1 – w1–)2 + (w2+ – p2)2 • W+ = (w1+ – p1)2 + (p2 – w2–)2 (Newcombe, 1998) + w1 0.8 p1 0.7 0.6 0.5 w1– w2+ p2 w2– – http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/ Test 1: Newcombe’s test • This test is used when data is drawn from different populations (different years, groups, text categories) – We calculate a new Newcombe-Wilson interval (W–, W+): • W– = -(p1 – w1–)2 + (w2+ – p2)2 • W+ = (w1+ – p1)2 + (p2 – w2–)2 (Newcombe, 1998) w1+ – We then compare + W– < (p2 – p1) < W0.8 p1 0.7 0.6 0.5 w1– w2+ p2 w2– – http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/ Test 1: Newcombe’s test • This test is used when data is drawn from different populations (different years, groups, text categories) – We calculate a new Newcombe-Wilson interval (W–, W+): • W– = -(p1 – w1–)2 + (w2+ – p2)2 • W+ = (w1+ – p1)2 + (p2 – w2–)2 (Newcombe, 1998) w1+ – We then compare + W– < (p2 – p1) < W0.8 p1 0.7 0.6 0.5 w1– (p2 – p1) < 0 = fall w2+ p2 w2– – http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/ Test 1: Newcombe’s test • This test is used when data is drawn from different populations (different years, groups, text categories) – We calculate a new Newcombe-Wilson interval (W–, W+): • W– = -(p1 – w1–)2 + (w2+ – p2)2 • W+ = (w1+ – p1)2 + (p2 – w2–)2 (Newcombe, 1998) w1+ – We then compare + W– < (p2 – p1) < W0.8 – We only need to0.7 check the inner interval 0.6 0.5 p1 w1– w2+ p2 w2– – http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/ Test 2: 2 x 2 chi-square • This test is used when data is drawn from the same population of speakers (e.g. grammar -> grammar) – We put the data into a 2 x 2 table • www.ucl.ac.uk/english-usage/statspapers/2x2chisq.xls observed ‘cogitate’ other total independent variable 1920s 1960s 51 108 15 73 66 181 total 159 88 247 (Wallis, 2013) – http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/ Test 2: 2 x 2 chi-square • This test is used when data is drawn from the same population of speakers (e.g. grammar -> grammar) – We put the data into a 2 x 2 table • www.ucl.ac.uk/english-usage/statspapers/2x2chisq.xls observed ‘cogitate’ other total independent variable 1920s 1960s 51 108 15 73 66 181 – The test uses the formula 2 = (o – e)2 e • where e = r x c / n total 159 88 247 (Wallis, 2013) – http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/ Expressing change • Percentage difference is a very common idea: – “X has grown by 50%” or “Y has fallen by 10%” – We can calculate percentage difference by • d% = d / p1 where d = p2 – p1 – We can put Wilson confidence intervals on d% • BUT Percentage difference can be very misleading – It depends heavily on the starting point p1 (might be 0) – What does it mean to say • something has increased by 100%? • it has decreased by 100%? • It is better to simply say that – “the rate of ‘cogitate’ uses of think fell from 77% to 59%” – http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/ Summary • We analyse results to help us report them – Graphs are extremely useful! • You can include graphs and tables in your essays – If a result is not significant, say so and move on… • Don’t say it is “nearly significant” or “indicative” – An error level of 0.05 (or 95% correct) is OK • Some people use 0.01 (99%) but this is not really better • Wilson confidence intervals tell us – Where the true value is likely to be – Which differences between observations are likely to be significant • If intervals partially overlap, perform a more precise test Summary • Always say which test you used, e.g. – “We compared ‘cogitate’ uses of think with other uses, between the 1920s and 1960s periods, and this was significant according to 2 at the 0.05 error level.” • Tell your reader that you have plotted (e.g.) “95% Wilson confidence intervals” in a footnote to the graph. • For advice on deciding which test to use, see – http://corplingstats.wordpress.com/2012/04/11/choosing-righttest/ • The tests you will need in one spreadsheet: – www.ucl.ac.uk/english-usage/statspapers/2x2chisq.xls References • Levin, M. 2013. The progressive in modern American English. In Aarts, B., J. Close, G. Leech and S.A. Wallis (eds). The Verb Phrase in English: Investigating recent language change with corpora. Cambridge: CUP. • Newcombe, R.G. 1998. Interval estimation for the difference between independent proportions: comparison of eleven methods. Statistics in Medicine 17: 873-890 • Wallis, S.A. 2013. z-squared: The origin and application of χ². Journal of Quantitative Linguistics 20: 350-378. • Wilson, E.B. 1927. Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association 22: 209-212 • Assorted statistical tests: – www.ucl.ac.uk/english-usage/staff/sean/resources/2x2chisq.xls

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download P - UCL