Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
populations vs. samples • we want to describe both samples and populations • the latter is a matter of inference… “outliers” • minority cases, so different from the majority that they merit separate consideration – are they errors? – are they indicative of a different pattern? • think about possible outliers with care, but beware of mechanical treatments… • significance of outliers depends on your research interests summaries of distributions • graphic vs. numeric – graphic may be better for visualization – numeric are better for statistical/inferential purposes • resistance to outliers is usually an advantage in either case general characteristics 0.22 • kurtosis [“peakedness”] 0.4 0.8 X X 0.00 -5 0.0 -5 5 D 0.0 -5 5 ‘leptokurtic’ D ’platykurtic’ 5 5 right (positive) skew 4 X 3 • skew (skewness) 2 5 1 4 0.2 0.4 0.6 D 0.8 1.0 1.2 3 X 0 0.0 left (negative) skew 2 1 0 0.0 0.2 0.4 0.6 D 0.8 1.0 1.2 central tendency • measures of central tendency – provide a sense of the value expressed by multiple cases, over all… • mean • median • mode mean • center of gravity • evenly partitions the sum of all measurement among all cases; average of all measures n x x i 1 n i mean – pro and con • crucial for inferential statistics • mean is not very resistant to outliers • a “trimmed mean” may be better for descriptive purposes mean rim diameter (cm) unit 1 unit 2 12.6 16.2 11.6 16.4 16.3 13.8 13.1 13.2 12.1 11.3 26.9 14.0 9.7 9.0 11.5 12.5 14.8 15.6 13.5 11.2 12.4 12.2 13.6 15.5 11.7 n total total/n 12 13 168.1 172.6 14.0 13.3 R: mean(x) unit 1 9 3 14.0== 8 651 641 65 7 unit 2 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 24 56 0 28 25 237 0 ==13.3 trimmed mean rim diameter (cm) unit 1 unit 2 9.7 9.0 11.5 11.2 11.6 11.3 12.1 11.7 12.4 12.2 12.6 12.5 13.1 13.2 13.5 13.8 13.6 14.0 14.8 15.5 16.3 15.6 26.9 16.2 16.4 unit 1 9 3 13.2== n total total/n 10 11 131.5 147.2 13.2 13.4 R: mean(x, trim=.1) 8 651 641 65 7 unit 2 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 24 56 0 28 25 237 0 ==13.4 median • 50th percentile… • less useful for inferential purposes • more resistant to effects of outliers… median unit 1 9 rim diameter (cm) unit 1 unit 2 9.7 9.0 11.5 11.2 11.6 11.3 12.1 11.7 12.4 12.2 12.6 12.5 12.9 <-13.2 13.2 13.1 13.8 13.5 14.0 13.6 15.5 14.8 15.6 16.3 16.2 26.9 16.4 3 12.85== 8 651 641 65 7 unit 2 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 24 56 0 28 25 237 0 ==13.20 mode • the most numerous category • for ratio data, often implies that data have been grouped in some way • can be more or less created by the grouping procedure • for theoretical distributions—simply the location of the peak on the frequency distribution regional centers regional centers villages hamlets isolated scatters modal class = ‘hamlets’ 0.22 0.00 1.0 -5 1.5 2.0 2.5 5 dispersion • measures of dispersion – summarize degree of clustering of cases, esp. with respect to central tendency… • range • variance • standard deviation range unit 1 unit 2 9.7 9.0 11.5 11.2 11.6 11.3 12.1 11.7 12.4 12.2 12.6 12.5 13.1 13.2 13.5 13.8 13.6 14.0 14.8 15.5 16.3 15.6 26.9 16.2 16.4 R: range(x) unit 1 * 9 | | | | | | | | | | 3 | | 8 | 651 | 641 | 65 | * 7 unit 2 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 24 56 0 28 25 237 0 * | | | | | | * • would be better to use midspread… R: var(x) variance • analogous to average deviation of cases from mean • in fact, based on sum of squared deviations from the mean—“sum-of-squares” n s 2 x i 1 x 2 i n 1 variance • computational form: 2 x xi / n i 1 s 2 i 1 n 1 n n 2 i • note: units of variance are squared… • this makes variance hard to interpret • ex.: projectile point sample: mean = 22.6 mm variance = 38 mm2 • what does this mean??? standard deviation • square root of variance: n s 2 xi x i 1 n 1 2 x xi / n i 1 i 1 s n 1 n n 2 i standard deviation • units are in same units as base measurements • ex.: projectile point sample: mean = 22.6 mm standard deviation = 6.2 mm • mean +/- sd (16.4—28.8 mm) – should give at least some intuitive sense of where most of the cases lie, barring major effects of outliers