Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 Descriptive Statistics Outline • Measures of Central Tendency • Mode • Median • Mean • Measures of Variability • Range • Variance & standard deviation Lecture 1 2 Measures of Central Tendency • Measures of central tendency tell you what is true on average • 3 such measures are used regularly: Mode Median Mean Lecture 1 3 The Mode • The most common value(s) in a data set. 5 6 6 6 7 7 8 9 10 10 10 11 12 • This is a bi-modal distribution (it has 2 modes): Mode 1 Lecture 1 Mode 2 4 The Median • The median of a dataset is the value in the 5 6 6 6 7 7 8 9 10 10 10 11 12 middle • That is, half of the scores in the set lie above and half lie below 6 scores Median it: Lecture 1 5 The Mean • The mean is the arithmetic average of a set of numbers. • Most frequently used of the three measures • Most useful when distribution is not “skewed.” Lecture 1 6 Note 1: Skewed distributions • A skewed distribution is one in which the scores “pile up” at one end of the scale, but are less frequent at the other end 4 5 7 8 11 14 (not skewed) 2 4 8 22 300 14000 (skewed) Scores are rare among the larger numbers Lecture 1 7 Measures of Central Tendency • We distinguish between SAMPLE and POPULATION values. • Sample values are shown as English letters • Population values are shown as Greek letters Lecture 1 8 Measures of Central Tendency • A sample mean is the average of all values in the sample. • Sample mean: X • A population mean is the average of all values in the population • Population mean: (“Mu”) • This is pronounced “Xbar” Lecture 1 9 Note 2: Sigma notation • The Greek letter ∑ indicates the addition operation. 4 ∑(xi) = x1 + x2 + x3 + x4 i=1 Lecture 1 10 The Mean - calculations 4 5 7 8 11 14 Σx “The sum of X” Σx = 49 n = 6 because there are 6 observations in the data set n=6 X = Σx = 49 = 8.17 n 6 “X – bar” Lecture 1 11 The Mean - calculations 4 5 7 8 11 14 2 4 8 22 300 14000 Σx = 49 Σx = 14336 n = 6 n=6 X = Σx = 49 = 8.17 n 6 X = Σx = 14336 n 6 = 2389.333 Lecture 1 12 Measures of Variability • With any sample mean, the question arises, “how useful is this number – to what extent is it descriptive of the data set?” • The answer depends upon how variable the data set is – how similar each data point is to all the other data points in the set. Lecture 1 13 Measures of variability • The range. The distance between the highest and lowest numbers in the data set Simplest and least useful measure of variability is the range 5 6 6 6 7 7 8 9 10 10 10 11 12 Here, the range is 12 – 5 = 7 Lecture 1 14 Measures of Variability • The Variance measures how much on average each data point is different from the others. much more useful than the range • To compute variance: 1. Find mean, X 2. Subtract X from each data point Xi 3. Square differences & add squared values up 4. Divide total by n-1 Lecture 1 15 The variance • Why do we square the differences before adding them up? • Because if we didn’t, the differences would always add up to zero. • Sample variance: S2 (“S-squared”) • Population variance: σ2 (“sigma squared”) • Important for you to understand how S2 is different from σ2. Lecture 1 16 Conceptual Formula for the Variance S2 = (Xi – X)2 n-1 Why “n – 1”? S2 is the sample variance because X is the sample mean Lecture 1 17 Note: degrees of freedom • Suppose we have six scores, and we know that X = 8 • Suppose five of the six scores are: 3, 5, 9, 10, 12 • What is the last score? Lecture 1 18 Note: degrees of freedom • Sum of the five scores: There are n-1 degrees of freedom in n scores 3+5+9+10+12 = 39 • X = 8 so sum of all six scores is 6 * 8 = 48 • So, unknown score is 48 – 39 = 9 Given n – 1 scores and the sample mean, there is no uncertainty about what the remaining score is (that score is not free to vary). Lecture 1 19 18 17 ? The sum=of145 the 10 x 14.5 nine observations shown is 125. 145 – 125 = 20 10 8 11 14 19 12 16 Lecture 1 X = 14.5. Thus, there are nine degrees of What is in theten missing freedom observation? observations 20 Computational Formula for the Variance Note these are Xs, not X-bars S2 = X2 – (X)2 n (n-1) Note: S = S2 Lecture 1