Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Central Tendency and Variability The two most essential features of a distribution Questions • Define – Mean – Median – Mode • What is the effect of distribution shape on measures of central tendency? • When might we prefer one measure of central tendency to another? Questions (2) • Define – – – – Range Average Deviation Variance Standard Deviation • When might we prefer one measure of variability to another? • What is a z score? • What is the point of Tchebycheff’s inequality? Variables have distributions • A variable is something that changes or has different values (e.g., anger). • A distribution is a collection of measures, usually across people. • Distributions of numbers can be summarized with numbers (called statistics or parameters). Central Tendency refers to the Middle of the Distribution Variability is about the Spread 1. Central Tendency: Mode, Median, & Mean • The mode – the most frequently occurring score. Midpoint of most populous class interval. Can have bimodal and multimodal distributions. Median • Score that separates top 50% from bottom 50% • Even number of scores, median is half way between two middle scores. – 1 2 3 4 | 5 6 7 8 – Median is 4.5 • Odd number of scores, median is the middle number – 1 2 3 4 5 6 7 – Median is 4 Mean • Sum of scores divided by the number of people. Population mean is (mu) and sample mean is X (X-bar). • We calculate the sample mean by: X X N • We calculate the population mean by: X N Deviation from the mean • x = X – X . Deviations sum to zero. • Deviation score – deviation from the mean 9 • Raw scores 8 9 10 7 8 9 10 11 -1 -1 0 0 0 1 1 • Deviation scores -2 2 Comparison of mean, median and mode • Mode – Good for nominal variables – Good if you need to know most frequent observation – Quick and easy • Median – Good for “bad” distributions – Good for distributions with arbitrary ceiling or floor Comparison of mean, median & mode • Mean – Used for inference as well as description; best estimator of the parameter – Based on all data in the distribution – Generally preferred except for “bad” distribution. Most commonly used statistic for central tendency. Best Guess interpretations • Mean – average of signed error will be zero. • Mode – will be absolutely right with greatest frequency • Median – smallest absolute error Expectation • • • • • Discrete and continuous variables Mean is expected value either way Discrete: E( X ) xp( x) mean of X Continuous: E( X ) xf ( x)dx mean of X (The integral looks bad but just means take the average) Influence of Distribution Shape Review • • • • What is central tendency? Mode Median Mean 2. Variability aka Dispersion • 4 Statistics: Range, Average Deviation, Variance, & Standard Deviation • Range = high score minus low score. – 12 14 14 16 16 18 20 – range=20-12=8 • Average Deviation – mean of absolute deviations from the median: | X Md | AD N Note difference between this definition & undergrad text- deviation from Median vs. Mean Variance 2 2 ( X ) • Population Variance: N • Where 2means population variance, • means population mean, and the other terms have their usual meaning. • The variance is equal to the average squared deviation from the mean. • To compute, take each score and subtract the mean. Square the result. Find the average over scores. Ta da! The variance. Computing the Variance (N=5) X X X X (X X ) 5 15 -10 100 10 15 -5 25 15 15 0 0 20 15 5 25 25 15 10 100 Total: 75 0 250 Mean: Variance Is 50 2 Standard Deviation • Variance is average squared deviation from the mean. • To return to original, unsquared units, we just take the square root of the variance. This is the standard deviation. 2 • Population formula: ( X ) N Standard Deviation • Sometimes called the root-mean-square deviation from the mean. This name says how to compute it from the inside out. • Find the deviation (difference between the score and the mean). • Find the deviations squared. • Find their mean. • Take the square root. Computing the Standard Deviation (N=5) 5 10 15 20 25 Total: Mean: Sqrt X X 15 15 15 15 15 75 Variance SD X X (X X ) -10 -5 0 5 10 0 Is Is 2 100 25 0 25 100 250 50 50 7.07 Example: Age Distribution Distribution of Age Central Tendency, Variability, and Shape 16 Median = 23 Mean=25.73 12 Frequency Average Distrance from Mean Mode = 21 SD = 6.47 8 4 0 10 20 30 age 40 50 Review • • • • Range Average deviation Variance Standard Deviation Standard or z score • A z score indicates distance from the mean in standard deviation units. Formula: X X z S z X • Converting to standard or z scores does not change the shape of the distribution. Z-scores are not normalized. Tchebycheff’s Inequality (1) • General form p(| X | b) 2 b2 Suppose we know mean height in inches is 66 and SD is 4 inches. We assume nothing about the shape of the distribution of height. What is the probability of finding people taller than 74 inches? (Note that b is a deviation from the mean; in this case 74-66=8.). Also 74 inches is 2 SDs above the mean; therefore, z = 2. 42 16 p 2 .25 64 8 [If we assume height is normally distributed, p is much smaller. But we will get to that later.] Tchebycheff (2) | X | 1 • Z-score form p( k) 2 k • Probability of z score from any distribution For the problem in the being more than k SDs previous slide: from mean is at most 1/k2. 1 1 • Z-scores from the worst p(| z | 2) 2 2 .25 k 2 distributions are rarely more than 5 or less than -5. 4 1 • For symmetric, p (| z | k ) 2 9k unimodal distributions, |z| is rarely more than 3. 4 1 p (| z | 3) 2 .05 93 Review • Z-score in words • Z-score in symbols • Meaning of Tchebycheff’s theorem Median House Price Data • Find data • Show Univariate • Show plots