Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Univariate Descriptive Statistics Heibatollah Baghi, and Mastee Badii George Mason University 1 Objectives • Define measures of central tendency and dispersion. • Select the appropriate measures to use for a particular dataset. 2 How to Summarize Data? • Graphs may be useful, but the information they offer is often inexact. • A frequency distribution provides many details, but often we want to condense a distribution further. 3 Two Characteristics of Distributions 1. Measures of Central Tendency. 2. Measures of Variability or Scatter. 4 Measures of Central Tendency: Mean The mean describes the center or the balance point of a frequency distribution. The sample mean: X X n Calculate the mean value for the following data: 23, 23, 24, 25, 25 ,25, 26, 26, 27, 28. 25.2 5 Measures of Central Tendency: Mode • The most frequent value or category in a distribution. • Calculate the mode for the following set of values: 20, 21, 21, 22, 22, 22, 22, 23, 23, 24. • 22 6 Measures of Central Tendency: Median • The middle value of a set of ordered numbers. • Calculate for an even number of cases. • 21, 22, 22, 23, 24, 26, 26, 27, 28, 29. • 25 • Calculate for odd number of data with no duplicates: 22, 23, 23, 24, 25, 26, 27, 27, 28. • 25 • Median changes when data at center repeats. 7 Comparison of Measures of Central Tendency Mode Most frequently occurring value Nominal, Ordinal, and (sometimes) Interval/Ratio-Level Data Median Ordinal-Level Data and Interval/Ratio-Level Exact center (when odd N) of rank-ordered data or average of data (particularly when two middle values (when even N) skewed) Mean Interval/Ratio-Level Data Arithmetic average (Sum of Xs/N) 8 Comparison of Measures of Central Tendency in Normal Distribution • Mean, median and mode are the same • Shape is symmetric 9 Comparison of Measures of Central Tendency in Bimodal Distribution • Mean & median are the same • Two modes different from mean and median 10 Comparison of Measures of Central Tendency in Negatively Skewed Distributions • Mean, median & mode are different • Mode > Median > Mean Outliers pull the mean away From the median 11 Comparison of Measures of Central Tendency in Positively Skewed Distributions • Mean, median & mode are different • Mean > Median > Mode Outliers pull the mean away From the median 12 Comparison of Measures of Central Tendency in Uniform Distribution • Mean, median & mode are the same point 13 Comparison of Measures of Central Tendency in J-shape Distribution • Mode to extreme right • Mean to the right of median 14 Measures of Variability or Scatter • Reporting only an average without an accompanying measure of variability may misrepresent a set of data. • Two datasets can have the same average but very different variability. 15 Measures of Variability or Scatter: Range • The difference between the highest and lowest score • Easy to calculate • Highly unstable • Calculate range for the data: 110, 120, 130, 140, 150, 160, 170, 180, 190 • 190 – 110 = 80 16 Measures of Variability or Scatter: Semi Inter-quartile Range • Half of the difference between the 25% quartile and 75% quartile • SQR = (Q3-Q1)/2 • More stable than range 17 Measures of Variability: Sample Variance • The sum of squared differences between observations and their mean [ss = Σ (X - M)2 ] divided by n -1. • Sample variance : Standard deviation squared • Formula for sample variance ss 2 s n 1 18 Measures of Variability or Scatter: Standard Deviation • The squared root of the variance. S ( X i X ) 2 n 1 • Calculate standard deviation for the data: 110, 120, 130, 140, 150, 160, 170, 180, 190. 19 Calculating Standard Deviation • Sample Sum of Squares: 2 ( X ) SS X 2 n • Sample Variance ss s n 1 2 • Sample Standard Deviation ss s n 1 SS is the key to many statistics 20 Calculating Standard Deviation Data X-M (X - M)2 110 -40 1600 N-1 9 120 -30 900 130 -20 400 Sample Variance 667 140 -10 100 150 0 0 Standard Deviation 25.8 150 0 0 160 10 100 170 20 400 180 30 900 190 40 1600 Total 0 6000 (SS) SS is the key to many statistics 21 Formula Variations Calculating formula Sum of squares Variance Standard deviation 2 ( X ) SS X 2 n Defining formula 2 ( X i X ) s2 n 1 ss s n 1 ( Xi X) 2 s n 1 ss s n 1 ( X i X ) 2 S n 1 2 2 22 Comparison of Measures of Variability and Scatter • In Normal Distribution • Range ~ 6 standard deviation • Standard Deviation partitions data in Normal Distribution 23 Standardized Scores: Z Scores • Mean & standard deviations are used to compute standard scores Z = (x-m) / s • Calculate standard deviation for blood pressure of 140 if the sample mean is 110 and the standard deviation is 10 • Z = 140 – 110 / 10 = 3 24 Value of Z Scores • Allows comparison of observed distribution to expected distribution Histogram Frequency 12 Expected Observed 10 8 6 4 2 0 -1.71 -0.95 -0.18 0.59 1.36 More Bin 25 Take Home Lesson Measures of Central Tendency & Variability Can Describe the Distribution of Data 26