Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Measures of Dispersion How far the data is spread out Range • Difference between the largest and smallest value in a data set. 3 5 8 1 -7 2 5 • Smallest value = -7 • Largest value = 8 • Range = 15 Variance • Includes all data values • Measures average distance of squared values to mean • Population • Sample 𝑠2 = 𝜎2 𝑥𝑖 −𝜇 𝑁 = 𝑥𝑖 −𝑥 𝑛−1 2 2 Calculating the Variance • Example: Consider the following data set 4, 5, 6, 6, 9 The sample mean = 6 𝟐 x 𝒙 (x-𝒙) 4 6 -2 4 5 6 -1 1 6 6 0 0 6 6 0 0 0 6 3 9 (x−𝒙) Calculating the Variance 𝟐 (x−𝒙) 4 1 0 0 9 Here’s the 4th column from the previous slide. • The sum of the square of the residuals = 14 • Divide the sum of the square of the residuals by the number in the sample minus one. • 14/4 = 3.5 Standard Deviation • Important value to study of statistics • Measures average absolute distance of values to the mean • Population • Sample Quantiles • Dividing data into equal groups ▫ Percentiles ▫ Quartiles First quartile – Q1 = Second quartile – Q2 = Third quartile – Q3 = • Interquartile range ▫ Q3 – Q1 Percentiles • Commonly used measure of relative position. • Remember the median? • For any data set, the pth percentile is a value (x) such that p percent of the data is less than x and 1-p percent of the data is greater than x. Boxplots (Box and whisker plots) • Visual display of data • 5-number summary ▫ ▫ ▫ ▫ ▫ Minimum Q1 Q2 Q3 Maximum • IQR • Outliers 0 2 4 6 8 10 12 14 Boxplots 5932 5933 z-scores • The z-score tells us how far a data value is from the mean in terms of the number of standard deviations • This is another one of the fundamental values in statistics that we will use again, later. Summary • Mean, median, and mode ▫ What information does each convey? ▫ Which is the most resistant to outliers? Summary • • • • Range Variance/Standard deviation Z-scores (more later) Quantiles ▫ IQR ▫ Boxplots Which to use? • Data is symmetric and unimodal ▫ Use the mean and s.d. • Data is skewed ▫ Use the median and 5-number summary • Mode? Practice Exercise Consider the following data: 4, -8, 4, -12, 8, 4, -14 What is the range? Practice Exercise Consider the following data: 4, -8, 4, -12, 8, 4, -14 What is the variance and standard deviation? Practice Exercise Consider the following data: 4, -8, 4, -12, 8, 4, -14 What are the values of Q1, Q2, and Q3? What is the value of the IQR?