Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Welcome to Week 04 Tues MAT135 Statistics http://media.dcnews.ro/image/201109/w670/statistics.jpg Review Descriptive Statistics Descriptive statistics – describe our sample – we’ll use this to make inferences about the population Descriptive Statistics graphs n max min each observation frequencies mean, median, mode range, variance, standard deviation, quartiles, IQR Statistics vs Parameters Statistic n x s2 s Parameter N μ σ2 σ Questions? Exploring Data We are using the descriptive statistics to summarize our sample (and, hopefully, our population) in just a few numbers Exploring Data The “five-number summary” is: the min Q1 the median Q3 the max Exploring Data We know how to get all of these using our calculators! Boxplots There is a graph statisticians use to show this summary: the box plot Boxplots The boxplot (a.k.a. box and whisker diagram) is a standardized way of displaying the distribution of data based on the five number summary: minimum, first quartile, median, third quartile, and maximum Boxplots BOXPLOTS IN-CLASS PROBLEM Daily high temperatures Feb 2008 for Fairbanks, Alaska: 14, 12, 17, 25, 10, -1, -8, -15, -7, 0, 5, 14, 18, 14, 16, 8, -15, -13, -17, -12, 0, 1, 9, 12, 14, 7, 6, 8 Create a Boxplot BOXPLOTS IN-CLASS PROBLEM 1 What do we need for a Boxplot? BOXPLOTS IN-CLASS PROBLEM 2 Daily high temperatures Feb 2008 for Fairbanks, Alaska: 14, 12, 17, 25, 10, -1, -8, -15, -7, 0, 5, 14, 18, 14, 16, 8, -15, -13, -17, -12, 0, 1, 9, 12, 14, 7, 6, 8 Find the 5-number summary BOXPLOTS IN-CLASS PROBLEM 2 Min = Q1 = Median = Q3 = Max = BOXPLOTS IN-CLASS PROBLEM 2 Min = -17 Q1 = -4 Median = 7.5 Q3 = 14 Max = 25 Notice they’re all in order at the bottom of your list! YAY! BOXPLOTS IN-CLASS PROBLEM 3 Min = -17 Q1 = -4 Median = 7.5 Q3 = 14 Max = 25 -24 -20 -16 -12 -8 Now for the box! -4 0 4 8 12 16 20 24 BOXPLOTS IN-CLASS PROBLEM 3 Min = -17 Q1 = -4 Median = 7.5 Q3 = 14 Max = 25 -24 -20 -16 -12 -8 -4 0 4 8 Min! 12 16 20 24 BOXPLOTS IN-CLASS PROBLEM 3 Min = -17 Q1 = -4 Median = 7.5 Q3 = 14 Max = 25 -24 -20 -16 -12 -8 -4 0 4 8 Q1! 12 16 20 24 BOXPLOTS IN-CLASS PROBLEM 3 Min = -17 Q1 = -4 Median = 7.5 Q3 = 14 Max = 25 -24 -20 -16 -12 -8 -4 0 4 8 Median! 12 16 20 24 BOXPLOTS IN-CLASS PROBLEM 3 Min = -17 Q1 = -4 Median = 7.5 Q3 = 14 Max = 25 -24 -20 -16 -12 -8 -4 0 4 8 Q3! 12 16 20 24 BOXPLOTS IN-CLASS PROBLEM 3 Min = -17 Q1 = -4 Median = 7.5 Q3 = 14 Max = 25 -24 -20 -16 -12 -8 -4 0 4 8 Max! 12 16 20 24 BOXPLOTS IN-CLASS PROBLEM 3 Min = -17 Q1 = -4 Median = 7.5 Q3 = 14 Max = 25 -24 -20 -16 -12 -8 -4 0 4 8 Box! 12 16 20 24 BOXPLOTS IN-CLASS PROBLEM 3 Min = -17 Q1 = -4 Median = 7.5 Q3 = 14 Max = 25 -24 -20 -16 -12 -8 -4 0 4 8 Whiskers! 12 16 20 24 Questions? Outliers Because the min and max may be outliers, a variation on the boxplot includes “fences” to show where most of the data occurs Outliers Lower fence: Q1 - 1.5 * IQR Upper fence: Q3 + 1.5 * IQR OUTLIERS IN-CLASS PROBLEM 4 Min = -17 Q1 = -4 Median = 7.5 Q3 = 14 Max = 25 -24 -20 -16 -12 -8 What is the IQR? -4 0 4 8 12 16 20 24 OUTLIERS IN-CLASS PROBLEM 4 Min = -17 Q1 = -4 Median = 7.5 Q3 = 14 Max = 25 -24 -20 -16 -12 -8 IQR=14-(-4)=18 What is the lower fence? -4 0 4 8 12 16 20 24 OUTLIERS IN-CLASS PROBLEM 5 Min = -17 Q1 = -4 Median = 7.5 Q3 = 14 Max = 25 -24 -20 -16 -12 -8 IQR=14-(-4)=18 Lower fence = Q1-1.5*IQR -4-1.5(18) = -31 -4 0 4 8 12 16 20 24 OUTLIERS IN-CLASS PROBLEM 5 Min = -17 Q1 = -4 Median = 7.5 Q3 = 14 Max = 25 -24 -20 -16 -12 -8 IQR=14-(-4)=18 Lower fence=-31 What is the upper fence? -4 0 4 8 12 16 20 24 OUTLIERS IN-CLASS PROBLEM 6 Min = -17 Q1 = -4 Median = 7.5 Q3 = 14 Max = 25 -24 -20 -16 -12 -8 IQR=14-(-4)=18 Lower fence=-31 Upper fence= Q3+1.5*IQR 14+1.5(18)=41 -4 0 4 8 12 16 20 24 OUTLIERS IN-CLASS PROBLEM 6 Min = -17 Q1 = -4 Median = 7.5 Q3 = 14 Max = 25 -24 -20 -16 -12 -8 IQR=14-(-4)=18 Lower fence=-31 Upper fence=41 So, do we have any outliers? -4 0 4 8 12 16 20 24 OUTLIERS IN-CLASS PROBLEM 7 Min = -17 Q1 = -4 Median = 7.5 Q3 = 14 Max = 25 -24 -20 -16 -12 -8 IQR=14-(-4)=18 Lower fence=-31 Upper fence=41 Max and Min are inside the fence! -4 0 4 8 12 16 20 24 Outliers How outliers are shown in a boxplot Types of Boxplots Questions? Boxplots Boxplots are typically used to compare different groups Boxplots Data Summary Table from a Ball-bouncing Experiment Super Wiffle Golf Splash Spongy Ball Ball Ball Ball Ball Minimum 66 38 70 7 44 Q1 71 45 75 14 58 Median 76 48 78 16.5 60 Q3 78 50 80 23 62 Maximum 91 58 90 28 67 Boxplots Boxplots BOXPLOTS IN-CLASS PROBLEM 8 What differences? Boxplots Unfortunately it is almost impossible to get a true boxplot using Excel Boxplots Unfortunately it is almost impossible to get a true boxplot using Excel (there are several YouTube videos showing how to get one… Boxplots Unfortunately it is almost impossible to get a true boxplot using Excel (there are several YouTube videos showing how to get one… but they are all wrong…) Questions? Exploring Data There actually IS a useful graph you can get out of Excel that includes both an average and a measure if dispersion Exploring Data I use the Hi/Low/Close graph Exploring Data BOXPLOTS IN-CLASS PROBLEM 9 What does this graph show? BOXPLOTS IN-CLASS PROBLEM 10 What does this graph show? Questions? Normal Probability The most popular continuous graph in statistics is the NORMAL DISTRIBUTION Empirical Rule Two descriptive statistics completely define the shape of a normal distribution: Mean µ Standard deviation σ Empirical Rule Suppose we have a normal distribution, µ = 12 σ = 2 Empirical Rule If µ = 12 12 Empirical Rule If µ = 12 σ = 2 6 8 10 12 14 16 18 Empirical Rule More sneaky stuff about the normal distribution: Empirical Rule More sneaky stuff about the normal distribution: Empirical Rule So now you can calculate even more percentages! EMPIRICAL RULE IN-CLASS PROBLEM 11 What % of the data is between the mean and +1 SD? EMPIRICAL RULE IN-CLASS PROBLEM 12 What % is between the mean and -1 SD? EMPIRICAL RULE IN-CLASS PROBLEM 13 What % of the data is between +1 SD and +2 SD? EMPIRICAL RULE IN-CLASS PROBLEM 14 What % is between -1 SD and -2 SD? EMPIRICAL RULE IN-CLASS PROBLEM 15 What % of the data is between +2 SD and +3 SD? EMPIRICAL RULE IN-CLASS PROBLEM 16 What % is between -2 SD and -3 SD? EMPIRICAL RULE IN-CLASS PROBLEM 17 What % of the data is above +3 SD? EMPIRICAL RULE IN-CLASS PROBLEM 18 What % of the data is below -3 SD? Questions? z-scores For the standard normal distribution, µ = 0 σ = 1 -3 -2 -1 0 1 2 3 z-scores The standard normal is also called “z” z-scores z = (x - µ)/σ EMPIRICAL RULE IN-CLASS PROBLEM 19 A dataset has a normal distribution with μ = 45 and σ = 13 Find the z-score for a value of 65: z-scores With a bit of algebra, we can use z = (x - µ)/σ to solve for x given a z-score z-scores z = (x - µ)/σ x = EMPIRICAL RULE IN-CLASS PROBLEM 20 A data point from a normal distribution with μ = 45 and σ = 13 has a z-score = 2.3 What is the data value? In-class Project Turn in your homework! Don’t forget your homework due next class! See you Thursday!