Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Review of Chapter 4 In this chapter, we learned how to display quantitative variables. Graphic techniques: histogram , stem-and-leaf plot, dot plot; How to describe the shape of the distribution? Unimodal/Bimodal/Multimodal/Uniform Symmetric/Skewed to the left/Skewed to the right Outlier How to describe the center of a distribution? Midrange: (max+min)/2 Median: # is odd/ # is even Mean (Next) 1 • Center of a distribution Measure of Center #3: Mean For convenience of discussion, we are going to introduce some notations from now on. In Statistics, the notation is part of the vocabulary. 1) Variable (values of) ---- x (also can be y, z, etc.) 2) number of data values ----- n 3) mean of variable x ---- ( (pronounced “x-bar”) • Center of a distribution Measure of Center #3: Mean Mean is defined by the following formula: (Σ means “sum”) The formula says to add up all the values of the variable and divide that sum by the number of data values. In daily life, we call it average. • Center of a distribution Measure of Center #3: Mean Example: Find the mean for the following dataset {12, 34, 45, 52} Solution: Here we have 4 data in total, so n=4 So the mean of the dataset is 35.75 • Center of a distribution Measure of Center #3: Mean Interpretation of mean First, look at the following simple example: Dataset 1: {4, 5, 6} median=5 mean=5 Dataset 2: {4, 5, 9} median=5 mean=6 Dataset 3: {1, 5, 6} median=5 mean=4 Therefore, we can see 1) Median is more resistant to the extreme values. 2) Mean is more sensitive to the extreme values • Center of a distribution Measure of Center #3: Mean Interpretation of mean The mean feels like the center because it is the point where the histogram balances: In our GPA example, mean median • Center of a distribution Discussion of relative position of mean and median Case 1: When the distribution is symmetric median coincides with mean • Center of a distribution Discussion of relative position of mean and median Case 2: When the distribution is skewed to the left mean is on the LHS of median • Center of a distribution Discussion of relative position of mean and median Case 3: When the distribution is skewed to the right mean is on the RHS of median • Center of a distribution Hint: How to judge the relative position of mean and median? Compared to median, mean is always closer to the longer tail (extreme values). • Center of a distribution Let’s try the following example together. A researcher is studying the distribution of a quantitative variable by using the histogram below. On the histogram, he marked two vertical lines, indicating the position of mean and median. But he is so careless that he forgot to mark the corresponding names of them. Can you help him to identify which line represents mean and which one is median? • Center of a distribution Q: When to use median and when to use mean as the measure of the center ? Case 1: If the distribution is skewed or has outliers We are usually better off with median because it is resistant to the extreme values. Case 2: If the distribution is symmetric and there are no outliers We can report mean and median together because they are not much of difference. But, technically, people prefer to report the mean • Center of a distribution Case 3: If you are not sure, report both and discuss why they might differ. For example, to tell the center of the distribution displayed below, which one do you prefer, mean or median? • Spread of a distribution When we describe a distribution numerically, we always report a measure of its spread along with its center. There’re a number of measures of spread, we are going to introduce three of them Measure of spread #1: Range Range= maximum value – minimum value Example: Please find the range of GPA data 3.9 3.0 2.7 4.0 3.6 3.2 4.0 2.2 3.2 3.7 4.0 3.9 1.6 3.8 1.9 2.8 2.9 3.6 3.5 2.0 1.2 3.7 3.3 2.9 3.5 1.6 2.4 3.7 3.9 3.2 • Spread of a distribution Measure of spread #2: The Interquartile Range (IQR) When we study the definition of median, we divide the data set into two equal-size halves. High Low Median Furthermore, let’s divide the data set into four quarters. And we call these new dividing points quartiles. High Low Lower Quartile (1st quartile) Q1 Median (2nd quartile) Upper Quartile (3rd quartile) Q3 • Spread of a distribution Measure of spread #2: The Interquartile Range (IQR) How to find quartiles by hand? Always start from sorting (from low to high) Case 1: When n (number of data values) is even. For example, data set { 1 , 3 , 5 , 7, 9, 11} (n=6) We know the median is the average of middle two values i.e.6. lower quartile (Q1): we focus on the first half of numbers, which are {1,3,5}. Find the median of {1,3,5}, then you will get Q1 = 3 upper quartile (Q3): we focus on the second half of numbers, which are {7,9,11}. Find the median of {7,9,11}, then you will get Q3 = 9 • Spread of a distribution Measure of spread #2: The Interquartile Range (IQR) Then how to find quartiles by hand? Let’s try an example immediately. Please find the median, Q1, Q3 in the following data set. { 64, 43, 64, 75} • Spread of a distribution Measure of spread #2: The Interquartile Range (IQR) How to find quartiles by hand? Always start from sorting (from low to high) Case 2: When n (number of data values) is odd. For example, data set { 1 , 3 , 5 , 7, 9, 11, 13} (n=7) We know the median is the middle value 7. lower quartile (Q1) : we focus on the numbers before the median 7, which are {1,3,5}. Find the median of {1,3,5}, then you will get Q1 = 3 upper quartile (Q3): we focus on the numbers after the median 7, which are {9,11,13}. Find the median of {9,11,13}, then you will get Q3 = 11 Remark: Some statisticians include the median in both halves. • Spread of a distribution Measure of spread #2: The Interquartile Range (IQR) Then how to find quartiles by hand? Let’s try an example immediately. Please find the median, Q1, Q3 in the following data set. { 14, 43, 64, 75, 72} • Spread of a distribution Measure of spread #2: The Interquartile Range (IQR) Now we are ready to define IQR, IQR= upper quartile – lower quartile = Q3 – Q1 For example, the IQR of data set { 1 , 3 , 5 , 7, 9, 11, 13} is IQR = Q3 – Q1 = 11 – 3 = 8 Comments on IQR: • Just like the median, IQR is also resistant to values that are extraordinarily large or small. • So IQR is a good choice of the measure of the spread when the distribution is skewed or has outliers. • Spread of a distribution 5-number Summary 5- number summary is commonly used to describe a quantitative variable. The 5-number summary of a distribution reports its median, quartiles, and extremes (max and min). For example, the 5-numner summary for data set {1 , 3 , 5 , 7 , 9 , 11 , 13} is Max 13 Q3 11 Median 7 Q1 3 Min 1 • Spread of a distribution Measure of spread #3: The Standard Deviation(SD) For each of the value x, tells us the distance from the value x to the mean , and it is called deviation. The standard deviation, denoted by s, is defined as Comments on standard deviation: • Like the mean, standard deviation is very sensitive to the extraordinarily large or small values. • So it’s a good idea to report SD as the measure of the spread when the distribution is symmetric and has no outliers. • is called the variance. • Spread of a distribution Measure of spread #3: The Standard Deviation(SD) Example: Please find the standard deviation of the following dataset {1,2,3,4} Solution: n=4 Step 1: Find the mean Step 2: Fill in the following table Original Values x 1 2 3 4 Deviations Squared Deviations (x- )2 • Spread of a distribution Measure of spread #3: The Standard Deviation(SD) Original Values x Deviations Squared Deviations 1 1 – 2.5= - 1.5 (-1.5)2=2.25 2 2– 2.5 = - 0.5 (-0.5)2=0.25 3 3 – 2.5 = 0.5 0.52=0.25 4 4 – 2.5 = 1.5 1.52=2.25 SUM Step 3: Add the squared deviations up Step 4: Q: What is the variance? 5 • Spread of a distribution Measure of spread #3: The Standard Deviation(SD) Interpretation of SD Brain Storm: Quickly compute the standard deviation of {1,1,1,1,1,1,1,1,1,1} From it, we can see 1) The SD always equals to zero if the all the values in a particular dataset are the same (i.e. no spread in value) 2) The SD will be very large if the values in the dataset vary a lot from each other. (i.e. a huge spread in value) Therefore, in this sense, we use SD as a measure of spread. • Spread of a distribution TI instructions How to find n , 𝑥̅ , s , median, max, min, Q1, Q3 by using TI? Step1: Press STAT Choose 1: Edit under the EDIT menu and press ENTER Input your data set into L1 Step2: Press STAT again go to CALC menu Choose 1: 1-Var Stats and press ENTER Step3: On the main screen, input L1 at the flashing block position and press ENTER. Then you will get every value you need. Practice: {12, 34, 63, 723, 668, 593, 832, 774, 326, 753 } Practice: #7 #8 in Suggested problem set 1 27 Review of Chapter 4 In this chapter, we learned Center of a distribution Midrange; Median; Mean (Definition, Properties) Spread of a distribution Range; IQR;SD (Definition, Properties) 28 Ch5 Understanding and comparing distributions To understand the distributions, Draw boxplot by hand Read Information from boxplot To compare the distributions, Compare by using boxplots Term 1: Boxplot Why Boxplot? The numerical descriptions for a distribution, e.g., median, Q1, Q3 and IQR, are useful. However, we love plots!!! Boxplots are perfect tools to vividly display the numerical descriptions of median, Q1, Q3, IQR and outliers on a single plot. We will discuss: How to make a boxplot by hand Read information from a boxplot. Term 1 Boxplot Example: Draw a boxplot for {0,6,7,8,9,10,11,15} Preparations: We need the 5-number summary Making a boxplot by hand: (Vertical Boxplot) Draw Box: Draw short horizontal lines at the lower and upper quartiles and at the median. Then connect them with vertical lines to form a box. Compute Upper fence=Q3+1.5IQR Caution: Don’t draw upper and Lower fence=Q1-1.5IQR lower fences on the boxplot !! Draw Whiskers: Draw lines from the ends of the box up and down to the most extreme values found within the fences. Draw Outliers: any data values outside the fences, denoted by special symbols. (e.g. *) Remark: Sometimes, people prefer to construct a horizontal boxplot. Term 1: Boxplot • Interpretation of the boxplot 25% of data 25% of data 25% of data Upper Whisker (maximum) Q3 Median IQR Q1 Range Lower Whisker 25% of data Outlier (minimum) Note: No matter what the pattern that the boxplot has, the maximum value is always the top of the boxplot; the minimum value is always the bottom of it.