Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Section 3.1 – Measures of Central Tendency A measure of center is a value at the center (or middle) of a data set. Notation: x = n = ∑ = N= variable representing data values total number of values in the sample sum or “add up” total number of values in the population (difficult to know) Arithmetic Mean – Add the data values and divide by the total number of values. x = sum of the data values divided by n (statistic) “x-bar” n x μ = population mean = (parameter) “mew” N x = sample mean = Median – Value that lies in the middle of the data set when arranged from smallest to largest. n1 If you have an ODD number of data points, the median will be the middle data point. 2 n n If you have an EVEN number of data points, then average the middle two data points. and 1 2 2 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 9 10 Mode – The data point that occurs most often. The data set can be bimodal, multimodal, or have no mode. (you can find the mode of qualitative data, too!) 1 Example: The following data represent the miles per gallon for a 2013 For Fusion for 6 randomly selected vehicles. Calculate the mean, median, and mode miles per gallon. 34.0 33.2 37.0 29.4 23.6 25.9 Finding Mean and Median on the calculator: 1. Enter the data into a list (L1) 2. Select STAT 3. Arrow over to the menu labeled “CALC” 4. Highlight “1-VAR STATS” and push enter. 5. Type the name of the list (L1 or L2 , etc.) 6. Push enter and scroll down. Note: The default list for 1-VarStats is L1 2 Describing the Shape of a Distribution Based on Mean and Median When data are skewed left or right, that means there are extreme values in the tail which are pulling the mean in the direction of the tail. (pg. 123) Mean < Median Mean = Median Mean > Median Example: Match the histograms shown to the summary statistics: Data Set I II III IV Mean 42 31 31 31 (#18 pg. 127) Median 42 36 26 32 Resistance A numerical summary of data is resistant if extreme values (very large or small) do not affect its value substantially. Which of the following is considered resistant: Mean or Median? 3 Section 3.2 – Measures of Dispersion Dispersion is the degree to which data points are spread out. Range = highest value – lowest value Variance is based on how the data points deviate from the mean. (x - x) 2 Variance of a sample = s2 = i n -1 Standard Deviation of a sample = s = s2 Notation: Standard Deviation Variance Sample Population Properties and Interpretations of Standard Deviation: 1. The standard deviation is a measure of variation of all values from the mean. We say the standard deviation is the “typical deviation from the mean”. The larger the standard deviation, the more dispersion the distribution has. 2. The value of the standard deviation is either zero or positive. (Zero means that all of the data points are the same number) 3. Outliers cause the standard deviation to increase dramatically. 4. The units of the standard deviation are the same as the units of the original data points. 4 Example: The following data represent the miles per gallon for a 2013 For Fusion for 6 randomly selected vehicles. Calculate the range, variance, and standard deviation of the miles per gallon. 34.0 33.2 37.0 29.4 23.6 25.9 Example: The following data represent exam score in a statistics class taught using traditional lecture and a in a statistics class taught using a “flipped” classroom. Use your calculator (1-VarStats) to find the standard deviation for each sample. Which class has more dispersion in the exam scores? Traditional Flipped 70.8 76.4 69.1 71.6 79.4 63.4 67.6 72.4 85.3 77.9 78.2 91.8 56.2 78.9 81.3 76.8 80.9 82.1 71.5 70.2 5 Empirical Rule for Bell-shaped data: (pg. 139) Approximately 68% of the data will lie within 1 standard deviation of the mean Approximately 95% of the data will lie within 2 standard deviations of the mean Approximately 99.7% of the data will lie within 3 standard deviations of the mean Example: IQ scores of normal adults have a bell-shaped distribution with a mean of 100 and a standard deviation of 15. a) Approximately what percentage of adults have IQ scores between 55 and 145? b) Approximately what percentage of adults have IQ scores greater than 145? c) Approximately what percentage of adults have IQ scores between 85 and 130? 6 Example: SAT math scores have a bell-shaped distribution with a mean of 515 and a standard deviation of 114. a) Approximately what percentage of math scores are between 401 and 629? b) Approximately what percentage of math scores are lower than 401? Section 3.3 – Measures of Central Tendency – Grouped Data Sometimes data points are not equally weighted. Some data points might have a higher importance, or more weight, to them. The weighted mean of a variable is found by multiplying each value of the variable by its corresponding weight, then adding these products, and dividing by the sum of the weights. Weighted Average xw w x w i i i sumof eachvariabletimesitsweight sumof allweights Example: Marissa just completed her first semester in college. She earned an A in her 3-hour statistics class, a B in her 3-hour sociology course, a C in her 6-hour biology class, and an A in her 1hour PE class. Calculate Marissa’s GPA. Course Grade (points) Weight (hours) Stats Sociology Biology PE 7 Example: In Marissa’s statistics course, attendance counts for 5% of her grade, quizzes count 10%, exams count 60%, and the final exam counts 25%. Her grades are as follows. Calculate Marissa’s course average. Attendance: Quizzes: Exams: Final Exam: 100% 93% 86% 85% Course Grade Weight (% as a decimal) Attendance Quizzes Exams Final Exam Section 3.4 – Measures of Position and Outliers Z-score – The distance that a data value is from the mean in terms of the number of standard deviations. Negative z-scores indicate that the data point is below the mean. Positive z-scores indicate that the data point is above the mean. Sample z-score x x z s Population z-score x z ALWAYS ROUND Z-SCORES TO 2 DECIMAL PLACES!! Example: Determine whether the Los Angeles Angels or the Colorado Rockies had a relatively better run-producing season. The Angels scored 773 runs and play in the American League, where µ = 677.4 runs and σ = 51.7 runs. The Rockies scored 755 runs and play in the National League, where µ = 640.0 runs and σ = 55.9 runs. 8 Example: The average man in his twenties is 69.6 inches tall with a standard deviation of 3.0 inches. The average woman in her twenties is 64.1 inches tall with a standard deviation of 3.8 inches. Who is relatively taller, a 75-inch man or a 70-inch woman? Explain your choice using a complete sentence. Quartiles and Percentiles The kth percentile, denoted Pk, of a data set is a value such that k percent of the data points are less than or equal to that value. Percentiles – separates the sorted data into 100 equal parts with 1% of the data values in each group. P13 separates the lower 13% from the upper 87% P55 separates the lower 55% from the upper 45% etc… Example: If someone’s SAT math score a 600 and that is in the 74th percentile, what does this mean? It means that _______% of SAT math scores are less than or equal to 600, and _______% of SAT math scores are greater than 600. 9 Quartiles – separates the sorted data into 4 equal parts with 25% of the data values in each group. Q1 separates the lower 25% from the upper 75% Q2 separates the lower 50% from the upper 50% (also called the median) Q3 separates the lower 75% from the upper 25% BEFORE CALCULATING PERCENTILES OR QUARTILES, THE DATA MUST BE SORTED LOW TO HIGH!!! Example: The following 13 data points are the sorted final exam grades of a sample of students who took the statistics final exam in 2005. Calculate Q1, Q2, and Q3. 53 58 64 66 71 74 75 77 83 84 87 92 93 Quartiles can also be found on your calculator using 1-VarStats (scroll down) The Interquartile Range (IQR) – the range of the middle 50% of the values in a data set. IQR = Q3 – Q1 Interpretation: The more spread out a data set is, the higher the IQR will be. Example: Calculate the IQR for the last example (test grades). 10 Outlier – An extreme observation (high or low) How to check for outliers : 1. Calculate Q1 and Q3 then calculate the IQR. 2. Determine the “fences”. Lower fence = Q1 – (1.5 × IQR) Upper fence = Q3 + (1.5 × IQR) 3. If a data value is less than the lower fence or greater than the upper fence, it is an OUTLIER! Example: The following data represent the hemoglobin levels for 20 randomly selected cats. 5.7 7.7 7.8 8.7 8.9 9.4 9.5 9.6 9.6 9.9 10.0 10.3 10.6 10.7 11.0 11.2 11.7 12.9 13.0 13.4 a) Calculate the mean and standard deviation of the hemoglobin level for this data set. b) A cat named Daisy had a hemoglobin level of 7.8. Calculate her z-score and interpret. c) Calculate the IQR and the fences. d) Are there any outliers? If so, list them. 11 Which statistics should I use for my data set? Shape of Distribution Measure of Central Tendency Measure of Dispersion SYMMETRIC SKEWED (left or right) Section 3.5 – The Five-Number Summary and Boxplots 5-Number Summary: 1. 2. 3. 4. 5. Minimum value Q1 Q2 (median) Q3 Maximum value How to Draw a Boxplot: 1. Draw a box with vertical lines at Q1, Q2 , and Q3 . 2. Draw brackets at the lower and upper fences. 3. Draw a horizontal line from Q1 to the smallest data point inside the lower fence. 4. Draw a horizontal line from Q3 to the largest data point inside the upper fence. 5. Label any outliers with asterisks (*). 12 Example: Create a Box & Whisker Plot for the “birth month” data for both classes. Compare. 10am Class 11am Class Min Q1 Median (Q2) Q3 Max 1 2 3 4 5 6 7 8 9 10 11 12 Birth Month 13 How to determine the distribution (shape) of the data set by looking at the boxplot: (pg. 167) Example: #6 on pg. 170 a) To the nearest integer, what is the median of variable x? b) To the nearest integer, what is the first quartile of variable y? c) Which variable has more dispersion? Why? d) Does the variable x have any outliers? If so, what is the value of the outlier(s)? e) Describe the shape of variable y. 14