Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
What is statistics? Statistics is the science of: Collecting information Organizing and summarizing the information collected Analyzing the information collected in order to draw conclusions Two types of Statistics Descriptive Statistics Organizing and summarizing the information collected. Inferential Statistics Draws conclusion from the information collected. Chapter 1 Exploring Data Lesson 1-1, Displaying Distributions with Graphs Bar Graphs and Pie Charts Data Individuals are objects described by a set of data. Individuals may be people, animals or things. Variable is any characteristic of an individual. A variable can take different values for different individuals Types of Variables Categorical variable allows for classification of individuals based on some attribute or characteristics. Quantitative variable provides numerical measures of individuals. Example, Page 7, #1.2 Data from a medical study contain values of many variables for each of the people who where subjects of the study. Which of the following variables are categorical and which are quantitative? Example, Page 7, #1.2 a) b) c) d. e) f) Gender (female or male) categorical Age (years) Quantitative Race (Asian, black, white or other) categorical Smoker (yes or no) categorical Systolic blood pressure (millimeters of mercury) Quantitative Level of calcium in blood (micrograms per milliliter) Quantitative Distribution Distribution Tells us what values the variable takes and how often it takes each value Displaying Distributions Categorical Variables Bar Graphs Pie Charts Quantitative Variables Dotplots Stemplots Histograms Example – Page 11, #1.6 In 1997 there were 92,353 deaths from accidents in the United States. Among these were 42.340 deaths from Motor vehicle accidents, 11,858 from falls, 10,163 from poisoning, 4051 from drowning, and 3601 from fires. A) Find the percent of accidental deaths from each of these causes, rounded to the nearest percent. What percent of accidental deaths were due to other causes? Example – Page 11, #1.6 Accidents Number Motor Vehicle 42,340 Falls 11,858 Poisoning Drowning 10,163 4051 Fires 3601 Other Causes 20,340 Total 92,353 Percentage 42,340 45.8 46% 92,353 13% 11% 4% 4% 22% 100% Example – Page 11, #1.6 STAT Example – Page 11, #1.6 Example – Page 11, #1.6 B) Make a well-labeled bar graph of the distribution of causes of accidental deaths. Be sure to include an “other causes” bar. Percentage of Accidental Deaths Example – Page 11, #1.6 US Accidental Death – 1997 50 40 30 20 10 MV Falls Poison Drown Fires OC Causes of Accidental Deaths Example – Page 11, #1.6 C) Would it also be correct to use a pie chart to display these data? If so, construct the pie chart. If not explain why not. Yes, since categories represent parts of a whole. Example – Page 11, #1.6 Accidents Number Percentage Pie Chart 0.46 360 46% 165.6 166 47 13% 11% 40 4% 14 4% 14 MV 42,340 Falls 11,858 Poisoning 10,163 Drowning 4051 Fires 3601 OC 20,340 22% Total 92,353 100% 79 360° Example – Page 11, #1.6 Example – Page 11, #1.6 US Accidental Deaths - 1997 22% 46% 4% 4% 11% 13% Motor Vehicle Falls Poisoning Drowning Fires Other Causes Lesson 1-1, Displaying Distributions with Graphs Dot Plots and Stem Leaf Plots Overall Pattern of Distribution (Quantitative Variables) Center Spread Smallest to largest values Shape Divides the data in half Skewness of the data Outlier Data that falls outside of the pattern Example – Page 16, #1.8 Are you driving a gas guzzler? Table 1.3 displays the highway gas mileage for 32 model year 2000 midsize cars. A). Make a dot plot of these data. Example – Page 16, #1.8 Example – Page 16, #1.8 21 23 25 27 29 Highway Gas Mileage 31 33 Example – Page 16, #1.8 B) Describe the shape, center, and spread of the distribution of gas mileages. Are there any potential outliers? The shape of the distribution is skewed to the left, with a major peak at 28 and a minor peak at 24. The spread is relatively narrow (21 to 32 mpg). The two observations at 21 and the observation at 32 appear to outliers. The center is 28 mpg. Example – Page 35, #1.28 In 1978 the English scientist Henry Cavendish measured the density of the earth by careful work with a torsion balance. The variable recorded was the density of the earth as a multiple of the density water. Here are Cavendish’s 29 measurements: 5.50 5.61 4.88 5.07 5.26 5.55 5.36 5.29 5.58 5.65 5.57 5.53 5.62 5.29 5.44 5.34 5.79 5.10 5.27 5.39 5.42 5.47 5.63 5.34 5.46 5.30 5.75 5.68 5.85 Example – Page 35, #1.28 5.50 5.61 4.88 5.07 5.26 5.55 5.36 5.29 5.58 5.65 5.57 5.53 5.62 5.29 5.44 5.34 5.79 5.10 5.27 5.39 5.42 5.47 5.63 5.34 5.46 5.30 5.75 5.68 5.85 Present these measurements graphically in a stemplot. Discuss the shape, center, and spread of the distribution. Are there any outliers? What is estimate of the density of the earth based on these measurements? Example – Page 35, #1.28 Density of the Earth 48 49 50 51 52 7 0 6 7 9 9 53 54 55 56 57 58 0 2 0 1 5 5 48|8 = 4.88% 8 4 4 3 2 9 4 6 5 3 6 9 7 7 8 5 8 The shape of the distribution is roughly symmetric with one possible outlier at 4.88 that is somewhat low. The spread between 4.88 to 5.85. The center of the distribution if between 5.4 and 5.5. Based on the plot, we would estimate the Earth’s density to be about halfway between 5.4 and 5.5. Lesson 1-1 Displaying Distributions with Graphs Histograms and Relative Frequency Graphs Histogram and categories GPAs of Spring 1998 Stat 250 Students Age of Spring 1998 Stat 250 Students 7 60 6 Frequency (Count) 50 40 30 20 5 4 3 2 1 10 0 0 2 18 23 28 Age (in years) 3 4 GPA n=92 students n=92 students too few categories too many categories Example – Histogram Suppose you are considering investing in a Roth IRA. You collect the data table, which represent the three-year rate of return (in percent) for 40 small capitalization growth mutual funds. 27.4 12.7 22.6 32.1 18.2 23.7 18.4 14.7 16.7 28.5 29.6 47.7 32.0 14.7 21.3 37.0 10.8 22.2 11.6 10.9 25.5 12.8 27.0 19.2 24.1 18.4 45.9 18.4 23.7 31.1 19.6 18.5 35.9 17.4 16.6 23.3 38.1 21.9 18.5 29.1 Example – Histogram STAT Example – Histogram A) Construct a histogram to display these data. Record your class intervals and counts Step 1 – Find the class intervals Locate the smallest number (10.8) and the largest number (47.7) Lower class limit will be 10.0 with a class width of 5 Example – Histogram 3-yr Rate of Return 10.00 14.9 15.0 19.9 20.0 24.9 25.0 29.9 30.0 34.9 35.0 39.9 40.0 44.9 45.0 49.9 Total Frequency 7 11 8 6 3 3 0 2 40 Example – Histogram Step – 2 Graph it using the TI Stat Plot 2nd Y= Window Example – Histogram Graph Trace Example - Histogram 3 – Year Rate of Return of Mutual Funds Frequency 12 8 4 10 15 20 25 30 35 40 Rate of Return 45 50 Example – Histogram B) Describe the distribution of 3 – Year Rate of Return. The shape of the distribution is skewed to the right with the center at class 15.0% – 19.9%. There is one outlier in class the 45.0% – 49.9%. The spread is between 10% to 50%. Shape of a Distribution Uniform (symmetric) Bell-shaped (Symmetric) Skewed Right Skewed Left Uniform Distribution Symmetric – Bell Shaped Skewed Right Skewed left Example – Relative Cumulative Frequency Suppose you are considering investing in a Roth IRA. You collect the data table, which represent the three-year rate of return (in percent) for 40 small capitalization growth mutual funds. 27.4 12.7 22.6 32.1 18.2 23.7 18.4 14.7 16.7 28.5 29.6 47.7 32.0 14.7 21.3 37.0 10.8 22.2 11.6 10.9 25.5 12.8 27.0 19.2 24.1 18.4 45.9 18.4 23.7 31.1 19.6 18.5 35.9 17.4 16.6 23.3 38.1 21.9 18.5 29.1 Example – Relative Cumulative Frequency Class 10.0 – 14.9 Freq 7 Relative Frequency 7 0.175 40 15.0 – 19.9 11 20.0 – 24.9 8 25.0 – 29.9 6 30.0 – 34.9 3 0.275 0.20 0.15 0.075 35.0 – 39.9 3 0.075 40.0 – 44.9 0 45.0 – 49.9 2 Total 40 0 0.05 1 Cumulative Frequency 7 Relative cumulative Frequency 0.175 7 11 18 0.175 0.275 0.45 18 8 26 32 35 0.45 0.2 0.65 0.8 0.875 0.95 38 38 40 0.95 1 Example – Relative Cumulative Frequency Class Freq Rel Freq Cum Freq Rel Cum Freq 20.0 – 24.9 8 0.2 26 0.65 45.0 – 49.9 2 0.05 40 1 26 of the 40 mutual funds had a 3 year rate of return of 24.9% or less 65% of the mutual funds had 3 year rate of return of 24.9% or less A mutual fund with a 3 year rate of return of 45% or higher is out performing 95% of its peers. Example – Relative Cumulative Frequency L3 – Upper Class Limits L4 – Relative Cumulative Frequency Example – Relative Cumulative Frequency Example – Relative Cumulative Frequency Cumulative Relative Frequency 3 Year Rate of Return for Small Capitalization Mutal Funds 1.2 1 0.8 0.6 0.4 0.2 0 10 14.9 19.9 24.9 29.9 34.9 Rate of Return 39.9 44.9 49.9 Lesson 1-2 Describing Distributions with Numbers Measuring the center Mean To find the sample mean add up all of the observations and divided by the number of observations. x x1 x2 ... xn X X n n Is affected by unusual values called outliers. Median The median is the midpoint of a distribution, such that half the observation are smaller and the other half are larger. Another name for the 50th percentile Is not affected by unusual values called outliers Center and Distribution Mean < Median Mean = Median Skewed Left Symmetric Mean > Median Skewed Right Measuring the Spread Range Quartiles Boxplots Standard Deviation Variance Range The range is the difference between the largest and smallest observation. R xmax xmin Quartiles Quartiles divides the observation into fourths, or four equal parts. Smallest Data Value 25% of the data Q2 Q1 25% of the data Q3 25% of the data Largest Data Value 25% of the data Interquartile Range (IQR) The interquartile range (IQR) is the distance between the first and third quartiles IQR Q3 Q1 Outliers Upper Cutoff Q3 1.5(IQR ) Lower Cutoff Q1 1.5(IQR ) Five Number Summary Smallest observation (minimum) Quartile 1 Quartile 2 (median) Quartile 3 Largest observation (maximum) Example – Page 41, #1.32 The Survey of Study Habits and Attitudes (SSHA) is a Psychological test that evaluates college students’ Motivation, study habits and attitudes toward school. A private college gives the SSHA to a sample of 18 of Its incoming first-year women students. There scores are 154 109 137 115 152 140 154 178 101 103 126 126 137 165 165 129 200 148 Example – Page 41, #1.32 A) Make a stemplot of these data. The overall shape of the distribution is irregular, as often happens when only a few observations are available. Are there any potential outliers? About where is the center of the distribution (the score with half the scores above it and half below)? What is the spread of the scores (ignoring any outliers)? STATEDIT1:edit Example – Page 41, #1.32 10 11 1 5 3 9 12 13 6 7 6 7 9 14 15 16 0 2 5 8 4 5 17 18 19 20 8 0 4 200 is a potential outlier. The center Is approximately 140. The spread (excluding 200) is 178 – 101 = 77. Example – Page 41, #1.32 154 109 137 115 152 140 154 178 101 103 126 126 137 165 165 129 200 148 Example – Page 41, #1.32 B) Find the mean. x 141.058 C) Find the median of these scores. Which larger: the median or the mean? Explain why. Median 138.5 The mean is larger than the median because the outlier at 200, which pulls the mean towards the long right tail of the distribution. Example – Page 47, #1.36 Here are the scores on the Survey of Study Habits and Attitudes (SSHA) for 18 first-year college women: 154 109 137 115 152 140 154 178 101 103 126 126 137 165 165 129 200 148 and for 20 first-year college men: 108 140 114 91 180 115 126 92 169 146 109 132 75 88 113 151 70 115 187 104 A) Make side-by side boxplots to compare the distribution. Example – Page 47, #1.36 SSHA SCORES Women Men Box Plot 0 40 80 120 160 200 Example – Page 47, #1.36 B) Compute the numerical summaries for these two distributions. x Women 141.06 Men 121.25 Min Q1 Median Q3 Max 101 126 138.5 154 200 70 98 114.5 143 187 Example – Page 47, #1.36 C) Write a paragraph comparing SSHA scores for men and women. All the displays and descriptions reveal that women generally score higher than men. The men’s scores (IQR = 45) are more spread out than the women’s (even if we don’t ignore the outlier). The shapes of the distributions are reasonable similar, with each displaying right skewness. Describing Distributions with Numbers Standard Deviation and Variance Standard Deviation The standard deviation (s) measures the average distance of observations from their mean. Example, Page 52, #1.40 The level of various substances in the blood influence our health. Here are measurements of the level of phosphate in the blood of a patient, in milligrams of phosphate per deciliter of blood, made on 6 consecutive visits to a clinic. 5.6 5.2 4.6 4.9 5.7 6.4 Example, Page 52, #1.40 5.6 5.2 4.6 4.9 5.7 6.4 A. Find the mean. x 5.6 5.2 4.6 4.9 5.7 6.4 32.4 5.4 6 6 Example, Page 52, #1.40 Observation xi Deviations xi x 5.6 5.6 5.4 0.2 5.2 4.9 5.2 5.4 0.2 4.6 5.4 0.8 4.9 5.4 0.5 5.7 5.7 5.4 0.3 6.4 6.4 5.4 1 4.6 0 Square Deviations xi x 2 Example, Page 52, #1.40 x 5.4 x 4.6 0.8 4.5 5.0 x 6.4 1 5.5 6.0 6.5 Example, Page 52, #1.40 Observation xi Deviations xi x 5.6 5.6 5.4 0.2 5.2 4.9 5.2 5.4 0.2 4.6 5.4 0.8 4.9 5.4 0.5 5.7 5.7 5.4 0.3 6.4 6.4 5.4 1 SUM 0 4.6 Square Deviations xi x 2 (0.2)2 0.04 0.04 0.64 0.25 0.09 1 SUM 2.06 Example – Page 52, #1.40 B) Find the standard deviation (s) from its definition. 1 1 1 2 s xi x 6 1 2.06 5 2.06 0.412 n 1 2 s s 2 0.412 0.64187 0.6419 Example – Page 52, #1.40 C) Use your TI-83 to find x and s. Do the result agree with part B. STAT Example – Page 52, #1.40 Standard Deviation Standard deviation (s) is the square root of the variance (s² ) Units are the original units Measures spread about the mean and should only be used when the mean is chosen as the center If s = 0 then there is no spread. Observations are the same value As s gets larger the observations are more spread out. Highly affected by outliers. Best for symmetric data Variance Variance (s²) measures the average squared deviation of observations from the mean Units are squared Highly affected by outliers. How to Choose? Skewed Distribution or Outliers Five number summary Symmetric Distribution or No Outliers Mean Standard Deviation Homework HW, page 52, #1.41, 1.43 Read pages 53 – 61 Linear Transformation A linear transformation changes the original variable x into the new variable xnew given by an equation of the form xnew a bx Adding the constant a shifts all values of x upward or downward by the same amount. Multiplying by the positive constant b changes the size of the unit of measurement. Example – Page 56, #1.44 Maria measures the lengths of 5 cockroaches that she finds at school. Here are her results in inches 1.4 2.2 1.1 1.6 1.2 A. Find the mean and standard deviation. Example – Page 56, #1.44 1.4 2.2 1.1 1.6 1.2 Example – Page 56, #1.44 B) Maria’s science teacher is furious to discover that she has measured the cockroaches lengths in inches rather than centimeters. (There are 2.54 cm in 1 inch). Find the mean and standard deviation of the 5 cockroaches in centimeters. x 1.5 1.5(2.54) s 0.436 0.436(2.54) 3.81 cm 1.017 cm Example – Page 56, #1.44 C) Considering the 5 cockroaches that Maria found as a small sample from the population of all cockroaches at her school, what would you estimate as the average length of the population of cockroaches? How sure of your estimate are you? The average cockroach length can be estimate as the mean length of the 5 sampled cockroaches of 1.5 inches. This is a questionable estimate, because the sample is so small. Example – Page 63, #1.56 A change of units that multiplies each unit by b, such as change xnew 0 2.54x from inches x to centimeters xnew, multiplies our usual measures of spread by b. This is true of the IQR and standard deviation. What happens to the variance when we change units this way? Variance is changed by a factor of 2.54² = 6.4516 Homework HW, Page 56, #1.45 HW, Page 63, #1.55 1-2 Describing Distributions with Numbers. Comparing Distributions Example – Page 59, #1.48 The table below gives the distribution grades earned by students taking the Calculus AB and Statistics exam in 2000. 5 4 3 2 1 Calculus 16.8% 23.2% 23.5% 19.6% 16.8% Statistics 9.8% 21.5% 22.4% 20.5% 25.8% A. Make a graphical display to compare the AP exam grades for Calculus AB and Statistics. Example – Page 59, #1.48 2000 AP Exam % of students Earning Grade 30.0 25.0 20.0 Calculus AB Statistics 15.0 10.0 5.0 0.0 1 2 3 Grade on Exam 4 5 Example – Page 59, #1.48 B) Write a few sentences comparing the two distributions of exam grade. Do you know which now know which exam is easier? Why or why not? The distributions are very similar for grades 2, 3, and 4. The major difference occurs for grades 1 and 5. With a larger proportion of Statistics students receiving a grade of 1 and a smaller proportion of Statistics student receiving a grade of 5. This suggest that the Statistics exam is harder in the sense that students are more likely to get a poor grade on the Statistics Exam than on the Calculus AB exam. Example – Page 63, 1.54 The mean x and standard deviation s measure the center and spread but are not a complete description of a distribution. Data sets with different shapes can have the same mean and standard deviation. To demonstrate this fact, use your calculator to find x and s for the following to small data sets. Then make a stem plot of each and comment on the shape of each distribution Data A 9.14 8.14 8.74 8.77 9.26 8.10 6.13 3.10 9.13 7.26 4.74 Data B 6.58 5.76 7.71 8.84 8.47 7.04 5.25 5.56 7.91 6.89 12.50 Example – Page 63, 1.54 Set A Set B Example – Page 63, 1.54 Set A 3 1 4 7 5 6 1 7 2 8 1 1 7 7 9 1 1 2 3|1 = 3.1 Set B 5 2 5 7 6 7 5 8 0 7 9 8 4 8 9 10 11 12 5 Example – Page 63, 1.54 The means and standard are basically the same. Set A is skewed to the left, while Set B has a higher outlier. Homework HW, Page 59, #1.47, #1.49 HW, Page 62, #1.51, 1,57