Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 1 Looking at Data Types of variables Looking at Data Be sure that each variable really does measure what you want it to. A poor choice of variables can lead to misleading conclusions!! For example, in most situations, a rate is more meaningful than a simple count. Distributions can be of the form Table Graph Formula Categorical variables Count = frequency (# of times that category was observed) Percent = relative frequency = proportion How to display categorical variables: Table Pie chart Uses the relative frequency to construct angles Relative frequency defines how big the “slice” of the pie is Bar graph Can have both relative frequency and frequency bar graphs Height of the bar indicates either the relative frequency or the frequency of that category Categorical variables continued Figure 1.3 2002 Statistical Abstract of the United States Introduction to the Practice of Statistics, Sixth Edition © 2009 W.H. Freeman and Company Quantitative Variables Stemplot Determine stems and leaves Write down ALL stems from smallest to largest Write leaves along side corresponding stems Order leaves Histograms Breaks the range of a variable into intervals (called classes) Classes should be of equal length Stemplot Table 1.2 Introduction to the Practice of Statistics, Sixth Edition © 2009 W.H. Freeman and Company Stemplot for Female Figure 1.5 Female Introduction to the Practice of Statistics, Sixth Edition © 2009 W.H. Freeman and Company Example of histogram Table 1.3 Introduction to the Practice of Statistics, Sixth Edition © 2009 W.H. Freeman and Company Histogram Figure 1.7 Introduction to the Practice of Statistics, Sixth Edition Quantitative Variables continued Examining distributions of Quantitative Variables is best done by looking at graphs Overall pattern (shape, spread, center) Outliers (values outside pattern of data) Modes – the peaks in a distribution (unimodal, bimodal, no modes) Shape of distribution Symmetric Right Skewed Left Skewed Example of Outliers Two lower outliers (at 0) were because the bonds between the wire and the wafer were not made. The high outlier at 3150 was a measurement error. Figure 1.9 Introduction to the Practice of Statistics, Sixth Edition © 2009 W.H. Freeman and Company Time Plot Shows how variable changes over time (time is always on the horizontal axis) Seasonal variation – systematic pattern that keeps reappearing Trend - persistent long-term rise or fall Example of Time plot Table 1.4 Volume of water discharged by Mississippi River into the Gulf of Mexico Figure 1.10 Introduction to the Practice of Statistics, Sixth Edition © 2009 W.H. Freeman and Company 1.2 Describing Distributions with Numbers Measuring center Mean Median (see data next page) Mode In a symmetric distribution, the mean and median are close to each other Right skewed – mean is higher than median Left skewed – mean is lower than median Comparing Mean and Median Figure 1.27 Introduction to the Practice of Statistics, Sixth Edition © 2009 W.H. Freeman and Company Table 1.8 Introduction to the Practice of Statistics, Sixth Edition © 2009 W.H. Freeman and Company 1.2 Continued If outliers are present in data, it is better to use median (also better to use median if the distribution is skewed) Why is spread so important? Measuring spread Range Standard deviation Quartiles Measuring Spread Range Maximum – Minimum Standard deviation Average deviation from mean Properties of standard deviation Measures spread about mean (should only be used when mean is used as the measure of central tendency s = 0 only when there is no spread Outliers affect s Quartiles Quartiles pth percentile - p% fall at or below that value (100-p)% falls above 25th percentile = 1st Quartile (Q1) 50th percentile = 2nd Quartile (Q2) 75th percentile = 3rd Quartile (Q3) Quantiles To find Quartiles Order data Find median First Quartile is the median of the first half of data Third Quartile is the median of the second half of data Use Guinea pig example Data is already order, n=72 IQR Inter-quartile range (IQR)=Q3-Q1 Five-number summary Minimum, Q1, Median, Q3, Maximum Boxplot – displays the five-number summary Box from Q1 to Q3 Line at the median “Whiskers” to the maximum and minimum Two-seater cars versus Minicompact cars Figure 1.19 Introduction to the Practice of Statistics, Sixth Edition © 2009 W.H. Freeman and Company Boxplot Example from text book Five-number summary: Modified boxplot (helps detect outliers) 43,82.5,102.5,151.5,598 Calculate 1.5*IQR Q1 – 1.5*IQR Q3+1.5*IQR Draw box and line (similar to before). Draw whiskers to minimum and maximum observation within (Q1 – 1.5*IQR, Q3+1.5*IQR). Observations outside this range should be plotted separately. Example of Modified Boxplot From text book IQR = 151.5-82.5 = 69 1.5*IQR = 103.5 82.5-103.5=0 (just truncated at 0) 151.5+103.5 = 255 Possible outliers? Draw boxplot Choosing a Summary for Data Set If distribution is skewed or has outliers, it is best to use the five-number summary. If distribution is “reasonably” symmetric, use the mean and standard deviation. ALWAYS PLOT DATA BEFORE DECIDING ON A NUMERICAL SUMMARY 1.3 The Normal Distribution Density curve Always on or above horizontal axis Area under curve equal to 1 Symmetric density curves have equal mean and median Normal distribution Mean=Median=Mode Symmetric, unimodal Area under curve = 1 Mean and spread of the normal distribution Figure 1.28 Introduction to the Practice of Statistics, Sixth Edition © 2009 W.H. Freeman and Company Empirical Rule(68-95-99.7% Rule) Approximately 68% of the data will fall within one standard deviation of the mean Approximately 95% of the data will fall within two standard deviations of the mean Approximately 99.7% of the data will fall within three standard deviations of the mean Figure 1.29 Introduction to the Practice of Statistics, Sixth Edition © 2009 W.H. Freeman and Company Example Weights of apples are normally distributed with a mean of 10 oz and a standard deviation of 2 oz. The middle 68% of apples weigh between _____ and _____. Middle 95% Middle 99.7% Approximately what percent of apples weigh below 6oz? Approximately what percent of apples weigh above 4 oz? Z-scores Tells # of standard deviations an observation is from the mean. Negative z-scores (observation is below the mean) Positive z-scores (observation is above the mean) Z-score = 0 (observation is equal to the mean) Z-scores Z= (X-m)/s Find z-score for an apple that weighs 11 oz. 15 oz? 5 oz? If we assume the distribution of the variable is normal, then the z-scores have a standard normal distribution. Standard Normal Distribution The standard normal distribution has a mean of 0 and a standard deviation of 1. Can use Table A to get area under the curve for a standard normal. Area under curve = proportion (percent) Look at table What percent of apples weigh below 7 oz? What percent of apples weigh more than 5oz?