Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter Three Numerical Descriptive Measures 1. Age Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count Largest(1) Smallest(1) 20.61 0.552 20 19 3.172 10.059 7.659 2.663 15 18 33 680 33 33 18 4. Study / Week (hrs) 13.67 1.165 12.5 10 6.693 44.792 1.160 1.123 28 5 33 451 33 33 5 5. Auto Cost ($) 21303.45 2971.803 18000 18000 16003.649 256116773.399 1.750 1.320 65000 1000 66000 617800 29 66000 1000 6. 7. Alch bev / Sodas / wk (#) wk (#) 8.83 1.957 6 0 11.071 122.558 5.879 2.242 50 0 50 282.5 32 50 0 3.56 0.798 2 0 4.584 21.012 4.415 1.967 20 0 20 117.5 33 20 0 10. 9. TV / 8. No. units video Hrs. Paid this sem game / / wk (hrs) (#) wk (hrs) 9.53 2.140 4.5 0 12.105 146.531 0.969 1.291 40 0 40 305 32 40 0 16.56 0.359 16 16 2.061 4.246 2.604 0.983 10 13 23 546.5 33 23 13 7.41 0.917 5 4 5.269 27.757 0.072 1.020 18 2 20 244.5 33 20 2 11. Movie theater / wk (#) 5.47 0.831 3.5 3 4.775 22.796 3.973 2.046 20 0 20 180.5 33 20 0 12. $/wk entertain ($) 46.41 6.335 40 100 35.836 1284.249 0.840 1.091 150 0 150 1485 32 150 0 17. 13. last volunteer 14. 16. semest 19. / year $ in largest bal on bad class 18. Int'l trips (hrs) wallet ($) cc ($) (#) GPA now (#) Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count Largest(1) Smallest(1) 16.91 4.333 5 0 24.889 619.460 1.784 1.674 80 0 80 558 33 80 0 26.35 4.161 20 20 23.901 571.258 1.841 1.364 100 0 100 869.5 33 100 0 1143.79 296.888 700 0 1705.491 2908697.922 8.437 2.811 7300 0 7300 37745 33 7300 0 1.09 0.159 1 1 0.914 0.835 1.589 0.858 4 0 4 36 33 4 0 3.21 0.075 3.3 3.3 0.418 0.175 -0.445 -0.164 1.62 2.38 4 99.49 31 4 2.38 5.95 1.441 3 1 8.150 66.425 4.965 2.228 35 0 35 190.5 32 35 0 20. Gamble Indian Casino (#) 1.34 0.634 0 0 3.589 12.878 8.294 3.006 15 0 15 43 32 15 0 21. Fly since 9/11 6.28 0.912 5 4 5.157 26.596 1.259 1.113 20 0 20 201 32 20 0 22. fast car (mph) 120.71 5.000 120 100 27.837 774.880 -0.912 0.272 110 70 180 3742 31 180 70 Commonly used Descriptive Measures: 1) Measures of Central Tendency 2) Measures of Variation 3) Measures of Position 4) Measures of Shape Measures of Central Tendency Purpose: To determine the “centre” of the data values. Measures of Central Tendency Answer questions • Where is the middle of my data? {Mean, Median, Midrange} • Which data value occurs most often? {Mode} The Mean The sample mean is denoted by x-bar The population mean is denoted by µ (mu) x = individual data values X-bar = Σx / n µ = Σx / N Example: The following are accident data for a 5 month period: 6, 9, 7, 23, & 5 To calculate the average number of accidents per month: X-bar = Σx / n X-bar = (6 + 9 + 7 + 23 + 5) ÷ 5 X- bar = 10.0 Statistic What is the average person’s monetary value to society? The Median is the centre value in a data set when the data are arranged from smallest to largest. What do we call this ordering process? By arranging the data in an Ordered Array: 5, 6, 7, 9, & 23 With an even number of observations, the value that has an equal number of items to the right and to the left is the Median. Md = 7 To calculate the median with an even number of observations, average the two center values of the ordered set. Example: With an ordered array: 5, 6, 7, & 9 Md = ( 6 + 7 ) ÷ 2 = 6.5 If there is an odd number of observations: Md = (n + 1 ) ÷ 2 where n = # of observations Remember: Median describes the centrally placed location of a value relative to the rest of the data. Question Is the mean or median more sensitive to extreme values (outliers)? Explain. The mean is affected by every value. The median is unaffected by extreme values. The mean is pulled toward extreme values. The median does not use all data information available. Question: When dealing with data that are likely to contain outliers (personal income, ages, or prices of houses), would the Mean or Median be preferred as the measure of central tendency? Why? Think of the Median as providing a more “typical” or “representative” value of the situation. The Mode (Mo) The value that occurs most frequently. Questions? 1) Can there be more than one mode? 2) Is the mode affected by extreme values? 3) For continuous variables, is it possible that a mode does not exist? Explain? 4) Is the mode always a measure of central tendency? Give an example of when the mode may provide more useful information than the mean or the median. Example From a purchaser’s standpoint, the most common hat or jeans size is what you would like to know, not the average hat or jeans size. Measures of Central Tendency are useful. Means Medians Modes The use of any single statistic to describe a complete distribution fails to reveal important facts. Dig Deeper! Measures of Variation Answers the question: “How spread out are my data values?” Consider Two Scenarios Scenario 1: Jack buys a car & pays $1000. Jill buys a car & pays $21,000. Average Price = $11,000 Scenario 2: Bob buys a car & pays $10,000. Mary buys a car & pays $12,000. Average Price = $11,000 Based on the data, both scenarios report the same “average price.” What’s the difference? Quiz Suppose you are a purchasing agent for a large manufacturing company. Your two suppliers fill your orders in an average of 10 days. The following histograms plot the delivery time of the two suppliers. Do the two suppliers have the same reliability in terms of making deliveries on time? Homogeneity: the degree of similarity within a set of data values. The mean of a homogeneous data set is far more representative of the typical value than a mean of a heterogeneous data set. If all the data values in a sample are identical, then the mean provides perfect information, the variation is zero, and the data are perfectly homogeneous. Variation: the tendency of data values to scatter about the mean, x-bar. If all the data values in a sample are identical, then the mean provides perfect information, the variation is zero, and the data are perfectly homogeneous. Commonly used Measures of Variation: • • • • Range Variance Standard Deviation Coefficient of Variation (CV) The Range Range = H – L The value of the range is strongly influenced by an outlier in the sample data. Variance & Standard Deviation During a five week production period, a small company produced 5,9,16,17,& 18 computers, respectfully. The average = 13 computers/wk Describe the variability in these five weeks of production. Variance & Standard Deviation Formulas for Variance & Standard Deviation 1. Age Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count Largest(1) Smallest(1) 20.61 0.552 20 19 3.172 10.059 7.659 2.663 15 18 33 680 33 33 18 4. Study / Week (hrs) 13.67 1.165 12.5 10 6.693 44.792 1.160 1.123 28 5 33 451 33 33 5 5. Auto Cost ($) 21303.45 2971.803 18000 18000 16003.649 256116773.399 1.750 1.320 65000 1000 66000 617800 29 66000 1000 6. 7. Alch bev / Sodas / wk (#) wk (#) 8.83 1.957 6 0 11.071 122.558 5.879 2.242 50 0 50 282.5 32 50 0 3.56 0.798 2 0 4.584 21.012 4.415 1.967 20 0 20 117.5 33 20 0 10. 9. TV / 8. No. units video Hrs. Paid this sem game / / wk (hrs) (#) wk (hrs) 9.53 2.140 4.5 0 12.105 146.531 0.969 1.291 40 0 40 305 32 40 0 16.56 0.359 16 16 2.061 4.246 2.604 0.983 10 13 23 546.5 33 23 13 7.41 0.917 5 4 5.269 27.757 0.072 1.020 18 2 20 244.5 33 20 2 11. Movie theater / wk (#) 5.47 0.831 3.5 3 4.775 22.796 3.973 2.046 20 0 20 180.5 33 20 0 12. $/wk entertain ($) 46.41 6.335 40 100 35.836 1284.249 0.840 1.091 150 0 150 1485 32 150 0 Empirical Rule Normally Distributed Data w/ Empirical Rule Example: Empirical Rule A company produces a lightweight valve that is specified to weigh 1365 g. Unfortunately, because of imperfections in the manufacturing process not all of the valves produced weigh exactly 1365 grams. In fact, the weights of the valves produced are normally distributed with a mean weight of 1365 grams and a standard deviation of 294 grams. Question? 1) Within what range of weights would approximately 95% of the valve weights fall? 2) Approximately 16% of the weights would be more than what value? 3) Approximately 0.15% of the weights would be less than what value? Answers: 1) 1365 +/- 2σ = 777 to 1953 2) 1365 + 1 σ = 1659 3) 1365 - 3 σ = 483 Example 2: Standard Deviation & the Empirical Rule A recent report states that for California the average statewide price of a gallon of regular gasoline is $1.52. Suppose regular gas prices vary across the state with a standard deviation of $0.08 are normally distributed. With x-bar = $1.52 & s = $0.08 1) Nearly all gas prices (97.7%) should fall between what prices? 2) Approximately 16% of the gas prices should be less than what price? 3) Approximately 2.5% of the gas prices should be more than what price? Answers: 1) µ +/- 3σ = $1.28 and $1.76 2) $1.44 (Since 68% of the prices lie w/in 1σ of the mean, 32% lie outside this range: 16% in each tail. 3) $1.68 (Since 95% of the price lie w/in 2 σ of the mean, 5% lie outside this range: 2.5% in each tail. Coefficient of Variation Compares the variation between two data sets with different means and different standard deviations and measures the variation in relative terms. Coefficient of Variation (CV) formula CV = σ / µ (100) CV Example 1 Spot, the dog, weighs 65 pounds. Spot’s weight fluctuates 5 pounds depending on Spot’s exercise level. Sea Biscuit, the horse, weighs 1200 pounds. Sea Biscuit’s weight fluctuates 125 pounds depending on the number of rides Sea Biscuit goes on. Question? Relatively speaking, which animal’s weight, Spot or Sea Biscuit’s, varies the most? Coefficient of Variation vs. Standard Deviation Some financial investors use the coefficient of variation or the standard deviation or both as measures of risk. What does the Coefficient of Variation tell us about the risk of a stock that the standard deviation does not? Relative to the amount invested in a stock, the coefficient of variation reveals the risk of a stock in terms of the size of standard deviation relative to the size of the mean (in percentage). CV Example 2: SUPPOSE: Five weeks of average prices for stock A are: $57, $68, $64, $71, and $62. While five weeks of average prices for stock B are: $12, $17, $8, $15, and $13. QUESTION: Relative to the amount of money invested in the stock, which stock, A or B, is riskier? Stock A vs. Stock B in terms of Risk Stock A Stock B µ = 64.40 µ = 13 σ = 4.84 σ = 3.03 CV = σ/ µ (100) = 7.5% CV = σ/ µ (100) = 23.3% Measures of Position Indicate how a particular value fits in with all the other data values. Commonly used measures of position are: Percentiles Quartiles Z-scores TO FIND THE LOCATION OF THE Pth PERCENTILE: • Determine n ∙ P /100 and use one of the following two location rules: • Location rule 1. If n ∙ P /100 is NOT a counting number, round up, and the Pth percentile will be the value in this position of the ordered data. • Location rule 2. If n ∙ P /100 is a counting number, the Pth percentile is the average of the number in this location (of the ordered data) and the number in the next largest location. Use the two rules of percentiles and the following data to determine both the 85th and the 50th percentile for starting salary. Starting Salary Data: 3130 2940 2920 2710 2850 2880 2755 3050 2880 3325 2950 2890 Step 1: Arrange the data in ascending order 2710 2920 2755 2940 2850 2950 2880 3050 2880 3130 2890 3325 Step 2: Use the formula for percentiles n ∙ P /100 & Identify the 85th percentile given 12 observations i = n (p /100) = 12 (85/100) = 10.2 Because i is not an integer, round up. The position of the 85th percentile is the next integer greater than 10.2, the 11th position. 2710 2920 2755 2940 2850 2950 2880 3050 2880 3130 2890 3325 From the data, the 85th percentile is the value in the 11th position, or $3130. To calculate the 50th percentile, apply step 2: • n ∙ P /100 • i = 12 (50/100) = 6 • Because i is an integer, the 50th percentile is the average of the sixth and seventh values: (2890 + 2920) /2 = 2905. 2710 2920 2755 2940 2850 2950 2880 3050 2880 3130 2890 3325 Quartiles • Quartiles are merely particular percentiles that divide the data into quarters: • Q1 = 1st quartile = 25th percentile (P25) • Q2 = 2nd quartile = 50th percentile (P50) • Q3 = 3rd quartile = 75th percentile (P75) • Quartiles are used as benchmarks, much like the use of A,B,C,D, and F on exam grades. Z- Scores A z-score determines the relative position of any particular data value x, and is expressed in terms of the number of standard deviations above or below the mean. Measures of Shape Measures of shape address skewness and kurtosis. Skewness • Symmetric data = the sample mean = sample median • Right-skewed (positive) = mean > median • Left-skewed (negative) = mean < median Closing Example: The number of defects in 10 rolls of carpets are: 3, 2, 6, 0, 1, 3, 2, 1, 0, 4 • • What are the 75th percentile and the 50th percentile? What are the mean, standard deviation, and coefficient of variation?