Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Transcript

Methods to take large amounts of data and present it in a concise form › Want to present height of females and males in STA 220 › Could measure everyone and graph results › More interested in that describes the most likely representation of the height of the students in the class This is called 2 Once you have your measure of centrality may want or need to know Is the data repeatable? › This would be 3 3 common measures of centrality › › › 4 Mean › Mathematical average of all the data Mean Mean sample size data sample size 5 Example › Suppose Suzy is taking Chemistry. There is a lab quiz every other week. Near the end of the semester, Suzy wants to determine her quiz average. Her quiz scores are: 78, 92, 83, 95, 98, 87 and 93. 78 92 83 95 93 Mean 7 626 Mean 7 Mean 6 Mathematical shorthand: › Data points are often referred to as xi where i is 1…n, n being › For Suzy’s quiz scores, n = 7 and x1 = 78, x2 = 92, x3 = 83, x4 = 95, x5 = 98, x6 = 87, and x7 = 93. › The mean would be denoted by , called x-bar. For Suzy’s quizzes, x 89.43 7 The median is the of the dataset, such that half of all data points are to that value AND half of all data points are to that value. 8 To find the median: 1. Rearrange data from smallest to largest 2. If n is odd, calculate 3. If n is even, calculate 4. Count the sorted data set until you get to the data point in the position you calculated in part 2 or 3 5. If the number of data points, n, was odd, then you are done. If n is even, then compute the mean of the data point in the position and position. 9 Example › Given the following salary information from a group of engineers, determine the median salary: $75,400; $83,600; $45,700; $43,900; $62,100; $90,500; $55,800. › First reorder the data in increasing order: 43,900; 45,700; 55,800; 62,100; 75,400; 83,600; 90,500 › Since n = 7 is odd, compute = (7+1)/2 = 4 43,900; 45,700; 55,800; ; 75,400; 83,600; 90,500 10 Example › A group of students are taking the following number of credit hours: 12, 17, 15, 14, 9, 16, 18, 16, 14, 12. Find the median number of credit hours being taken by this group of students. › Put the data in increasing order: 9, 12, 12, 14, 14, 15, 16, 16, 17, 18 › Since n = 10 is even, compute = 10/2 = 5 › Next, identify the data points in the fifth and sixth position 9, 12, 12, 14, 14, 15, 16, 16, 17, 18 › Compute the mean of the fifth and sixth data points =14.5 11 The mode is the number that appears the most often in the data set. Example: Here are the number of cavities found in a class of 1st graders: › 0,1,0,1,0,5,5,3,4,0,0,2,0,1,0,3,2,4,7,1. Find the mode. › 0 occurs times, while 1 occurs times, 2, 3, 4, and 5 occurs , and 7 occurs once. As 0 occurs the most often, it is . 12 Comparing Mean, Median, Mode › Mean Strong Points Uses all of the data Weak Points Sensitive to extremes. Test scores: 34, 92, 95, 94, 89 have a mean of 80.8. If the professor dropped the lowest test score, 34, then the mean would be May not be an actual, observable value. For example, the average family has 1.6 children. What does it mean to have 0.6 of a child? 13 Comparing Mean, Median and Mode › Median and Mode Strong Points Not sensitive to . In test score example from before the median would be 34, 89, 92, 94, 95. The mode is an observable value; the median is an observable value Weak Points The value may not be unique. In the case of the mode, it is possible to have several values that appear the most. Both do not use actual/all data values. The mode keys in on frequency, while the median just looks at the middle of the data set. 14 In 1995, the mean salary of a MLB player was $1,080,000 while the median salary of a MLB player was $275,000. › Recall the median is the point where half of the data points are above and half are below – Thus, at least half of the players in the MLB earned less than › A mean of $1,080,000 tells you that there are players earning millions of dollars – but this may not be the number of all players in the MLB 15 The Corps of Engineers wants to dredge a harbor in Hackensack, NJ. The EPA has these guidelines for harbor dredging: › The sediment is tested for the presence of PCBs. › If PCBs < 25 parts per billion, then its OK to dredge and dump. › If 25 ppb ≤ PCBs ≤ 50 ppb, then its OK to dredge and dump, but then a cap must be placed on the dump pile. › If PCBs ≥ 50 ppb, then the harbor can not be dredged and dumped. 16 6 samples are taken, and the average PCBs was 46.5 ppb. The Corps of Engineers should be allowed to dredge and dump the harbor, then cap the dump site…or should they? The actual samples were: 66, 74, 81, 55, 1, 2. › The average is › The median is 17 Measures of variability describe the of the data All measures of variability are greater than or equal to › Measures close to indicate that the data is highly consistent and repeatable 4 measures of variability: Average deviation, Standard Deviation , , 18 Range › Difference between the largest data point in the dataset and the smallest data point in the dataset › or Range = Example › Suppose the daily low temperatures for the past week have been -3, -7, -2, 0, 2, 4. What is the range? › Range = = 11 19 Average Deviation › The average deviation of the data from its mean value. › There are 4 steps: 1. Compute the of the data set, x-bar 2. Calculate the absolute value of the between each data point, xi , and the mean value, x-bar 3. Add up all of the values calculated in step 2 4. Divide the sum from step 3 by 20 Average Deviation, Example › Suppose you have the following four data points in your dataset: 1,2,4,5. Find the average deviation. 1 2 4 5 1. x 3 4 2. 1 - 3 2; 2 - 3 1; 4 - 3 1; 5 - 3 2 3. 6 4. 1.5 4 21 Average Deviation › In mathematical shorthand, the average deviation can be expressed as: AverageDeviation › Good method is to make a table: |Xi – (x-bar)| Result 1 |1-3| 2 2 |2-3| 1 4 |4-3| 1 5 |5-3| 2 12/4 = 3 6/4 = 1.5 22 Variance › Similar to average deviation 1. Compute the mean of the dataset, x-bar 2. Calculate the difference between each data point, xi , and the mean value, x-bar 3. all of the values in step 2 4. Add up all the values in step 3 5. Divide the sum in step 4 by the total number of data points 23 Variance, Example › Good idea to make a table similar to the one we used for average deviation Xi Xi – (x-bar) Xi – (x-bar) 1 1-3 -2 4 2 2-3 -1 1 4 4-3 1 1 5 5-3 2 4 12/4 = 3 24 Variance › Mathematical shorthand: Variance 25 Standard Deviation › The standard deviation is just the › By taking the square root, the units of the standard deviation are the same as the original units of the data › In the previous example: Standard Deviation Variance Standard Deviation Standard Deviation 1.58 inches 26