Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics CSCE 115 Revised: 10/27/04, 3/30/05, 10/3/05 1 Reference Mathematics and the Modern World by Mario F. Triola, Copyright 1973 by Cummings Publishing Company, Inc. 2 Some on-line references: http://www.socialresearchmethods.net/kb/statdesc.htm http://www.tcd.ie/Economics/staff/fotoole/JF%20EC1030 /Chapter%203.htm http://courses.umass.edu/resec312/labs/Lab%201%20%20Spring%202005.pdf http://www.it.bond.edu.au/ianl111/Lectures/week1.ppt (slides 1-23) (May load slowly or not at all) http://classweb.gmu.edu/mgabel/unit1_2001/lab_1_deta ils.htm http://www.physics.csbsju.edu/stats/descriptive2.html http://www.socialresearchmethods.net/kb/statinf.htm 3 History of Statistics Founded by John Graunt (1620-1674) Studied “Bills of Mortality” Published “Natural and Political observations made on the Bills of Mortality” 4 Graunt’s Observations More males are born than females (“thirteenth part”) More men die violent deaths than women Number of adult males approximately equal the number of adult women Few people starve 5 Other Leaders in Statistics 1692 - Edmund Halley wrote “An Estimate of the degrees of the Mortality of Mankind, drawn from curious Tables of the Births and Funerals at the City of Breslow; with an attempt to ascertain the Prices and Annuities upon ” and “Some further Considerations on the Breslow Bills of Mortality” 6 Other Leaders in Statistics (con’t) Adolphe Jacques Quetelet planned and interpreted a census in Belgium in 1829 Gregor Mendel (1822-1884) published a paper on the hybridization of peas relating principles of heredity to mathematics. 7 The Fields of Probability and Statistics Probability Statistics Descriptive statistics Inferential statistics 8 Probability and Statistics Probability: We determine the chances of selecting a certain sample from a known population Statistics: We make estimates or projections about a whole population based on a sample. 9 Example of Probability Suppose that there are 13 women and 7 men taking English 101 section 3. A student is picked at random to give a presentation. The probability that the student is a woman is 13/20 and the probability that the student is a man is 7/20. 10 Example of Statistics While walking across campus we observe 13 women and 7 men. We conclude that about 65% (13/20) of all students at PLU are female and 35% (7/20) are males. 11 Two Branches of Statistics Descriptive statistics Inferential statistics 12 Descriptive Statistics The methods or techniques designed to summarize or to describe the main features of numerical data 13 Inferential Statistics Involves those methods and techniques whereby estimates of a general nature are made on the basis of knowledge about a part or sample of the general population 14 Example of Descriptive Statistics We determine the age of all students in this class and determine that the average age is 20.8 years, the minimum is 18 and the maximum is 42. Hence the range in ages is 42 – 18 = 24 15 Example of Inferential Statistics The Mooring Mast calls 100 PLU students at random and asks them if they are satisfied with the current student government? 48 say yes, 30 say no and 22 are not sure. They conclude that 48% of all students are satisfied, 30% are not and 22% are not sure. 16 Example of Descriptive Statistics The food service interviews every student with a meal plan and determines that 39% are very satisfied with their service. 17 Example of Inferential Statistics The food service picks 50 students with a meal plan at random and interviews them. They determine that 34% of the students in the sample were completely satisfied. They infer that 34% of all the students on a meal plan are completely satisfied. 18 Other Examples of Descriptive Statistics An instructor determines that the average exam score was 82 and that there were 5 A’s, 6 B’s, 10 C’s, 2 D’s and 1 E. After an election, it is determined that the winner received 54% of the vote. To test a new drug, it is given to 200 patients. It is found that the new drug helps 82% of those patients. 19 Other Examples of Inferential Statistics Before giving a new SAT test, the writers give the exam to 2000 students to help standardize the exam. Before an election, a pollster conducts a survey and predicts that a certain candidate will win the election with 54% of the vote. Based on experiment where a new drug helped 80% of the patients in a sample of 200, it is decided the drug may be useful. 20 Some Descriptive Statistics Count: Number of values Mean: The sum of the values divided by the number of values Mode: The value(s) that occurs most frequently Median: The middle value. Half the values are larger, half are smaller. (If the number of values is even, it is the average of the two middle values.) 21 Example 1: 10 point quiz scores: 8, 3, 8, 9, 6 Count: 5 Mean: (8 + 3 + 8 + 9 + 6)/5 = 34/5 = 6.8 Mode: 8 occurs twice, other values only once so the mode is 8 Median: order the values: 3, 6, 8, 8, 9 so median is 8 22 Example 2: 10 point quiz scores: 8, 3, 9, 6 Count: 4 Mean: (8 + 3 + 9 + 6)/4 = 26/4 = 6.5 Mode: No value occurs more than once. No mode Median: order the values 3, 6, 8, 9 Median = (6 + 8)/2 = 7 23 Three ways to determine the “average” Mean Mode Median 24 Why all the different ways of calculating “Average”? Example: The accessed values In a certain neighborhood are as follows: $2,000,000 No. Value No. Value $200.000 1 $2,000,000 2 $200,000 $200,000 3 $150,000 4 $100,000 $150,000 $150,000 Count: 10 $150,000 Mean: $3,250,000/10 = $325,000 $100,000 Mode: $100,000 $100,000 $100,000 Median: $150,000 $100,000 25 Why all the different ways of calculating “Average”? (con’t) Question: In determining the amount of tax collected from the neighborhood, which average is most meaningful to the tax collector? In trying to determine what a home buyer is likely to pay, which is most important? What is the cost of the most common home in the area? 26 Calculation of weighted averages Use house accessed value example Count Value Product 1 $2,000,000 $2,000,000 2 200,000 400,000 3 150,000 450,000 4 100,000 400,000 10 $3,250,000 Average is $3,250,000/10 = $325,000 27 Some Other Useful Statistics Max: Largest value Min: Smallest value Range: Max - Min 28 Ordering and arranging data Sometimes ordering data in ranges is Data values useful 68 66 95 78 50-59 60-69 70-79 89-89 90-100 81 89 79 72 74 85 76 84 Data ordered and arranged in ranges of 10 54 61 63 66 68 69 72 72 74 76 78 81 81 84 85 89 95 69 54 81 72 69 79 79 69 63 61 79 29 Histograms Column charts showing the number of times a value or range of values appears 30 Histogram Data values 81 74 89 85 79 76 72 84 68 66 95 78 69 54 81 72 79 69 63 61 Frequency Table Range Count 50-59 1 60-69 6 70-79 7 80-89 5 90-100 1 Number in range Histogram of test scores 8 7 6 5 4 3 2 1 0 50-59 60-69 70-79 80-89 90-100 Score 31 Histogram with 5 point range Data values 81 74 89 85 79 76 72 84 68 66 95 78 79 69 63 61 Frequencies in ranges of 5 95-100 90-94 85-89 80-84 75-79 70-74 65-69 60-64 55-59 5 4 3 2 1 0 50-54 Count Frequency Table Range Count 50-54 1 55-59 0 60-64 2 65-69 4 70-74 3 75-79 4 80-84 3 85-89 2 90-94 0 95-100 1 69 54 81 72 Range 32 Excel and Histograms Use the frequency function to help build frequency tables Use column charts to create histograms from frequency tables Use Data | Sort or sort tools buttons to sort data 33 Comparing Two Sets of Numbers Set 1: 5 7 3 5 6 6 5 5 5 3 5 4 5 5 5 6 5 5 4 Mean 5.0588 Median 5 Mode 5 Max 7 Min 3 Range 4 Set 2: 3 5 3 7 7 4 6 4 5 3 6 5 5 3 4 5 6 7 6 Mean 5.0588 Median 5 Mode 5 Max 7 Min 3 Range 4 Are these sets essentially the same? 34 Frequencies Lets arrange the values in order Set 1 Set 2 33 3333 4 444 55555555555 55555 666 6666 7 777 35 Frequency Table Frequency Count Value Set 1 Set 2 3 2 4 4 2 3 5 11 5 6 3 4 7 1 3 Set 1 seems bell shaped, centered about 5 Set 2 seems to be dispersed about equally 36 Histogram H i sto g r a m o f 2 d a ta se ts 12 10 Count 8 6 4 2 0 3 Set 1 Set 2 4 5 6 7 V a lu e s 37 Histogram The groups used in histograms may include a single value or several values. Sometimes grouping several values in ranges, may help hide “noise” 38 Standard Deviation A way to measure how close the numbers are to each other We assume we are using all the values in our calculation The standard deviation is the square root of the mean of the square of the deviation of each number from the mean of the list This definition assumes we calculate the standard deviation of the entire population 39 Standard Deviation If the values are x1, x2, x3, ..., xn, and the mean is m then ( x m) 2 ( x m) 2 ( x m) 2 ...( x m) 2 n 1 2 3 n n 2 ( x m ) i i 1 n 40 Variance The variance is the mean of the squared deviations The standard deviation is the square root of the variance 41 Calculating Standard Deviation - Step 1 Example: Calculate the standard deviation of 4, 8, 3, 6, 9 Values 4 8 3 6 9 sum 30 mean 6 42 Calculating Standard Deviation - Step 2 Example: Calculate the standard deviation of 4, 8, 3, 6, 9 Values Deviations 4 -2 8 2 3 -3 6 0 9 3 sum 30 mean 6 43 Calculating Standard Deviation - Step 3 Example: Calculate the standard deviation of 4, 8, 3, 6, 9 Values Deviations Squared Deviations 4 -2 4 8 2 4 3 -3 9 6 0 0 9 3 9 sum 30 sum 26 mean 5.2 variance mean 6 square root 2.28 st. dev. 44 Another example Example: Calculate the standard deviation of 2, 4, 6 Values Deviations Squared Deviations 2 -2 4 4 0 0 6 2 4 sum 12 sum 8 mean 4 mean 2.667 variance square root 1.633 st. dev 45 Standard Deviation of a Sample The initial formulas assumed we calculated the standard deviation of the entire population If we calculate the standard deviation on only a sample, we divide by n-1 instead of n Dividing by n-1 allows are calculation to be “unbiased” 46 Standard Deviation of a Sample 2 2 2 2 ( x m) ( x m) ( x m) ...( x m) n 1 2 3 n 1 n 2 ( xi m) i 1 n 1 47 Standard Deviation of a Sample Example: Calculate the standard deviation of the sample 4, 8, 3, 6, 9 randomly picked from 2000 Values Deviations Squared Deviations 4 -2 4 8 2 4 3 -3 9 6 0 0 Divide by 5–1=4 9 3 9 sum 30 sum 26 mean 6 “mean” 6.5 variance 5 values: square root 2.55 st. dev. Divide by 5 48 Standard Deviation and Excel Excel provides functions to calculate standard deviation and variance Use STDEVP and VARP if the values represent the entire population Use STDEV and VAR if the values are for a sample from the entire population If you have a calculator with a “st. dev.” button, you will have to check to which version it calculates 49 50