Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics CSCI 115 1/21/2002 1 The Fields of Probability and Statistics Probability Statistics Descriptive statistics Inferential statistics 7 Probability and Statistics Probability: We determine the chances of selecting a certain sample from a known population Statistics: We make estimates or projections about a whole population based on a sample. 8 Example of Probability Suppose that there are 13 women and 7 men taking English 101 section 3. A student is picked at random to give a presentation. The probability that the student is a woman is 13/20 and the probability that the student is a man is 7/20. 9 Example of Statistics While walking across campus we observe 13 women and 7 men. We conclude that about 65% (13/20) of all students at PLU are female and 35% (7/20) are males. 10 Two Branches of Statistics Descriptive statistics Inferential statistics 11 Descriptive Statistics The methods or techniques designed to summarize or to describe the main features of numerical data 12 Inferential Statistics Involves those methods and techniques whereby estimates of a general nature are made on the basis of knowledge about a part or sample of the general population 13 Example of Descriptive Statistics We determine the age of all students in this class and determine that the average age is 20.8 years, the minimum is 18 and the maximum is 42 14 Example of Inferential Statistics The Mooring Mast calls 100 PLU students at random and asks them if they are satisfied with the current student government? 48 say yes, 30 say no and 22 are not sure. They conclude that 48% of all students are satisfied, 30% are not and 22% are not sure. 15 Example of Descriptive Statistics The food service interviews every student with a meal plan and determines that 39% are very satisfied with their service. 16 Example of Inferential Statistics The food service picks 50 students with a meal plan at random and interviews them. They determine that 34% of the students in the sample were completely satisfied. They infer that 34% of all the students on a meal plan are completely satisfied. 17 Other Examples of Descriptive Statistics An instructor determines that the average exam score was 82 and that there were 5 A’s, 6 B’s, 10 C’s, 2 D’s and 1 E. After an election, it is determined that the winner received 54% of the vote. To test a new drug, it is given to 200 patients. It is found that the new drug helps 82% of those patients. 18 Other Examples of Inferential Statistics Before giving a new SAT test, the writers give the exam to 2000 students to help standardize the exam. Before an election, a pollster conducts a survey and predicts that a certain candidate will win the election with 54% of the vote. Based on experiment where a new drug helped 80% of the patients in a sample of 200, it is decided the drug may be useful. 19 Some Descriptive Statistics Count: Number of values Mean: The sum of the values divided by the number of values Mode: The value(s) that occurs most frequently Median: The middle value. Half the values are larger, half are smaller. (If the number of values is even, it is the average of the two middle values.) 20 Example 1: 10 point quiz scores: 8, 3, 8, 9, 6 Count: 5 Mean: (8 + 3 + 8 + 9 + 6)/5 = 34/5 = 6.8 Mode: 8 occurs twice, other values only once so the mode is 8 Median: order the values: 3, 6, 8, 8, 9 so median is 8 21 Example 2: 10 point quiz scores: 8, 3, 9, 6 Count: 4 Mean: (8 + 3 + 9 + 6)/4 = 26/4 = 6.5 Mode: No value occurs more than once. No mode Median: order the values 3, 6, 8, 9 Median = (6 + 8)/2 = 7 22 Three ways to determine the average Mean Mode Median 23 Why all the different ways of calculating “Average”? Example: The accessed values In a certain neighborhood are as follows: No. Value No. Value 1 $2,000,000 2 $200,000 3 $150,000 4 $100,000 Count: 10 Mean: $3,250,000/10 = $325,000 Mode: $100,000 Median: $150,000 24 Why all the different ways of calculating “Average”? (con’t) Question: In determining the amount of tax collected from the neighborhood, which average is most meaningful to the tax collector? In trying to determine what a home buyer is likely to pay, which is most important? What is the cost of the most common home in the area? 25 Calculation of weighted averages Use house accessed value example Count Value Product 1 $2,000,000 $2,000,000 2 200,000 400,000 3 150,000 450,000 4 100,000 400,000 10 $3,250,000 Average is $3,250,000/10 = $325,000 26 Some Other Useful Statistics Max: Largest value Min: Smallest value Range: Max - Min 27 Ordering and arranging data Sometimes ordering data in ranges is Data values useful 68 66 95 78 50-59 60-69 70-79 89-89 90-100 81 89 79 72 74 85 76 84 Data ordered and arranged in ranges of 10 54 61 63 66 68 69 72 72 74 76 78 81 81 84 85 89 95 69 54 81 72 69 79 79 69 63 61 79 28 Histograms Column charts showing the number of times a value or range of values appears 29 Histogram Data values 81 74 89 85 79 76 72 84 68 66 95 78 69 54 81 72 79 69 63 61 Frequency Table Range Count 50-59 1 60-69 6 70-79 7 80-89 5 90-100 1 Number in range Histogram of test scores 8 7 6 5 4 3 2 1 0 50-59 60-69 70-79 80-89 90-100 Score 30 Histogram with 5 point range Data values 81 74 89 85 79 76 72 84 68 66 95 78 79 69 63 61 Frequencies in ranges of 5 95-100 90-94 85-89 80-84 75-79 70-74 65-69 60-64 55-59 5 4 3 2 1 0 50-54 Count Frequency Table Range Count 50-54 1 55-59 0 60-64 2 65-69 4 70-74 3 75-79 4 80-84 3 85-89 2 90-94 0 95-100 1 69 54 81 72 Range 31 Excel and Histograms Use the frequency function to help build frequency tables Use column charts to create histograms from frequency tables Use Data | Sort or sort tools buttons to sort data 32 Comparing Two Sets of Numbers Set 1: 5 7 3 5 6 6 5 5 5 3 5 4 5 5 5 6 5 5 4 Mean 5.0588 Median 5 Mode 5 Max 7 Min 3 Range 4 Set 2: 3 5 3 7 7 4 6 4 5 3 6 5 5 3 4 5 6 7 6 Mean 5.0588 Median 5 Mode 5 Max 7 Min 3 Range 4 Are these sets essentially the same? 33 Frequencies Lets arrange the values in order Set 1 Set 2 33 3333 44 444 55555555555 55555 666 6666 7 777 34 Frequency Table Frequency Count Value Set 1 Set 2 3 2 4 4 2 3 5 11 5 6 3 4 7 1 3 Set 1 seems bell shaped, centered about 5 Set 2 seems to be dispersed about equally 35 Histogram H i sto g r a m o f 2 d a ta se ts 12 10 Count 8 6 4 2 0 3 Set 1 Set 2 4 5 6 7 V a lu e s 36 Histogram The groups used in histograms may include a single value or several values. Sometimes grouping several values in ranges, may help hide “noise” 37 Standard Deviation A way to measure how close the numbers are to each other The standard deviation is the square root of the mean of the square of the deviation of each number from the mean of the list This definition assumes we calculate the standard deviation of the entire population 38 Standard Deviation If the values are x1, x2, x3, ..., xn, and the mean is m then ( x m) 2 ( x m) 2 ( x m) 2 ...( x m) 2 n 1 2 3 n n 2 ( x m ) i i 1 n 39 Variance The variance is the mean of the squared deviations The standard deviation is the square root of the variance 40 Calculating Standard Deviation - Step 1 Example: Calculate the standard deviation of 4, 8, 3, 6, 9 Values 4 8 3 6 9 sum 30 mean 6 41 Calculating Standard Deviation - Step 2 Example: Calculate the standard deviation of 4, 8, 3, 6, 9 Values Deviations 4 -2 8 2 3 -3 6 0 9 3 sum 30 mean 6 42 Calculating Standard Deviation - Step 3 Example: Calculate the standard deviation of 4, 8, 3, 6, 9 Values Deviations Squared Deviations 4 -2 4 8 2 4 3 -3 9 6 0 0 9 3 9 sum 30 sum 26 mean 5.2 variance mean 6 2.28 st. dev. 43 Another example Example: Calculate the standard deviation of 2, 4, 6 Values Deviations Squared Deviations 2 -2 4 4 0 0 6 2 4 sum 12 sum 8 mean 4 mean 2.667 variance 1.633 st. dev 44 Standard Deviation of a Sample The initial formulas assumed we calculated the standard deviation of the entire population If we calculate the standard deviation on only a sample, we divide by n-1 instead of n Dividing by n-1 allows are calculation to be “unbiased” 45 Standard Deviation of a Sample 2 2 2 2 ( x m) ( x m) ( x m) ...( x m) n 1 2 3 n 1 n 2 ( xi m) i 1 n 1 46 Standard Deviation of a Sample Example: Calculate the standard deviation of the sample 4, 8, 3, 6, 9 randomly picked from 2000 Values Deviations Squared Deviations 4 -2 4 8 2 4 3 -3 9 6 0 0 9 3 9 sum 30 sum 26 mean 6 “mean” 6.5 variance 2.55 st. dev. 47 Standard Deviation and Excel Use STDEVP and VARP if the values represent the entire population Use STDEV and VAR if the values are for a sample from the entire population 48 49