Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Central Tendency and Dispersion Ungrouped Data -- population n X i i 1 N n 2 Spring 02 (X i 1 i ) 2 N 1 Central Tendency and Dispersion Ungrouped Data -- sample n X i X i 1 N n s 2 Spring 02 (X i 1 i X) 2 N 1 2 Central Tendency and Dispersion Other measures of central tendency Median Mode Relating central tendency and dispersion Coefficient of variation V Spring 02 3 Problem Salvatore 2.1 – quiz scores – ungrouped data 7 5 6 2 8 7 6 7 48 Spring 02 3 9 10 4 5 5 4 6 46 7 4 8 2 3 5 6 7 42 9 8 2 4 7 9 4 6 7 8 3 6 7 9 10 5 49 55 240 4 Problem Salvatore 2.1 – quiz scores –grouped data 2 2 2 3 3 3 4 4 Spring 02 4 4 4 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 7 7 7 8 8 8 8 9 9 9 9 10 10 5 Problem Salvatore 2.1 – quiz scores –grouped data X 2 3 4 5 6 7 8 9 10 Spring 02 f 3 3 5 5 6 8 4 4 2 f*X 6 9 20 25 36 56 32 36 20 240 6 Central Tendency and Dispersion Grouped Data -- population n fX i i i 1 N n 2 Spring 02 f (X i 1 i i ) 2 N 7 Central Tendency and Dispersion Grouped Data -- sample n X fX i 1 i i N n s 2 Spring 02 f (X i 1 i i X) 2 N 1 8 Problem Salvatore 2.1 – quiz scores –grouped data X 2 3 4 5 6 7 8 9 10 Spring 02 f 3 3 5 5 6 8 4 4 2 X-Xmean (X-Xmean)^2 f*(X-Xmean)^2 -4 16 48 -3 9 27 -2 4 20 -1 1 5 0 0 0 1 1 8 2 4 16 3 9 36 4 16 32 Sum Variance Stdev 192 4.8 2.19 9 Probability If event A can occur in nA ways out of a total of N possible and equally likely outcomes, the probability that event A will occur is given by: P(A) = nA/N What is the probability of flipping a fair coin and getting a head? a tail? What is the probability of rolling a four with a fair die? What is the probability that a women who is pregnant will have a boy? Spring 02 10 Probability of Multiple Events Probability of either of two events happening Mutually exclusive (one event precludes the occurrence of another) What is the probability that when rolling two dice, I roll a seven or eleven? P(A or B) = P(A) + P(B) Not mutually exclusive (one event does not preclude the occurrence of the other) What is the probability that a flipped coin is heads or a rolled die is a 4? P(A or B) = P(A) + P(B) – P(A and B) Spring 02 11 Probability of Multiple Events Probability of two independent events happening at the same time. Independent events (event A is not at all connected to event B) What is the probability that if I flip a coin twice, I will get heads both times? P(A and B) = P(A)*P(B) Dependent events (event A is connected in some way the event of the other) What is the probability that drawing one card from a deck of 52 playing cards, I draw the ace of hearts? P(P and B) = P(A)*P(B/A) Spring 02 12 Binominal Probability Discrete vs. continuous distributions The probability of X number of occurrences or successes of an event, P(X), in n trials of the same experiment when 1. 2. 3. Spring 02 There are only two possible and mutually exclusive outcomes. The n trials are independent The probability of success remains constant in each trial. 13 Binomial Probability The probability of X successes is given by: n! x n X P( X ) p (1 p) X !(n X )! Spring 02 14 Binomial Probability What is the probability that in a family of four children, there are four boys? P(Boy and Boy and Boy and Boy) = ½* ½ * ½ * ½ = 1/16 = .0625 According to the binomial formula: 4! 4 0 4 P( X 4) (.5) (.5) .5 .0625 4!*0! Spring 02 15 Normal Distribution Continuous random variable is one that can assume an infinite number of values within any given interval. The probability that X falls within any interval is given by the area under the probability distribution. The normal distribution (bell shaped and symmetric) is the most commonly used. Spring 02 16 Standard Normal Distribution Standard normal distribution is a normal distribution with a mean of zero and a standard deviation of one. P(-1<z<+1) = 68% P(-2<z<+2) = 95% P(-3<z<+3) = 99% http://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html http://www.math.csusb.edu/faculty/stanton/m262/normal_distribution/normal _distribution.html Spring 02 17 Problems If grades are normally distributed with a mean of 75 an a variance of 25, what is the probability that a student’s grade will fall between 80 and 90? If the demand for ice cream cones in January is normally distributed with a mean of 100 cones/week with a standard deviation of 15, what is the probability that demand is less than 90? If the owner only wishes to turn customers away less than 5% of the time, how many cones should she be prepared to make? Spring 02 18 Statistical Inference Refers to estimation and hypothesis testing. Estimation is the process of inferring and estimating a population parameter (, ) from the corresponding statistic drawn from a sample Spring 02 19 Sampling Distribution of the Mean X X n or n N n N 1 http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/ Central Limit Theorem Spring 02 As n approaches infinity, the sampling distribution of the sample mean approaches the normal distribution regardless of the distribution of the original population. 20 Interval Estimates/Confidence Intervals If n > 30 and n > .05N s X z n N n N 1 If n > 30 and n < .05N s X z n Spring 02 21 Interval Estimates/Confidence Intervals If n < 30 and population is normally distributed s X t n If n < 30 and population is not normally distributed Spring 02 the probability of observations falling within k standard deviations of the mean is at least 1-1/k2 given that k > or equal to 1. 22 Problems A random sample of 49 with a mean of 80 and a standard deviation of 42 is taken from a population of 1000. Find the interval estimate for the population mean such that we are 80 percent confident that it includes the population mean. A random sample of 64 with a mean of 50 and a standard deviation of 20 is taken from a population of 800. Find the 90 percent confidence interval. Spring 02 23 Problems A random sample of 9 lighting components with a mean of 9 months and a standard deviation of 4 months is taken from a population which is known to be normally distributed. What is the 90% CI of the population mean? What is the 95% CI of the population mean? What is the 99% CI of the population mean? What if n=25, what is the 90% CI of the population mean? Spring 02 24 Testing Hypothesis Formal steps Set null and alternate hypotheses Ho: = 0 Ha: 0 Set level of significance Take a random sample; compute the sample mean; test the null hypothesis. Spring 02 25 Errors Type I Error: rejecting a true hypothesis Probability of a Type I Error is Level of significance is 1- Type II Error: accepting a false hypothesis Probability of a Type II Error is can only be reduced at the expense of Spring 02 26 Errors Accept Ho Reject Ho Spring 02 Ho is True Correct Type I Error Ho is False Type II Error Correct 27 Problems USP specifies that a certain drug be effective for at least 37 hours. The standard deviation is known to be 11 hours. A shipment of this drug will be accepted or rejected on the basis of a random sample of 100. Spring 02 What decision rule should be used if the maximum probability of erroneously rejecting the shipment is to be 10% (i.e., = .10)? 28 Problems Over the past 10 years, the Snow Mountain Ski Resort has averaged 120 skiers/day during the winter season (130 days) with a standard deviation of 10. In a random sample of 50 days during the most recent ski season, the mean number of skiers was 118/day. Assuming = 0.05, would you conclude that the average number of skiers per day has changed? If the decision rule were as shown below, what would be the level of significance: Spring 02 117 < X < 123 29 Differences between Means Many times one is faced with determining whether the means of two populations are the same. Take a random sample of each population. You will accept the hypothesis that the means are equal only if the difference can be attributed to chance. Spring 02 30 Differences between Means If the two populations are normally distributed (or if n1 and n2 30), then the sampling distribution of the difference between means is also normal with the standard error: X Spring 02 1X 2 2 1 n1 2 2 n2 31 Difference between Means One can test for a difference between means as follows Null and Alternate Hypothesis Ho : 1 = 2 Ha: 1 2 Test statistic is z Spring 02 ( X 1 X 2 ) ( 1 2 ) X 1X 2 (X 1 X 2) X 1X 2 32 Problems The Dairy Fresh Milk Company felt that two of its markets exhibited equivalent sales patterns Area 1 2 Mean Cons. 1500 1465 s 140 120 Sample Size 100 150 Justify the proposition at the 2% level of significance. Spring 02 33 Problems A random sample of 100 of the entering firstyear students at a particular college in 2001 has a mean SAT score of 950 and s=50. In 2000, a random sample of 100 had a mean SAT score of 975 and s=58. Spring 02 Are the entering first-year students in 2001 academically better than those in 2000? Estimate for = .05. 34 Chi-Squared Test Goodness of Fit Tests If observed frequency differs significantly from expected frequency when more than two outcomes are possible If sampled distribution is binomial, normal, or other If two variables are independent Sum of squares of N independently distributed normal random variable N(0,1) is distributed as Chi-Squared with n degrees of freedom. Spring 02 35 Chi-Squared Test ( f0 fe ) fe 2 2 where f0 is the observed frequency fe is the expected frequency df = c-m-1 c= number of categories m=number of population parameters Spring 02 36 Test of Proportions There was a survey undertaken in 1994, 1997, and 2001 to determine the number of women with children in school working: 1994 1997 2001 Working 410 412 409 Not Working 252 176 151 On the basis of this study, should one reject the hypothesis that the proportion of women who worked has remained constant during the study? =0.05 Spring 02 37 Tests of Goodness of Fit Suppose the Department of Defense believes that the probability distribution of the number of submarine parts of a certain type that will fail during a mission is as shown below and the data for 500 missions is observed as follows: Number of Failures 0 1 2 3 4 or more Spring 02 Theoretical Probability Observed Frequency .368 .368 .184 .061 .019 190 180 90 30 10 What is the probability that the data follows the expected distribution? =0.05 38 Analysis of Variance Used to test the hypothesis that the means of more than two populations are equal or different when the populations are normally distributed with equal variance. Spring 02 39 Analysis of Variance Estimate the population variance from the variance beween the same means (MSA) Estimate the population variance from the variance within the samples (MSE) Compute the F ratio: F MSA MSE If F>Fcrit, reject the null hypothesis If F<Fcrit, accept the null hypothesis Spring 02 40 Analysis of Variance Source of Variation Between groups Within groups Total Sum of Squares Df SSA ( X j X ) 2 c-1 SSE SST (X (X ij X j )2 (r-1)c X )2 rc-1 ij MS MSA SSA c 1 F MSA MSE SSE MSE (r 1)c SSA SSE Spring 02 41 ANOVA Problem A thread manufacturer wants to determine whether the mean strength of thread produced by 3 different types of machines are different when raw material A is used on each machine. Four pieces of thread are produced on each machine with the following results I II III 50 51 51 52 41 40 39 40 49 47 45 47 Test whether the mean strength of thread is equal at =.05 Spring 02 42