Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
156S05 sample final with solutions 1. The data below represents the ages of 30 randomly selected USC students: 20 18 26 21 24 38 19 24 27 17 23 19 20 19 20 48 22 25 22 25 18 21 18 18 22 42 17 27 21 23 a) Make a stem and leaf plot of this data 1 1 778888999 2 0001112222344 2 55677 3 38 42 48 b) Use your calculator's mean and standard deviation functions to find the mean and standard deviation. Enter the data into a list. Use 1var to find x-bar and s. c) Give the five number summary (min Q1 median Q3 max) for this data (be sure to compute Q3 and Q1 according to the method described in the text). With 30 students the median is the average of the 15th and 16th values. 21.5 The quartiles are at position 8 in each half of the data. 19 & 25 d) Make a boxplot of this data ________ |----|-----|-------|----------------------------------------------| 17 20 23 26 29 32 35 38 41 45 48 e) Which of the values computed above (mean or median) best represent the center of data set? Use the median since the data isskewed f) Which values (standard deviation or the 5 number summary) could be used to best describe the spread of the data? 5 number summary since skewed g) How would you describe the shape of the data set? skewed right 2. Scores on the SAT follow a normal distribution with mean 500 and standard deviation 100. a) Use the 68-95-99.7 rule to give a range of that includes 68% of all scores. 500±100 or (400,600) b) What fraction of the scores are between 480 and 675? z = (480-500)/100 = -.2 , p = .1151 and z = (675-500)/100 = 1.75 , p = .9599 Answer: .9599 - .1151 c) How high must a score be to be higher than 75% of all scores? From the z-table z is about .67 (from t-table we can find z = .674). Now the value is mu + z sigma = 500 + .674*100 3. Consider the data below of 9 quiz scores, before and after a 20 minute review session: student before after 1 11 12 2 13 14 3 10 11 4 11 12 5 12 14 6 9 11 7 8 10 8 11 13 9 11 11 before after r = .88389 N 9 9 MEAN 10.667 12.000 STDEV 1.500 1.414 r-sq = .78125 a) Make a scatterplot of this data Done in class b) Describe the relationship (shape strength direction) linear moderate-to-weak positive ˆ = a + bx which would predict after c) Find the equation of the least squares line y scores from before scores using b=r sy sx , a = y − bx and the values provided above. b = .88389*1.414/1.5 = .8332 a = 12 - .8332*10.667 = 3.11 equation: y-hat = 3.11 + .8332 x d) Plot this line on your graph. Indicate the coordinates of the points you chose to plot the line. You need to show me (x,y) pairs. Pick x values on each side of the graph, say x = 8 and x = 13 and put into the equation to get the y's. ( 8 , 9.78) (13,13.94). Draw the line through the two points. e) What does the line predict as an after score if before is 15? 15.608 Would you be willing to accept this prediction? I have some doubt about the predicted value. Why or why not. We have no data with x values as big as 15, so we do not know what will happen. f) What percent of the variation in after scores is explained by the linear relationship determined by the line? r-squared is .78.125 so 78.125% 4. a) Define the terms: lurking variable sampling distribution of a statistic confounding simple random sample Look these up. Any term in a box is fair game on the exam. b) Explain what the difference is between outliers and influential observations in regression. Outliers are far from the line. Influential observations have extreme x values and when added or removed change the position of the regression line. They pull the regression line towards them and may not be far from the line. 5. It is suspected that first time freshmen and transfer students will react differently to a new on-line format for Math 156. Outline a randomized block experiment that could be performed on 40 freshmen and 40 transfer test subjects, to determine if the new course is more effective at delivering the material than the old one. Separate the students into freshmen and transfer student groups. Randomly assign each group to on-line and traditional classes. Compare performance at the end of term for each group. Explain what could go wrong if the researchers simply carried out a randomized comparative experiment without blocking. If one group has better results with the on-line class and the other with traditional teaching, combining the groups could give the apparent result that the two methods produce similar results. 6. Consider the probabilities listed below for the eye color of a randomly chosen USC student: blue .19 brown .58 green ?? other .12 a) What must be the green probability? 1 - (.19 + .58 + .12) b) What is the chance a randomly chosen student doesn’t have blue eyes? 1 - .19 c) Consider the experiment of randomly selecting two USC students. What is the probability that both have blue eyes? (.19)(.19) Neither have blue eyes? (1-.19)(1 - .19) 7. Suppose 60% of USC students work more than 20 hours a week. a) If you randomly sample 8 students and count the number that work more than 20 hours a week, what is the probability that this number is 5? Use P(X = k) = nnCk k p k (1 − p)n − k with n = 8 k = 5 p = .6 to get .27869 What is the probability that this number is 5 or more? Here you need to sum .27869 with the corresponding three probabilities coming from k = 6 , 7 and 8 in the formula above. b) Compute the mean and standard deviation of the number in your sample of 8 students that work more than 20 hours a week. mean = np = 8*.6 standard deviation = sqrt( np(1-p) ) = sqrt( 8 * .6 * .4 ) c) What is approximate probability that in a sample of 60 students, more than 40 work more than 20 hours a week? Use z = x − np with x = 40 n = 60 p = .6 to get z = 1.05 and prob = 1 - .8531 np(1 − p) 8. Weights of bags of Fritos coming out of the factory follow a normal distribution with mean 1.8 ounces and standard deviation .15 ounces. a) What is the probability a randomly chosen bag weighs over 2 ounces? z = (2 - 1.8)/.15 = 1.33 probability = 1 - .9082 = .0918 b) Compute the mean and standard deviation of the average weight of three randomly chosen bags. mean = 1.8 standard deviation = .15/sqrt3 = .0866 c) What is the probability that the average weight of three randomly chosen bags is above 2 ounces? z = (2 - 1.8)/ .0866 = 2.31 prob = 1 - .9896 = .0104 9. Suppose a 95% confidence interval for the mean is quoted to be (43.5, 183.9). Are sampled values likely to be in this interval? Why or why not? Probably not. The interval explains where the population mean is, not individual values. Large samples give small intervals which would contain a small percentage of data values. 10. a) Compute a 95% confidence interval if your sample contains 345 items, the sample mean is 8.3 and the population standard deviation is 6.34. Note: we are given the population standard deviation so it is a z-interval. 8.3±1.96 * 6.43/sqrt345 b) What assumptions must be satisfied for the interval to be valid? Since this is a large sample, need only SRS. For small samples we would need a normally distributed population. 11. Suppose you wish a 95% confidence interval to estimate the mean to within .175 . If the population standard deviation is 1.89, how large a sample must be taken? n = (1.96*1.89/.175)^2 = 448.08 so n = 449 12. Suppose a hypothesis test results in a p-value of .002. What can you say about the null and alternative hypotheses? Reject Ho. Accept Ha. There is very strong evidence against Ho. Random variation alone almost never could produce the observed results. 13. Perform a hypothesis test of H 0 : µ = 2.5 vs. H a : µ ≠ 2.5 if x = 2.96 , n = 12, and σ = 1.2 . Quote a p-value in your conclusions. Note: this is a two-sided test. The p-value will be double the tail probability. Since the population standard deviation is given we use z as the test statistic. If the sample standard deviation was given we would compute t . z = 1.327 tail prob = 1 - .9082 = .0918 p-value = .1836. Fail to reject Ho. No evidence in favor of Ha. Random variation can easily produce data of the type observed. b) What assumptions need to be satisfied for your results to be valid? With small samples such as this one we need to know the population has a normal distribution. As usual we also need to know the data was collected by a SRS. 14. Construct a 95% confidence interval for the difference of the two population means if the first sample has n=25, mean 1.5 and standard deviation .25 and the second sample has n=22, mean 1.85 and standard deviation .35. Using x1 − x2 ± t * s12 s12 + we get (1.5 - 1.85) ± t* sqrt( .25^2/25 + .35^2/22). n1 n2 The t* is found using df = 21 to be 2.080. 15. Recall the data from problem 3: student before after 1 11 12 2 13 14 3 10 11 4 11 12 5 12 14 6 9 11 7 8 10 8 11 13 9 11 11 Perform a hypothesis test to determine if the review session was effective. You need to first determine if this test should be a matched pairs procedure or an independent sample procedure. a) Explain your choice. Since the data comes from sampling each student twice the samples are not independent. It is a matched pairs design. We first find the differences: after - before. Positive values indicate the review worked. student before after after-before 1 11 12 1 2 13 14 1 3 10 11 1 4 11 12 1 5 12 14 2 6 9 11 2 7 8 10 2 8 11 13 2 9 11 11 0 Ho: mu = 0 . Ha: mu > 0 xd Now compute t = = 5.65 sd / n b) State your results using a p-value. df = 8. p-value less than .0005 from tables. Reject Ho. Accept Ha. Very strong evidence against Ho. Random variation is almost surely not the cause of the observed data. c) What assumptions need to be satisfied for your results to be valid? SRS of students, Normally distributed differences 16. A recent poll resulted in 184 of 350 students in favor of a student fee to support the USC math learning center. a) Produce a 95% confidence interval for the proportion of USC students as a whole that are in favor of the fee. p-hat = 184/350 = .5257 interval: .5257 ± 1.96* sqrt( .5257*.4743 / 350 ) b) Perform the hypothesis test of H 0 : p = .5 vs. H a : p > .5 . Report a p-value in your conclusions. p̂ − p0 The test statistic is z = = .9621 Use .5 for p-zero and n = 350. p0 (1 − p0 ) n p-value = 1 - .8315 = .1685. Fail to reject Ho. No evidence in favor of Ha. Random variation can easily produce data of the type observed. c) What information would you need to know to determine if the interval and hypothesis test are valid? SRS , population at least 10 times sample size 17. What sample size is required so that a 95% confidence interval for the population proportion has a margin of error of .04: a) If you think the proportion is about .8 ? (1.96/.04)^2 .4(1-.4) = 576.24 so n = 577 b) If you have no idea about the proportion? (using p=.5 gives the largest possible n) (1.96/.04)^2 (.5)(.5) = 600.25 so n = 601 x−µ z-score z = σ Binomial µ = np and σ = t-procedures Two Sample Proportion σ z - procedure x ± z * n x ±t* s n np(1 − p) z = t= x−µ s/ n s12 s12 x1 − x2 ± t * + n1 n2 p̂ ± z * p̂(1 − p̂) n z= t= x−µ z= σ/ n ⎛ z *σ ⎞ n=⎜ ⎝ m ⎟⎠ x − np np(1 − p) xd ± t * Matched pairs sd xd t= n sd / n x1 − x s12 s12 + n1 n2 p̂ − p0 p0 (1 − p0 ) n 2 2 ⎛ z *⎞ n = ⎜ ⎟ p * (1 − p*) ⎝ m⎠