Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Sample Final Exam Data Sets We observe a number of data sets, resulting in the following summaries: Set #1 0.92154 0.95749 0.95969 0.96660 0.96837 0.98737 0.98897 1.03850 1.04007 1.04705 1.06594 1.06960 1.06964 1.08102 1.09246 1.09496 1.10273 1.14394 1.14707 1.15204 1.15276 Sum Count Sum of squares 1.15544 1.18723 1.1994 1.20593 1.21070 1.21613 1.30351 30.5261 28 33.5317 Set #2 0.90357 2.66148 3.54599 5.28769 6.93558 7.02943 7.04880 Sum Count Sum of Squares 7.24213 7.44643 7.72484 8.19351 8.65482 8.74767 8.82486 9.46781 10.1070 10.8687 11.5657 12.2587 12.7516 13.5501 14.6619 14.9376 15.5333 15.8570 16.8235 17.8379 18.5768 18.6690 18.8775 18.9959 19.2264 27.6086 40.1723 68.1279 496.722 35 11929.8 Set #3 We only report about the differences between the data in Sample 1 and this Sample. The summaries for the differences are Sum −3.0456 Count 28 Sum of Squares 1.21637 Set #4 0.13234 0.15308 0.35685 0.40157 0.40893 0.43890 0.51273 0.54072 0.66466 0.71547 0.73718 0.78282 0.81730 0.83066 Sum Count Sum of Squares 0.92685 0.93011 0.97048 1.07575 1.1858 1.19442 1.21328 1.25213 1.37178 1.40444 1.49122 1.49174 1.8339 1.94618 25.7813 28 29.8625 Set #5 0.80214 0.82697 0.87572 0.89670 0.92259 0.93288 0.93394 0.93650 0.95775 0.97102 0.97303 1.00718 1.01767 1.03088 Sum Count Sum of Squares 1.04796 1.05381 1.05695 1.05709 1.06174 1.08236 1.09798 1.12141 1.13258 1.13815 1.16667 1.21023 1.26489 1.27568 28.8525 28 30.1252 1. Descriptive Statistics Calculate the following statistics for data set #1 and data set #2 1. Sample mean 2. Sample Variance 3. Sample Standard Deviation 4. Median 5. Range Optional (Extra Credit): If time allows, you may calculate these statistics for some other data set as well 2. Confidence Intervals Determine a confidence interval, at a confidence interval of your choice, for the true mean of data set #1 and data set #2 Optional (Extra Credit) • If time allows, you may calculate confidence intervals for other data sets as well • If time allows you may calculate confidence intervals for the variance of one or more data sets. 3. Testing 3.1. Testing for the mean Data set #1 simulates measurements of a batch of resistors with nominal resistance of 1 kOhms. Set up a two-tailed test (at a significance level of your choice) to verify that the resistors are up to specification 1. (Optional - Extra Credit) The data in Data set #2 simulates measurements of the voltage of a batch of nominal 9 V batteries. Set up a two-tailed test (at a significance level of your choice) to verify that the batteries are up to specification 2. (Optional - Very Extra Credit) The resistors are sold as having a standard deviation from the nominal value of no more than 10% of their nominal specification. Set up a one-tailed test for their variance (which should be no more than 0.01, as the square of a standard deviation of 0.1) 3.2. Testing Proportions A coin is flipped 80 times, and this results in 36 “Heads”. Set up a test to determine whether the coin is fair 3.2.1. (Optional - Extra Credit) The experiment is repeated four more times, resulting in, respectively, 42 42, 41 and 48 “Heads”. • Repeating the test for each experiment, what is the result? • If you combine all five experiments (for a total of 400 tosses and 209 “Heads”), what is the outcome of a statistical test now? 3.3. Paired Samples Data set #3 is supposed to measure the resistance of the items from Sample #1 after a stress test. Set up a paired sample test to verify whether the average resistance has changed or not. 3.4. Independent Samples 1. Unequal Variances We compare sample #1 with sample #4, which comes from a different facility. We can no longer assume that the variances are equal. Set up a test to verify that the two batches come, nonetheless, from populations with the same “true” mean. 2. Equal Variances 3. We compare the resistors in sample #1 with a second batch from the same production line, which we can assume produces items with the same variance as that of sample #1, resulting in Sample #5. Set up a test to verify that the two batches have the same “true” mean Remarks about the data sets All data sets were simulated by computer • Set #1 should look like a simple random sample from a normal distribution, with mean 1.06 and standard deviation 0.1 • Set #2 is a reduced sample from a fat-tailed distribution (a Cauchy distribution) that has no true mean nor true variance (its median should be 9, but the data reported here came from excluding negative values obtained in the simulation, including a whopping ¡26;035:8 - talk about really fat tails!) • Proportions were all simulated assuming p = 0.52 • The second set of paired data (used for set #3) was obtained by adding to each element of set #1 a random variable, uniformly distributed between -10 and 2, for an expected difference of - 4 • Set #4 should look like a simple random sample from a normal distribution with mean 0.9 and standard deviation 0.5 (hence, different from set #1) • Set #5 should look like a simple random sample from a normal distribution with mean 1.03 and standard deviation 0.1 (hence, with the same variance as set #1)