Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ECE 341 - Homework #8 Hypothesis Testing - t tests Due Tuesday, November 8th Let X and Y be normally distributed: X ∼ N(1000, 100 2 ) Y ∼ N(1100, 200 2 ) Problem 1-3: Assume you know the standard deviation of X (so you can use a standard normal table). 1) Let N=2 (a sample size of 2). Generate two random numbers for X. With this data, determine the 90% confidence interval for the mean of X. -->rand('normal'); -->X = 100*rand(2,1) + 1000; -->x = mean(X) 995.69523 -->s = 100 / sqrt(2) 70.7107 From the standard normal table, for 5% tails, you need to be 1.645 deviations out: The true mean is -->x_min = x - 1.645*s x_min = 879.37613 -->x_max = x + 1.645*s x_max = 1112.0143 The 90% confidence interval for the mean: 879 < mean < 1112 2) Let N = 20. With this data, determine the 90% confidence interval for the mean of X. -->X = 100*rand(20,1) + 1000; -->x = mean(X) x = 1005.1142 -->s = 100 / sqrt(20) s = 22.3607 From the t-table with 19 degrees of freedom and 5% tails, you need to go 1.729 deviations out. -->x_min = x - 1.645*s x_min = 968.33085 -->x_max = x + 1.645*s x_max = 1041.8976 So, you can be 90% certain that 968 < mean < 1041 3) Let N = 200. With this data, determine the 90% confidence interval for the mean of X. -->X = 100*rand(200,1) + 1000; The sample mean and standard deviation are: -->x = mean(X) 1003.1351 -->s = 100 / sqrt(200) 7.0710678 From the t-table with 199 degrees of freedom, you need to go out 1.653 deviations for 5% tails: -->x_min = x - 1.645*s x_min = 991.50319 -->x_max = x + 1.645*s x_max = 1014.767 So, you can be 90% certain that 991 < mean < 1014 Note: As the sample size increases, the uncertainty in the true mean gets smaller. 90% Confidence Interval for the mean with a known variance Problem 4-6: Assume you don't know the variance of X and Y (so you use a student t-table). 4) Let N=2 (a sample size of 2). Generate two random numbers for X. With this data, determine the 90% confidence interval for the mean of X. -->rand('normal'); -->X = 100*rand(2,1) + 1000; -->x = mean(X) 995.69523 -->s = stdev(X) 101.62559 -->x_min = x - 6.314*s 354.03128 -->x_max = x + 6.314*s 1637.3592 If you pick another random number, X, you can be 90% certain that that number will be in the range of (354.03, 1637). Like a normal distribution, if you create a new random variable: W will have a t-distribution (as well) with mean(W) = mean(X) variance(W) = 1/2( variance(X) ) standard deviation = standard deviation(X) So, the mean is a t-distribution with mean = 995.69 standard deviation = The 90% confidence interval for the mean is 6.314 deviations from the mean: 541.99 < mean < 1449 For comparison, from problem #1, if you knew the true standard deviation, this was: 879 < mean < 1112 5) Let N = 20. With this data, determine the 90% confidence interval for the mean of X. -->X = 100*rand(20,1) + 1000; -->x = mean(X) x = 1005.1142 -->s = stdev(X) s = 98.529981 This will tell you the 90% confidence interval for the next sample selected from X. For the mean of 20 samples, -->s = s / sqrt(20) 22.031974 From the t-table with 19 degrees of freedom and 5% tails, you need to go 1.729 deviations out. -->x_min = x - 1.729*s 967.02092 -->x_max = x + 1.729*s 1043.2075 So, you can be 90% certain that 967 < mean < 1043 For comparison, from problem #2, if you knew the true standard deviation, this was: 968 < mean < 1041 6) Let N = 200. With this data, determine the 90% confidence interval for the mean of X. -->X = 100*rand(200,1) + 1000; The sample mean and standard deviation are: -->x = mean(X) 1003.1351 -->s = stdev(X) 101.64484 The mean has a standard deviation equal to: -->s = s / sqrt(200) 7.1873759 From the t-table with 199 degrees of freedom, you need to go out 1.653 deviations for 5% tails: -->x_min = x - 1.653*s 991.25432 -->x_max = x + 1.653*s 1015.0158 So, you can be 90% certain that 991 < mean < 1015 Note: As the sample size increases, the uncertainty in the true mean gets smaller. For comparison, from problem #2, if you knew the true standard deviation, this was: 991 < mean < 1014 Also note that as the sample size increases, there isn't much difference between a t-distribution and a normal distribution. Sidelight: The t-distribution converges to the normal distribution. If you assume the sample variance is 100 (it will vary a little - but let's assume that for now), the 90% confidence interval vs. sample size for a known variance (normal) and unknown variance (t-distribution) looks like the following: 90% Confidence Interval for the population mean using a t-distribution (blue line) and normal distribution (red line). When the sample size is more than 20, there isn't a lot of difference. You can use a normal approximation with sample sizes 20 or more. Problem 7-9: Assume you again know the variance of X and Y (so you can use a standard normal table) 7) Let N=2. Generate two numbers for X and Y. What is the confidence level for the null hypothesis that the mean of X is more than the mean of Y? -->X = 100*rand(2,1) + 1000; -->Y = 200*rand(2,1) + 1100; The expected value for Y minus X is: -->mean(Y) - mean(X) 112.27533 -->s = sqrt(variance(X)/2 + variance(Y)/2) 47.724711 Determine what the difference is in terms of standard deviations: -->S = 112.27 / 47.72 2.3525617 The mean of Y is 2.3525 deviations more than X. From a standard normal table, the right hand tail has an area of 0.01 From the data, you can be 99% certain that Y has a larger mean than X. Sidelight:Everyone should get a different answer. Doing 100,000 runs and plotting the CDF for S S = zeros(100000,1); rand('normal'); for i=1:100000 X = 100*rand(2,1) + 1000; Y = 200*rand(2,1) + 1100; x = mean(Y) - mean(X); s = sqrt(variance(X)/2 + variance(Y)/2); S(i) = x / s; end gives: 2.5% of the time you'll (incorrectly) conclude that X is larger than Y with 99% confidence 20% of the time you'll (correctly) conclude that Y is larger than X with 99% confidence Your actual results were: 0/26 concluded that X is more than Y 23/26 had no conclusion at the 99% level of confidence 3/26 concluded that Y is more than X with a 99% level of confidence 8) Let N=20. What is the confidence level for the null hypothesis that the mean of X is more than the mean of Y? First, generate 20 random points: -->X = 100*rand(20,1) + 1000; -->Y = 200*rand(20,1) + 1100; The expected value of Y minus X is: -->mean(Y) - mean(X) 110.50309 Find the standard deviation for the sample mean is -->s = sqrt(variance(X)/20 + variance(Y)/20) 46.365449 Express the difference in the means in term of standard deviations: -->S = ( mean(Y) - mean(X) )/s 2.3833069 From the standard normal table, 2.383 deviations has a tail of 0.01. From the data, you can be 99% certain that Y has a larger mean than X. Repeating this 100,000 times gives the following CDF dor S: Almost no homework sets should conclude that X (incorrectly) is larger than Y (red line - left tail) 40% of homework sets should conclude that Y is larger than X with a 99% level of confidence (green line right tail) Your results from the homework were, at a 99% level of confidence: 0/26 concluded that X is more than Y 16/26 had no conclusion 10/26 concluded that Y was more than X 9) Let N=200. What is the confidence level for the null hypothesis that the mean of X is more than the mean of Y? First, create 200 random data poitns: -->X = 100*rand(200,1) + 1000; -->Y = 200*rand(200,1) + 1100; The average of the difference has a mean and standard deviation of: -->mean(Y) - mean(X) 106.26821 -->s = sqrt(variance(X)/200 + variance(Y)/200) 16.794926 Expressing the difference in standard deviations: -->S = ( mean(Y) - mean(X) )/s 6.3273998 From the standard normal table, 6.23 deviations has a tail less than 0.0005 From the data, you can be 99.95% certain that Y has a larger mean than X. Repeating this with 100,000 runs and plotting the CDF for S: Nearly zero homework sets should indicate that X is (incorrectly) larger than Y (left tail of red line) Almost all homework sets should indicate that Y is (correctly) larger than X (right tail of green line) Your homework sets were, at a 99% level of confidence 0/26 concluded that X was greater than Y 10/26 had no conclusion 16/26 concluded that Y was more than X