Download Solution 8 - Bison Academy

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
ECE 341 - Homework #8
Hypothesis Testing - t tests
Due Tuesday, November 8th
Let X and Y be normally distributed:
X ∼ N(1000, 100 2 )
Y ∼ N(1100, 200 2 )
Problem 1-3: Assume you know the standard deviation of X (so you can use a standard normal table).
1) Let N=2 (a sample size of 2). Generate two random numbers for X. With this data, determine the
90% confidence interval for the mean of X.
-->rand('normal');
-->X = 100*rand(2,1) + 1000;
-->x = mean(X)
995.69523
-->s = 100 / sqrt(2)
70.7107
From the standard normal table, for 5% tails, you need to be 1.645 deviations out: The true mean is
-->x_min = x - 1.645*s
x_min =
879.37613
-->x_max = x + 1.645*s
x_max =
1112.0143
The 90% confidence interval for the mean:
879 < mean < 1112
2) Let N = 20. With this data, determine the 90% confidence interval for the mean of X.
-->X = 100*rand(20,1) + 1000;
-->x = mean(X)
x =
1005.1142
-->s = 100 / sqrt(20)
s =
22.3607
From the t-table with 19 degrees of freedom and 5% tails, you need to go 1.729 deviations out.
-->x_min = x - 1.645*s
x_min =
968.33085
-->x_max = x + 1.645*s
x_max =
1041.8976
So, you can be 90% certain that
968 < mean < 1041
3) Let N = 200. With this data, determine the 90% confidence interval for the mean of X.
-->X = 100*rand(200,1) + 1000;
The sample mean and standard deviation are:
-->x = mean(X)
1003.1351
-->s = 100 / sqrt(200)
7.0710678
From the t-table with 199 degrees of freedom, you need to go out 1.653 deviations for 5% tails:
-->x_min = x - 1.645*s
x_min =
991.50319
-->x_max = x + 1.645*s
x_max =
1014.767
So, you can be 90% certain that
991 < mean < 1014
Note: As the sample size increases, the uncertainty in the true mean gets smaller.
90% Confidence Interval for the mean with a known variance
Problem 4-6: Assume you don't know the variance of X and Y (so you use a student t-table).
4) Let N=2 (a sample size of 2). Generate two random numbers for X. With this data, determine the
90% confidence interval for the mean of X.
-->rand('normal');
-->X = 100*rand(2,1) + 1000;
-->x = mean(X)
995.69523
-->s = stdev(X)
101.62559
-->x_min = x - 6.314*s
354.03128
-->x_max = x + 6.314*s
1637.3592
If you pick another random number, X, you can be 90% certain that that number will be in the range of
(354.03, 1637).
Like a normal distribution, if you create a new random variable:
W will have a t-distribution (as well) with
mean(W) = mean(X)
variance(W) = 1/2( variance(X) )
standard deviation = standard deviation(X)
So, the mean is a t-distribution with
mean = 995.69
standard deviation =
The 90% confidence interval for the mean is 6.314 deviations from the mean:
541.99 < mean < 1449
For comparison, from problem #1, if you knew the true standard deviation, this was:
879 < mean < 1112
5) Let N = 20. With this data, determine the 90% confidence interval for the mean of X.
-->X = 100*rand(20,1) + 1000;
-->x = mean(X)
x =
1005.1142
-->s = stdev(X)
s =
98.529981
This will tell you the 90% confidence interval for the next sample selected from X. For the mean of 20
samples,
-->s = s / sqrt(20)
22.031974
From the t-table with 19 degrees of freedom and 5% tails, you need to go 1.729 deviations out.
-->x_min = x - 1.729*s
967.02092
-->x_max = x + 1.729*s
1043.2075
So, you can be 90% certain that
967 < mean < 1043
For comparison, from problem #2, if you knew the true standard deviation, this was:
968 < mean < 1041
6) Let N = 200. With this data, determine the 90% confidence interval for the mean of X.
-->X = 100*rand(200,1) + 1000;
The sample mean and standard deviation are:
-->x = mean(X)
1003.1351
-->s = stdev(X)
101.64484
The mean has a standard deviation equal to:
-->s = s / sqrt(200)
7.1873759
From the t-table with 199 degrees of freedom, you need to go out 1.653 deviations for 5% tails:
-->x_min = x - 1.653*s
991.25432
-->x_max = x + 1.653*s
1015.0158
So, you can be 90% certain that
991 < mean < 1015
Note: As the sample size increases, the uncertainty in the true mean gets smaller.
For comparison, from problem #2, if you knew the true standard deviation, this was:
991 < mean < 1014
Also note that as the sample size increases, there isn't much difference between a t-distribution and a
normal distribution.
Sidelight: The t-distribution converges to the normal distribution. If you assume the sample variance is
100 (it will vary a little - but let's assume that for now), the 90% confidence interval vs. sample size for a
known variance (normal) and unknown variance (t-distribution) looks like the following:
90% Confidence Interval for the population mean using a t-distribution (blue line)
and normal distribution (red line). When the sample size is more than 20, there isn't a lot of difference.
You can use a normal approximation with sample sizes 20 or more.
Problem 7-9: Assume you again know the variance of X and Y (so you can use a standard normal table)
7) Let N=2. Generate two numbers for X and Y. What is the confidence level for the null hypothesis
that the mean of X is more than the mean of Y?
-->X = 100*rand(2,1) + 1000;
-->Y = 200*rand(2,1) + 1100;
The expected value for Y minus X is:
-->mean(Y) - mean(X)
112.27533
-->s = sqrt(variance(X)/2 + variance(Y)/2)
47.724711
Determine what the difference is in terms of standard deviations:
-->S = 112.27 / 47.72
2.3525617
The mean of Y is 2.3525 deviations more than X. From a standard normal table, the right hand tail has an
area of 0.01
From the data, you can be 99% certain that Y has a larger mean than X.
Sidelight:Everyone should get a different answer. Doing 100,000 runs and plotting the CDF for S
S = zeros(100000,1);
rand('normal');
for i=1:100000
X = 100*rand(2,1) + 1000;
Y = 200*rand(2,1) + 1100;
x = mean(Y) - mean(X);
s = sqrt(variance(X)/2 + variance(Y)/2);
S(i) = x / s;
end
gives:
2.5% of the time you'll (incorrectly) conclude that X is larger than Y with 99% confidence
20% of the time you'll (correctly) conclude that Y is larger than X with 99% confidence
Your actual results were:
0/26 concluded that X is more than Y
23/26 had no conclusion at the 99% level of confidence
3/26 concluded that Y is more than X with a 99% level of confidence
8) Let N=20. What is the confidence level for the null hypothesis that the mean of X is more than the
mean of Y?
First, generate 20 random points:
-->X = 100*rand(20,1) + 1000;
-->Y = 200*rand(20,1) + 1100;
The expected value of Y minus X is:
-->mean(Y) - mean(X)
110.50309
Find the standard deviation for the sample mean is
-->s = sqrt(variance(X)/20 + variance(Y)/20)
46.365449
Express the difference in the means in term of standard deviations:
-->S = ( mean(Y) - mean(X) )/s
2.3833069
From the standard normal table, 2.383 deviations has a tail of 0.01.
From the data, you can be 99% certain that Y has a larger mean than X.
Repeating this 100,000 times gives the following CDF dor S:
Almost no homework sets should conclude that X (incorrectly) is larger than Y (red line - left tail)
40% of homework sets should conclude that Y is larger than X with a 99% level of confidence
(green line right tail)
Your results from the homework were, at a 99% level of confidence:
0/26 concluded that X is more than Y
16/26 had no conclusion
10/26 concluded that Y was more than X
9) Let N=200. What is the confidence level for the null hypothesis that the mean of X is more than the
mean of Y?
First, create 200 random data poitns:
-->X = 100*rand(200,1) + 1000;
-->Y = 200*rand(200,1) + 1100;
The average of the difference has a mean and standard deviation of:
-->mean(Y) - mean(X)
106.26821
-->s = sqrt(variance(X)/200 + variance(Y)/200)
16.794926
Expressing the difference in standard deviations:
-->S = ( mean(Y) - mean(X) )/s
6.3273998
From the standard normal table, 6.23 deviations has a tail less than 0.0005
From the data, you can be 99.95% certain that Y has a larger mean than X.
Repeating this with 100,000 runs and plotting the CDF for S:
Nearly zero homework sets should indicate that X is (incorrectly) larger than Y (left tail of red line)
Almost all homework sets should indicate that Y is (correctly) larger than X (right tail of green line)
Your homework sets were, at a 99% level of confidence
0/26 concluded that X was greater than Y
10/26 had no conclusion
16/26 concluded that Y was more than X
Related documents