Download Part 3. Executing the program

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Law of large numbers wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
CHAPTER 9
Section 9.2
Simulating the sampling distribution for a sample proportion–Example 9.2
Figure 9.2 displays a histogram for the simulated values of the sample proportion ( p̂ ) of
400 random samples based on n = 2400, p = .40. Simulation can be used to get an idea of
the shape, center, and spread of the sampling distribution of a statistic. We will illustrate
how to simulate your own random samples in Minitab. The material will be presented in
4 parts:
1.
2.
3.
4.
Understanding the commands to be used in the program.
Preparing the program.
Executing the program.
Analyzing the simulation results.
You can go directly to Part 2 if you prefer.
Part 1. Understanding the commands to be used in the program.
If the MTB> prompt is not displayed in the session window, place the cursor in the
session window and from the menu, select Editor>Enable Commands. When the
MTB> prompt is present, Minitab will show the commands used to generate the output in
the session window. In other words, all the actions that we get Minitab to do by ‘point
and click’ will elicit associated commands as well in the session window.
In example 9.2 each voter in the sample is asked if they favor candidate X for President.
The voter can respond: YES, to be symbolized by a 1 or NO, to be symbolized by a 0.
We are going to generate 2400 values that can be either 0 or 1. These values represent
responses from 2400 voters. We are assuming that the probability of getting a ‘YES’ is
0.4 because 40% of the population (voters) favor Candidate X. From the menu, select
Calc>Random Data> Bernoulli and then fill in the dialog boxes as shown below:
95
After clicking OK, a column of zeros and ones will appear in the first column of the
worksheet and the following commands will appear in the session window.
MTB > Random 2400 c1;
SUBC>
Bernoulli 0.4.
We need to calculate the proportion of voters (simulated) in the sample that favor
Candidate X. So we need to count the number of 1’s (YES) and divide that number by
2400 (Sample Size). From the menu, select Calc > Column Statistics and then indicate
that we want to calculate the Sum of the values in C1.
The following command will appear in the session window:
MTB > Sum C1
The sum of the values in C1 is the simulated number of voters who favor Candidate X, so
the corresponding proportion is sum(c1)/2400 .
Part 2. Preparing the program
We want the program to do the following:
a. Select a random sample from a population where the probability of answering
‘yes’ is 0.4.
b. Calculate the proportion of people in the sample who answer ‘yes’ and store the
sample proportion in a cell of a column.
c. Repeat actions ‘a’ and ‘b’ many times.
Next, the commands seen in Part 1 will be put together to form the program. In each
execution of the program the following actions will be taken:


A sample of size 2400 will be generated and stored in C1
The proportion of voters in the sample who vote for X will be stored in a cell of
C2
96
The commands in the following program perform these tasks. We will use k1 to label the
cell of C2 where the current sample proportion will be stored. After each run of the
program k1 needs to be incremented by 1 in order to go to the next cell of C2.
Random 2400 c1;
Bernoulli 0.4.
let c2(k1)=sum(c1)/2400
let k1=k1+1
This program needs to be typed using an editor and saved with the file extension mtb.
The easiest option is to use Notepad (this program comes with Windows and can be
found in the windows menu by clicking on Start, All Programs and Accessories). Type
the commands in notepad and save the file; make sure the extension is mtb. We will save
the program under the name distphat.mtb. A very common problem is that the file gets
saved with another extension, such as txt, and then it cannot be run in Minitab. To avoid
this make sure you choose the option ‘all files’ instead of ‘text file’ when saving the file.
Note the location of the saved file since you will need to provide this information to
execute the program.
Part 3. Executing the program
Assume the program has been saved with the filename distphat.mtb on drive C. Since
we need to start with the first sample and k1 is the counter for the samples, at the MTB>
prompt, type MTB> let k1=1. The program will be repeated 400 times (an arbitrary
number chosen in example 9.2). From the menu select File > Other Files > Run an Exec
Indicate the number of times the program needs to be executed (400) and choose Select
File
Use the window that appears to locate the file distphat.mtb. Use the mouse to select the
file and click on Open (another option is to double-click on the name of the file).
(Alternatively) Executing the program by typing a command.
First make sure that the MTB> prompt is displayed in the session window. At the prompt
type Execute ‘C:\distphat.mtb’ 400
Part 4. Analyzing the simulation results
The shape of the distribution of the simulated sample proportions can be observed
through a histogram. The center and spread of the distribution such as the mean and
standard deviation can be found also. Selecting Stat>Basic Statistics >Display Basic
Statistics from the menu will give us a histogram with the superimposed normal curve,
as in Figure 9.2, along with the mean and standard deviation of the sample proportions. In
the Variables dialog box select C2 (sample proportion) and select Graphs:
97
Clicking on Graphs produces more options. Select Histogram of data, with normal curve
and then click OK.
Do not be surprised if your histogram looks a little different from the one in Figure 9.2;
your 400 samples might be different from the 400 samples in the book.
98
Histogram (with Normal Curve) of sample proportion
35
Mean
StDev
N
30
0.4000
0.01027
400
Frequency
25
20
15
10
5
0
0.376
0.384
0.392
0.400
0.408
sample proportion
0.416
0.424
The basic statistics will also be displayed. Your results will most likely be different from
ours.
Descriptive Statistics: sample proportion
Variable
N N* Mean
SE Mean
sample proportio 400 0 0.40002 0.000514
Variable
sample proportio
Median
0.40000
Q3
0.40750
StDev Minimum
0.01027 0.37208
Q1
0.39250
Maximum
0.42542
Note that the mean of the sample proportions is very close to 0.4 (the population
proportion). The standard deviation is 0.01027 (approximately 0.01 - the theoretical value
for the standard deviation of the sample proportion displayed in the book).
Section 9.3
Simulating the sampling distribution for a sample mean
In Example 9.4, 400 sample means, each based on a sample of size 25, are simulated
from a hypothetical normal population (weight loss pounds) with mean 8 and standard
deviation 5. Figure 9.5 displays a histogram and superimposed normal curve of these 400
simulated sample means. We will illustrate how to simulate sample means in Minitab.
You need to write your own program to do the simulation but first you need to get
familiar with the commands that will be needed to do the simulation. Get the MTB>
prompt to appear in the session window by placing the cursor in the session window and
selecting Editor>Enable commands.
Select Calc>Random Data> Normal and indicate that you want to generate 25 values
with a mean of 8 and a standard deviation of 5. Store the results in C1.
99
The commands that appear in the session window are:
MTB > Random 25 c1;
SUBC>
Normal 8 5.
The program for generating sample means is very similar to the program that was written
for simulating sample proportions.
Use Notepad or another editor to type the following program:
Random 25 c1;
Normal 8 5.
let c2(k1)=mean(c1)
let k1=k1+1
Save the program with a name related to the purpose of the program, such as
sampmean.mtb. In the session window of Minitab, start the counter by typing after the
prompt
MTB> Let k1=1
and to execute the program: MTB> Execute ‘c:sampmean .mtb’ 1000
Or, after typing Let k1=1, select from the menu File>Other Files>Run an Exec
and enter 400 and then click Select File.
Enter the File name sampmean in the dialog box and then click Open.
100
The program will be executed 400 times. At the end, you will have 400 sample means
stored in column C2. Name the column sample means. Select Stat > Basic Statistics >
Display Basic Statistics and then click on Graphs to get the histogram with the
superimposed normal curve.
The histogram obtained might look a little different from the one in the book because
most likely the 400 sample means are different.
101
Histogram (with Normal Curve) of sample means
50
Mean
StDev
N
8.087
1.010
400
Frequency
40
30
20
10
0
5
6
7
8
sample means
9
10
Section 9.7
1.
Areas and probabilities for Student’s t-distribution
The area under the curve up to a given value k is P(t  k). To calculate an area under the
curve for the Student’s t-distribution: select Calc > Probability distributions > t, click
the option Cumulative Probability, indicate the number of Degrees of freedom, and the
value of k, i.e. Input constant. Note that in the most recent version of Minitab there is a
Noncentrality parameter dialog box. This should be set to 0, which is the default value.
To find P(t  0.34) enter 24 for the Degrees of freedom and 0.34 in the Input constant
dialog box.
102
The output is:
Cumulative Distribution Function
Student's t distribution with 24 DF
x
0.34
P( X <= x )
0.631593
If we are interested in finding P(t > 0.34), as in Example 9.8, we need to calculate the
area under the curve to the right of the value 0.34.
P(t >0.34) = 1- P(t  0.34) =1- 0.631593 = 0.368407. (See Figure 9.12 in the textbook.)
(Alternatively) Use Graph > Probability Distribution Plot > View Probability and
select t distribution with 24 degrees of freedom.
Select Shaded Area, X value, and for X-value: enter .34
103
To find P(t  0.34) select Left Tail.
Distribution Plot
T, df=24
0.4
Density
0.3
0.2
0.632
0.1
0.0
0 0.34
X
For P(t > .34) select Right Tail.
104
Distribution Plot
T, df=24
0.4
Density
0.3
0.2
0.1
0.368
0.0
2.
0 0.34
X
Finding the t-value corresponding to a given area
To find a t-value (k) corresponding to a specified area, such as finding the value k such
that P(t < k) = 0.975, use Calc > Probability distributions > t from the menu.
In the t-distribution window, select Inverse cumulative probability, indicate the Degrees
of freedom and the area (probability) in the dialog box of Input constant. (The
Noncentrality parameter is 0.)
For example, enter 24 for the Degrees of freedom and 0.975 in the dialog box of Input
constant:
The output is:
Inverse Cumulative Distribution Function
Student's t distribution with 24 DF
P( X <= x )
0.975
x
2.06390
105
So P(t  2.0639) = 0.975 and P(t > 2.0639) = 0.025. Since the t-distribution is
symmetric, P(-2.0639  t  2.0639) = 0.95. (The value 2.0639 is rounded to 2.06 in the
book.)
(Alternatively) Use Graph > Probability Distribution Plot > View Probability and
select t distribution with 24 degrees of freedom and then select Shaded Area. Choose
Probability, Left Tail and enter .975 in the dialog box.
Click OK to get the result.
Distribution Plot
T, df=24
0.4
0.975
Density
0.3
0.2
0.1
0.0
0
X
2.06
106