Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Foundations of statistics wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Law of large numbers wikipedia , lookup
CHAPTER 9 Section 9.2 Simulating the sampling distribution for a sample proportion–Example 9.2 Figure 9.2 displays a histogram for the simulated values of the sample proportion ( p̂ ) of 400 random samples based on n = 2400, p = .40. Simulation can be used to get an idea of the shape, center, and spread of the sampling distribution of a statistic. We will illustrate how to simulate your own random samples in Minitab. The material will be presented in 4 parts: 1. 2. 3. 4. Understanding the commands to be used in the program. Preparing the program. Executing the program. Analyzing the simulation results. You can go directly to Part 2 if you prefer. Part 1. Understanding the commands to be used in the program. If the MTB> prompt is not displayed in the session window, place the cursor in the session window and from the menu, select Editor>Enable Commands. When the MTB> prompt is present, Minitab will show the commands used to generate the output in the session window. In other words, all the actions that we get Minitab to do by ‘point and click’ will elicit associated commands as well in the session window. In example 9.2 each voter in the sample is asked if they favor candidate X for President. The voter can respond: YES, to be symbolized by a 1 or NO, to be symbolized by a 0. We are going to generate 2400 values that can be either 0 or 1. These values represent responses from 2400 voters. We are assuming that the probability of getting a ‘YES’ is 0.4 because 40% of the population (voters) favor Candidate X. From the menu, select Calc>Random Data> Bernoulli and then fill in the dialog boxes as shown below: 95 After clicking OK, a column of zeros and ones will appear in the first column of the worksheet and the following commands will appear in the session window. MTB > Random 2400 c1; SUBC> Bernoulli 0.4. We need to calculate the proportion of voters (simulated) in the sample that favor Candidate X. So we need to count the number of 1’s (YES) and divide that number by 2400 (Sample Size). From the menu, select Calc > Column Statistics and then indicate that we want to calculate the Sum of the values in C1. The following command will appear in the session window: MTB > Sum C1 The sum of the values in C1 is the simulated number of voters who favor Candidate X, so the corresponding proportion is sum(c1)/2400 . Part 2. Preparing the program We want the program to do the following: a. Select a random sample from a population where the probability of answering ‘yes’ is 0.4. b. Calculate the proportion of people in the sample who answer ‘yes’ and store the sample proportion in a cell of a column. c. Repeat actions ‘a’ and ‘b’ many times. Next, the commands seen in Part 1 will be put together to form the program. In each execution of the program the following actions will be taken: A sample of size 2400 will be generated and stored in C1 The proportion of voters in the sample who vote for X will be stored in a cell of C2 96 The commands in the following program perform these tasks. We will use k1 to label the cell of C2 where the current sample proportion will be stored. After each run of the program k1 needs to be incremented by 1 in order to go to the next cell of C2. Random 2400 c1; Bernoulli 0.4. let c2(k1)=sum(c1)/2400 let k1=k1+1 This program needs to be typed using an editor and saved with the file extension mtb. The easiest option is to use Notepad (this program comes with Windows and can be found in the windows menu by clicking on Start, All Programs and Accessories). Type the commands in notepad and save the file; make sure the extension is mtb. We will save the program under the name distphat.mtb. A very common problem is that the file gets saved with another extension, such as txt, and then it cannot be run in Minitab. To avoid this make sure you choose the option ‘all files’ instead of ‘text file’ when saving the file. Note the location of the saved file since you will need to provide this information to execute the program. Part 3. Executing the program Assume the program has been saved with the filename distphat.mtb on drive C. Since we need to start with the first sample and k1 is the counter for the samples, at the MTB> prompt, type MTB> let k1=1. The program will be repeated 400 times (an arbitrary number chosen in example 9.2). From the menu select File > Other Files > Run an Exec Indicate the number of times the program needs to be executed (400) and choose Select File Use the window that appears to locate the file distphat.mtb. Use the mouse to select the file and click on Open (another option is to double-click on the name of the file). (Alternatively) Executing the program by typing a command. First make sure that the MTB> prompt is displayed in the session window. At the prompt type Execute ‘C:\distphat.mtb’ 400 Part 4. Analyzing the simulation results The shape of the distribution of the simulated sample proportions can be observed through a histogram. The center and spread of the distribution such as the mean and standard deviation can be found also. Selecting Stat>Basic Statistics >Display Basic Statistics from the menu will give us a histogram with the superimposed normal curve, as in Figure 9.2, along with the mean and standard deviation of the sample proportions. In the Variables dialog box select C2 (sample proportion) and select Graphs: 97 Clicking on Graphs produces more options. Select Histogram of data, with normal curve and then click OK. Do not be surprised if your histogram looks a little different from the one in Figure 9.2; your 400 samples might be different from the 400 samples in the book. 98 Histogram (with Normal Curve) of sample proportion 35 Mean StDev N 30 0.4000 0.01027 400 Frequency 25 20 15 10 5 0 0.376 0.384 0.392 0.400 0.408 sample proportion 0.416 0.424 The basic statistics will also be displayed. Your results will most likely be different from ours. Descriptive Statistics: sample proportion Variable N N* Mean SE Mean sample proportio 400 0 0.40002 0.000514 Variable sample proportio Median 0.40000 Q3 0.40750 StDev Minimum 0.01027 0.37208 Q1 0.39250 Maximum 0.42542 Note that the mean of the sample proportions is very close to 0.4 (the population proportion). The standard deviation is 0.01027 (approximately 0.01 - the theoretical value for the standard deviation of the sample proportion displayed in the book). Section 9.3 Simulating the sampling distribution for a sample mean In Example 9.4, 400 sample means, each based on a sample of size 25, are simulated from a hypothetical normal population (weight loss pounds) with mean 8 and standard deviation 5. Figure 9.5 displays a histogram and superimposed normal curve of these 400 simulated sample means. We will illustrate how to simulate sample means in Minitab. You need to write your own program to do the simulation but first you need to get familiar with the commands that will be needed to do the simulation. Get the MTB> prompt to appear in the session window by placing the cursor in the session window and selecting Editor>Enable commands. Select Calc>Random Data> Normal and indicate that you want to generate 25 values with a mean of 8 and a standard deviation of 5. Store the results in C1. 99 The commands that appear in the session window are: MTB > Random 25 c1; SUBC> Normal 8 5. The program for generating sample means is very similar to the program that was written for simulating sample proportions. Use Notepad or another editor to type the following program: Random 25 c1; Normal 8 5. let c2(k1)=mean(c1) let k1=k1+1 Save the program with a name related to the purpose of the program, such as sampmean.mtb. In the session window of Minitab, start the counter by typing after the prompt MTB> Let k1=1 and to execute the program: MTB> Execute ‘c:sampmean .mtb’ 1000 Or, after typing Let k1=1, select from the menu File>Other Files>Run an Exec and enter 400 and then click Select File. Enter the File name sampmean in the dialog box and then click Open. 100 The program will be executed 400 times. At the end, you will have 400 sample means stored in column C2. Name the column sample means. Select Stat > Basic Statistics > Display Basic Statistics and then click on Graphs to get the histogram with the superimposed normal curve. The histogram obtained might look a little different from the one in the book because most likely the 400 sample means are different. 101 Histogram (with Normal Curve) of sample means 50 Mean StDev N 8.087 1.010 400 Frequency 40 30 20 10 0 5 6 7 8 sample means 9 10 Section 9.7 1. Areas and probabilities for Student’s t-distribution The area under the curve up to a given value k is P(t k). To calculate an area under the curve for the Student’s t-distribution: select Calc > Probability distributions > t, click the option Cumulative Probability, indicate the number of Degrees of freedom, and the value of k, i.e. Input constant. Note that in the most recent version of Minitab there is a Noncentrality parameter dialog box. This should be set to 0, which is the default value. To find P(t 0.34) enter 24 for the Degrees of freedom and 0.34 in the Input constant dialog box. 102 The output is: Cumulative Distribution Function Student's t distribution with 24 DF x 0.34 P( X <= x ) 0.631593 If we are interested in finding P(t > 0.34), as in Example 9.8, we need to calculate the area under the curve to the right of the value 0.34. P(t >0.34) = 1- P(t 0.34) =1- 0.631593 = 0.368407. (See Figure 9.12 in the textbook.) (Alternatively) Use Graph > Probability Distribution Plot > View Probability and select t distribution with 24 degrees of freedom. Select Shaded Area, X value, and for X-value: enter .34 103 To find P(t 0.34) select Left Tail. Distribution Plot T, df=24 0.4 Density 0.3 0.2 0.632 0.1 0.0 0 0.34 X For P(t > .34) select Right Tail. 104 Distribution Plot T, df=24 0.4 Density 0.3 0.2 0.1 0.368 0.0 2. 0 0.34 X Finding the t-value corresponding to a given area To find a t-value (k) corresponding to a specified area, such as finding the value k such that P(t < k) = 0.975, use Calc > Probability distributions > t from the menu. In the t-distribution window, select Inverse cumulative probability, indicate the Degrees of freedom and the area (probability) in the dialog box of Input constant. (The Noncentrality parameter is 0.) For example, enter 24 for the Degrees of freedom and 0.975 in the dialog box of Input constant: The output is: Inverse Cumulative Distribution Function Student's t distribution with 24 DF P( X <= x ) 0.975 x 2.06390 105 So P(t 2.0639) = 0.975 and P(t > 2.0639) = 0.025. Since the t-distribution is symmetric, P(-2.0639 t 2.0639) = 0.95. (The value 2.0639 is rounded to 2.06 in the book.) (Alternatively) Use Graph > Probability Distribution Plot > View Probability and select t distribution with 24 degrees of freedom and then select Shaded Area. Choose Probability, Left Tail and enter .975 in the dialog box. Click OK to get the result. Distribution Plot T, df=24 0.4 0.975 Density 0.3 0.2 0.1 0.0 0 X 2.06 106