Download The Computer: A Teaching Tool / A Learning Tool

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Beyond the Formula II
MCC June 18-19, 1998
Robert Johnson
Page 1 of 2
[AllSesn.doc]
The Computer: A Teaching Tool / A Learning Tool
The computer can be a very powerful and effective teaching/learning tool. A carefully planned
computer activity or demonstration, which includes student interaction, can and will bring life to
your statistics lessons. In order to be effective, a computer activity must start from familiar
territory, and build on information already known and understood. The computer classroom
activities being demonstrated (as part of this conference) start in class; each student receives
a printed copy of the activity, is expected to write notes on it, and should be able to complete
most of the computer activity during class. Anything unfinished, at the end of class, is to be
finished outside of class. When the activity is to result in a written report (to be turned in), the
printed copy they receive in class is their rough draft. After completing the activity, they
retrieve an electronic copy from the network, using a word processor, type in their results
and/or answers, and print the final copy that is turned in.
The recommended pattern of development used for a computer activity lesson is:
1. Individual (small amount of data),
2. Work group or Class (medium amount of data),
3. Computer simulation (large amount of data),
4. Written report.
Note: The initial parts of the activity should be organized in the same manner as the
computer part so the student will not become overwhelmed by the sudden changes
when the large amounts of data occur in the simulations.
Generally, computer activity lessons should do more than just teach a particular topic. In
addition to teaching the statistics lesson included in the activity, the activity should: 1) set up
later lessons, 2) stimulate the student’s inquisitive nature, 3) teach ‘new’ computer
commands, 4) teach computer ‘short-cuts’ and anything else you can pack into it.
I do not teach computer programming!! The computer activities and computer assignments
are written so as to demonstrate the features of Minitab. Once they have mastered basic
techniques, I expect them (and tell them so) to “adventure” on their own. That is how
everybody I know learns to use a computer software. I strongly suggest “try out one new
command every time a software is started,” whether it be Minitab, Word, or whatever. Have a
pad of small Post-Its handy; make a note and stick it along side the monitor or inside the back
page of your notebook. After you use a command a few times, you will have learned it,
painlessly.
The three computer activities that I am using for demonstrations are on the networks m: drive.
You may copy the files onto your conference disk, if you so desire.
Beyond the Formula II
MCC June 18-19, 1998
Robert Johnson
Session 1
Page 2 of 2
[AllSesn.doc]
“What 30 Data Can Tell About Thousands”
“How can you tell so much about a data set of thousands, when you only have so few data in your
sample?”
A frequent question asked by beginning statistics students or by people who read/hear
statistical information in the news, has to do with “how can so much be learned from so few
pieces of data?” This activity is one that I have used early in the semester to answer that
question, and to set the stage for several things to be studied throughout the course.
A large data set, with some distinctive characteristics, is randomly generated using the
computer. This large data set can be generated using any theoretical distribution or
combinations of distributions. Use of the computer allows us to see many samples and see how
they each relate to the sampled data set.
This lesson opens the way for future lessons about interpreting the variability in histograms,
numerical statistics and the idea of sampling distributions.
Taken from: Exercises 2.188-190 on page 127 of Elementary Statistics, 7th.
Session 4
“The Chances of Two Boys”
“Using simulation to show that relative frequencies of real life events agree with the probabilities
obtained from a sample space. “
Are your students having trouble with those “simple” sample space probability problems? You
say they’re having trouble with the Addition and Multiplication rules and knowing when to use
them?
Did you ever wonder how the relative frequency of an event in real life compared to the “sample
space” probability of that same event?
Using the computer we can: quickly and easily simulate real life events, calculate relative
frequencies of specific events, look at probability from an everyday point of view to determine
the probability of specific events, or test the truth of the sample space results.
Session 5
“How Important Is the Assumption?”
The meaning and importance of the assumption, “The sampled population is normally distributed” is
easily demonstrated by simulation.
What if the sampled population is not normal? Then what? How does the lose of normality
effect the results of a statistical procedure? Is the effect the same on all statistical tests?
The computer allows us to use simulations to look at concepts that are otherwise impossible to
deal with in an introductory statistics course. These simulations can be very informative for both
the students and the instructors.
Taken from: Exercises 9.42 on page 453 of Elementary Statistics, 7th.
Beyond the Formula II
MCC June 18-19, 1998
Robert Johnson
Session 1
Page 1 of 4
[Sessn1.doc]
“What 30 Data Can Tell About Thousands”
[Elementary Statistics, 7th ed., Johnson, Duxbury Press, pg. 127, Exercise 2.188, 189, 190]
General question to be investigated:
How well does a random sample of 30 data represent a data set of several thousand?
Overview:
A. Create a large set of data.
1. Use the computer to generate the 5000 data.
2. Construct the histogram of the data and calculate several numerical statistics.
3. Describe the data set using ideas of shape, center, and spread.
B. Draw a random sample of 30 data from the large set generated in A.
4. Construct the histogram and find the same numerical statistics for the sample.
5. Describe the sample of data; shape, center, and spread.
C. Repeat Part B five (5) times.
6. Describe what appears to be the “typical” sample.
7. Compare the sample of 30 to the large sampled data set of 5000.
D. Conclusions
8. Write a paragraph answering the question, “How well does a random sample of 30 data
represent a data set of several thousand?”
E. Repeat parts A - D using a different large data set.
Start Minitab
Start > Programs > Departmental Applications > Mathematics > Minitab 12 for Windows > Minitab 12
PART A - A LARGE DATA SET
1. Generate 5000 data from the theoretical normal distribution with mean 100, standard deviation 20.
Calc > Random Data > Normal ...
Enter: Generate: 5000
Store in: C1
Mean: 100
Standard deviation: 20
> OK
2a. Construct a histogram of the 5000 data
Graph > Histogram ...
Enter: Graph variables: C1
Select: Options ...
Select: Type of Histogram: Frequency
Type of Interval: Cutpoint
Definition of Intervals:
Midpoint/Cutpoint positions: 20:180/5 > OK
Select: Annotation > Footnote ...
Enter: “Your Name”
> OK > OK
Optional:
Print the histogram:
Ctrl P > OK
2b. Inspect the histogram. Does the data seem to be normally distributed? (Y/N) Write a paragraph of
25+ words stating specific evidence seen in the histogram to support your answer.
Beyond the Formula II
MCC June 18-19, 1998
Robert Johnson
Page 2 of 4
[Sessn1.doc]
2c. Find several numerical statistics describing the 5000 data and compare the normal distribution
curve to the histogram.
Stat > Basic Statistics > Display Descriptive Statistics ...
Enter: Variable: C1
Select: Graphs ...
Select: Histogram of data with normal curve
> OK > OK
Optional:
Print the histogram:
Ctrl P > OK
3. Does the normal curve seem to follow the distribution of the 5000 data? (Y/N) Write a paragraph
(50+ words) stating specific reasons why the numerical statistics and the normal curve demonstrate
that the 5000 data have a normal distribution.
PART B - A SAMPLE
Let us now consider the 5000 data in C1 to be an unknown data set. Further suppose that because of
some circumstances beyond our control, we can only access a sample of 30 data to be used to
describe the entire set of 5000 values.
4a. Randomly select a sample of 30 data from the 5000 data in C1.
Calc > Random Data > Sample from Columns ...
Enter: Sample: 30 rows from columns: C1
Store sample in: C2
Select: Sample with replacement > OK
4b. Find numerical statistics and construct a histogram of the 30 sample data.
Stat > Basic Statistics > Display Descriptive Statistics ...
Enter: Variable: C2
Select: Graphs ...
Select: Histogram of data with normal curve
> OK > OK
Optional:
Print the histogram:
Ctrl P > OK
5. Describe the sample of data; shape, center and spread. Write a paragraph (50+ words) stating
specific reasons why the numerical statistics, the histogram, and the normal curve demonstrate that
the 30 data have a normal distribution.
Beyond the Formula II
MCC June 18-19, 1998
Robert Johnson
Page 3 of 4
[Sessn1.doc]
PART C - ADDITIONAL SAMPLES
6a. If you could randomly select another sample of 30 data, describe the distribution that you would
anticipate with regards to shape, center and spread. (25+ words)
6b. Repeat the instructions for 4a and 4b using copy and paste techniques.
Scroll up the Session window until you find the following block of commands:
MTB > Sample 30 C1 c2;
SUBC>
Replace.
MTB > Describe C2;
SUBC> GNHist.
Highlight these commands
Copy [Ctrl C]
Go to MTB > prompt [Ctrl End]
Paste [Ctrl V]
Press: Enter
Optional:
Print the histogram: Ctrl P > OK
6c. Repeat (6b) three more times by returning to Session window and using:
Ctrl V; Enter
6d. To view all graphs at onetime,
Window > Manage Graphs ... >
Highlight all graphs listed (click & drag across list & release)
Select: Tile > Done
6e. Use Ctrl M to return to Session window; use highlighting, deleting, copy and paste techniques to
organize the output from 2c, 4b, 6b & 6c into a table for easier comparison. Delete the SE Mean.
Descriptive Statistics
N
Mean
5000
99.647
30
30
100.87
97.49
Median
100.027
TrMean
99.693
StDev
19.78
Minimum
31.047
99.85
98.84
100.66
98.07
20.32
19.34
53.70
51.49
Maximum
168.369
Q1
86.329
Q3
113.249
148.63
135.16
88.05
86.35
111.91
111.23
6f. Based on the results from #4 and # 6 (a) through (e), describe the “typical” sample of 30 data;
shape, center, and spread. (50+ words)
7.
Compare the sample described in # 6(f) to the data set in Part A #2(c). Be specific about the
characteristics of shape, center and spread. (50+ words)
Beyond the Formula II
MCC June 18-19, 1998
Robert Johnson
Page 4 of 4
[Sessn1.doc]
PART D - CONCLUSION
8.
Write a paragraph answering the question, How well does a random sample of 30 data represent a
data set of several thousand? (50+ words) Be convincing.
PART E - A DIFFERENT DATA SET
9. Repeat Parts A through D using a different data set. Generate 5000 bimodal data by combining 3000 from
the theoretical normal distribution with mean 100, standard deviation 20 with 2000 from a normal distribution
with mean 160, standard deviation 10.
Calc > Random Data > Normal ...
Enter: Generate: 3000
Store in: C1
Mean: 100
Standard deviation: 20
> OK
Calc > Random Data > Normal ...
Enter: Generate: 2000
Store in: C2
Mean: 160
Standard deviation: 10
> OK
To combine the two set, we must “Stack” them.
Manip > Stack/Unstack > Stack Columns ...
Enter: Stack the following columns: C1 C2
Store the stacked data in: C3
> OK
Graph > Histogram ...
Enter: Graph variables: C3
Select: Options ...
Select: Type of Histogram: Frequency
Type of Interval: Cutpoint
Definition of Intervals:
Midpoint/Cutpoint positions: 20:200/5 > OK > OK
To select the random samples and describe the sample, use the same block of commands as used in 6b, except
the column numbers must be changed.
MTB > Sample 30 C3 C4;
SUBC>
Replace.
MTB > Describe C4;
SUBC> GNHist.
PART F - A Class Project
10. Repeat Parts A through D using different data sets or different sample sizes. For example; assign
different distributions each study group, or assign different sample sizes, for example 10, 20 and
50. These projects could become individual or group, written or oral reports.
Part G - HELP
11. What is the Tr Mean? Use Minitab’s Help window to find its definition. Describe how the Tr Mean
is calculated.
Beyond the Formula II
MCC June 18-19, 1998
Robert Johnson
Session 4
Page 1 of 5
[Sessn4.doc]
“The Chances of Two Boys!”
General question to be investigated: Do probabilistic events in real life occur with a relative frequency
that is comparable to the theoretical probability of that same event?
Situation to be studied: A woman and a man (unrelated) each have two children. At least one of the
woman’s children is a boy, and the man’s older child is a boy. Do the chances that the woman has two
boys equal the chances that the man has two boys?
In other words:
a) Find the probability the woman has two boys, knowing she has at least one boy?
b) Find the probability the man has two boys, knowing his oldest child is a boy?
Overview:
A. Find the theoretical probabilities of these events.
1. Construct the sample space for family of two children.
2. Reduce the sample space to show at least one boy and determine the P(2 boys)
3. Reduce the sample space to show oldest is a boy and determine the P(2 boys)
B. Survey by simulation using coins.
5. Simulate the gender of two children for 10 families using two coins.
6. Use the results to estimate the P(2 boys) for families of at least one boy.
7. Use the results to estimate the P(2 boys) for families where the oldest is a boy.
C. Survey by simulation using a computer.
9. Simulate the gender of two children for 200 families using Minitab
10. Use the results to estimate the P(2 boys) for families of at least one boy.
11. Use the results to estimate the P(2 boys) for families where the oldest is a boy.
D. Repeat Part C several times, keeping a record of the repeated results.
12. Statistically describe the two sets of observed proportions.
PART A - THE SAMPLE SPACE APPROACH
1. Construct a sample space showing the possible genders for an older and younger child for all
families with 2 children. {Use (older, younger) for ordered pair notation.}
Sample space:
S=
2. Show the reduced sample space satisfying the condition, “the woman has at least one boy,” and
find the probability the woman has 2 boys, knowing she has at least one boy.
Reduced sample space: Sw =
P(2 boys | at least 1 boy) = ______
3. Show the reduced sample space satisfying the condition, “the man’s oldest children is a boy,” and
find the probability the man has 2 boys, knowing his oldest child is a boy.”
Reduced sample space: Sm =
P(2 boys | oldest is boy) = ______
4. Do the chances that the woman has two boys equal the chances that the man has two boys?
Explain why the chances are different.
STOP! Complete this page. Wait until instructed to continue on next page.
Beyond the Formula II
MCC June 18-19, 1998
Robert Johnson
Page 2 of 5
[Sessn4.doc]
PART B - SIMULATION USING TWO COINS
It would be interesting to see if the probabilities found in Part A really occur in life. This would require
that we identify many women with two children, find out which of these have ‘at least one boy,’ and then
determine the relative frequency of ‘2 boys.’ This would be quite difficult. The same would be true
about identifying man who had two children where the oldest was a boy, and so on. So rather than
surveying actual people in a complex situations like this, we simulate such a survey.
5. Toss two coins of different denominations simultaneously, letting
the coin of larger value represent the older child. Record a 1 for
a heads, and a 0 for a tails on the table at the right. Each toss
of the pair of coins will represent one family of two children. Let
the 1 represent boy and 0 represent girl. [The choice is
arbitrary, however I recommend the 1 represent the event that
is the object of the question; “boy” in this case.]
6. a. How many boys in each family? Complete n(boys) column.
b. How many of the ten families have “at least one boy?” ____
(Cross out the families that do not.)
Family
Oldest
Youngest
n(boys)
1
2
3
4
5
6
7
c. How many of the remaining families have “two boys?” ____
8
d. With what relative frequency did you observe families having
“2 boys, knowing the family had at least one boy?”
9
10
Toss your two coins 10 more times, following the instructions above, recording the results on table to
the right:
Family
Oldest
Youngest
1
7. a. How many of the ten families have a “boy for the oldest
child?” _____
(Cross out the families that do not.)
2
b. How many of the remaining families have “two boys?” ____
3
c. With what relative frequency did you observe families having
“2 boys, knowing the oldest was a boy?”
4
5
6
8. Do the chances that the woman has two boys equal the chances
that the man has two boys? Explain.
7
8
9
10
STOP! Complete this page. Wait until instructed to continue on next page.
Beyond the Formula II
MCC June 18-19, 1998
Robert Johnson
Page 3 of 5
[Sessn4.doc]
Start Minitab
Start > Programs > Departmental Applications > Mathematics > Minitab 12 for Windows > Minitab 12
PART C - SIMULATION USING COMPUTER
9a. The Bernoulli process (or experiment) is any event that has 2 outcomes; for example: yes or no,
success or failure, on or off, boy or girl, . . . and are coded 0 or 1. Typically, x = 1 represents the
outcome that is of interest (meaning the outcome being discussed, boy in our case). We will use
the Bernoulli probability function will place 0’s and 1’s in columns on Minitab’s worksheet to
construct a tables showing two simulated surveys of 200 families of 2 children for each the woman
and the man in our problem.
9b. Use Minitab to generate the gender codes (1 for boys and 0 for girls) for 200 families of 2 children
for the woman placing the codes in C1 (Oldest Woman) and C2 (Youngest Woman); likewise,
generate 200 families for the man in C3 (Oldest Man) and C4 (Youngest Man).
Calc > Random Data > Bernoulli ...
Enter: Generate: 200 rows of data
Store in: OldW YoungW OldM YoungM
Probability of success: 0.5 > OK
10. a. Find the number of boys in each of the woman’s families.
Calc > Calculator ...
Enter: Store results in variable: ‘n(boys)’
Enter: Expression: OldW + YoungW
> OK
[To enter the expression, click in sequence: the expression window; dbl. click C1; +; dbl. click C2]
[It should now look like expression above.]
10. b. Separate the 200 woman’s families by number of boys in family.
Manip > Stack/Unstack > Unstack One Column ...
Enter: Unstack the data in: n(boys)
Store the unstacked data in: C6 C7 C8
Using subscripts in: n(boys)
> OK
10. c. Determine the proportion of families with 2 boys, knowing there is at least one boy.
Calc > Calculator ...
Enter: Store results in variable: K1
Expression:
COUNT(C8) / (COUNT(C7) + COUNT(C8))
> OK
[The COUNT function is listed under Functions: Statistics; N total.]
At the active MTB > prompt, type: PRINT K1
10. d. Explain the value K1. Describe how it was obtained.
STOP! Complete this page. Wait until instructed to continue on next page.
> OK
Beyond the Formula II
MCC June 18-19, 1998
Robert Johnson
Page 4 of 5
[Sessn4.doc]
11. a. Separate the 200 man’s families according to gender of oldest child.
Manip > Stack/Unstack > Unstack One Column ...
Enter: Unstack the data in: YoungM
Store the unstacked data in: C9 C10
Using subscripts in: OldM
> OK
11. b. Determine the proportion of families with 2 boys, knowing the oldest is a boy.
Calc > Calculator ...
Enter: Store results in: K2
Expression:
SUM(C10) / COUNT(C10)
> OK
[The SUM and COUNT function is listed under Functions: Statistics; N total.]
At the active MTB > prompt, type: PRINT K2
> OK
11. d. Explain the value K2. Describe how it was obtained.
PART D - REPEAT THE SIMULATION SEVERAL TIMES
Let’s further demonstrate the truth of the probabilities found in questions (2) and (3) by repeating the
computer simulation of Part C to see if the same or similar values reoccur.
12. a. Use Copy & Paste techniques to run the commands used in (9, 10, 11) at least 20 times.
Scroll up the Session window until you find the following block of commands:
MTB >
MTB >
SUBC>
MTB >
MTB >
MTB >
SUBC>
MTB >
MTB >
MTB >
SUBC>
MTB >
MTB >
Name c1='OldestW' c2='YoungW' c3='OldestM' c4='YoungM'
Random 200 'OldestW' 'YoungW' 'OldestM' 'YoungM';
Bernoulli .5.
Name C5 = 'n(boys)'
Let 'n(boys)' = OldestW + YoungW
Unstack 'n(boys)' c6 c7 c8;
Subscripts 'n(boys)'.
Let k1 = COUNT(C8) / (COUNT(C7) + COUNT(C8))
PRINt k1
Unstack 'YoungM' c9 c10;
Subscripts 'OldestM'.
Let k2 = SUM(C10) / COUNT(C10)
PRINt k2
[If you have unwanted lines in between any of these commands, highlight and delete the unwanted material.]
Click, drag, and release to highlight the “entire block” of commands shown above
Use Ctrl C to Copy
Use Ctrl End to go to the active (Red) MTB > prompt
Use Ctrl V to Paste
Press Enter
Then use Ctrl V to Paste and press Enter
Then use Ctrl V to Paste and press Enter; continue until you have your 20 sets of
results.
STOP! Complete this page. Wait until instructed to continue on next page.
Beyond the Formula II
MCC June 18-19, 1998
Robert Johnson
Page 5 of 5
[Sessn4.doc]
You may “collect” the two results each time by deleting the “unwanted stuff” in between,
or you may do delete all the “unwanted stuff” at the end; your choice.
When you have completed your 20 simulations, your results need to be listed in 2
columns similar to below:
0.34193
0.32738
0.33962
0.429907
0.524752
0.531250
0.27142
0.35135
0.526316
0.428571
.
.
.
Type: READ C11 C12 on the line above your results, and END on the line below the
results , as shown below:
READ C11 C12
0.34193
0.32738
.
.
.
0.27142
0.35135
0.429907
0.524752
0.526316
0.428571
END
To enter values onto the Minitab worksheet , highlight, and Copy the “block” including
READ and END; go to the active MTB > prompt, Paste and press Enter.
Check columns C11 and C12 on the worksheet, your results should now be listed there.
12. b. Statistically describe the two sets of observed proportions. [Minitab will do both at onetime.]
Stat > Basic Statistics > Descriptive Statistics ...
Enter: Variable: C11 C12
Select: Graphs ...
Select: Histogram of data with normal curve
> OK > OK
12. c. Write a paragraph answering the question, How closely do the two sets of simulation results
agree with the two sample space probabilities? (50+ words) Be convincing.
STOP! Complete this page. Wait until instructed to continue on next page.
Beyond the Formula II
MCC June 18-19, 1998
Robert Johnson
Session 5
Page 1 of 7
[Sessn5.doc]
“How Important Is the Assumption?”
General question to be investigated:
How important is the assumption “the sampled population is normally distributed” to the use of the ChiSquare distribution for statistical procedures about one standard deviation?
[Elementary Statistics, 7th ed., Johnson, Duxbury Press, pg. 484, Exercise 9.112]
Overview:
A. Generate a random sample of 5000 values from a non-normal probability distribution.
1. Construct the histogram of the 5000 data.
2. Visually inspect the distribution to verify “non-normally distributed data.”
B. Draw a random sample of 10 from the non-normal theoretical distribution.
3. Calculate the standard deviation for the sample.
4. Calculate the chi-square statistic χ 2 * = ( n − 1) s 2 σ 2
C. Draw 100 random samples of 10 each from the non-normal theoretical distribution.
5. Calculate the standard deviation for each of the 100 samples.
6. Calculate the chi-square statistic χ 2 * = ( n − 1) s 2 σ 2 for each of the 100 samples.
D. Compare the empirical results to chi-square distribution with 9 degrees of freedom.
E. Repeat parts A - D sampling from a normal probability distribution.
F. Conclusions
Start Minitab
Start > Programs > Departmental Applications > Mathematics > Minitab 12 for Windows > Minitab 12
PART A - THE DATA BEING GENERATED
We will start by sampling a non-normal theoretical distribution. Let’s check the data produced by the
random number generator, “Does the data appear to be normally distributed?”
1a. Generate 5000 data from the exponential distribution with mean 100 and store in column C1.
Calc > Random Data > Exponential ...
Enter: Generate: 5000 rows of data
Store in column(s): ‘data’
Mean: 100
> OK
[Use of single quotes names the next available column.]
1b. Construct a histogram of the 5000 data
Graph > Histogram ...
Enter: Graph variables: C1 data
[Double click on variable or column name to enter it.]
Select: Options ...
Select: Type of Histogram: Frequency
Type of Interval: Cutpoint
Definition of Intervals:
Midpoint/Cutpoint positions: 0:700/50 > OK
Select: Annotation > Title ...
[Read, ”from 0 to 700 by 50’s”]
Enter: 5000 Non-Normal Data
> OK
Select: Annotation > Footnote ...
Enter: “Your Name”
> OK > OK
2.
Inspect the histogram. Does the data seem to be normally distributed? ( Y / N ) Write a
paragraph of 25+ words stating specific evidence seen in the histogram to support your answer.
Optional:
Print the histogram: Ctrl P > OK
STOP! Complete this page. Wait until instructed to continue on next page.
{Attend P. Kuby’s Section 6.2 to experience the graph editor.}
Beyond the Formula II
MCC June 18-19, 1998
Robert Johnson
Page 2 of 7
[Sessn5.doc]
PART B - ONE SAMPLE
3a. Randomly select one sample of 10 data from the exponential distribution with mean 100 and place
it in row 1 of columns C2 through C11.
Calc > Random Data > Exponential ...
Enter: Generate: 1
Store in: C2 - C11
Mean: 100
> OK
Inspect your sample data.
3b. Calculate the mean and standard deviation for the sample in row 1 of C2 - C11.
Calc >
Select:
Enter:
Enter:
Row Statistics ...
Statistic: Standard deviation
Input variables: C2 - C11
Store results in: ‘stdev’ > OK
[Double click C2; type - ; double click C11]
[Use of single quotes names the next available column.]
View the sample standard deviation; it’s in row 1 of C12.
4a. Find the calculated chi-square, χ 2 * = ( n − 1) s 2 σ 2 , for sample in row 1, C2 thru C11 and put in
C13. [Note: The standard deviation for this theoretical distribution is 100.]
Calc > Calculator ...
[Use Shift 8 for *]
Enter: Store results in variable: ‘chisq*’
Enter: Expression: 9 * stdev ** 2 / 100 ** 2
> OK
[To enter the expression, click or type in sequence: the expression window; 9; * (multiply); dbl. click on
C12; **(power); 2; / (divide); 100; **(power); 2 It should now look like expression above.]
4b. View the standard deviation and chisq*; they’re in row 1 of C12 and C13. Write a paragraph of 25+
words describing what has occurred in 3a, 3b and 4a.
STOP! Complete this page. Wait until instructed to continue on next page.
Beyond the Formula II
MCC June 18-19, 1998
Robert Johnson
Page 3 of 7
[Sessn5.doc]
PART C - ONE HUNDRED SAMPLES
Questions 5a, 5b, and 6a are basically repeat questions (3a, 3b, and 4a); therefore they can all be
completed very quickly by following the procedure described below question 6a:
5a. Randomly select 100 samples of 10 data each from the normal distribution with mean 100,
standard deviation 50 and place them in rows 1 through 100 of columns C2 through C11.
5b. Find the mean and standard deviation for each of the 100 samples, rows 1-100 of C2 - C11.
6a. Find the calculated chi-square, χ 2 * = ( n − 1) s 2 σ 2 , for sample in row 1, C2 thru C11 and put in
C13.
Scroll up the Session window until you find the following block of commands:
Random 1 c2-c11;
SUBC>
Exponential 100.
MTB > Name c12 = 'stdev'
MTB > RStDev C2-C11 'stdev'.
MTB > Name C13 = 'chisq*'
MTB > Let 'chisq*' = 9 * stdev**2 / 100**2
First type in two “zeroes” to change the 1 in the first command to 100.
The first line should now read:
MTB > Random 100 c2-c11;
[If you have unwanted lines in between any of these commands, highlight and delete the unwanted material.]
Click, drag, and release to highlight the “entire block” of commands shown above
Use Ctrl C to Copy
Use Ctrl End to go to the active (Red) MTB > prompt
Use Ctrl V to Paste
Press Enter
6b. Use the histogram command to express the relative frequency distribution of the 100 observed
chisq* values.
Graph > Histogram ...
Enter: Graph variables: chisq*
Select: Options ...
Select: Type of Histogram: Frequency
Type of Interval: Cutpoint
Definition of Intervals: Midpoint/Cutpoint positions: 0:40/4
Select: Annotations
Select: Data Labels ...
Select: Show data labels
> OK
Select: Annotation > Title ...
Enter: 100 Calculated Chi-Square Values > OK
> OK
> OK
6c. Inspect the 100 standard deviations and chisq* values; they’re in C12 and C13. Write a paragraph
of 25+ words describing what has occurred in 5a, b and 6a.
Optional: Print the graph: Ctrl P > OK
STOP! Complete this page. Wait until instructed to continue on next page.
Beyond the Formula II
MCC June 18-19, 1998
Robert Johnson
Page 4 of 7
[Sessn5.doc]
PART D - COMPARISON OF RESULTS TO THE CHI-SQUARE DISTRIBUTION, df = 9
7a. Inspect the histogram (6b) and list the relative frequencies distribution on the table (Careful, the
histogram shows frequencies).
chisq*
Rel Freq
Prob
(from 6b)
(from 8b)
Difference
(6b) - (8b)
Rel Freq
Rel Freq
Rel Freq
(10b.1)
(10b.2)
(10b.3)
0-4
4-8
8 - 12
12 - 16
16 - 20
20 - 24
24 - 28
28 - 32
32 - 36
36 - 40
7b. Write 25+ words describing the
meaning of this relative frequency
distribution.
8a. Construct the cumulative probability distribution for Chi-Square with df = 9.
1) List the possible chisq values corresponding to the class boundaries in the above table.
Calc > Make Patterned Data > Simple Set of Numbers ...
Enter: Store patterned data in: ‘chisq’
From first value: 0
To last value:
40
In steps of:
4
list each value: 1 times
List the whole sequence:
1 times > OK
2) Calculate the cumulative probabilities for chi-square with df = 9.
Calc > Probability Distributions > Chisquare ...
Select: Cumulative probability
Enter: Degrees of freedom: 9
Input column:
C14 chisq
Optional storage:
‘CumProb’ > OK
8b. Determine the expected probability for each class of the distribution from (8a.2) and list it in the
Prob. column on table (7). [Hint: You will need to subtract to find each entry.]
8c. Write 25+ words describing the meanings of the information in C14 and C15 and its relationship to
the Prob. distribution column you listed on table in (7).
9a. Find the difference between the relative frequency and the probability for each class interval of
chisq*. (Write percentage in Difference column.)
9b. How close do the relative frequency distribution (7) and the probability distribution (8b) compare?
Write a short paragraph comparing these two distributions.
STOP! Complete this page. Wait until instructed to continue on next page.
Beyond the Formula II
MCC June 18-19, 1998
Robert Johnson
Page 5 of 7
[Sessn5.doc]
10a. Repeat the instructions for 5a, 5b, 6a and 7a to observe another set of 100 samples.
Scroll up the Session window until you find the following block of commands:
MTB >
SUBC>
MTB >
MTB >
MTB >
MTB >
MTB >
SUBC>
SUBC>
SUBC>
SUBC>
SUBC>
SUBC>
SUBC>
SUBC>
SUBC>
SUBC>
SUBC>
Random 100 c2-c11;
Exponential 100.0.
Name c12 = 'stdev'
RStDev C2-C11 'stdev'.
Name C13 = 'chisq*'
Let 'chisq*' = 9 * stdev**2 / 100**2
Histogram 'chisq*';
CutPoint 0:40/4;
Bar;
Title "100 Calculated Chi-Square Values";
Footnote "Your Name";
ScFrame;
ScAnnotation;
Symbol;
Type 0;
Label;
Offset 0.0 0.0125;
Placement 0 1.
[If you have unwanted lines in between, highlight and delete these unwanted lines.]
Click, drag, and release to highlight this “block” of commands
Use Ctrl C to Copy
Use Ctrl End to go to MTB > prompt
Use Ctrl V to Paste
Press Enter
Compare the results shown on the graph to the distribution in C14 and C15.
10b. Repeat three more times using only:
Ctrl V;
Enter each time.
To arrange graphs so you can see them all at once;
Window > Manage Graphs . . .
Highlight all graphs to be viewed (click & drag across their names & release)
Select: Tile > Done
10c. Write a short paragraph answering, “If the sampled population is non-normally distributed, does
the sampling distribution for the calculated chisq* seem to have a Chi-square distribution?”.
STOP! Complete this page. Wait until instructed to continue on next page.
Beyond the Formula II
MCC June 18-19, 1998
Robert Johnson
Page 6 of 7
[Sessn5.doc]
PART E - SAMPLING NORMAL DISTRIBUTIONS
11.
Repeat the instructions for 1a, 1b, 5a, 5b, 6a and 7a to observe a set of 5000 data and 100
samples from a normal probability distribution with mean 100 and standard deviation 50.
a. Scroll up the Session window until you find the following block of commands. Then replace
the underlined information with the corresponding information in the box.
MTB > Name c1 = 'data'
MTB > Random 5000 'data';
SUBC>
Exponential 100.
MTB > Histogram 'data';
SUBC>
CutPoint 0:700/50;
SUBC>
Bar;
SUBC>
Title “5000 Non-Normal Data";
SUBC>
Footnote "Your Name";
SUBC>
ScFrame;
SUBC>
ScAnnotation.
Normal 100 50.
-100:300/25;
“5000 Normal Data”;
Copy and Paste at the active (Red) MTB > prompt. Press Enter.
View 5000 values. Do they seem to have a normal distribution from -100 to 300? Describe the
distribution.
b. Scroll up the Session window until you find the following block of commands. Then replace
the underlined information with the corresponding information in the box.
MTB >
SUBC>
MTB >
MTB >
MTB >
MTB >
MTB >
SUBC>
SUBC>
SUBC>
SUBC>
SUBC>
SUBC>
SUBC>
SUBC>
SUBC>
SUBC>
SUBC>
Random 100 c2-c11;
Normal 100 50.
Exponential 100.0.
Name c12 = 'stdev'
RStDev C2-C11 'stdev'.
Name C13 = 'chisq*'
50
Let 'chisq*' = 9 * stdev**2 / 100**2
Histogram 'chisq*';
CutPoint 0:40/4;
Bar;
Title "100 Calculated Chi-Square Values";
Footnote "Your Name";
ScFrame;
ScAnnotation;
Symbol;
Type 0;
Label;
Offset 0.0 0.0125;
Placement 0 1.
Copy and Paste at the active (Red) MTB > prompt. Press Enter.
STOP! Complete this page. Wait until instructed to continue on next page.
Beyond the Formula II
MCC June 18-19, 1998
Robert Johnson
Page 7 of 7
[Sessn5.doc]
c. Inspect the graph. Complete the listing of the corresponding relative frequency distribution
(below). List the cumulative probability distribution for chi-square distribution with df = 9.
chisq*
Rel Freq
Prob
(from 6b)
(from 8b)
Difference
(6b) - (8b)
Rel Freq
Rel Freq
Rel Freq
(10b.1)
(10b.2)
(10b.3)
0-4
4-8
8 - 12
12 - 16
16 - 20
20 - 24
24 - 28
28 - 32
32 - 36
36 - 40
Compare the two distributions.
(25+ words)
PART F - CONCLUSIONS
13. Write a paragraph answering the question, How important is the assumption “the sampled
population is normally distributed?” to the use of the Chi-Square distribution for statistical
procedures about one standard deviation? (50+ words) Be convincing.
STOP! Complete this page. Wait until instructed to continue on next page.