Download understanding probability and statistical inference through resampling

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
IASE/ISI Satellite, 1993: Clifford Konold
From Brunelli, Lina & Cicchitelli, Giuseppe (editors). Proceedings of the First Scientific
Meeting (of the IASE). Universiti di Perugia (Italy), 1994. Pages 199-211. Copyright holder:
University of Perugia. Permission granted by Dipartmento di Scienze Statistiche to the IASE to
make this book freely available on the Internet. This pdf file is from the IASE website at
ht~:!!~~ww.stat.auckla~ld.nz!-iase!~ublica~ions!p-oc 1993. Copies of the complete Proceedings
are available for 10 Euros from the IS1 (International Statistical Institute). See
http:lljsi.cbs.n1isale-iase.htmfor details.
UNDERSTANDING PROBABILITY AND STATISTICAL
INFERENCE THROUGH RESAMPLING
Clifford Koriold
ScientificReasonirrg Resenrch Institute
Hnsbrourh Laboratory, Unzvrt-siiy of Mdssdchzae~s
Amberst, MA 01003, USA
1. Introdr~ction
During tile past four years, I have been developing curricztlic and
computer sofiware for teaclzing probability and data analysis at the
itltroductory high-school and college level. The approach I've taken
emphasizes thc use of red data, where "telling a story" ralces priority over
testing hypotheses, and in which mathenlatical for~nalismis ltcpr to a
minimum (see Cobb, 1992; Scheaffer, 1990; Watlcins ctltl., 1992).
A major question I have considered is how probability ought to be
integrated into rhis n&wdata-rich curriculum. There are two major reasons
for keeping probability in the data-analysis (or statistics) curriculum. First,
at some point in the process of constructing theories that accounr for
patterns in data, it is important that students consider alterilative
explanations. Among these is the possibility that some outcome of interest
resulted from chance. Second, probability is an important concept in its
own right (Falk and Konold, 1392). It comprises a world view and should
not be viewed a necessary evil that must be faced if students are to
understand statistical inference.
i n deciding how to relate probaLility and data at~al~sis,
T have adopted
an approach Julian Sitnon began &vacating in the late 1960s. Simon
describes his approach as having grown out of his frustrariol~watching
graduate students do siily things when trying to test a staristical hypothesis
(Simon and Bruce, 1992). H e began designing
experiments from
which he hoped they could build up sound probnbitistic understandings.
These eventually developed into a resampling approach that promised a
nlore intuitive take oil probability and data analysis, and which made the
connections between the two f elds more apparent. Of course, Simon didn't
invent Mollre Carlo methods, nor the randomizatioo tests he would come
ro employ, Lut he was among the first to see rheir educatiorlal potenrial, and
long before the computer was widely available.
IASE/ISI Satellite, 1993: Clifford Konold
200
UNDERSTANDING PRORARILI?YAND S TATIS11CS THROUGH HESAMPUXC
Rather than elaborate Simon's argument here, I briefly describe two
software tools we've developed, highlighting aspects rl-iat empilrtsize the
relation between probability and data analysis. I also report some results
from our primary test site, a high school in Hotyoke, R4assachusetts.
2. Modeling a problem with Prob Sim
Most educationai probability-simulation software con~priseseveral
ready-made models (e.g., coin model, die model). Snlde~ztsload the
appropriate model, draw samples, and t11cn see results displayed. The
software thus offers empirical dc~nolistrarionsof various facts and principles,
such as the Iaw of large numbers and the binomial distribution. The
software we have designed, "Prob Sim", inclrrdes no ready-made models.
Rarl~er,the student must build the model, specifying the appropriate
sampling procedure and analyses in order ro estimate the probabiiity of
some went. The process of buildi~xga simulation nlodel is at least as
important as, if not niore important rhan, drawing the appropriate
conclusions fronl rhe resdrs. T o illustrate, I'll describe one of our activities
entitled "LAPD" (see Xonold, 1933, for another example).
Students begin by reading excerpts from The Nezv Yoi-k Tinzes account
(March 18, 1991) of the beating of Rodney King by officers of the Los
Angeles Police Department. After reporting rlzat. "at least 15 officcrs in
patrol cars converged on King", the article broaches one of the issues that
made chis incident explosive: "In what other police officers called a chancc
deployment, all the pursuing officers were white. Tile force, which nurnbcrs
about 8,300, is 14% black?.
Scudcnts are asked to build a model of tliis situation LO estimate the
probability of finding no blaclrs in a random sample of I6 officers. 'Yhis
problem has generated lively discussions in our test sites. Wlzetz scude~lts
care deeply about a problem, they are more willing to persist through
difficulties. hbreover, they learn that applicarion of probability theory is
not limited to rolling dice arid blindly clrawing soclcs out of dresser drawers.
2.1 Building a madel
I demonstrate below the various stages through which students progress
in modeling this situation, illustracing the steps wid1 screen shots from f'rob
Sim. In Fig, 1, a mixer has been filled according to rile irrforznatiori
prot.ided in the article. There are a total of 8,300 elements, 1,162 of them
labeled B (Black) and 7,138 labeled N (Not blaclt). The non-replacement
option has been selected to preclude the possibility of having the sanic
IASE/ISI Satellite, 1993: Clifford Konold
C. KONOI D
clement (officer) in a sample more than once.
Figure 1. A snnplzi?gt~1odeIf.rthe Isil'Dpro6br~t
The Run controls on the far left shows the sample size set at 16, and
numbcr of repetitions set (somewhat arbitrarily) at 100. After tlte llun
button is pressed, the cornputer draws I6 elements from the mixer without
replacement, repeating chis a rota1 of 100 rimes. This is analogous to
lookiilg ar 100 occasions when IG randomly-selected officers appeared on a
crime scene. The sampling process is animated in the lower palt of the
Mixer window. The results of each repetition arc displayed in a Data
Record window (not shown).
After the block of 100 repetitions has been drawn, results can be
analyzed as shown in Fig. 2. The Analysis window in this case shows the
numbcr (and proportion) of occurrences of each of the 17 possible
unordered outcomes. Ten percent of the samples had 0 B's. Thus, a first
estimate of the probabjIiry that a "chance deployment" would include no
black police of&cersis .10.
2.2 Repeati~ythe txperivzent
111many textbook examples of simulation, the experimental component
ends here. But it is important not only to esrinlate the probability of sonic
event, bur to get sonie sense of [he range (or variability) in that estimate
given the number of reperitions. Indeed, the notion of chance is nor
IASE/ISI Satellite, 1993: Clifford Konold
value per se, but in the variability of result over
apparent iiz a
& File Edit Run flnalyze Windows
BBBBBBBBBBBBBBBN
BBBBBBBB8BBBBBNN
BBBBBEBB'BBBBSNNN
0
O
0
BBBBBBBBBBBBNRNN
o
BBBBBBBrfNNNNNt{NN
BBBBBBNNNNNNNNNN
BBBBBNNNNNNNNNNN
0
0
2
Figure 2. ?%r 1rtln4ih zutndow shozc!ing rhc ~rsziltsof 100 rtprt/tlot.zi
replications of an experi~nenr.To replicate ail cxprrimer~tirl l'rob Sim, thc
studcnr has only to press the run button again. Another random sample is
dmwn, and rbe Analysis window updates to show the new results.
Students replicate tlte experiment about 10 times, plorting the resulrs olt
a line plot whicll helps to emphasize the variability in the process. They
&en pool their results to come up wid1 a final probability estimate. For the
10 resulrs plottedi~~
Kg.-3, the pooIed esrinare ofP (0 b~aclcss)is .I 09.
x
X
X
X
X
X
X
X
X
X
5 6 7 8 9 10 11 12 13 14 15 IG 17
Percentage of samples with no BS
Figure 3. Lincpht of reszrlt of10 rrplicmonr
30 uo~ssnss!p Zun~oddnsrr! 1003 ~ueuodur!u-e s! W!S q o ~ d.as!p pue s u y o ~ j o
ruIea1 art3 puoXsq saAom uoysnssu! Ipun m s m woplas suoFssn3srp y s n s
-maria Jeqloj palelax uorspap
awos Surlleus u! PJOM ISET xo Lluo ayl xa'iau SF 3uaAa auros j o kq!qeqo~d
ayl 'sp.rO,?n.Jaylo u1 -auass ayl uo 3<uaXalM sSa3yjo y3eiq Xy*\l, r r ~ ~ d x a
ley] sa!Joaqx h ~ y d s u o liue
s j o L!pq!suas aql pue s r a s g o ayl j o Xuow!lsal
luanbasqns arp arz ~ m l ~ o d uaxom
r;
- u o ~ ~ c w ~ oj oj us! a y d IueAaIas a71 j o
auo lilrro s! 1na.s a q j~o 1b!l~9'e401day3 'asmosfo 'pny '93~19aJaM smsgjo
8u!puodsa~aql 30 auou zerp asucys lsn[ s e a 11 JaylayM cuor3ew.Iojrr! l!atji
u o pasaq <Su!p!sap jo ruayqo~datis ssnss!p euaprus 'qef aql JO pua a43 IQ
.sawosaq faporu lerp suo~.js z p ayl a~!lem.roju!slow
a111 c l qJ E~~ I I~~ ! I~eJ jo
E~
~ayur!l a y l j o axe .Cay] aleme alour ayl Gstro~durnsse
4~1!Ljilf!~dllr~s
~ O A fCm3
E
Xay t191101.p I E ~ Iaztiza.1 01 auros sluapnls kro!~ct.tz!s
1a83ei at11 01 1apow E % K I ! . I ~ ~ I L I O ~ssasold
~O
s!q$ q 8 n o . q ~-lapour pagydurfs
3!ayl Xq pa3npoauy sasc!q j o vopsarfp arp $s!pa~d uayo rres Xayl 'sjrp
oy ~ ; u e oXaxp j~ - ( s l i d asex-awes u! auass arp uo a g u e axjod aneq <.%-a)
zo~s-ejleuo!~ppeua mnomz o3rir aym 01lapow aql rasp uc3 Xay~sases aulos
u j .uo!zans!s Tea1 at11 13ajjE suo!lt.lap!suos sno?a.e,z ~ o r PIIBJ!
l
a p a p ~SRW
smapn$s 's1jnsa.i rro!l.elnur!s 3 q 3 30 anph anys!pald aql au!urJauy 01 JayJo
111 .S~A~OI\U!%~!lapoxu 30 ssaso~daqs I ~ C ~prrurrslapun
M
01 u$aq sxuapn3s
xcql ppout Xue o ~ u !pa~e-tod~osrr!
suo!~du~nssc
ayl Bu!ssarrppe qSno.rrp st
21 .sl[nsaa .rfaylj o suorlt.s!~druy arp s s n x p pue srro!~drunssenzau awyl r p p
JrraisIsuoa spgoru rrnJ pue pl!nq Xarfi 'ues Xatp 21
arues arp jo aq 01
pual S J ~ U I j!
J ~r~o!~sanb
II! L~!I!qcqo~d
ay3 ~ s a j j e1y8!w S J ! E ~ u! a~!jreusijo
sraq.~oa q o d 3erp 1 3 E j ayi ~ o J3p!SU03
q
01 L a ~ d t u ~~: ~
9a' $p a p a.""
s~uaprls~
.apzxu ale s u o ! ~ d r u u s ssno!.rxh
~
'%u!p~!nq rapom j o s s a l o ~ dayl u~
IASE/ISI Satellite, 1993: Clifford Konold
IASE/ISI Satellite, 1993: Clifford Konold
tl~istype, because it cat1 be used to model engaging problems too complex
for students to tackle fortnally. Prob Sirn's simplicity, and dze speed with
which suff~cientdata can be generated, gives students the time to design a
sampling model, collect adequate data, draw conc~usions,and discuss
implications of the Gndings. In rhe next section, I show Izotv we build on
ideas introduced in modeling probabilistic situations when we move on to
data analysis.
3. Data analysis usir~gDatascope
We designed a data-analysis program called "DaraScope" for use in
introductofy courses stressing exploratory data analysis in which studerlts
work with real data. Our primary objective is for students to learn basic
data-analysis techniques for exploring relationships anlong various variables.
By using mulriple-variable data sets, we give students the opporcr~tlityto
explore questions of particular interest to them. DataScope encourages
students to make initial judgments of relationship by visually comparing
plots. This is demonstrated below using data obrained from a tluestiorlrtaire
administered to 84 students in two high schools in western Massachusetts:
Amherst Regional and Holyolce High. Andlerst is a snxall coilege rown,
while )-IolyoIre is a larger, industrial city. The i~lfornzatioilcollected on tach
studenr included !gender, age, birth order, family size, inaritai status of
parents, religious activity, rating of school ~erformnnce,educational level of
parents, curfew times, working hours and wages, and time spent on
homeworlc. Students spend about two weeks exploring various questions irt
this data set, among them thc question of whether holding an &r-school
job adversely affects school perfornimce.
DataScope encourages exploration by allowing students to form
subgroups of some variable based on the values of some other,
related, variable. For example, Fig. 4 shows the box plots for hours of
homework (HWHRS) and hours spenr working (JOBHRS) for Amherst and
Hoiyokce students (in the case of JOBHRS at Amherst, the median is the
same as the 1st quartite, as indicated by the double-thick line). The nulnber
of cases in each box plot is displayed to the right of the plot. This is
imporrant to irrclude as students wiIl frequently draw concfusio~iswithout
considering the number of cases represented by each box plot. The plots
below suggest that Amherst students spend more time on homeworlc than
working apart-time job, and that the opposite is true of Holyoke students.
T%istmnd is consistent ryjtl, common (v-6efdsrereo qpes oLche
cawrrs.
l'c is tempci11g to ailclude from fig. 4 &at those wit& jobs spend
less
time on homework than those without jobs. However, whetl HWHRS are
IASE/ISI Satellite, 1993: Clifford Konold
"grouped" by the categorical variable JOB, students discover the reverse
appears to be true - students with a job studied an average of three hoursper-week mare t h m those without a job.
HWHRS, AMHERST
n=3 1
*@%$d+gq
<+
a. -6.
.".s\
j!
HWHRS, HOLYOKE
+=I-
n=S1
+
n=5 1
JOBHRS, WOLYOKE
e
Hours per week
Figure 4. Box plots oJbonzemrk. hocm nn~fjobhoztrsfir Amhcrst md FIo~okcsnldents
job = no
n=26
*
;.
-/f#&4&.7p@&3$$*iq
**,A
job = yes
.U>.'
c %q*"a<f,.i
..
n=56
Homework I-~oursper week
Figure 5. Box plots of honzetuor.4 hottrrfir strrdmt wrri~am1 zuzrl~wt
jobs
At this point, students reconsider their expectarions and develop theories
that might explain these data. One possibility that those who worlc also
study Inore because they have learned to eFfective1y manage their rime. Or
maybe the students with jobs are alder and have more homework assigned.
Sorrle of these explanations can be ir~ve-sti~aced
by Looking at other variabfes
IASE/ISI Satellite, 1993: Clifford Konold
in the data set. However, among the explanations to consider is the possibility that rhe difference was due to the "Xuck of tile draw" - chat with a better or larger sample, no difference \vould be found. In my cxpcriencc, stladents do not spontaneously raise this possibility, even when they have been
using resanipling techniques, as described above to estimate probabilities.
'I'hey will judge differelices between two medians as important in one case,
and uniniporralit in another, apparently rnalcing the judg~nentbased on the
distance between the two medians, as ir appears on the computer screen.
Thcy do not sponraneousty evaluate tliu difference with respect to rhe
variability (i.e., to the IQI3s as shown in thc box plots). TIrerefoi.e, before
of a difference
showing them how we might determine the
occurriilg by chance, I typically must remirld them that rhis is a possibility.
3.1 Rdn~i'omixatz'onm t s in DataScope
Below is a demonsrration of how Datascope is used ro estimate the onetailed probabiliry of observing a differellce at least as large as the one
observed in the sample above. The method is based on the rantfon~k~tion
procedure as origir~allydeveloped by Fisher (see Barbelfa ct n l , 1390). This
involves randomly reordering one of the variables (without replacement) to
estimate the probabiliry of the observed difference under the 1lul1 hypothesis. With Datascope, the studenr can first do this "manually", to
develop a sense of what the computer is doing. A "reorder" con~rnand
randomly reorders rhe values of one of the selccted variables (in chis case,
e
of HWHKS are ra~ldornly
the values for I-IWHRS). That is, d ~ values
reassigned to cases. Once reassigned, rbe variable name appears in the data
table with the synnbol O 011 both sides to remind the student that the
column has been randomized (another command will restore the original
order). A box plot of this randomly ordered variable, again grouped by the
job variable, can be viewed. Given that the values of HWHRS have been
randomly assigned, any difference between the medians of the two groups
(with or without jobs? is due ro chance. In the example shown in Fig. 6, this
difference is -1 (sub-tracting the tnedian of the tipper plot from the median
of the lower
Once students undersrand the procedure, the computer can be instructed to repeatedly reorder the variable and compure the difference
between medians of the job and no-job groups, recording these in a new
data table as iilusrrated in Fig. 7. lin this case the computer has been
instructed to draw 100 random samples. The first few differences that were
obtained are shown in Fig. 7 in the "Resampling" data table. The first value
is the observed difference, -3. In the background you can see the variables
IASE/ISI Satellite, 1993: Clifford Konold
selected (Vl and GI) in the primaly data rable. The values i t 1 HWHRS are
bcing randomly reordered.
job = no
-+-f
n=26
-2":-*t',5Pti.
")
W$$!FI
X
job = yes
n=56
*
---*ggJ
'
$ x d"" SF&+'aa*?
b
I
1
I
1
I
1
0
5
10
15
20
25
OWHESO
Figrrre 6. Horrrrzuo~khozir~mndom<~
nfr&t~ra'm job nnB nojob growpr
HS Survey 9D
Figure 7. 7irble oj+di~etvncrs
Itenuecn medidn IWNI?$fir job (nzd ~zo+b gror<ppslrn stlccefrive
resamplig mns.
After [he 100 r a ~ ~ d odifferences
m
have been obtained, the results can be
displayed in a histogram as shown in Fig. 8. In this instance, 25 of the
samples had a difference at Ieasr as large as -3. Thus, an estimate of idle onetaiied p value is .25. Additional repetitions could be cond~tcredfor a more
precise estimate. This same procedure is used in DaraScope to test the
IASE/ISI Satellite, 1993: Clifford Konold
statistical significance of a value of I; and of frequency counts i n a 2 X 2
table.
Difference between medians
Figtrre 8. Satfipling I~i~rogram
dzspliyig rrm/t.r of IOO resdmpIii?t+s
4. Educational outcomes
I have found, as have Simon and Bruce (1991), that students are
enthusiasric a b o ~ ~roba
t ability and statistical infererlce when approached
througl~resampling. Bur, do studenrs using this approach learn more than
they d o in a traditional course! Simon ft.ul. (1976) compared student
performance in cotlrses taught t~singresan~~ling
vs. conventional methods.
Give11 problems that could be solved using methods taught in either course,
students in courses using the resampling approach consistentiy outscored
students using the traditional approach.
Many or most students who rake an introductory course will never need
to collduct a statistical test or dererminc a probability precisely. They do,
however, as members of a complex and increasiligly technological society,
require a basic ur~derstaridin~
of uncertainty and the savvy to cvaluare
"research" claims in the mass media. Accordingly, tllough sttldents should
be able to solve problenls using methods they Iiave been taught, it is even
inore important that they understand basic concepts .rvhich underlie these
methods. Konofd and Garfield (1993) have developed items ro assess
understanding of these basic ideas. Below is orle of our problems, adapted
from an itern in Falk (1933, p. 11 1).
The Springfield Meteorologic~lCenrer wanted to derer~ninethe accuracy of
their weather forecasts. They searched their rccords for those days when the
forecaster had teporred a 70% chance of rain. They compared these faiecasrs ro
record? ofwhether or not it actually rained on rhose particular days.
IASE/ISI Satellite, 1993: Clifford Konold
cast of 70% chance of rain can be considered t'ety accurate if i t rained
blem as .I0 this means
ozttToPize of a single trial,
I
have referred ro this as the "outcome approach"
(Konolcf, 1989).
Figure 9. Frequency of reepomes 6&re iiztiuctiopt of193 s d e n t f ro the zo~atLvpro6lem
IASE/ISI Satellite, 1993: Clifford Konold
210
UNDI?RSliV.iDlKG PROBADILITYAND STATISTICS T1iROUCI-I RES&fPLINC
One of my instructional objectives, therefore, is for students to realize
that a probability value typically tells us little or notlling about resulrs in the
short run, but a great deal about results in the long run. Fig. 10 compares
the results on this same problem before (black) and after (gray) instructioit.
Correct responses increased only 6% with instruction. The results are
similar across the majority of our assessment items. I.it a deeper level, many
students after instruction using resampfillg appear tlnaware of the
fundamental nature of probability and data analysis.
Selected range
Figitre 10. Frequency ofrefponsesbefjrc (black) arzd njer CYrny) insirz~ctiono f f 99 students to tlx
wrn~berpro6lcnz.
formalisms. Perhaps lookirig at development over a singIe course is too srrtall
a unit of analysis in the case of
and we sltoutd be rhinliing
about series of courses over which we can expect to effect and observe
conceptual change.
IASE/ISI Satellite, 1993: Clifford Konold
Bibliography
Barbella P., Denby L. and Landwehr J.M. (1990), Beyond exploratory data
analysis: The randoznization test, Mathemtics Teacher, 83, pp. 144-149.
Cobb G.W. (1992), Teaching statistics: More data, less lecturing, in L.
Steen (ed.), Needing the CallforChunge, (MMA Notes # 22, pp. 3-43],
Matl~ematicalAssociation of America.
Faik R. (1993), Understandingprobabitity and st~~tistics:
Problems involving
J;k??&erztal concepts, Wellesley, MA.: AK Pc ters (to appear).
Falk R, and Konold C. (1392), The psychology of learning probability, in
F.S. Gordon and S.P. Gordon (&cis.),Stztisticsfir the tweudq-first ccntwy
(MMA Notes .# 26, pp. 1 5 1-J 64), Mathematicid Associz~ionof America.
l~h
real proble~ns,
Konold C. (1993), Teaching probability t l ~ r o ~ rnoddi~lg
Mrzthernrtticr Teacher (to appear).
Konold C. (1969), Informal conceptions of probability, Copition and
Instniction, 6, pp. 59-98.
KonoId C. and Garfield 1. (1 993), StatistiCdl Reasoning Assess7nent. Part I:
I~ztuilive7%izhjng,Scientific Reasoning Research Instittite, University of
Massachusetts, Amherst.
Scheaffer 11. L,. (1930), Toward a more quantitatively literate citizenry, The
A~lilericanStfitistichn, 44(1), pp. 2-3.
Simon J.L. and Bruce P. (1991), Resampling: h tool for everyday statistical
work, Chance: Nezu Directio~issforStatistics and Computirzg, 4(1), pp. 2232.
Simon J.T., Atkinson D.T. and Shcvokas C. (1976), Prol.ahitity and
statistics: Experirnenral results of a radically different teacil~ngmethod,
Anzcrica?~Mat/~einmticalMonthly,83, pp. 733-739.
Watkins A., Burril G., Landwehr J. and Schaeffer R. (1992), Rernedial
Sratistics?: Thc implications for colleges of the changing secondary
scl~oolcurriculum, in F.S. Gordon and S.P. Gordon (eds.), Strltisticsjir
the twenty-first certtury ( M M A Notes # 26, pp 45-55), Mathematical
Association of America.
Note
The curricuium materials and research described in this articie were
supported by a grant from the National Science Foundation (Grant #MDR8954626). The opinions expressed here are the author's and not necessarily
those of the Foundation.