Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Transcript

Some Statistical Heresies Author(s): J. K. Lindsey Source: Journal of the Royal Statistical Society. Series D (The Statistician), Vol. 48, No. 1 (1999), pp. 1-40 Published by: Blackwell Publishing for the Royal Statistical Society Stable URL: http://www.jstor.org/stable/2680893 . Accessed: 27/02/2011 12:19 Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at . http://www.jstor.org/action/showPublisher?publisherCode=black. . Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org. Blackwell Publishing and Royal Statistical Society are collaborating with JSTOR to digitize, preserve and extend access to Journal of the Royal Statistical Society. Series D (The Statistician). http://www.jstor.org The Statistician(1999) 48, Part1, pp. 1-40 Some statisticalheresies J. K. Lindsey LimburgsUniversitair Centrum,Diepenbeek, Belgium [Read before The Royal Statistical Society on Wednesday, July 15th, 1998, the President, ProfessorR. N. Curnow,in the Chair] Summary. Shortcomingsof modernviews of statisticalinferencehave had negativeeffectson the image of statistics,whetherthroughstudents,clientsor the press. Here, I question the underlying foundationsof moderninference,includingthe existence of 'true'models, the need forprobability, whetherfrequentist or Bayesian, to make inferencestatements,the assumed continuity ofobserved data, the ideal oflargesamples and the need forproceduresto be insensitiveto assumptions.Inthe contextofexploratory inferences,I considerhow muchcan be done by usingminimalassumptions relatedto interpreting a likelihoodfunction.Questions addressed includethe appropriateprobabilisticbasis of models, ways of calibratinglikelihoodsinvolving differing numbersof parameters,the roles of model selectionand model checking,the precisionof parameterestimates,the use of prior and the relationshipof these to sample size. I compare thisdirectlikelihood empiricalinformation approach withclassical Bayesian and frequentistmethods in analysingthe evolutionof cases of acquired immunedeficiencysyndromeinthe presence of reporting delays. Keywords:Acquiredimmunedeficiencysyndrome;Akaike'sinformation criterion; Asymptotics; Compatibility; Consistency;Discretedata; Hypothesistest;Likelihood;Likelihoodprinciple;Model Poisson distribution; selection;Nonparametric models; Normaldistribution; Robustness; Sample size; Standarderror 1. Introduction Statisticians are greatlyconcernedaboutthe low publicesteemforstatistics. The disciplineis oftenviewedas difficult andunnecessary, orat bestas a necessaryevil.Statisticians tendto blame thepublicforitsignoranceof statistical butI believethatalmostall of thefaultlies procedures, withstatisticians andthewaythattheyuse,and teach,statistics. The twogroupswith themselves whom statisticians the have the most directcontactare studentsand clients.Unfortunately, and contradictions. Much of thisarisesfromthe messagethatthesereceiveis fullof conflicts inordinate emphasisthatstatisticians place on a certainnarrowviewof theproblemof inference withtheteachingofthefirstintroductory statistics course.Knowinga vastarray itself,beginning than of tests,or,morerecently, estimation is consideredto be muchmoreimportant procedures, whichwillbe suitableforeach particular beingawareof whatmodelsare available,and studying dataanalysis. questionat handandfortheaccompanying In whatfollows,I considersomeoftheshortcomings ofmodemviewsof statistical inference, This is necessarybecausetheseviewsare increasingly Bayesianand frequentist. beingignoredin statistics appliedstatistical practice,as thegap betweenit andtheoretical widens,and are having is difficult to find negativeeffectson studentsand clients.An adequatemeansof presentation because of the vast and internally natureof both approaches:to any critical contradictory J.K. Lindsey,Department of Biostatistics, Address 3590 Diepenforcorrespondence: LimburgsUniversitair Centrum, beek,Belgium. E-mail:jlindsey@luc.ac.be ? 1999 RoyalStatisticalSociety 0039-0526/99/48001 2 J.K Lindsey but statement, some Bayesianor frequentist is certainto say 'I am a Bayesianor frequentist of wouldneverdo that'.For example,withthegrowingrealizationof thepracticaldifficulties haveturnedto evaluating implementing a purelysubjectiveBayesianparadigm, somestatisticians criteria;I see themas techniquesderivedthroughthe Bayes formulaby long run frequentist probabilities forobseressentiallyfrequentists. Or considerhowmanyBayesiansuse frequency evaluatesinference methodsaccordingto theirlong vables.Forme,thequintessential frequentist in repeatedsampling, runproperties whereastheequivalentBayesiandependson methodsbeing thedata. coherent afterapplying theBayesformula topersonalpriorselicitedbeforeobserving has evolvedin manyand unforeseeable ways in the past few decades. Statisticalinference supremacy, in spiteof Previously, 'classical' frequentist proceduresseemedto hold uncontested themajorcontributions of R. A. Fisher(and Neyman'sdenialthatinference was possible).Much of statistical oftheimpetusforthischangehas comefromthecomputerization analysisand from of modernstatistical inference is thecentralrole theBayesiancritique.One ofthemajorfeatures whichwas introduced of the likelihoodfunction, of long beforeby Fisher.Withthe rewriting statistical fewmodernstatisticians realizethatFisherwas nevera frequentist forinference history, of probability), beingmuchcloserto the (althoughhe generallyused a frequency interpretation thanto theNeyman-Pearson Bayesians,suchas Jeffreys, school. In thefollowing text,I look at someof thesedominant ideas,old and new,especiallyas they in arecurrently letme clarify theareato be discussed.I practised appliedstatistics. However, first, with in the Fisherian of am concerned statistical sense obtaining themaximuminforinference, mationfromthegivendataaboutthequestionat hand,without incorporating priorknowledgeand availableas information, otherthanthatrequiredin theconstruction ofthemodel(or empirically theargument to follow),and likelihoodfunctions frompreviousstudies,butthatis anticipating be used (althoughthis withoutconcernfortheway in whichtheconclusionswill subsequently in howthequestionis posed). I thusexcludedecisionproceduresthatrequire maybe reflected modelofhumanbehaviour. somesupplementary I shalluse thetermknowledgeto referto availablescientific theories,expressibleas models, thosetheories. and information to referto empirically observabledata,alwaysfiltered through from(prior)personalbeliefs. Thus,bothofthesemustbe distinguished i.e. to exploratory rather than I restrict attention to modelselectionandestimation ofprecision, In current at leastin myexperience, morethan95% of inference. scientific confirmatory research, workof mostappliedstatisticians is exploratory, as opposedto testingone precise theinferential modelconceivedbeforethepresentcollectionof data.Scientists usuallycometo thestatistician to be filledoutby empiricaldata collection,notwitha specific witha vaguemodelformulation to test.Powerful and easilyaccessiblestatistical packageshavemade this hypothesis computers When to be necessary. appearto be a routinetaskwherethestatistician maynotevenbe thought to test,theyoftenno longerneed reachthestageof havinga clear simplehypothesis scientists the factsmayalmostseem to speak forthemselves.(Phase III trialsin statistics: sophisticated are required, at leastformally.) researchare a majorexceptionwherestatisticians pharmaceutical facts. Theoriesofstatistical inference havenotadjustedto theseelementary inference As I see it,two basic principlesshouldunderliethe choice amongstatistical procedures. to draw selectedaxioms can be used to provea theorem;in contrast, (a) In mathematics, in the availableempiricaldata abouta set of statisticalinferences, all the information models,i.e. aboutthe questionat hand,mustbe used if the conclusionsare not to be arbitrary. ofreality, withepistemological, butno ontological, crudesimplifications (b) Modelsarerather Statistical Heresies 3 status.If observablesare important, inferences mustbe invariant to theparameterizations inwhichsuchmodelsarespecified, theirapplication inprediction beinga typicalexample. Manystatisticians are willingto hedgethe firstprincipleby usingonlythe relevantempirical information, as maybe thecase whenan 'appropriate' teststatistic is chosen.This could leave inference opento thearbitrary: withouta cleardefinition, therelevantcan be chosento provea point.(Models can also be so chosen,buttheyare thereforcriticsto scrutinize.) It is generally at least agreedthatthisprinciplemeansthatinference mustbe based on thelikelihoodfunction, whencomparingmodels and estimating the precisionof parameters. (The problemof model forusingall theinformation arenotgenerally available.)Thus, checkingis morecomplex;criteria in the data; the likelihoodfunction modelsneed notbe constructed to use all the information determines whatis relevant to theproblem.In turn,thisfunction Both obeysthesecondprinciple. ones: for principlestogetherexclude manyfrequentist procedures,especiallythe asymptotic intervals based on standarderrorsdo notuse efficiently example,forsmallsamples,confidence in the data and are notparameterization invariant. The secondexcludesmany the information practicalBayesianprocedures: forcontinuous parameters, forexample,regionsof highestposteriordensityare notpreserved undernon-linear transformations whereasthosebased on tail areas are. Havingsaid this,we mustthenask whatis left.The commoncriticisms ofboththefrequentist and the Bayesianschoolsare sufficiently well knownand do not need to be restated.I rather considerjust how much can be done using minimalassumptionsrelatedto interpreting a likelihoodfunction. Manyof theideas presentedherearosewhileI was doingtheresearchfor Lindsey(1996a), althoughmostare notnecessarily explicitly statedin thattext.The pathbreaking,butlargelyignored, paperofBarnardetal. (1962) is fundamental reading. 1.1. Example I shalluse a simpleexampleto illustrate someofthepointsdiscussedlater,althoughobviouslyit will notbe appropriate forall of them.In suchlimitedspace, I can onlyprovidea parodyof a properanalysisofthescientific questionunderconsideration. The estimationof the incidenceof acquiredimmunedeficiencysyndrome(AIDS) is an modelthatwill describe taskin modernsocieties.We wouldliketo obtaina stochastic important untilthepresentandthat howtheepidemichas evolvedgloballywithina givencountry succinctly in relationto healthservicerequirements. Data are obtainedfrom maybe usefulforprediction not doctorswhomustreportall cases to a centralofficein theircountry. Thus,we studydetection, incidence.However,reporting maytaketimeso we have a secondstochasticprocessdescribing thearrivalofthedoctors'reports. One difficult problemarisesbecauseofthesedelays.Some reports maytakeseveralmonthsto over recentperiodsare incomplete.Thus, the arriveso, at any giventime,the observations table withthe periodsof can be represented as a two-waycontingency availableinformation in thebottomright-hand corneris interest as rowsandthereporting delaysas columns.A triangle missing.Forexamplesof suchtables,reported by quarteryear,see De Angelisand Gilks(1994) forEnglandandWalesandHayandWolak(1994) fortheUSA. In sucha table,therowtotalsgive butthemostrecentvaluesaretoo smallbecauseof thenumberofdetectedAIDS cases perquarter themissingtriangle.For thesetwo data sets,theobservedmarginaltotalsare plottedin Fig. 1 curvesfromsomemodelsthatI shalldiscussbelow.We see howthereported alongwiththefitted inthemostrecentquarters becauseofthemissingdata. numbers dropdramatically 4 J. K Lindsey 0 to 1984 1986 1988 Year (a) l1990 1992 1 982 1984 1986 1988 l1990 Year (b) Fig. 1. Observed totals (e) and several models to estimateAIDS cases in (a) England and Wales and (b) the USA: ,stationary nonparametric;------- non-stationary paranonparametric;....... non-stationary metric 2. True models Most statisticiansadmitthattheirmodels are in no sense 'true', although some argue thatthereis a true underlyingmodel; many like to cite a well-knownphrase of Box (1976, 1979) that all models are wrong but some are useful (which models depend on the question at hand, mightI add). Nevertheless,in practice,statisticians,when makinginferences,do not act as if thiswere so: almost all modern statisticalprocedures,Bayesian and frequentist,are implicitlybased on the assumption that some one of the models under consideration is correct. This is a necessary conditionfor decision-makingbut antitheticalto scientificinference,a firstfundamentaldistinctionbetweenthetwo. The assumption that a model is true is sometimes clearly and openly counter-intuitive; for testswhethera null hypothesisis correct,generallyhoping thatit is not, so example, a frequentist thatthe alternativehypothesiscan be retained.In contrast,the Fisherianinterpretation of a test is throughP-values providing weight of evidence against a hypothesis; they do not require an alternativehypothesis.(Alternativehypotheses,and the accompanyingpower calculations, were introducedby frequentiststo choose among Fisheriantests.) Asymptotictheoryis more subtle; it findsits rationale in an estimatedmodel convergingto the truemodel as the numberof observations increases: hence, forexample, the importanceof asymptoticconsistency,discussed below, in frequentist procedures. In a similar way, the Bayesian prior gives the probabilitiesthat each of the various models under considerationis the truemodel. The likelihood principle,also discussed below, carries the assumptionthatthe model functionis true and thatonly the correctparametervalues need to be determined. Many will argue thatthese are 'as if' situations:conditionalstatements,not to be taken literally. But all statisticalthoughtis permeatedby this attitudewhich, if we take modelling seriously,has far-reaching negativeimplications,fromteachingto applications. StatisticalHeresies 5 Thus,muchof statistical inference revolvesaroundtheidea thatthereis some 'correct'model is to findit. But,theverypossibility and thatthejob of the statistician of makinga statistical All differences inference, and indeedall scientific work,dependson marginalization. among observational to be pertinent unitsthatare notthought forthequestionat handare ignored.In to be possible,theobserveduniqueindividuals otherwords,forgeneralization mustbe assumedto be 'representative' withinthosesubgroups thatarebeingdistinguished. and exchangeable Thisis alwaysan arbitrary, butnecessary, assumption, whichis nevertrue.The only'model'thatone (not I) mightcall truewould be thatdescribingthe historyof everysubatomicparticlesince the of theuniverse, and thathistory is unobservable beginning becauseobservation wouldmodifyit. All othermodelsinvolvemarginalization thatassumesthatseemingly irrelevant differences can be ignored.(See LindseyandLambert(1998) forfurther discussion.) We shouldrather thinkin termsof modelsbeingappropriate whichare useful simplifications, fordetectingand understanding generalizablepatternsin data,and forprediction; the typeof to be detectedandunderstood, thepredictions patterns to be madeandthemodelsto be used will dependon thequestionsto be answered.The complexity of themodelwill determine howmuch dataneedto be collected.By definition, no model,in scienceor in statistics, is evercorrect;nor does an underlying of reality, truemodelexist.A modelis a simplification constructed to aid in itis an epistemological tool. understanding it;as statedinmysecondprinciple, 2. 1. Example In studying in theoveralltrend.An appropriate theevolutionof AIDS, we are interested model can assumea smoothcurveovertime,ignoring themore'correct'modelthatallowsforvariations in detection An even 'truer' overthedaysof theweek,withfewercases observedon week-ends. modelwould have to accountforthe doctors'officehoursduringeach day.(Of course,such information is notavailablefromthedataas presented above.) In a 'true'model,varioussubgroups, suchas homosexualsor drugusers,shouldalso be taken intoaccount.Butwe muststopsomewhere, beforeintroducing sufficient characteristics to identify individualsuniquely,or we shallbe unableto makeanygeneralizations, i.e. anyinferences. For thequestionat hand,theoverallevolutionof AIDS in a country, suchsubgroups be may ignored although modellingthemandthenmarginalizing mayproducemorepreciseresults. 3. Probability statements are convincedthatinferencestatements mustbe made in termsof Virtuallyall statisticians It providesa metricbywhichsuchstatements, arethought madein different probability. contexts, to be comparable.However,fora probability and anymodelbased on it,to exist,the distribution, notjust thoseeventsobservedor directly under space of all possibleeventsmustbe defined, In thissense,probability consideration. is fundamentally we mustknowexactlywhatis deductive: This is finefordecisionprocedures, but not forinference, whichis possiblebeforestarting. insiston usingprobability forinference? inductive. Whythendo statisticians arenotagreedabouteitherthedefinition of inference ortowhatsuch Statisticians probabilities statements shouldrefer. Twowidelyacceptedjustifications canbe foundintheliterature: inthelongrunand oferrorsin inference statements can be controlled (a) theprobability ofunknowns new can be elicited,beforeobtaining (b) ifcoherent personalpriorprobabilities and updatedby the Bayes formulathe posteriorwill also have a coherent information, probabilistic interpretation. 6 J.K. Lindsey The Fisherianenumeration ofall possibilities undera hypothesis, yieldinga P-valueindicating whetheror nottheobservations are rare,could conceivablybe used witheithera frequentist or Anotherapproach,Fisher'sfiducialprobability, is notwidely personalprobability interpretation. accepted. The case againstfrequentist is well known.Amongotherthings,they probability statements referto and dependon unobserved outcomesthatare notrelevantto theinformation and questionat hand.(This can be reasonableformodelcriticism, as withtheFisheriantest.)The related problemswithBayesianprobability statements havebeenlittlediscussed,exceptfortheobjection thattheyarenot'objective',no empiricalcheckbeingavailable. The FisherianP-valueis perhapsthemostwidelyused inference in the probability. Judging, theobserveddataarerareundera probability-based senseofa Fisheriansignificance test,whether statistical modelrelieson theexistenceof a space of observableevents,thesamplespace. The weak:forgivendata,it is notuniquebecauserarenesscan be definedin many judgmentis rather no ways.Such testsgive an indicationof absolutelack of fitof one completemodel,providing measureforcomparison amongmodels(thereis no alternative hypothesis). The moreobservations thereare,themorechancethereis of detecting lack of fitof thegivenmodel,a pointto whichI return later.In contrast, thelikelihoodfunction onlyallowsa comparison amongmodels,providingmeansofcheckingonlyrelative, andnotabsolute,adequacy. are not based on the likelihoodfunction(althoughsome Frequentist probability statements involvethelikelihoodratiostatistic). Theyuse goodness-of-fit criteriain an attempt to compare are not comparableoutsidethe contextin whichtheyarise,except models; such statements thatarenotpertinent through longrunarguments to thedataat hand.Thisis one basis ofthecase statements basedon frequentist againstinference probability. The factthata frequentist testhas beenchosento havepoweragainstsomealternative is notan testin themodelselectioncontext(wheretheFisheriantestwouldnot advantageovera Fisherian be used anyway).It is misleadingbecause the testwill also have poweragainstmanyother alternatives. chosengenerallyleads off Developinga model in the directionof the alternative finalmodel.For example,do autocorrelated residualsin a time along a pathto an inappropriate ornon-linearity? seriesmodelindicatethewrongMarkovian order,missingortoomanycovariates Indeed,a hypothesis mayeven be rejected,notbecause it is wrong,butbecause the stochastic ofthemodelinwhichitis embeddedis inappropriate. component Otherarguments inferenceincludethe assumptionthathypothesesare againstfrequentist formulated beforelookingat the data,whichis impossiblein an exploratory context,and the problemof sequentialtesting,bothdiscussedbelow.The probabilities arisingfromfrequentist inference statements do notadequatelymeasureuncertainty. in favourof subjectiveBayesian Let us now turnto the Bayesianapproach.The argument relies cruciallyon the principleof coherence.Unfortunately, thishas onlybeen probabilities shownto applyto finitely additivesetsof models;theextension, evento countably additivesets, it is simplywidelyassumed'formathematical has not been demonstrated; convenience'(Hill contentof most (1988) and Bernardoand Smith(1994), pages 44-45). Thus,theprobabilistic is no morethanan unproven statements Bayesianinference assumption. Besides coherence,theothercriteriaforthevalidityof Bayesianprobabilistic statements are thatthepriorbe personaland thatit be specifiedbeforeobtainingnew information. Bayesians butwhoproposeto use a flatprior,to try who do nothavetheirownpersonalpriordistribution, variouspriorsto checkrobustness, to use some mathematically diffuseprioror to convenient thedata,do notfulfilthesecriteria. In no case do theirposterior choosea 'prior'afterinspecting to have havea personalprobability 'probabilities' interpretation, althoughsomemaybe calibrated long run frequentist properties.Indeed,thesepeople do not have the courageto admitthat StatisticalHeresies 7 inferences canbe drawndirectly fromthelikelihoodfunction. Thisis whattheyare,in fact,doing, at least in thefirsttwo cases. For thosewho are capable of formulating or elicitinga personal prior,itsroleshouldbe, notto be combinedwith,butto be comparedwiththelikelihoodfunction to checktheiragreement, say in thediscussionsectionof a paper;'sceptical'and 'enthusiastic' priorscan also be usefulin thiscontext(see, forexample,Spiegelhalter et al. (1994)). Anyone whohas studiedcloselythedebateabouttheselectionofpriordistributions, whichhas been ably summarized byKass andWasserman (1996), mustsurelyhavesomedoubtsaboutusingBayesian forinference. techniques For a priorprobability distribution to exist,thespace,hereof all possiblemodelsto be condefined.Anymodelsthatare notincluded,or thathave zero prior sidered,mustbe completely probability, willalwayshavezeroposterior probability undertheapplication oftheBayesformula. No empiricalinformation can giveunforeseen modelspositiveposterior probability. This use of probability directly contradicts theveryfoundation of scientific inference, fromthespecificto the general,wherethegeneralcan neverbe completely knownor foreseen.UnderformalBayesian scientific is impossible. procedures, discovery In contrast, in decision-making, such a completeenumeration of all possibilitiesis a prerequisite.This is a second fundamental distinction betweenscientific inferenceand decisionmaking. At leasttheoretically, Bayesianscan easilyplace personalpriorson parameters withinmodels. However,as mentioned earlier,manyBayesianconclusions,e.g. aboutthecredibility regionin whicha set of parameters mostprobablylies, are not parameterization invariant (unlessthey whentheychangetheparameterization; see Wasserman changetheirpriorappropriately (1989)). As well,putting of thespace of priorson structural subspaces(i.e. on different modelfunctions) all possiblemodelsis extremely difficult. For example,how can modelsbased on logisticand be directly probitregression compared?Varioussolutionshavebeenproposed,thetwomostwell thatplace pointprobability knownbeingBayesfactors masseson thevarioussubspaces(see Kass in a widerclass,although thelatterwouldhaveto andRaftery (1995) fora review)andembedding be available,withtheappropriate priors,beforeobservingthe data,fortheposteriorto have a No solutionis widelyaccepted. probabilistic interpretation. in We mustconcludethat'probabilities' statements arisingfromalmostall Bayesianinference realisticappliedcontextsdo not have a rigorousprobabilistic interpretation; theycannotadequatelymeasureuncertainty. basedon thesamplespace,are critically on thechoiceofa Frequentist conclusions, dependent model'sstochasticstructure. The procedures are alwaysimplicitly testingthischoice,as well as theexplicithypothesis aboutparameters: a modelmaybe rejectedbecausesomeirrelevant partof thatare it is misspecified. mustadopta modelthatonlymakesassumptions Thus,a frequentist of nonparametric reallynecessaryto thequestionsto be answered(explainingtheattractiveness as discussedfurther methods, later).In contrast, Bayesianconclusionsaboutparameters depend model thattheoverallstochastic onlyon theobserveddataandtheprior,butalwaysconditionally thatforwhichpriorswere is true.Here,theglobal set of modelfunctions underconsideration, constructed beforeobservingthedata,cannotbe placed in question,whereasin the frequentist i.e. about the problemof interest, approachconclusionscannotbe drawnabout parameters, thevalidityofthemodelfunction itself. without simultaneously questioning If we concludethatthe 'probability' statements used in drawingstatistical conclucurrently sions,whether Bayesianor frequentist, generallyhaveno reasonableprobabilistic interpretation, forscientific at least in the foundation theycannotprovideus withan appropriate inference, context.We shallencounter further reasonsas we proceed.We are thenleftwiththe exploratory as we shallsee. Itprovidesthemeansof likelihoodfunction whichalwayshas a concretemeaning, 8 J.K. Lindsey incorporating priorknowledgeand information, obtainingpoint estimatesand precisionof modelcriticism parameters, comparing models,includingerrorpropagation, and prediction, all without probability statements, exceptabouttheobserveddata(Lindsey,1996a). The globalprobability modelunderconsideration represents thatpartof presenttheoretical whichis notin questionin thestudywhilepriorempiricalinformation knowledge can be incorfrompreviousstudies.Model selectionand precision poratedby usingthelikelihoodfunctions estimation will involvelookingat ratiosof likelihoods(Fisher(1959), pages 68-75, and Sprott andKalbfleisch (1965)). Becausethelikelihoodfunction ofmodels,fora onlyallowscomparisons givendata set, it is usuallynormedby dividingby its maximumvalue, so thatthisvalue is notas a pointestimate, important, butas a reference is used point.Whenthelikelihoodfunction in thiswayforinference, thereis no conceptualdifference betweencomparing theprobability of thedataforvariousparameter valueswithina givenmodelfunction andcomparing variousmodel functions (Lindsey,1974a). This providesa powerfulsense of coherenceto thisapproach.In contrast, exceptfortheAkaikeinformation criterion (AIC), no widelyacceptedmethodsfora directcomparisonof non-nested modelsare available,withineithertheBayesianor frequentist frameworks. 3. 1. Example The reporting of AIDS is a one-timeaffair, nota samplefromsomewell-defined population, so anyfrequentist samplespace can onlybe some arbitrary The modelsused conceptualconstruct. andreporting byDe AngelisandGilks(1994) assumethatAIDS detection delaysareindependent and thatthese two processescan jump arbitrarily betweentimepoints.Their firstmodel is to independence inthecontingency table: equivalent log(ij) = ai + f/j (1) where ij is themeannumber ofreported cases withi andj respectively indexingdiscretequarters and reporting of reporting overtime. delays.It assumesthatthedistribution delaysis stationary Thepredicted marginalcurveis plottedas thefullcurvein Fig. 1(a). Thismodelhas a devianceof The Fisheriantestof significance forindependence has a P716.5 with413 degreesof freedom. this valueofzeroto theprecisionofmycomputer, evidence model. This providing strong against to thatfroma frequentist has no clear run probability, althoughequivalent test, long interpretation inclassicalfrequentist inference. De Angelisand Gilks (1994) used reasonablynon-informative normaland inverseGaussian in a second independence model. They did not describehow these priorsforthe parameters (personal?)priorswereelicited.Theyonlyconsideredthesetworathersimilarmodels,givingno of anyotherpossibility, so theirpriorprobability priorsforthefirst, includingnon-stationarity, thetestjust performed, musthave been 0. This contradicts but,giventhispriorbelief,such a problemcannotbe detectedfromwithintheBayesianparadigm. 4. Likelihood principle In supportofbasinginferences on thelikelihoodfunction, one customarily invokesthelikelihood of thisis 'All information about 0 obtainablefroman experiment is principle.One statement inthelikelihoodfunction contained for0 givenx. Twolikelihoodfunctions for0 (fromthesameor different containthesameinformation about0 iftheyareproportional tooneanother.' experiments) about0' (BergerandWolpert (1988),p. 19; see also Birnbaum (1962)). Thephrase'all information mustbe takenin theverynarrowsenseof parameter The estimation, giventhemodelfunction. Statistical Heresies 9 principalconsequenceis thatsuchinferences shouldonlydependon theobserveddataandhence noton thesamplespace.The mostcommonexampleis theidentity ofthebinomialandnegative themodelshaveverydifferent binomiallikelihood functions although samplespaces. If thisprincipleis takenliterally, an interpretation of themodelunderconsideration becomes impossible.Few if anymodelshave meaningoutsidetheirsamplespace. An important use of modelsis to predictpossibleobservations, whichis impossiblewithoutthesamplespace. Also, an essentialpartof inference, the interpretation of modelparameters, dependson the sample ofthebinomialdistribution is notthesameas thatofthe space.Forexample,themeanparameter negativebinomialdistribution. is However,themajorproblemwiththisprincipleis thatit assumesthatthemodelfunction knownto be true.The onlyinference problemis thento decideon an appropriate set of values fortheparameter, i.e. a pointestimateand precision.It is simplynotcorrectthat'by likelihood function we generallymean a quantitythatis presumedto containall information, giventhe data, aboutthe problemat hand' (Bj0rnstad,1996). The likelihoodprincipleexcludesmodel criticism. In fact,applyingthelikelihoodprinciplealmostinvariably in involvesdiscarding information, abouttheparameter of interest. thewidersensethansimplyestimation, Forexample,thebinomial canbe performed, andtheprobability eveniftheonlyinformation experiment estimated, provided by a studyis the totalnumberof successes,giventhe totalnumberof trials(trialsmay be In contrast, the negativebinomialexperiment simultaneous). can onlybe performed, and the whenwe knowtheorderofthesuccessesandfailures, probability can onlybe estimated, whichis necessaryto knowwhento stop.This information, aboutwhetheror not the parameter being is constant overtime,is thrown forthenegative estimated awaywhentheusuallikelihoodfunction binomialmodel is constructed. is (For the binomialmodel, when the orderinginformation likelihoodfunctionfora negative available,it obviouslyalso mustbe used.) An appropriate binomialexperiment mustuse thisinformation andwillnotbe identicalwiththatfortheminimal binomialexperiment. Bayesianswhowishto criticizemodelsmusteitherresortto frequentist methods(Box, 1980; in somewidermodel(Box andTiao, 1973). Gelmanetal., 1996) orembedtheirmodelof interest to be trulyBayesian,itmust,however, be specified before Forthelatter, calledmodelelaboration, is excluded. lookingatthedata.Again,theunexpected The appropriate likelihood Thus,thelikelihoodprincipleis hollow,ofno realuse in inference. functionforan experiment musttake into accountthe sample space. Basically different exare performed suchas thebinomialand negativebinomialexperiments, fordifferent periments, so the corresponding reasonsand can alwaysyield different information likelihoodsmustbe detailanddiscussion, see Lindsey(1997a).) different. (Forfurther 4. 1. Example The likelihoodprinciple,appliedto the stationary model above,allows us to make inferential in thatmodel,assumingthatit is correct.But itprovidesno way statements abouttheparameters of criticizing thatmodel,in contrast withtheFisheriantestused above,norof searchingfora better alternative. 5. Likelihood function Let us now look more closely at the definition of likelihood.The likelihoodfunctionwas defined(Fisher,1922),and is stillusuallyinterpreted, to givethe(relative)probability originally 10 J.K. Lindsey of theobserveddataas a function of thevariousmodelsunderconsideration. Models thatmake thedatamoreprobablearesaidtobe morelikely. However,in almostall mathematical statistics texts,thelikelihoodfunction is directly defined as beingproportional totheassumedprobability oftheobservations density y, L(O; y) ocf(y; 0), (2) wheretheproportionality constantdoes notinvolvetheunknownparameters 0. For continuous likelihoodas therelativeprobability of the variables,thisis simplywrongifwe wishto interpret of anypointvalue of a continuous observeddata.The probability variableis 0. Thus,equation (2) does notconform to theoriginaldefinition. This changeof definition has had ramifications thetheoryof statistical Forexample,at themostbenignlevel,it has led to throughout inference. manyparadoxes. One signof themathematization of statistics, and of its distancefromappliedstatistics, is thatinference procedures, fromtheempiricalto thegeneral,do notallow fortheelementary fact thatall observations are measuredto finiteprecision.I am not speakingaboutrandommeasurementerrors,but about the fixedfinite,in principleknown,precisionor resolutionof any If thisis denotedby A, thenthecorrectlikelihoodforindependent instrument. observations of a continuous variable,accordingto Fisher's(1922) originaldefinition (he was verycarefulabout it),is J yi+Ai/2 L(O; y) -H f(ui; O)dui. (3) yi-Ai/2 In virtually all cases wherethelikelihoodfunction is accusedof beingunsuitableforinference, includingby Bayesianswho claimthata priorwill correcttheproblem,thereasonis themore recentalternative definition. (The fewothersarisefroman incorrect specification ofthemodelor datato studya model;see Lindsey(1996a),pages 118-120 and 139.) inappropriate A classicexampleofthesupposedproblemswiththelikelihoodfunction is thatit(actuallythe distribution density)can takeinfinite values,as forthethree-parameter log-normal (Hill, 1963). Definedas in equation(3), thelikelihoodcan neverbecomeinfinite. Indeed,anydatatransformationthatdoesnottakeintoaccountAi can lead totrouble. Let us be clear;I amnotarguingagainstmodelsusingcontinuous variables.Theseareessential inmanyapplications. withhowto drawcorrectinferences Theproblemis rather aboutthem,given theavailableobservabledata.Nor am I suggesting thatequation(2) shouldneverbe used; it has beena valuablenumerical of equation(3), although approximation, giventheanalyticcomplexity itis rarelyneededanymore,withmoderncomputing power.Thus,whenproblemsarise,andeven iftheydo notarise,we shouldneverforget thatthisisjustan approximation. Of course,thisconclusionis ratherembarrassing because suchbasic toolsas theexponential fallbythewaysideforcontinuous statistics familyand sufficient variables,exceptas approximations(see thediscussionin Heitjan(1989)). The sufficient statistics containonlyan approximation in the data about a parameterin real observationalconditions.Bayesian to the information procedures can also haveproblems:ifseveralindistinguishable valuesarerecorded(whichis impossiblefora 'continuous' variable),theposterior maynotbe defined. 5. 1. Example The detectionof AIDS cases is a processoccurringover continuoustime.Nevertheless, for it is discretized intoconvenient administrative reporting, intervals, quartersin ourcase. Because Statistical Heresies 11 in ofthehighrateofoccurrence ofevents,thisdoesnotcreateproblemsin fitting models,whether discreteor continuous time,forthequestionof interest. The samesituation also arisesin survival studies,buttheretheeventrateis muchlowerso timeintervals maybe ofmoreconcern. 6. Normal distribution Ifall observations areempirically discrete, we mustre-evaluate theroleofthenormaldistribution. Thatthisdistribution has longbeen centralbothin modellingand in inference is primarily the faultofFisherinhis conflict withPearson.Pearsonwas greatly withdevelopingapproconcerned In priatemodelsto describeobservedreality, buthe used crudemomentmethodsof estimation. Fisherconcentrated, noton finding themostappropriate contrast, models,buton thebiasedness (fromthedesign,notthefrequentist bias discussedbelow)andefficiency of statistical procedures. Fornecessarypracticalreasonsof tractability, thatsomebelieveto continueto hold to thisday, the normaldistribution was centralto developinganalysesof randomizedexperiments and, moregenerally, to efficient inferences (althoughtheexponential and location-scalefamiliesalso emergedfromhiswork,almostinpassing). in appliedstatistics Thus,a keyfactorin thecontinueddominanceof thenormaldistribution has beenthistractability forcomputations. Thiswas overthrown, in themodellingcontext, bythe introduction of survivaland generalizedlinearmodels. Developmentsin computer-intensive in inference. methodsare leadingto thesame sortof revolution Analytically complexlikelihood functions can nowbe explorednumerically, althoughthishas so farbeen dominated by Bayesian applications (see,however, Thompson(1994)). Here,I proposethat,just as thenormaldistribution is nottheprimary basis,althoughstillan it shouldnotbe thefoundation important one, of modelconstruction, of inference procedures. it shouldbe thePoissondistribution. Instead,I suggestthat,ifone distribution is to predominate, A Poissonprocessis themodelofpurerandomness fordiscreteevents.Thiscan be usedto profit as a generalframework withinwhichmodelsareconstructed andforthesubsequent inferences. All observations are discrete,whateverthe supposedunderlying data generating mechanism andaccompanying model.Thus,all observations, areevents.Current whether positiveornegative, modellingpracticesfor continuousvariablesdo not allow forthis. The Poisson distribution that providesan interface betweenthemodelforthedata generating mechanismand inferences takesintoaccounttheresolution ofthemeasuring instrument. A model based on any distribution, even the Poisson distribution, for a data generating can be fitted, mechanism to theprecisionof theobservations, by suchan approachbased on the Poissondistribution forthemeasurement modelwhen process.Thisdistribution yieldsa log-linear themodelunderstudyis a memberoftheexponential thelikelihoodbythe family(approximating forcontinuous stochastic density variables)andwhenitis a multiplicative intensity process(using a similarapproximation forcontinuous time),perhapsindicating whythesefamiliesare so widely used(LindseyandMersch,1992;Lindsey,1995a,b). A Poissonprocessdescribesrandomevents.One basic purposeof a modelis to separatethe of interest, thesignal,fromtherandomnoise.Supposethatthecombination of systematic pattern a distribution, whichever is chosen,and a systematic regression partcan be developedin sucha fortheoccurrence of waythatthemodeladequatelydescribesthemeanof a Poissondistribution theeventsactuallyobserved.Then,we knowthatthemaximuminformation aboutthepattern of interest has beenextracted, leavingonlyrandomnoise. A principle thathas notyetfailedme is thatanymodelorinference is untrustworthy procedure if it dependson theuniqueproperties of linearity i.e. if it cannot and/orthenormaldistribution, be generalized is centralto directly beyondthem.IfthepointofviewthatthePoissondistribution 12 J.K. Lindsey statistical inference is adopted,whatwe thenrequireare goodness-of-fit procedures tojudge how in thedatathatremains farfromtherandomness ofa Poissonprocessis thatpartofthevariability unexplained bya model. 6. 1. Example As is well known,thePoissondistribution formsthebasis forestimating log-linearmodelsfor tables.Thus,I implicitly used thisin fitting contingency my stationary model above and shall againforthosethatfollow.However,as in mostcases, thisprocedurehas anotherinterpretation (Lindsey,1995a). In thiscase, I am fitting risk(hazardor intensity) functions fora pairof nonhomogeneous (i.e. withrateschangingovertime)Poissonprocessesforincidenceand reporting delays.Above,theywerestepfunctions describedbyai and/gj. Below,I shallconsidermodelsthatallow bothAIDS detectionand reporting to varycontinestimated as Poisson uouslyovertimeinsteadoftakingdiscrete jumpsbutthatare,nevertheless, processes.In thecontextof such stochasticprocesses,a suitablemodelwill be a modelthatis Poissonat anydiscretely observedpointin time,giventheprevioushistoryof the conditionally process(Lindsey,1995b). Thatused below is a generalization of a (continuoustime)bivariate Weibulldistribution forthetimesbetweenevents(AIDS detection andreporting). 7. Model selection by testing One fundamental questionremainsopen:howcan ratiosoflikelihoods be comparedformodelsof different complexity, i.e. whendifferent numbers ofparameters areinvolved?How can likelihood intheprobability ofhypothesis ratiosbe calibrated? Thenumberofdegreesoffreedom statements this calibrationof the likelihood testshas traditionally fulfilled this role. Beforeconsidering moreclosely.Nester(1996) has function bysignificance levels,letus first lookat stepwisetesting describedsomeoftheproblems withhypothesis admirably testing. Everyoneagreesthat,theoretically fora testto be valid,thehypothesis mustbe formulated thedata.No one doesthis.(I exaggerate; beforeinspecting protocolsofphaseIII trialsdo,butthat is nota modelselectionproblem.)In theirinference course,students aretaughtthisbasicprinciple for of hypothesis testing;in theirregression course,theyare taughtto use testingsequentially, stepwiseselection. testto the Considerthecase wheretwostatisticians, independently, applythesamefrequentist it in advance,the otherdoes not.Bothwill obtainthe same value. same data. One formulates at his or However,accordingto standard inference theory, onlythefirstcan rejectthehypothesis herstatedprobability theconclusionby collectingfurther level.The secondmustconfirm data, and indeedthe level of uncertainty reportedabout a givenmodel shouldreflectwhatprior followformal workwas carriedout withthosedata. However,if two statisticians exploratory to arriveat thesamefinalteststarting fromthesedifferent initialconditions, Bayesianprocedures thereasonmustbe theirdiffering initialpriors. are suchas theX2-distribution, Testsat a givenprobability level,based on some distribution level evenifthatprobability assumedtobe comparableinthesamewayin anysituation, generally has no meaningin itself,forexamplebecausewe havepreviously lookedat thedata.However,in in unmeasurable a sequenceof tests,thedegreeof uncertainty is accumulating ways,becauseof on thesame data (evenlookingat thedatabeforea firsttestwill previousselectionsperformed in different levels so heretheseriesof significance modify uncertainty waysin different contexts), are certainly not comparable.The probability levels obtainedfromsuchtestsare meaningless, evenforinternal comparisons. Statistical Heresies 13 Bayesianmethodshave theirownproblemswithmodelselection,besidesthefactthatone's mustbe alteredas soon as one looksat thedata.(Whatare theallowpersonalpriordistribution able effectsof data cleaningand of studying on one'spersonalprior?)With descriptive statistics a priorprobability of theparameter densityon a continuous parameter, theprobability being0, andhencebeingeliminated fromthemodel,is 0. SomeBayesianshavearguedthatthequestionis ratherif theparameter is sufficiently close to 0 in practicalterms(see, forexample,Bergerand thishave been proposed.For Sellke (1987)) butno widelyacceptedmeansforoperationalizing invariant? example,whatsuitableintervals aroundthiszero are parameterization Placinga point priorprobability masson sucha model,as withBayesfactors, thesituation, aggravates leadingto Lindley's(1957) paradox:withincreasinginformation, the modelwithpointpriorprobability necessarily dominates thosehavingcontinuous priorprobability. 7. 1. Example A modelfordependencebetweenAIDS detectionand reporting delaysis equivalentto a nonstationary stochastic processforreporting delays:theshapeof thedelaycurveis changingover time.Because of themissingtriangleof observations, a saturated modelcannotusefully be fitted to thiscontingency table.Instead,considera slightly simplerinteraction modelthatis stillnonparametric (Lindsey,1996b) log(,uij)=ai + ,Bj+ bit + yju (4) wheret and u respectively are thequarterand thedelaytimes.This yieldsa devianceof 585.4 with364 degreesof freedom.The curveis plottedas thebrokencurvein Fig. l(a). The deviance is improved The P-valueof thecorresponding testis by 131.1with49 degreesof freedom. 2.0 X 10-9,apparently strong evidenceagainststationarity. Here,I have decidedon the necessityto findsuch a model afterseeingthepoor fitof the modelused by De Angelisand Gilks(1994). Whatthenis thelongrunfrequentist stationary (as thefactthatwe do not ofthissecondP-value,evenignoring opposedto frequency) interpretation havea sampleandthatthestudycan neverbe repeatedinthesameconditions? As a Bayesian,howcouldI handlesucha modelifI had foreseenitsneedbeforeobtaining the data?It contains101 interdependent parameters forwhicha personalmultivariate priordistributionwouldhavehad to be elicited.ShouldI workwithmultiplicative factorsfortheriskfunction or additivetermsforthelog-risk? The twocan givequitedifferent intervals ofhighest credibility How shouldI weightthe priorprobabilities of the two posteriordensityforthe parameters. models,withand withoutstationarity? How shouldI studythe possibleelimination of the 49 of theirbeing0 would dependence(interaction) parameters, giventhattheposterior probability alwaysbe 0 (unlesstheywereexcludedin advance)? 8. Stepwise and simultaneous testing I now return to thecalibration problem.A changein deviance(minustwicethelog-likelihood) betweentwo models of, say,fivehas a different meaningdependingon whetherone or 10 are fixedin the second model in relationto the first.It has been argued,and is parameters such as the %2-distribution, commonlyassumed,thata probability distribution, appropriately calibrates thelikelihoodfunction to allowcomparisons numbers ofdegreesof undersuchdifferent freedomeven thoughthe P-value has no probabilistic forthe reasonsdiscussed interpretation above.Thisassumption is notoftenmadeclearto students andclients. Let us considermorecloselya simpleexampleinvolvingdifferent numbersof degreesof 14 J.K. Lindsey by testing, applicable freedom, thatcomparingstepwiseand simultaneous parameter elimination credibility or conto bothfrequentist and Bayesianprobability statements. Noticethatexamining test.Takea modelforthe fidenceintervals wouldyieldthesameconclusionsas thecorresponding meanofa normaldistribution, havingunitvariance,defined by -, ai + fij witheach of two explanatory variablestakingvalues 1 and -1 in a balancedfactorialdesign. Witha likelihoodratiotest(here equivalentto a Wald test),the square of the ratioof each errorwill havea %2-distribution and, parameter estimateto itsstandard with1 degreeof freedom with2 degreesof because of orthogonality, thesumof theirsquareswill have a %2-distribution underthenullhypotheses thattheappropriate freedom parameters are 0. First,we are comparing a modelwithone parameter at a time0 withthatwithneither0; then,both0 withneither0. priorsforthetwo EquivalentBayesianprocedurescan be givenusingany desiredindependent parameters. or construct separateprobability Supposethatwe perform separatetestsforthe parameters forthem,and thatwe findthattheindividualteststatistics havevalues3 and 3.5. These intervals indicatethatneitherparameter is significantly different from0 at the5% level,or that0 lies in each interval.Because of orthogonality, one parametercan be eliminated,then the other, on thefirst a simultaneconditionally being0, in eitherorder.However,if,instead,we performed ous testforbothbeing0, or constructed a simultaneous regionat thesame level as probability wouldhavea value of 6.5 with2 degreesof freedom, indicating, againat the above,thestatistic arisesin practice,it is 5% level,thatat leastone couldnotbe eliminated. Whensucha situation difficult to explainto a client! extremely I call suchinference and to procedures incompatible. The criticism appliesbothto frequentist Bayesianprobability statements. Frequentist multiple-testing procedures,such as a Bonferroni thatare so oftenincorrectly worse.The onlymakematters correction, appliedin sucha context, two tests for individualeliminationwould be made at the 2.5% level, providingeven less testwouldremain indication thateitherparameter was different from0, whereasthesimultaneous unchanged. Whatexactlyis theproblem?Calibrationof the likelihoodfunction by fixinga probability forvaryingnumbersof parameters inducesan increasingdemand level,Bayesianor frequentist, as model complexity on theprecisionof each parameter increases.A givenglobal probability level necessarilytightensthe precisionintervalaroundeach individualparameteras the total is testedforelimination numberofparameters involvedincreases.Forexample,ifone parameter or confidence intervals modelwill generally at a time,or individualcredibility used,theresulting ora simultaneous aretestedforremovalsimultaneously, be muchsimplerthanifmanyparameters arebiasedtowardscomplexmodels. regionat thesamelevelused.Suchprocedures mustbe heldconstant Whatthenis thesolution?The precisionof each parameter (at leaston itcan easilybe demoninferences, average),insteadoftheglobalprecision.To avoidincompatible as proportional to aP, where0 < a < 1 and strated thatthelikelihoodfunction mustbe calibrated or on average, estimatedin themodel.(This is onlyapproximate, p is thenumberof parameters i.e. whenthelikelihoodfunction doesnotfactor.) arenotorthogonal, whenparameters In model selection,this constanta is the likelihoodratiothatwe considerto be on the for thatoneparameter can be setto a fixedvalue(usually0). Moregenerally, borderline indicating itis theheightofthenormedlikelihoodthatone choosesto definea likelihood interval estimation, intervalforone parameter. heightto definea simultaneous Then,aP will be thecorresponding thatis compatible withthatforoneparameter. regionforp parameters In theprocessof simplifying makethedataless probable(unlessthe a model,we necessarily Statistical Heresies 15 parameterbeing removedis completelyredundant).This value a is the maximumrelative of thedatathatwe are willingto acceptto simplify a modelby removing decreasedprobability one parameter. The smalleris a, thewiderare thelikelihoodregionsand thesimplerwill be the model chosen,so thatthismaybe called the smoothing constant.Well-known model selection criteria,such as theAIC (Akaike,1973) and the Bayes information criterion (BIC) (Schwarz, 1978),aredefined inthisway.Suchmethodsdemystify thewidelybelieved,andtaught, frequentist myth thatmodelsmustbe nestedto be comparable. How havesuchcontradictory methodssurvivedso long?In manycases ofmodelselection,the numberofdegreesoffreedom is smallandaboutthesameforall tests,so thedifferences between model selectionconclusionsare not thetwoprocedureswill thenbe minimal.Most important borderline cases so theresultswouldbe thesameby bothapproaches.In addition,mathematical statisticians have done an excellentjob of convincing appliedworkersthattheoverallerrorrate shouldbe heldconstant, insteadofthemoreappropriate individual parameter precision.Although theformer maybe necessaryin a decisiontestingsituation, especiallywithmultipleendpoints,it in a modelselectionprocess. is notuseful,orrelevant, I can treatone further difficult Aftermodelselectionhas beencompleted, it issueonlybriefly. of interest. This raisesthe maybe desirableto make inferences aboutone particular parameter in thepresenceof non-orthogonality questionof how to factorlikelihoodfunctions (Kalbfleisch and Sprott(1970), Sprott(1975a, 1980) and Lindsey(1996a), chapter6). Parameter transformationcan helpbutwe mayhaveto acceptthatthereis no simpleanswer.The Bayesiansolutionof out the 'nuisance' parametersis not acceptable:a non-orthogonal averagingby integrating likelihoodfunction is indicating thattheplausiblevaluesoftheparameter of interest change(and on thevaluesofthenuisanceparameters. thushavedifferent Undersuch interpretation) depending an averageis uninterpretable conditions, (Lindsey(1996a),pages349-352). 8. 1. Example modelexploration, transformations of timeand Aftersome further usingsome ratherarbitrary I a of of have found a continuous timeparanumber non-nested involving comparisons models, metricmodelallowingfornon-stationarity (Lindsey,1996b): + dI log(t)+ 42/t+ log([Uij)=0 + 01 log(u) + 02/u+ VIt/u+ v2tlog(u) V3 log(t)/u+ V4 log(t)log(u). thelog-transformation oftimealoneyieldstheWeibullmodel.)Thismodelhas a deviance (Fitting of 773.8 with456 degreesof freedom.It has onlynineparameters comparedwith101 in the model(1). It is a simplification ofthatmodelwitha deviance188.4 nonparametric non-stationary largerfor92 degreesoffreedomso theP-valueis 1.3 x 10-8,seemingto indicatea veryinferior model.However,its smoothcurvethroughthe data,plottedas the dottedcurvein Fig. l(a), appearsto indicatea good model both fordescribingthe evolutionof the epidemicand for prediction. Witha fairlylargeset of 6215 observations, a = 0.22 mightbe chosenas a reasonablevalue theintervalof precisionforone parameter; fortheheightof thenormedlikelihooddetermining thisimpliesthatthedeviancemustbe penalizedbyaddingthree(equal to -2 log(0.22))timesthe numberof parameters. (This a is smallerthanthatfromtheAIC: a = I/e = 0.37 so twicethe wouldbe addedto thedeviance.It is largerthanthatfroma X2-test at 5% numberof parameters a = 0.14, or adding3.84 timesthenumberofparameters, with1 degreeof freedom, but,withp nonthisdoes not changeto 0.14p.) For the stationary parameters, model,the nonparametric 16 J.K Lindsey modelandtheparametric 872.5,989.4 model,thepenalizeddeviancesarerespectively stationary forthe last. (Such comparisonsare onlyrelativeso and 800.8, indicatinga strongpreference wouldyieldthesame results.)This confirms theconclusionsfromour penalizedlog-likelihoods intuitive inspection ofthegraph. inno waythe'true'modelforthephenomenon giventhearbitrary Thisis certainly understudy, transformations of time.However,it appearsto be quiteadequateforthequestionat hand,given modelselectionexamples.) theavailabledata.(See LindseyandJones(1998) forfurther 9. Equivalence of inference procedures give aboutthe Some statisticians claimthat,in practice,thecompeting approachesto inference One mayarguethat,just as sameresultsso itis notimportant whichis used ortaughtto students. do a reasonably procedures, butthattheybothnevertheless modelsareimperfect, so areinference withouttrying to goodjob. However,just as we wouldnotstickwithone modelin all contexts, in all contexts, norrefuse to applyone modeof inference improveon it,so we shouldnotattempt are extremely difficult to checkempirically, procedures improvements. Unlikemodels,inference andthereis no agreement on theappropriate criteria to do this. it is truethattheconcluBayesianand frequentist procedures, Although, formanycompeting sionswill be similar,in ourgeneralexploratory contextall methodsare notalike.Forsmalland as describedabove,givevery propermodelselectioncriteria, largenumbers ofdegreesoffreedom, different resultsfromthosebased on probabilities. (The standardAIC is equivalentto a %2-test orinterval atthe5% levelwhenthereare7 degreesoffreedom.) can reasonablycalibratethe statements For thosewho would stillmaintainthatprobability considerthe situations, likelihoodfunction, especiallygiventheirbehaviourin highdimensional following questions. (a) The deviancecan providea measureof thedistancebetweena givenmodeland thedata modelsforcontingency %2-distribution. tables)andhas an asymptotic (e.g. withlog-linear We wouldexpectmodelsthatare morecomplexto be generallycloserto thedata.Why thendoestheprobability ofa smalldeviance,indicating a modelclose to thedata,become increases?(Thiswas a questionaskedbya verysmallas thenumberofdegreesoffreedom sociologystudent.) second-year undergraduate in multidimensional ofteninvolvesintegration spaces. (b) The calculationof probabilities unithypersphere changeas p How do the surfacearea and volumeof a p-dimensional mathematics increases?(Thisis a standard undergraduate question.) statements. formodelcomparison inferences without probability Herethenis mycompromise, thefrequentist likelihoodratioor Bayes Take theratioof likelihoodsfortwomodelsof interest, we haveno agreed factor. Maximizeeachoverunknown parameters, because,amongotherthings, invariant for and integration, withwhatever measureon parameters prior,is notparameterization aboutprecision.To avoidincompatibility, comintervalstatements makingeasilycommunicable to aP, with bine each likelihoodwitha 'prior'thatpenalizesforcomplexity, beingproportional overwhich a and p definedas in the previoussection.Here, p is the numberof parameters was performed; a shouldbe specifiedin theprotocolforthe studyand mustbe maximization withthesamplesize,as discussedbelow.Thisis the'non-informative, objectiveprior' congruent for The resulting oddsthenprovidesuitablecriteria foreachmodelunderconsideration. posterior witheach otherand All conclusionsaboutmodelselectionwillbe compatible modelcomparison. inthefinalmodel. likelihoodregionsforparameters with(simultaneous) or by Bayesiancoherency In contrast withevaluationby frequentist long runcharacteristics Statistical Heresies 17 thisprocedure will indicatethemostappropriate amongconclusions, setofmodelsforthedataat hand(including likelihoodfunctions frompreviousstudies,whereavailableandrelevant). Thisset of models,amongthoseconsidered, makestheobserveddata mostprobablegiventheimposed constraint on complexity. Hence,we havea thirdcriterion forinferential evaluation. Thisseemsto me to providea betterguarantee of scientific generalizability fromtheobserveddatathaneithera longrunora personalposterior probability. of inference Thus,we havea hierarchy procedures: modelfunction and thenumberof parameters (a) if theappropriate requiredare notknown beforedata collection,and one believesthatthereis no truemodel,use theabove procedureforsomefixedvalueofa (a generalization oftheAIC) to discoverthesetofmodels thatmostadequatelydescribesthedataforthequestionathand; modelfunction is notknownbeforedatacollectionbutone believesthat (b) iftheappropriate thereis a truemodel(fora Bayesian,it mustbe containedin the set of functions to be use theBIC totryto findit; considered), onehas a modelfunction (c) if,beforedatacollection, thatis knownto containthetruemodel, use probability-based Bayesian or frequentist proceduresderivedfromthe likelihood function to decideonparameter andtheiraccompanying estimates intervals ofprecision. The difference betweenBayesianand frequentist procedures, each based on its interpretation of is less thanthatbetweeneitherof theseand model selectionmethods inference probabilities, basedon appropriately penalizedlikelihoods. 9. 1. Example We saw,in theprevioussection,thatthepenalizeddevianceclearlyrejectsthenonparametric nonmodelcomparedwiththeothertwomodels.However,we havealso seenthatFisherian stationary mannerin thismodelselectioncontext)strongly tests(incorrectly used in a frequentist rejectboth the stationary and the parametric models in favourof nonparametric These non-stationarity. conclusionsare typicalof the differing inferences contradictory producedthroughdirectlikelihoodandprobability-based are involved.In such approacheswhenlargenumbersofparameters a penalizedlikelihood, suchas theAIC, will selectsimplermodelsthanclassical circumstances, inferences basedon probabilities, whether Bayesianorfrequentist. 10. Standarderrors Hypothesis testinghas recently come underheavyattackin severalscientific disciplinessuchas psychologyand medicine;see, forexample,Gardnerand Altman(1989). In some scientific whichare feltto be more journals,P-valuesare beingbannedin favourof confidence intervals, thisis nottrue;P-valuesand confidence informative. (In thefrequentist setting, intervals, based on standarderrors,are equivalentif the pointestimateis available.) Thus, the formulation, =5.1 (standarderror2.1), is replacingthe equivalent0 =5.1 (P= 0.015). Unfortunately, believethattheonlywaythata statistician is capableof providing a measureof manyscientists to convince estimateis through itsstandard error.It is oftenverydifficult precisionofa parameter evensomehighlytrainedappliedstatisticians thatparameter precisioncan be specifiedintermsof the(log-) likelihood.Also, standardstatistical packagesdo notsupplythelatterlikelihood-based andthesecan be difficult to extract. intervals, For non-normal models,standarderrors,withthe sizes of samplesusuallyrequired,can be itis oftenvery thelikelihoodfunction, extremely misleading;at thesametime,without inspecting 18 J.K. Lindsey approximations are difficult to detectthespecificcases,perhapsone timein 20, whenasymptotic errorprovidesa quadraticapproximation so faroffthatsucha problemhas occurred.The standard to theprecisionobtaineddirectlyfromthelog-likelihood function fora givenparameterization (Sprottand Kalbfleisch, 1969; Sprott,1973, 1975b).As is well known,it is notparameterization invariant. It is less widelyknownjust how muchtheprecisionintervaldependson theparamitcan be, especiallyiftheestimateis neartheedgeof eterization andhowpooran approximation theory. theparameter space,whichis another weaknessoffrequentist asymptotic whichparameters maynotbe Standard errorsarea usefultoolin modelselectionforindicating fordrawingfinalinference necessary.Outsidelinearnormalmodels,theyare not trustworthy conclusionsabouteitherselectionor parameter theyare demandedby precision.Nevertheless, manyreferees and editors,evenofreputable statistical journals.Iftheycometo replaceP-values, intervalsof precision they,in turn,will have to be replacedin a fewyearsby moreappropriate basedon thelikelihoodfunction. 10.1. Example standarderrorsprovidereasonableapproximations Because of thelargenumberof observations, a binarylogisticregression fortheAIDS data.Instead,considera (real) case of fitting involving thesoftware 1068 observations thathappensto be at hand.In a modelwithonlyfourparameters, errorgivento be 9.46; thatwouldapparently calculatesone ofthemtobe 5.97 withstandard yield of 0.40. However,whenthisparameter an asymptotic is removed, thedevianceincreases %2-value thisis a pathological by 3.06, a ratherdifferent asymptotic X2-value.Of course,fora frequentist, is on theboundary oftheparameter with examplebecausea parameter space,as someonefamiliar this logisticregression mightguessfromthesize of thecalculatedstandarderror.Nevertheless, is can catchtheunwaryandmuchmoresubtlecases arepossiblewhenever thelikelihoodfunction skewed. In contrast, changesin likelihoodare notmisleading(exceptwhenjudged asymptotically!), evenon theboundaryof theparameter abouttherelative space; theyare stillsimplystatements models.Thus,thevalueof3.06 is thepropervalue oftheobserveddataunderdifferent probability to use above(theother,0.40, notevenbeingcorrectly calculated).The normedprofilelikelihood is plottedin Fig. 2, showingthatan AIC-basedinterval(a =1 /e) excludes0. forthisparameter between HandandCrowder(1996),pages 101 and 113,providedotherexamplesofcontradictions from likelihood ratios and from standard errors. inferences 11. Sample size from of designinga studyand drawinginferences Let us nowconsidertheopposingphilosophies is premisedon theidea thatthelargerthesample it.The theoryofalmostall statistical procedures we shall knowthe correctmodel exactly.Appliedstatisthebetter;withenoughobservations, incorrect. Theywantthe smallest ticians,whenplanninga study,knowthatthisis completely to answerthequestion(s)underconsamplepossiblethatwill supplytherequiredinformation sideration at theleastcosts.The onlythingthata largesampledoes is to allow thedetectionof whilewastingresources. effects thatareofno practicalimportance, Samplesize calculationsare criticalbeforeundertaking anymajorstudy.However,thesesame who carefullycalculatea sample size in theirprotocoloftenseem to forgetthis statisticians important point,that samples shouldbe as small as possible,when theyproceedto apply to thedatathattheyhavecollected. inference asymptotic procedures controlled thecomplexity Exceptperhapsin certaintightly physicalandchemicalexperiments, StatisticalHeresies 19 0 0 / 0 E o - ------------------------ z o (DI C-J -2 0 4 2 6 p Fig. 2. Normedprofilelikelihood( interval of a = 1/e (-)---- ) ) forthe logisticregressionparameterwiththe standardAIC likelihood of themodelrequiredadequatelyto describea data generating mechanismincreaseswiththe samplesize. Considerfirstthecase whereonlya responsevariableis observed.For samplesof reasonablesize,one ormoresimpleparametric probability distributions can usuallybe foundto fit well.But,as thesamplesize grows,morecomplicatedfunctions will be requiredto accountfor the more complexformbeing revealed.The marginalization hypothesisproves increasingly inadequate.Thus,the secondpossibilityis to turnto subgroupanalysisthrough regression-type models.The moreobservations, themoreexplanatory variablescan be introduced, and themore At each step,themarginalization mustbe estimated. thefoundation ofthe parameters hypothesis, model,is modified. Thus, this growingcomplexitywill generallymake substantively uninteresting differences detectablewhile oftenhidingthe verypatternsthata studysets out to discover statistically (althoughit may,at thesame timeas makingtheoriginalquestionsimpossibleto answer,raise new questions).Social scientists withcontingency tablescontaining important largefrequencies are well awareof thisproblemwhentheyask statisticians how to adjustthe%2-values, and the resulting P-values,to givesensibleanswers.The solution,whenthesamplesize,forsomeunconin a propermodel trollablereason,is too large,is to use a smallera, the smoothing constant, selectionprocedure, as describedabove. The detectionof practically usefulparameter differences is directly relatedto the smoothing whichfixesthelikelihoodratioandthetwotogether determine thesamplesize. Withthe constant, difference of interest parameter specifiedby the scientific questionunderstudy,the smoothing constant andsamplesize shouldbe fixedintheprotocolto havevaluesinmutualagreement. Fora effectof interest, givensize of thesubstantive largersampleswill requiremoresmoothing, i.e. a This is the directlikesmallerlikelihoodratiocomparingmodelswithand withoutdifference. lihoodequivalent offrequentist powercalculations(Lindsey,1997b). inference Asymptotic procedures, assumingthatthe samplesize goes to oo, are misplaced. Theironlyrolecan be in supplying to thelikelihoodfunction, certainnumericalapproximations or itsfactorization, whenthesecan be shownto be reasonableforsufficiently smallsamplesizes. to conform moreclosely Asymptotic frequentist tests,iftheyare to be used,shouldbe corrected to the(log-) likelihoodfunction, whichshouldneverbe corrected to conform to a test.Bayesians 20 J.K. Lindsey in thelikelihooddominating cannottakecomfort theirpriorasymptotically (unlesstheyhavethe truemodelfunction). Unnecessarily largesamplesare bad. The demandsby appliedstatisticians foraccuratesamplesize calculationsand forexact small sampleinference proceduresconfirm this. 11.1. Example ConsidernowtheAIDS datafortheUSA forwhichthereare 132 170 cases,comparedwithonly 6215 forEnglandand Wales. Withso manyobservations, the mostminutevariationswill be detectedbyusingstandard inference The samethreemodels,as estimated fromthese procedures. data,are plottedin Fig. 1(b) (Hay and Wolak (1994) used the same nonparametric stationary modelas De Angelisand Gilks (1994), shownby thefullcurve).Again,theparametric model appearsto be mostsuitablefortheprediction problemat hand.The nonparametric non-stationary model,themostcomplexof thethree,fluctuates wildlyneartheend of theobservation period. in deviancebetweenthisandthestationary However, thedifference modelis 2670 with47 degrees in favourofthisunstablemodel.The parametric offreedom, modelis clearlyrejecting stationarity evenmoreclearlyrejected:5081 with88 degreesoffreedom. Withthepowerprovidedby thismanyobservations, a greatdetailcan be detected, requiring to the questionat hand. Hence, I would have to penalize the complexmodel,but irrelevant likelihoodverystrongly to obtaina simplemodel.Fortheparametric modelto be chosenoverthe othertwo,thedeviancewouldhaveto be penalizedbyadding60 timesthenumberofparameters tothedeviances,corresponding to a = 10-13. 12. Consistency The mosteffective fromusinga modelor methodis to wayto discouragean appliedstatistician inconsistent Thisis completely irrelevant for estimates. saythatitgivesasymptotically parameter a fixedsmallsample;theinterval ofplausiblevalues,nota pointestimate, is essential.Fora point themorepertinent Fisherconsistency is applicableto finitepopulations.If thesample estimate, so thequestionofa paramwerelarger,in a properly themodelcouldbe different, plannedstudy, to a 'true'valuedoesnotarise. eterestimatein somefixedmodelconverging asymptotically even a mean,has littleor no valid interpretation outsidethe specific However,a parameter, modelfromwhichitarises.Forcountsofevents,themeanofa Poissondistribution has a basically fromthe interpretation of the mean of its generalization to a negative different interpretation binomialdistribution; a givenregressioncoefficient changesmeaningdependingon the other leads variablesin themodel(unlesstheyare orthogonal). explanatory Ignoringthesedifferences in to contradictions andparadoxes.The choiceofa newcarbymeanautomobilefuelconsumption milespergalloncan be contradicted bythatmeasuredin litresper 100 kilometres (Hand,1994) if thedistributions towhichthemeansreferarenotspecified. Thus,ifthemodelis heldfixedas thesamplesize grows,it will notadequatelydescribethe of interest will generally observations. But,if it is allowedto becomemorecomplex,parameters be estimated will changeinterpretation. Information inconsistently, and,muchmoreimportantly, cannotbe accumulated abouta parameter. asymptotically The essentialquestion,in a properly themodeldescribeswell the designedstudy,is whether observeddatain a waythatis appropriate to thequestionbeingasked.Forexample,a fixedeffects modelcanbe moreinformative thanthecorresponding randomeffects theformer model,although suchas the estimates. The fixedeffects modelallowscheckingof assumptions yieldsinconsistent formof thedistribution of effectsand thepresenceof interaction betweenindividualsand treat- Heresies Statistical 21 ments.The relativefitof thetwoto thegivendata can be comparedby appropriate directlikelihoodmodelselectioncriteria. Ironically, virtually all thecommonly used estimates forparameters in modelswithcontinuous responsevariablesare inconsistent foranyreal databecausetheyare based on theapproximate likelihoodof equation(2). Theseincludealmostall thecommon,in factapproximate, maximum likelihoodestimates. The existenceof sucha lack of consistency has been knownfor100 years. Sheppard's(1898) correction providestheorderoftheinconsistency forthevarianceof a normal Fisherwas carefulaboutthis,it has been conveniently in recent distribution. Although forgotten work. asymptotic Frequentists havealso beenmuchconcernedwithstatistically unbiasedestimates(as opposed to designbiases thatare usuallyfarmoreimportant). The criticisms of thisemphasisare well known.For example,the unbiasedestimateof the varianceis no longerunbiasedwhentransformed to a standard thevalue mostoftenrequired.Interestingly, unbiasedness deviation, adjusts the maximumlikelihoodestimateof the normalvarianceso thatit becomes larger,whereas forconsistency makesit smaller.Fora fixedmodel,thebiasednessofmost Sheppard'scorrection estimates theinconsistency due to theapproxidisappearswithincreasedsamplesize; in contrast, mationof equation(2) increasesin relativeimportance, becauseoftheincreasedprecisionof the estimate, as do designbiases. 12.1. Example In contrast withmorestandardcontingency tableanalyses,herethesize of thetablegrowswith thenumberofobservations, as timegoes by.It willalso increaseifthetimeinterval is takento be smallerthanquarters. is notfixed, Hence,forthenonparametric models,thenumberofparameters raisingquestionsofconsistency. ForthedatafromtheUSA, thelargesamplesize wouldallowone to modelmuchmoredetail if thatweredesired.For example,seasonalfluctuations mightbe studied.However,by standard inference criteria, onlythesaturated modelfitswell,althoughitcannotprovidepredictions forthe andhencecannotanswerthequestionsof interest. recentperiodwhenthereis themissingtriangle 13. Nonparametric models is dominated Much of modernappliedstatistics by thefearthatthemodelassumptions maybe of thefactthatall suchassumptions wrong,ignoring necessarilyare wrong.This is a symptom the lack of adequatemodel comparisonand model checkingtechniques,forlookingat model This trendis fuelledby the fact,discussedabove, that appropriateness, in currentstatistics. values without testsand confidenceintervalscannotbe used to examineparameter frequentist thestochastic ofthemodelwhereasBayesianprocedures simultaneously questioning assumptions tobe correct. mustassumetheoverallmodelframework This fear has led to the widespreaduse of nonparametric and semiparametric models, especiallyin medical applicationssuch as survivalanalysis.These, supposedly,make fewer assumptions, meaningfewerthatare wrong.However,thisis an illusionbecausetheassumption thatwill formof a modelis completely unknownis just that:an assumption thattheparametric oftenitselfactuallybe wrong(Fisher(1966), pages 48-49, and Sprott(1978)). We studya suchas theCauchy in meansnonparametrically. Do we knowthata stabledistribution, difference is notinvolved? distribution, formofthe A nonparametric modelis generally understood to be a modelwherethefunctional has an infinite numberof parameters; fora stochasticpartis not specifiedso it theoretically 22 J.K. Lindsey frequentist, thisprovideschallenginginference problems.Paradoxically, a model in whichthe formof the systematic functional regression partis not completelyspecified,so it potentially containsan infinite numberofparameters, is notcallednonparametric; thefrequentist can handle it routinely (by conditioning) because it does not directlyinfluenceprobabilistic conclusions. Thus,a classicalanalysisofvariancewitha factorvariablerepresenting a continuous explanatory buta Cox (1972) proportional variableis parametric hazardsmodel,witha similarrepresentation forthebase-linehazard,is semiparametric. Fromthepointofviewofthelikelihoodfunction, that thereis no difference onlytakesintoaccounttheparameters actuallyestimated, betweenthetwo. One majoradvantageof inferences based on parametric modelsis thattheyallowall assumpIn contrast, tionsto be checked,theavailabledatapermitting, andto be improved. nonparametric modelsfollowthedatamoreclosely,leavinglittleorno information formodelcheckingthatcould indicateimprovements. However,the two can be complementary: a nonparametric model can serveas a basis forjudgingthegoodnessoffitofparametric models.Ifnoneofthelatteris found one can fallbackon usingtheformer untilfurther appropriate, knowledge has accumulated. It is commonly believedthatno meansexistfordirectly comparing parametric and nonparametricmodels,evenforthesamedata.At best,diagnostics can be compared.However,fromany properly specifiedstatistical model,theprobability of theobserveddata can be calculated;the most'nonparametric' model simplyhas equal probability masses forall observations. In other models can be directlycompared words,the likelihoodsof parametricand nonparametric (Lindsey,1974a,b). This,however, leavesasidetheproblemofthevastlydifferent numbers ofparameters estimated in modelsof the two types.If the likelihoodfunction is calibratedby means of a probability anditsnumber in sucha context, distribution ofdegreesoffreedom (andthefrequentist problems, resolved),nonparametric modelswillalmostinvariably win.Suchcomparisons usinga probability calibrationgreatlyfavourcomplexmodels forthe reasonsdiscussedabove. In contrast, with properly compatiblemodelselectionmethodsbased on al, parametric modelswill generally be and oftensuperior, competitive dependingon theamountof scientific knowledgethatis available to developthemodel,as well as on thesize of a chosen,i.e. thedegreeof smoothing required.A is greater model be in the its likelihood will judgedsuperior parametric if, simplestcase, penalized the n distinct observathana"-'I/nn; thisis likelihoodfor penalizednonparametric independent with at of information that models tions a pointmass each.Because theincreased goodparametric oftheirpossiblelackoffitcomparedwitha nonparametric an indication provide,including model, thecomparison is alwaysworthmaking. 13.1. Example the development of my example,I have been directlycomparingparametric and Throughout model models.We haveseenhowtestingled to rejectionofthesimplerparametric nonparametric withthesame delayj for forbothdata sets.In thesaturated model,all nij observations reported thesame quarteri haveprobability mass l /nij; combining theseyieldsitsmaximizedlikelihood. modelforthesedata.ForEnglandandWales, Thus,thismodelis themostgeneralnonparametric to as manyparameter thereare 465 entriesin thetablecorresponding estimatesin thesaturated modelthathas zerodeviance.Thusthepenalizeddevianceis 1395,whichis muchpoorerthanfor theothermodels,as seenfromthevaluesgivenabove. 14. Robustness A second answerto the fearof inappropriate model assumptionhas been to look forrobust StatisticalHeresies 23 and inference procedures. Much of thissearchhas occurredin thecontextof pointestimation, thenormaldistribution decisionproblems, thatarenotthesubjecthere.Thus,in manysituations, (Box and Tiao and the t-testare widelyconsideredto be robustundermoderatenon-normality (1973),p. 152). is thatparameterestimatesare not being undulyinfluenced by One symbolof robustness are hauntedby thepossibility of outliers.They extremeobservations. Some appliedstatisticians to be valid,insteadof modifying believethattheymustmodifyor eliminatethemforinferences Outliersare themodel(iftheobservations arenotin error)or acceptingthemas rareoccurrences. oftenthemostinformative partof a data set,whetherby tellingus thatthe data werepoorly mask procedures shouldnotautomatically collectedorthatourmodelsare inadequate.Inference oradjustforsuchpossibilities butshouldhighlight them. andotherdiagnosticresults,are definedonlyin termsof somemodel.Unfortunately, Outliers, in thesamewayas fora testrejectinga nullhypothesis, theyneverindicateone uniquewayof theproblem.Possiblescientific models correcting waysto proceedincludelookingat competing or havingsome moreglobal theoretical modelwithinwhichthe model underconsideration is in whichcases directlikelihoodmethodsare applicable.However,ad hoc patchingof embedded, thecurrent modelis probably morecommon. Manystatisticians, especiallyfrequentists forreasonsdevelopedearlier,professa preference forresultsthatare insensitive thatare notdirectly of interest. In contrast, I to anyassumptions claimthatbotha goodmodelanda good inference shouldbe as sensitiveas possibleto procedure that theassumptions beingmade.Thus,a robustmodelwill be a modelthatcontainsparameters allowsit to adaptto, and to explain,thesituation(Sprott,1982). This will thenprovideus with givingindications of meansbywhichthe waysto detectinadequaciesin ourapproach,hopefully can be further improved. However, model,andourunderstanding ofthephenomenon understudy, ifone'sattitude is simplyto answertheresearchquestionat hand,thenone mayarguein favourof to (apparently) incorrectness. modelsthatareinsensitive irrelevant 14.1. Example formoftheparametric modeldevelopedforEnglandandWalesalso describesthe The functional valuesare very datafromtheUSA reasonably well,as seen fromFig. 1, althoughtheparameter in thereporting-delay process.A similar,although different, especiallybecauseofthedifferences modelcan be developedspecifically forthesedata whichfitsverymuch functionally different, ofthe better. thereis littlevisibledifference whenit is graphed, showingtherobustness However, inrecently modelsforthequestionathand,at leastforpredictions pastquarters. parametric 15. Discussion theprincipal These comments havebeen made in thecontextof (scientific) modeldevelopment, ofmostappliedstatisticians. butnotless important, activity Manyarenotapplicablein thatrarer, However,in spiteof itsname,presentconfirmaprocessofmodeltesting, or in decision-making. thata null hypothesis can be rejected.This toryinferenceis essentiallynegative:confirming clinicaltrials,requiring tensof thousands perhapsexplainswhyconfirmatory phaseIII industrial of subjectswhiletestingone endpointas imposedbyregulatory agencies,do notyieldscientific information thatis proportional to theirsize and cost. Owingto theinadequacyof exploratory in phasesI and II forspecifying to be tested,if propergoodness-of-fit inference thehypothesis in phaseIII, it couldbe very criteria wereeverappliedto themodelforthealternative hypothesis embarrassing. 24 J.K. Lindsey in Certainof thepracticesthatI have criticized, such as theuse of the normaldistribution constructing modelsand in developingasymptotic inference criteria, originally arosethrough the practicalneedto overcomethelimitations of thetechnology of thetimes.Withtheadvancement oftechnology, especiallythatofcomputing power,suchrestrictions areno longernecessary. Otherpracticeshave arisendirectly through the development of mathematical out statistics, of touchwiththe realitiesof scientific researchand appliedstatistics. Based on well-reasoned arguments and deep mathematical proofs,theyare themoredifficult to fight.Reinforcing this, manypeople firstencounterstatistics throughintroductory coursespresenting littleotherthan theseinference principles, taughtby mathematically trainedinstructors and containing fewhints Aftera valiantstruggle aboutusefulmodelsor realapplications. to masterthecontradictions, they as theonlyonespossible,andthatstatistics cometo accepttheseprinciples is a difficult subjectof littlepracticaluse,a necessaryevil. Nevertheless,applied statisticsseems oftento be leading mathematicalstatisticsin the forexample,how,increasingly of inference development procedures. Consider, widely,non-nested theAIC is used deviancesare directly comparedin theappliedstatistics literature or,preferably, in place of tests,in spiteof its known'poor' asymptotic formodelcomparisons, for properties selectingthe'true'model. We practisestatistics without everquestioning, orevenseeing,thesebasiccontradictions. Thus, forexample,we tellstudents andclients areacceptable, (a) thatall modelsarewrong,butthatonlyconsistent estimates (b) thatwe can calculatetheminimum necessarysamplesize, butthatwe mustthendraw inferences as ifitshouldideallyhavebeeninfinite, mustbe formulated beforelookingat thedata,butthattestingis appro(c) thathypotheses priateas a stepwiseselectionprocedure, statements abouttheirlimits,but we interpret (d) thatconfidenceintervalsare probability themas probabilities abouttheparameters, and thenwe use Jeffreys orreference mustobeythelikelihoodprinciple, (e) thatinferences priors. inferencestatements, true models,the Unnecessaryunderlying principles-probability-based model likelihoodas probability accumulationof information, density,asymptotic consistency, effects. nesting, minimizing modelassumptions-havewide-ranging and oftenunnoticed harmful The roleofthestatistician is to questionandto criticize;letus do so withourownfoundations. Acknowledgements of at theSymposium on theFoundations This is a revisedversionof an invitedpaperpresented in HonourofDavid Sprott(University ofWaterloo, October3rd-4th,1996). Statistical Inference MarkBecker,Denis de Crombrugghe, DavidFirth, Dan Heitjan,David Hinkley, Murray Aitkin, Colin James,PhilippeLambert,TomLouis,PeterMcCullagh,JohnNelder,FranzPalm,Stephen Senn and JohnWhitehead,as well as fivereferees,includingDavid Draperand David Hand, criticisms andsuggestions. providedvaluablecomments, References theoryand an extensionof themaximumlikelihoodprinciple.In Proc. 2nd Int.Symp. Akaike,H. (1973) Information Inference Theory(eds B. N. PetrovandF. Csaki),pp.267-28 1. Budapest:AkademiaiKiado. andtimeseries.J R. Statist.Soc. A, 125, C. B. (1962) Likelihoodinference Barnard,G. A., Jenkins, G. M. and Winsten, 321-352. StatisticalHeresies 25 Berger,J.0. and Sellke,T. (1987) Testinga pointnullhypothesis: theirreconcilability of P valuesand evidence.J Am. Statist.Ass.,82, 112-139. Berger,J.0. andWolpert, R. L. (1988) TheLikelihoodPrinciple:a Review,Generalizations, and StatisticalImplications. Hayward:Institute ofMathematical Statistics. New York:Wiley. Bernardo, J.M. andSmith,A. F M. (1994) BayesianTheory. ofstatistical J Am.Statist. Bimbaum,A. (1962) On thefoundations inference Ass.,57, 269-306. J.F (1996) On thegeneralization ofthelikelihoodfunction Bj0rnstad, andthelikelihoodprinciple. J Am.Statist.Ass.,91, 791-806. J Am.Statist.Ass.,71, 791-799. Box, G. E. P. (1976) Scienceandstatistics. of scientific in thestrategy modelbuilding.In Robustness in Statistics (1979) Robustness (eds R. L. Launerand G. N. Wilkinson), pp. 201-236. New York:AcademicPress. in scientific (1980) SamplingandBayes' inference modellingandrobustness (withdiscussion).J R. Statist.Soc. A, 143,383-430. Box,G. E. P.andTiao,G. C. (1973) BayesianInference inStatistical Analysis.New York:Wiley. modelsandlife-tables Cox,D. R. (1972) Regression (withdiscussion).J R. Statist.Soc. B, 34, 187-220. De Angelis,D. and Gilks,W R. (1994) Estimating acquiredimmunedeficiencysyndromeincidenceaccountingfor reporting delay.J R. Statist.Soc. A, 157,31-40. Fisher,R. A. (1922) On themathematical foundations oftheoretical statistics. Phil. Trans.R. Soc. Lond.A, 222,309-368. 2ndedn.Edinburgh: OliverandBoyd. (1959) Statistical Methodsand Scientific Inference, (1966) DesignofExperiments. Edinburgh: OliverandBoyd. Gardner, M. J.andAltman,D. G. (1989) Statistics withConfidence. London:BritishMedicalJournal. Gelman,A., Meng,X. L. and Stern,H. (1996) Posterior predictive assessment ofmodelfitness via realizeddiscrepancies. Statist.Sin.,6, 733-807. statistical Hand,D. J.(1994) Deconstructing questions(withdiscussion).J R. Statist.Soc. A, 157,317-356. Data Analysis.London:ChapmanandHall. Hand,D. J.andCrowder, M. (1996) PracticalLongitudinal the unconditional cumulativeincidencecurveand its Hay,J.W and Wolak,F A. (1994) A procedureforestimating forthehumanimmunodeficiency virus.Appl.Statist., variability 43, 599-624. Heitjan,D. F (1989) Inference fromgroupedcontinuous data:a review.Statist.Sci.,4, 164-183. Hill,B. M. (1963) The three-parameter lognormaldistribution and Bayesiananalysisof a point-source epidemic.J Am. Statist. Ass.,58, 72-84. and A(,,)or Bayesiannonparametric inference. (1988) De Finetti's theorem, induction, predictive Bayes.Statist., 3, 211-241. J. D. and Sprott,D. A. (1970) Applicationof likelihoodmethodsto modelsinvolvinglargenumbersof Kalbfleisch, parameters (withdiscussion).J R. Statist.Soc. B, 32, 175-208. J Am.Statist. A. E. (1995) Bayesfactors. Kass,R. E. andRaftery, Ass.,90, 773-795. Kass, R. E. and Wasserman,L. (1996) The selectionof priordistributions by formalrules.J Am. Statist.Ass., 91, 1343-1370. D. V (1957) A statistical Lindley, paradox.Biometrika, 44, 187-192. ofprobability J R. Statist.Soc. B, 36, 38-47. distributions. Lindsey, J.K. (1974a) Comparison ofstatistical models.J R. Statist.Soc. B, 36, 418-425. andcomparison (1974b) Construction and CountData. Oxford:OxfordUniversity Press. (1995a) ModellingFrequency models.Appl.Statist., (1995b)Fitting parametric counting processesbyusinglog-linear 44, 201-212. Statistical Oxford:OxfordUniversity Press. (1996a) Parametric Inference. bivariateintensity withan application to modellingdelaysin reporting (1996b) Fitting functions, acquiredimmune J R. Statist.Soc. A, 159, 125-131. deficiency syndrome. J Statist.PlanngInf:,59, 167-177. (1997a) Stoppingrulesandthelikelihoodfunction. forexponential (1997b) Exactsamplesize calculations familymodels.Statistician, 46, 231-237. Lindsey,J.K. and Jones,B. (1998) Choosingamonggeneralizedlinearmodelsappliedto medicaldata.Statist.Med., 17, 59-68. of marginalmodelsforrepeatedmeasurements in clinical Lindsey,J.K. and Lambert,P. (1998) On theappropriateness trials.Statist.Med.,17,447-469. andcomparing withlog linearmodels.Comput.Statist. Lindsey,J.K. andMersch,G. (1992) Fitting probability distribution Data Anal.,13,373-384. creed.Appl.Statist., Nester,M. R. (1996) An appliedstatistician's 45, 401-410. ofa model.Ann.Statist., thedimension Schwarz,G. (1978) Estimating 6, 461-464. W F (1898) On thecalculationofthemostprobablevaluesoffrequency constants fordataarranged to Sheppard, according divisionsofa scale. Proc.Lond.Math.Soc., 29, 353-380. equidistant D. J.,Freedman,L. S. and Parmar,M. K. B. (1994) Bayesianapproachesto randomizedtrials(with Spiegelhalter, discussion).J R. Statist.Soc. A, 157,357-416. andtheirrelationto largesampletheory D. A. (1973) Normallikelihoods ofestimation. Sprott, Biometrika, 60,457-465. (1975a) Marginalandconditional sufficiency. Biometrika, 62, 599-605. ofmaximum likelihoodmethodsforfinite (1975b) Application samples.SankhyaB, 37, 259-270. to normality. Can. J (1978) Robustnessand non-parametric proceduresare nottheonlyor the safe alternatives 26 J. K. Lindsey Psychol.,32, 180-185. (1980) Maximumlikelihoodin smallsamples:estimation in thepresenceof nuisanceparameters. Biometrika, 67, 515-523. andmaximum (1982) Robustness likelihoodestimation. Communs Statist.Theory Meth.,11,2513-2529. Sprott, D. A. andKalbfleisch, J.D. (1969) Examplesof likelihoodsandcomparison withpointestimatesand largesample J Am.Statist. approximations. Ass.,64, 468-484. D. A. andKalbfleisch, Sprott, J.G. (1965) Theuse ofthelikelihoodfunction in inference. Psychol.Bull.,64, 15-22. Thompson, E. A. (1994) MonteCarlolikelihoodin geneticmapping.Statist.Sci.,9, 355-366. L. A. (1989) A robustBayesianinterpretation Wasserman, oflikelihoodregions.Ann.Statist.,17, 1387-1393. Discussion on the paper by Lindsey DavidJ.Hand (TheOpenUniversity, MiltonKeynes) I wouldlike to ask forsome clarification of how you see therelativerolesof modeland questionin data it seems to me thatthe emphasisis oftenon themodel,whereasit shouldbe on analysis.In particular, thequestion.You argue,on p. 23, thatmodelsand inference proceduresshouldbe as sensitiveas possible to the assumptions, while also sayingon thatpage thatone may argue in favourof models whichare insensitive to irrelevant incorrectness. Please can youclarifyyourpositionon this? The trickof modelbuildingis to choose a modelwhichdoes notoverfit thedata,in thesense thatthe models do not fitthe idiosyncrasiesdue to chance events,as well as fitting the underlying structure. Recentyears have witnessedthe developmentof manymethodsforthis. Statisticiansoftensolve the problemby choosing froma restrictedfamilyof models linear or generalizedlinear models, for measure. The neural networkpeople, buildingon the example or use a penalized goodness-of-fit extremeflexibility of theirmodels,have developedstrategiesforsmoothingoverfitted models,as have people workingon ruleinductionand treemethodsin machinelearning.Fromthefieldof computational learningtheorywe have methodswhich choose a model whichhas minimumvariancefroma set of overfitted models,and so on. Many of theseapproachesare ratherad hoc. One of thethingsthatI like aboutthispaper is thatit gives a principledcriterionto choose betweenmodels. You suggestthatthe criterion of compatibility is attractive, and I agree.Based on this,you suggestcalibratingthelikelihood functionin termsof aP. By choosinga smallera one increasesone's confidencethattheincludedeffects are real. However,you also suggestchoosinga smallera so thatonlythoseeffectswhichare practically relevantare includedin themodel.It seemsto me thatthisis mixingup twouses of thea-parameter:its use as a gauge to determine how confident we wantto be thatan effectis realbeforewe agreeto include it in themodel,and itsuse to forcethemodelto be simplerforpracticalreasons.The firstof theseis an whereasthe second forcesit away fromthistruth.It is notclear attemptto reflecttheunderlying truth, thatthisis thebestwayto achievethesecondobjective.Perhapsa betterwaywouldbe to buildthebest modelthatyou can and thento simplifyit relativeto theresearchquestion.This also allows one to take thesubstantive of different effectsintoaccount. importance On p. 20 you say 'a parameter, even a mean,has littleor no valid interpretation outsidethe specific model fromwhichit arises'. And you illustrate thiswithmyfuelconsumption example,sayingthatit is to whichthemean refersare specified.But thisis notthecase. The aspects resolvedif thedistributions of the model which matterhere are the empiricalrelationshipswhich the numberswere chosen to of fuel consumptionin represent.Takingaccountof the Jacobianin movingbetweenthe distribution miles per gallon and itsreciprocaldoes notsolve theproblemof whichof thetwo competingempirical systemswe wishto drawa conclusionabout. You say thatthemodelyoupresentin Section8.1 'is certainlyin no waythe"true"model',butsurely it is, in fact,fundamentally misspecifiedand cannotpossiblybe legitimate.In themodel,u is thedelay time.As suchit is a ratioscale and we mayexpressit in different units.If we do thiswe see thatan extra term,v5t is introducedinto the model. If I fita correctedmodel includinga termin t, I obtain a penalizeddevianceof 803.8, greaterthanthe devianceof 800.8 forthe incorrectmodel thatyou give, whichis presumablywhyyou rejectedthe former.However,if I use (fractionsof) yearsforthe delay time(approximated by dividingthe giventimesby 4), I obtaina penalizeddevianceforyourmodel of worsethanthatof mycorrectedmodel,which,of course,remainsat 803.8. 825.9, whichis substantially oftheunitsthatyou happened The misspecification of yourmodelmeansthatthedevianceis a function to use, which is absurd.In view of the elegantcompatibleformalismthatyou have introducedfor choice of model formhas led to 'incompatibility' comparingmodels,it is ironicthatan unfortunate acrossa choiceofunits! Discussion on thePaper by Lindsey 27 Of course,this is a detail which is easily corrected,and it does not detractfromwhat is a very interesting andvaluablepaper.It givesme genuinepleasureto proposethevoteofthanks. David Draper (University ofBath) Thereis muchto admireand agree within thispaper; in particularthe authordeservesour thanksfor tacklingpracticalissues in what mightbe termedthe theoryof applied statistics.I especially like Lindsey'sremarkson theirrelevenceof and (a) asymptotic consistency is notknownwithcertainty (b) thelikelihoodprinciplewhenmodel structure (i.e., in real problems, always). thatI will focus However,thereis also muchwithwhichto disagree,and it is on thesedisagreements here,ordering mycommentsfrommildto severecriticismoftheauthor'spositions. (a) In Section2 Lindseysays, 'Many will arguethatthese[pointsof view on theexistenceof a "true" model] are "as if" situations:conditionalstatements, notto be takenliterally'. In otherwords,he is sayingthatthisis mathematics, by whichI mean deductivereasoning:if I make assumptionA = {a truemodel exists}, thenconclusionC follows.But recalling(e.g. de Finetti(1974, 1975)) thatall statisticalinferenceproblemsmayusefullybe cast predictively, withtheingredients (i) (ii) (iii) (iv) alreadyobserveddata y, as yetunobserveddata y*, contextC abouthow thedataweregatheredand assumptionsA abouthow y and y* are related it seems to me thatthe best thatwe can ever do (no matterwhat our religiousviews on Bayes may be) is conditionalconclusions,e.g. of the formp(y* ly, C, A): assumption-free is notpossible.Thusto me statisticalinferenceis inherently inference deductive, just liketherest of mathematics; we cannotescape our assumptions.(Even Fisher,theorginal'likelihoodlum' in thispaper-is vague on thispoint,e.g. Fisher littleattention whoseviews receivesurprisingly (1955).) Withhis advocatedmethodsthe authorcertainlydrawsconclusionsbased on assumptions;so in whatsense is he reasoninginductively? undercutsmany of his criticisms.For example, (b) Lindsey'sfailureto address decision-making modelselectionis bestregardedas a decisionproblem:as I and othershave notedelsewhere(e.g. Bemardo and Smith(1994) and Draper (1998a)), to choose a model you have to say to what purposethe model will be put,forhow else will you knowwhetheryourmodel is sufficiently based on context)he would good? If the authortookthisseriously(specifyingutilityfunctions discoverthathe can dispensewiththead hoc natureofthechoice ofhis quantitya. (c) Lindseyclaimsin Section8 that of averagingby integrating out the 'The Bayesiansolution[to the problemof marginalization] "nuisance" parametersis notacceptable:a non-orthogonal likelihoodfunctionis indicatingthat theplausiblevalues of theparameterof interestchange(and thushave different interpretation) dependingon thevalues ofthenuisanceparameters' (emphasisadded). This is arrantnonsense.Forinstance,in thelocationmodel yi = ,u+ a ei IID ei -vtv,, it is truethata and v are typicallycorrelatedin the posteriordistribution (e.g. witha diffuse prior),but the meaningof v as an index of tail weightdoes not change fromone O-value to and theweightedaverage another, p(vly) Jp(vIa, y)p(a y) da of conditionaldistributions as a summaryof model uncerp(vla, y) is perfectlyinterpretable 28 Discussion on thePaper by Lindsey taintlyabouttheneed forthet-distribution insteadofthesimplerGaussiandistribution. (d) The author'sBayesianis a strawman who overemphasizescoherence,relieson highestposterior densityregions,has neverheardof cross-validation and is well out of date on methodology and outlook.As a practisingappliedBayesianI do notrecognizeLindsey'scaricaturein myselfor in (thevastmajorityof) mycolleagues,and I commendresourcessuchas Gatsoniset al. (1997) and Bernardoet al. (1998) to theauthor'sattention ifhe wishesto forma moreaccurateview.Several intofocus. quotesfromthepaperwill bringthisdisparity 'For me, the quintessential... Bayesian dependson methodsbeing coherentafterapplyingthe Bayes formulato personalpriorselicitedbeforeobservingthedata.' Whata narrow,old-fashioned view of Bayes thisis! Personalpriorsand coherenceneed notbe all; out-of-sample predictivecalibration(see below) can help to tune the choice of priorand modelstructure. ... regionsof highestposterior 'For continuousparameters, densityare notpreservedundernonlineartransformations whereasthosebased on tail areas are.' The fullposterioris invariant;and nobodyforceshighestposteriordensitysummarieson you is anotherdecisionproblem,forwhichtail areas may be an adequate (posteriorsummarization workingapproximate solution). 'The argumentin favourof subjectiveBayesianprobabilitiesreliescruciallyon theprincipleof coherence,[whichunfortunately] has onlybeen shownto applyto finitely additivesetsofmodels.' Coherenceis overratedas a justificationforBayes (in the sense thatit is necessarybut not in good appliedwork),and theauthoroveremphasizes sufficient it.The bestthatit can offeris an assuranceof internalconsistencyin probability judgments;to be usefulscientific(and decisionmaking)co-workers, we mustaspireto more as well: to externalconsistency(calibration).For me theargument fortheBayesianparadigmis thatit works:itallows theacknowledgement of an widerset of uncertainties, appropriately e.g. about model structure as well as about unknowns and itmakesthecrucialtopicofpredictioncentralto themodelling. conditionalon structure; 'For a priorprobability distribution to exist,thespace ... of all possiblemodelsto be considered, mustbe completelydefined.Anymodelsthatare notincluded,or thathave zero priorprobability, will always have zero posteriorprobabilityunderthe applicationof the Bayes formula.No ... [Thusunder] can giveunforeseen modelspositiveposteriorprobability. empiricalinformation formalBayesianprocedures,scientific discoveryis impossible.' The firstpartof thisis true,and well knownto Bayesiansas Cromwells rule (Lindley,1972). withBayesian nonparametric But, firstly, methodsbased on Markovchain Monte Carlo computations(e.g. Walkeret al. (1999) and Draper et al. (1998)) we can now put positiveprior probabilityon the space of all (scientifically defensible)models in interesting problemsof real calibration(e.g. Gelfandet al. complexity, and, secondly,we can use out-of-sample-predictive withoutusing (1992) and Draper(1998b)) to learnabout appropriatepriorson model structure take the stingout of Cromwell'srule in the data twice. Both of these approacheseffectively appliedproblems:we are freeto findout thatour initialset of model structural possibilitieswas notsufficiently richand to expanditwithoutcheating. To summarize,the authorhas clearlyshruggedoffthe shacklesof frequentist inferencesuccessfully of his belief in thevalue of the likelihoodfunctionhas (if ever he was so shackled),and the strength broughthim to the very doorstepof Bayes, so withthe same sortof evangelicalspiritwithwhich Lindseyhereproselytizesus about likelihood I urgehim to walk boldlythroughthe Bayesian door: withmoderncomputingand predictivevalidationto keep himcalibrated,I thinkthathe will findhimself rightat home. The voteofthankswas passed by acclamation. Chris Chatfield(University ofBath) This paperis about 'heresies'.The OxfordEnglishDictionarydefinesa heresyas: 'an opinioncontrary to orthodoxor accepted doctrine'.But what is accepted doctrinein statistics?It is surelynot the Discussion on thePaper by Lindsey 29 viewpointof a hardlineBayesianor an ardentfrequentist. Ratherwe need to look to theviews of the 'typicalapplied statistician'who has to solve real problems.While agreeingthatdifferences in philosophyexistin thenarrowfieldof inference, theviews of moststatisticians are muchmorein harmony thanmightbe thought fromtheliterature, withcommongroundon thetenetsof good statisticalpractice, suchas clarifying objectivesclearlyand collectinggood data. The papersaysmuchaboutwhatBayesiansand frequentists mightdo, butsuch labels can be divisive will use whicheverinferential and I preferthe label 'statistician'.Most applied statisticians approach to solve a particularproblem.Perhapsthe real hereticsare statisticians seems appropriate who cannot to applyone mode of inferencein all contexts'. agreethat'we shouldnotattempt The biggestfictionof statisticalinferenceis thatthereis a truemodelknowna priori.Surelyit is no longera heresyto say this?Thereis a growingliterature (e.g. Chatfield(1995) and Draper(1995)) on modeluncertainty topics,such as model mixingand sensitivity analysis.Withno 'true' model, most argumentsabout the foundationsof inferencecease to have much meaning and many inferential 'heresies'evaporate.Forexamplemodelcomparisonis clearlymoreimportant thanmodeltestingifone It mayhelp to expandinferenceexplicitlyto acceptsthatone is lookingforan adequateapproximation. include modelformulation, but even thenit would only includepartof what a statisticiandoes. The statistician's job is to solveproblems,and not(just) to estimateparameters. How should the paper change what I do? I would have liked more examples as the example on acquiredimmunedeficiencysyndromewas not helpfulto me. In Fig. 1, I thinkthatI can make better forecasts'by eye' thanthosegivenby anyofthemodels. Overallthepapermakesmanygood pointsbutis less controversial thanitthinksit is and I do nottell mystudentsor clientsany of thepoints(a)-(e) listedin Section 15. I do agree thatstatisticsteaching, and thestatistical shouldmoretruthfully reflectwhatstatisticians literature, actuallydo. MurrayAitkin(University ofNewcastle) I am so muchin agreement withJim'smainpointsthatI shallsimplylista fewpointsof disagreement. The likelihoodprinciple I findthe likelihoodprinciplepersuasivein its narrowform.The different confidencecoveragesof the referto referencesets of samples likelihoodintervalforthepositiveand negativebinomialexperiments The likelihoodintervalrefersto the experifromdifferent hypothetical replicationsof the experiment. mentactuallyperformed. Real replications,as in Jim'spredictions, requirethe explicitmodel forthe new data.Thereis no conflictwiththelikelihoodprinciplein this. Models The 'truemodel' is an oxymoron.Model specificationsare generallydeficientbecause theyomitthe neglectedpart.The usual normalregressionmodel iN(0, ga2), Y, = a +3xi +ci, failsto distinguish betweennormalpopulationvariationabout the model mean f,u forfixedx and the ofthemodelmeanfromtheactualpopulationmean.The representation thatwe need is departure Yilxi N(Ji, a 2), a + /xi + i, 0 thisoverdispersed modelcan be fitted where/i is theneglectedpart.Giving0 an arbitrary distribution, maximumlikelihoodas a finitemixture, as in Aitkin(1996). bynonparametric -i Model comparisons I have suggestedin Aitkin How shouldmodel comparisonsbe made, in a pure likelihoodframework? of the likelihoodratio.In the (1997) a generalizationof Dempster's(1974, 1997) posteriordistribution simplestsingle-parameter case, Ho specifies0 00,and HI is general.The likelihoodratio LR L(0o)/L(0) has a posteriordistribution derivedfromthatfor0. We can therefore computetheposteriorprobability thatLR < 0.1 or anyothervalue whichwouldsuggeststrongevidenceagainstHo. Stone(1997) givesan exampleof Bernoullitrialswithn = 527135 and r = 106298, withHo: p = 0.2. The observedproportion p = 0.20165 is 3 standarderrorsfromHo, withP-value 0.0027. For a uniformprioron p, the Bayes factorforHo to HI is 8 and thefractional Bayes factoris 4.88, whereastheposteriorBayes factor is 0.0156, in agreementwiththe P-value. The posteriordensityof p forthe same uniformprioris N(0.20165, 0.000552), so Ho is faroutsidethe 99% highestposteriordensityintervalforp, contra- 30 Discussion on thePaper by Lindsey theposteriorprobability dictingtheBayes factorconclusion.Equivalently, thatLR < 0.1 is 0.964. (The posteriorprobability thatLR < 1 is 1 - P = 0.9973.) This approachis focusedon likelihoodratiosas measuresof strength of evidence,and it expresses conclusionsin posteriorprobability termswhichrequireno morethanconventionaldiffuseor reference priors.At the same time P-values can be expressedexplicitlyas posteriorprobabilities,providinga of Bayes,likelihoodand repeatedsamplinginferential unification statements fornestedmodelcomparisons. D. R. Cox (University of Oxford) I was a littlebaffledby thispaper.Thereare certainlyimportant and constructive pointsto agree with and warningsto be heededbutI foundit ratherdisconcerting thatProfessorLindseyattributes views to In particular, of conventional othersso firmly. his descriptions statisticalthinking are notrecognizableto me,as a veryconventional statistician. For example,as a keen BayesianI was surprisedto read thatpriorsmustbe chosenbeforethe data have been seen. Nothingin the formalism demandsthis.Priordoes notreferto time,butto a situation, hypothetical whenwe have data,wherewe assess whatour evidencewould have been ifwe had had no data.This assessmentmayrationally be affectedbyhavingseen thedata,althoughthereare considerable dangersin this,rathersimilarto thosein frequentist theory. As an enthusiasticfrequentist, again some statements seem to me not partof a sensiblefrequential formulation. On a moretechnicalpoint,it is nottruethata frequentist approachis unableto deal with non-nestedhypotheses;thereis an extensiveliterature on thisovernearly40 years.Althoughthereare difficulties withthis,thereis no problemin principle. ProfessorLindsey'slikingfora directuse of the likelihoodis veryunderstandable, but the difficulties of nuisanceparametersare not reallyaddressed.We need some formof calibrationto avoid the of theNeyman-Scotttypeas well as misbehaviourof the profilelikelihoodin othermuch difficulties simplerproblems. Some of thedifficulties whichProfessorLindsey,and manyothers,have withfrequentist theoryas it is sometimespresentedare based on a failureto distinguishbetweenthe operationalhypothetical proceduresthatdefinetermslike significancelevel and confidenceinterval,and advice on how to use them.The narrower thegap betweenthesetwothebetterbutthereis a clear difference. Thus,defining a significancelevel via a Neyman-Pearsonapproachrequiresa decision-likeattitudewhich Neyman himselfdid notuse whenhe analyseddata. Two finalcommentsare thatall carefulaccountsof asymptotic theorystressthatthelimitingnotions involvedare mathematical devicesto generateapproximations. These deviceshave no physicalmeaning and discussionsthatsupposethatthereis sucha meaningare largelyirrelevant. thefirstsentenceof thepaperis in conflictwiththeempirical Finally,and perhapsmostimportantly, observationthatstatisticalmethodsare now used in manyfieldson a scale vastlygreaterthaneven 20 yearsago. Not all theseapplicationsare good and surelythereare manyareas wherestatisticalmethods are underusedor badly used and we should tryto addressthese. Althoughthereare no groundsfor thegeneralnoteofpessimismthatstartsthepaperseemsto me misplaced. complacency, J. A. Nelder (ImperialCollege ofScience,Technology and Medicine,London) I welcome thispaper withveryfew reservations, the title'The likelihood but I would have preferred school strikesback'. Many statisticians believe thatinferencemustbe eitherfrequentist or Bayesian, whichis nonsense.I hope thatthesestatisticians will come to realize thatenormoussimplifications can followfromLindsey'sapproach.For example,theadoptionof likelihood(support)intervalsfortheodds ratioin 2 x 2 tableswouldlead to ourdiscardinga large,messyand confusingliterature. I wish thatthe authorhad calibratedthe log-likelihood, ratherthanthe likelihood,function.I findit easier to deal withlog(a) =-1, ratherthana = 1/e. Furthermore, plots withlog-likelihood(or deviance) as theordinateare muchmoreinformative visuallythanthoseusinglikelihood,wheretheway in whichtheordinategoes to zero is hardto see. A topic missingfromthis paper is quasi-likelihood(QL). It is oftenintroducedby stressingthat, because it is specifiedby the mean and variancefunctiononly,those are the only assumptionsbeing ifwe normalizeit we findthat made: notso. Froma QL we can construct an unnormalizeddistribution; the normalizingconstantis eitherexactlyconstantor varies only slowlywiththe mean. Further, the at least up to level 4, have a patternclose to thatwhichan cumulantsof the normalizeddistribution, This is theassumptionthatwe exponentialfamilywouldhave ifone existedwiththatvariancestructure. Discussion on thePaper by Lindsey 31 makewhenusingQL. In theformof extendedquasi-likelihood(EQL) (Nelderand Pregibon,1987) it is identicalwiththe unnormalizedformof Efron'sdouble-exponential families(Efron,1986). Any EQL criterionand similarstatistics.In summary,QL is a gives a deviance and hence Akaike information valuableextensionto likelihoodinference. I am interested in the speaker'sreactionto the idea of the h-likelihood(Lee and Nelder,1996) as a to modelswithmultiplerandom way of generalizingtheFisherlikelihood,and so likelihoodinference, thatis necessaryfor effects.Maximizationof the h-likelihoodavoids the multidimensional integration theuse of marginallikelihood,and thearbitrary specification of priorsforthebottomlevel parameters. Markov chain Monte Carlo samplingis replaced by a generalizationof iterativeleast squares,with computingtime being reduced by ordersof magnitude.I hope thatthe speaker will contributeto to thiswiderclass ofmodels. extendinglikelihoodinference S. M. Rizvi (University of Westminster, London) I am delightedby thishighlyprovocativeand controversial paper,as it conformsto some of my own hereticalviews developed over the past 30 years of practisingand teachingstatistics.I had long suspectedthatSirRolandFisherwas a clandestineBayesian,as some timeago I derivedhis multivariate function fromtheBayesianstandpoint. discriminant I also believethattheauthor'ssecondprinciple(p. 2) is self-contradictory. It is necessaryto appreciate thatepistemology deals withquestionsof perception,questionsof inferenceand questionsabouttruth. Thus modelswhichbest conformwithdata best accordingto one's light are a simplification of the in that.Further, theauthor'sinsistence reality, and hencethatof ontologicaltruth.I see no contradiction in whichmodels are cast is not supported thatinferenceshouldbe invariantto the parameterizations by experience,as the authorhimselflaterindicates.Once you accept a model,yourinferencetakes a deductiveform.Anyinferenceis subservient to theparameterization, and notinvariant to it. I rejecthis secondprinciple. Thereis no suchthingas a truemodel or a truetheory.Thereare onlymodelsand theoriesthatwork or do notwork.Once we accepta modelbased on givendata,thereis no escape fromtheconclusions,at leastwithrespectto thesampleconcerned.We need to acceptthatmodelsare tentative and provisional. This is whollyconsistentwith scientificinference.That judgmentsof science are provisionalis not appreciatedevenby someNobel prize-winners. I questiontheintroduction of an 'intuitiveor pseudo-intuitive' as theauthorappearsto do argument, whenhe castigatestrue'models' as counter-intuitive. We need to appreciatethatintuitionis an aid to butit is nota groundfora discursiveargument. Russellwas to putit moreeloquentlywhen perception, with'animalinference'. he equatedintuition P-values are fine,but theydo not provideus witha clear dividingline foracceptingor rejectinga A significance testdoes. hypothesis. I take issue withthe author'sinsistencethatinferenceshouldnot be backed by probabilities.That assertionsshouldbe based on probabilities has respectableadvocates.It was thephilosopherJohnLocke who was the firstto assertthatthe degreeof assentwe give to a propositionshouldbe based on its probability. Finally,confidenceintervalsare fineif you are dealingwitha two-sidedalternative hypothesis;they witha one-sidedalternative. are misleadingwhenconfronted AndrewEhrenberg(SouthBank University, London) in the I am sure thatwe all laud ProfessorLindsey's effortto face up to the many contradictions theoretical of statisticalinference. treatment These contradictions are,as he says,one reasonwhystatisticsis so oftenviewedas bothdifficult and unnecessary. But even if statisticians raisesit would do face up to and resolvetheproblemswhichLindseyrightly notimprove'the low publicesteemforstatistics'.Statisticswould stillbe seen as an 'unnecessaryevil', because studentsand clientshardlyever actuallyuse the tools of statisticalinferencewhichtheyare Nor is thereanysoundreasonwhytheyshoulduse them. beingoffered. As I see it,statisticalor probabilisticinferenceis at best a relativelyminortechnicalissue. It arises whenone has just a singlerathersmall sample,whichhas been properlyselectedfroma well-defined population.Thatis rare. and clientsusuallyface Instead,appliedstatisticians (a) eitherquitea largedata set,or evena verylargeone, 32 Discussion on thePaper by Lindsey butbroadlycomparablepopulations,and hencesampleswhichare at (b) or data frommanydifferent leastcumulatively large,or verylarge, (c) or a small data set whichis not an explicitsample fromany definedlargerpopulationanyway (e.g. all theacquiredimmunedeficiencysyndrome(AIDS) cases in St Thomas'sHospitalin the autumnof 1992). But theseare not Such datahaveproblemsof analysis,interpretation and,broadlyspeaking,inference. statisticsin generalor in ProfessorLindsey'spaperin particular. addressedeitherbytheoretical To finishwitha specificpoint:ProfessorLindseysays thatin his experiencemorethan95% of the inferential workwhich is taken to statisticiansfor advice is exploratory, i.e. concernedwithmodel development. But whatappliedstatisticians and clientsmostlydo on theirown is to use existingmodels(formalor informal), ratherthanalwaysto developnew ones. For example,anyanalysisof AIDS data shouldnow use thepriorknowledge(or 'verbalmodel') thatthereare delaysin thereporting of cases of AIDS, and also in therecordingofthem. fromfirstdevelopingit. Usinga modelis verydifferent G. A. Barnard (University ofEssex, Colchester) Lindsey'spaper is verymuchto be welcomedforits emphasison the factthatstatistics,like mathematicalphysicsand fluidmechanics,is a branchof applied mathematics in whichresultsmaybe very valuabledespitebeingonlyapproximate. And his stresson therelevanceof new computingpossibilities to thelogic of ourfieldis mosttimely. forinference My onlyseriousqueryarises fromhis remark(p. 2) thatFisherwas nevera frequentist purposes.True, he made clear in his 1925 classic thatinferenceswere to be expressedin termsof likelihood,notin termsof probability. And he was muchcloserto Jeffreys thanto theNeyman-Pearson school. But likelihoodratioshave a long runfrequencyinterpretation, in termsof odds, thoughnot in termsofprobabilities. If E has been observed,thelikelihoodratioforH versusH' is L(E) = Pr(E IH) /Pr(EIH'). If we choose H onlywhen L(E) exceeds W,and H' onlywhen 1/L(E) exceeds W,makingno choice thenwhenevereitherH or H' is truethelong runfrequency ratioof rightchoices to wrong otherwise, choiceswill exceed W Limitationson computerpowermeantthatFishercould not implement thisview in his lifetime.But now deskcomputers withspreadsheetfacilitiescan exhibitvariouslikelihoodfunctions fullyconditioned on thegivendata on variousassumedmodels.We onlyneed to worryabout such differences whenthe spreadsheetsdifferseriouslyin relationto the questionsof interest.Such cases are much rarerthan unconditionalapproachesmightlead us to think.The Cushny-Peeblesdata quotedby Studentin 1908 and by Fisherin 1925 and moreaccuratelyby Senn (1993) illustrate thispoint.Cushnyand Peebleswere interestedin the evidence for or against chiralitywith the drugs used. Withoutchiralitythe data about0. Withchirality it wouldbe symmetrical aboutsome non-zero distribution wouldbe symmetrical value. Assumingexact normality and no chiralitythe odds on t fallingshortof ratherthanexceeding 4.03 are exactly698 to 1. But if we replace the false assumptionof normalitywiththe equally false assumptionof a Cauchydensitywe findthatfor thedata to hand the Cauchy spreadsheetdifferslittle fromthe normalspreadsheet.And littlechange resultsfromtakingpropernote of the factthatthe patientswould nothave been allowedto sleep formorethan24 hours.To relateour spreadsheetsto the butwe can see from questionat issue we mayneed to makeassumptionsabouttheirrelevant parameters, thespreadsheethow much,or how little,theseassumptions matter. StephenSenn (University College London) This stimulating and Bayesian papermakesmanyexcellentpointsnotleast of whichis thatfrequentist modelsbutthatthisrequirement is commonlyignored.However,I approachesbothrequireprespecified am notconvincedthattheauthor'simplicitdistinction betweenparameterestimationand model choice is fundamental. For example,he objectsto 'incorporating otherthan priorknowledgeand information, thatrequiredin theconstruction ofthemodel',butin practiceI thinkthattherewill be manycases where will be unsurewhatthisLindseyianprescription the applied statistician permitsand forbids.Take the AB/BA crossoverdesign (Grieve and Senn, 1998). We can considermodels withand withoutcarryover.For thischoice it seems thatthe authorwould allow priorknowledgeto play some role,but the lattermodelis merelya special case of theformer withA,thecarry-over effect,equal to 0. However,the Discussion on thePaper by Lindsey 33 modelwithcarry-over effectsincludedis uselessunlesscoupledwithan informative priorthatA is likely to be smallwhich,however,it seemshis prescription forbidsus to assume. Also the problem(Section 8) of what is called consonance in the technicalliterature on multiple testing(Bauer, 1991) is notuniqueto statisticsand thereis nottheslightestdifficulty in explainingit to non-statisticians. Two squabblingchildrenare separatedby theirmother.She knowsthatat least one of themis guiltyof aggressivebehaviourbutshe will oftenbe incapableof establishingdefinitely thatany givenchildis. Is thisincompatible withhumanlife? Finally,a small matter:contraryto Section 15, it is the exceptionratherthanthe normfordrug let alone individualclinicaltrials,to involvetensof thousandsof subjects.A development programmes, typicalprogramme will involve5000 or fewersubjectsand individualtrialsare usuallymeasuredin the hundreds.The author'spointaboutlostopportunities in measurement is, none-the-less, well taken. D. V. Lindley(Minehead) My commentsare mostlyon statements made in thepaperaboutBayesianideas. (a) 'Inferencestatements mustbe made in termsof probability'because of arguments by de Finetti and others.These take self-evident principlesand fromthemprovethatuncertainty satisfiesthe probabilitycalculus. It is wrongto say thatthe Bayesian methodis 'internallycontradictory' whenitis based on coherence theavoidanceof contradiction. is nottaken'formathematical convenience'.Thereare enoughexamplesto (b) Countableadditivity showthatsome consequencesof finiteadditivity are unacceptable.In anycase, conglomerability is self-evident formost. (c) Likelihood is not, alone, adequate for inferencebecause it violates one of the self-evident principlesin (a). It can happenthat,withsets of exclusiveevents{Ai}, {Bi}, Ai is morelikely thanBi forall i, yettheunionoftheAi is less likelythanthatof theBi. (d) Likelihoodis not additive,so thereis no generalway of eliminatingnuisanceparameters.13 varietiesof likelihoodtestify to thestruggleto overcomethishandicap. The joint distri(e) The old examplein Section 8 is not incompatiblein the Bayesian framework. butionof a, /3has marginsfora and /3separately.Is thesuggestionthatthelatterare incompatible withtheformer? (f) Probabilityis invariant.What may not be invariantare summarystatistics,like the mean or notof thedistribution. highestposteriordensityintervals.But thisis a criticismof thesummary, Similarcommentsapplyto thecomparisonofthenegativeand positivebinomials. is alwaysconditionalon whatis known,or assumed,whentheuncertainty statement (g) Probability is made.It is nottherefore truethat'all possibleeventsmustbe defined'. (h) All of us are continuallymakinginformalpriorjudgmentsafterseeing the data. Justbefore writingthis,I readof some data on lobbyists.A reactionwas to thinkof myprior(to thedata) on and to recallthe'cash-for-questions' affairin theHouse of Commons. lobbyists, it is not Like probability, (i) A model is yourdescriptionof a small worldin termsof probability. betweenyou and thatworld. objectivelytrue,butexpressesa relationship (j) The statement abouttheparadoxis wrong.The modelwithpointpriordoes notdominateif the alternative is true. said thata modelshouldbe as big as an elephant.Untilrecently (k) Manyyearsago Savage correctly we have not been able to act on the advice forcomputational reasons.It now looks as though but therestill remainsthe Markovchain Monte Carlo samplingwill overcomethe difficulty, distributions. unsolvedproblemoftheassessmentand understanding of multivariate Nick Longford(De Montfort University, Leicester) JimLindseyshouldbe congratulated on a thoughtful critiqueof manyconventionsdeeplyingrainedin statisticalpractice.Many of themdo have a rationalbasis, butonlyin a specificcontext.Our failureis thatwe applythemuniversally. I have specificcommentson Section11. I do not see the conflictbetween'the largerthe sample the better'and the needs of an applied All otherdesignfactorsbeing equal, largersamplesizes are associatedwithmoreinformastatistician. tion.The purposeof the sample size calculationsis to save resourcesand to ensureprecisionthatis at least as highas a presetstandard.When a model of certaincomplexity(smoothness)is requiredit is a faultof themodel selectionroutinethatlargerdata sets tendto yieldmorecomplexmodels.No model selectionprocedureis verygood if it functions well onlyfor'good' sample sizes. A moreconstructive approachthanthatimpliedby Lindsey'sdiscussionis to remove,or smoothout,fromthemodel selected 34 Discussion on thePaper by Lindsey by a formalstatisticalprocedureany effects(differences) thatare substantively unimportant. Then we can address,ifneedbe, theissue ofprecisiongreaterthantheminimumacceptable. Vaguelyspeaking,by the sample size calculationswe aim at matchingstatisticalsignificancewith substantive importance.The product,the sample size n, maybe way offthemarkbecause the formula used involvesresidualvarianceand otherparameters, some of whichare onlyguessed,oftenwithonly modestprecision.Also, in multipurpose studies,or studiesthatwill by analysedby a wide constituency of secondaryusers,thesamplesizes will be 'too large' forsome inferences and too smallforothers.It is withthe latter,but no problemwiththe former.If everything unfortunate else fails,simplyanalyse a simplerandomsubsampleofthecases. T. A. Louis (University ofMinnesota,Minneapolis) Lindseymakes manyexcellentand manyprovocativepoints.Most of my commentsderivefromthe view thatthelikelihoodis keyto developinginferences and thatno model is correct.ThoughI agreeon thelikelihood'scentralrole,effective designsand analysesalso dependon inferential goals and criteria (see Shenand Louis (1998) on thepoorperformance ofhistograms and ranksbased on posteriormeans). Lindsey supportsgoal- and criteria-driven proceduresin his considerationof Bayesian highest content posteriordensity(HPD) regions.An HPD regionhas thesmallestvolumefora givenprobability butis nottransformation equivariant.Posteriortail area intervalsfora scalar parameterare equivariant, and Lindseypromotestheiruse. However,transform equivarianceis onlyone criterionand I shall use for a skewed posterior,for a multimodal the HPD region when minimizingvolume is important, posterioror one withouta regularmode and fora multivariate parameter. Lindseyis correctthatno model is correct.Unfortunately, he concludesthatmodel complexitymust increaseas the sample size increasesand so asymptoticanalysesbased on an assumedtruemodel are irrelevant. Believingthatsome parametersretaintheirmeaningwithincreasingsample size, I strongly A largesample size allows us to disagreethatasymptotic propertiessuch as consistencyare irrelevant. increasecomplexity butshouldnot requireus to. Also, asymptotics can revealimportant propertiesand justifymodemmethodsoffinitesampleinference suchas thebootstrap. In choosingbetweenmodels, aP likelihoodcalibrationeffectively structures the trade-off between complexityand degreesof freedom.Selectinga is vitaland moreguidanceis needed (whatis thebasis in choosing fora = 0.22 in Section 8.1?). However,goals and priorknowledgecan be instrumental betweencandidatemodels (see Fosterand George (1994)). And, selectionproceduressuch as crossvalidationhave the advantagesof dealingdirectlywithpredictivegoals and of being relativelyfreeof assumptions. continuousprobability models.Continuousdistributions Lindseygoes overboardin criticizing provide to (discrete)realityand have theirquirks,but theycan be effectivein studying onlyan approximation propertiesand communicating insights.Furthermore, replacingtheGaussianby thePoissondistribution will generateas manyproblemsas iteliminates. Lindsey criticizesthe use of standarderrors(SEs). His criticismis old but worthrepeating.SEs summariesof thelikelihood(or shouldbe replacedby a set of likelihoodregionsand otherinformative or posteriordistribution). samplingdistribution statistical Finally,are the'heresies'in thetitlecurrent practicesor theauthor'sviews? Peter McCullagh (University ofChicago) In statistics, as in fashion,a model is an idealizationof reality.JoePublic knowsthisall too well, and, feelobligedto engagepublicallyin incoherent he does notordinarily althoughhe maygrumblesilently, does not live up to the image portrayedin glamour Quixotic ramblingswhen his wife or girlfriend magazinesand romanticnovels. Scientistsknowthisalso. To a large degree,science is the searchfor Mendel'slaws are good science,butit does not a searchformodelsas an aid forunderstanding. pattern, followthattheyare universallyobeyedin nature.Justas Mendel's laws do notcoverall of genetics,so too neithertheBayesiannorthe frequentist approachcoverseveryfacetof statisticalpractice.I do not see this as a bad thingforgeneticsor statistics;it is evidence of the richnessand diversityof both subjects. forvalid criticism.But sweepingunsubstanCurrentstatisticalpracticepresentsample opportunity tiatedclaims,undueemphasison thepicayuneand plain errorsof factdo not add up to a strongcase. in theBayesianapproach. I see no internalcontradictions Despitetheauthor'ssinceresermonizing issues is ordinarily welcome.Whilethispaper The opportunity to discussand to reassessfoundational is passionatelyprovocativeon minorpointssuch as likelihood(3), the discussionof major issues is Discussion on thePaper by Lindsey 35 irritating in tone and confusedon the facts.Despite this,the data analysisseems reasonablysensible,a triumphof commonsense overunsounddogma.In thediscussionof foundational matters, however,the partsof thepaperthatare trueare notnew,and partsthatare neware nottrue. R. Allan Reese (Hull University) Althoughagreeingthatthelack of sharedcredoand approachcontributes to thepoor publicperception of statisticsand statisticians, it is extremeto place theblameentirely on statisticians and teachers.Users of statisticalmethodsthemselvesapproachthesubjectwithfearand reluctance,treating it as somewhere betweenabstrusemathematics and witchcraft. An inabilityto relatequantitative data to realitymaystart well beforestudentsmeetstatistics(Greer,1997). Fromthatstarting point,and invariablyagainsttight to instilgood principlesof dataanalysis. deadlines,itis difficult Thereare shiningexamplesof good practiceand excellentsoftware, and I have taughtusingGenstat, GLIM, SPSS and Stata.However,I have knowna studentattenda whole courseon GLIM onlyto ask 'buthow do I do regressionlike in thebook?'. Mead's (1988) classic workcontainsan excellentchapter on 'Designingusefulexperiments', butthisis tuckedaway at theback because 'the standardformatof books on designis inevitable,and therefore correct'.I wishthathe had challengedthatview! Underlying thepaper appearsto be a philosophicalquestion.We make assumptionsand hence propose mathematicalmodels whichreflecta theorythatis externalto the data. We can testthe models and findthosewhichfitbest.However,we can nevertestor verifythe assumptions.Each real-lifedata set is unique.Even whenrepeatinga well-established protocolundercontrolledconditions,we need to allow forthepossibilityof outliers.Hence theargument is forevercircular we trustthemodelbecause it fitsthe data, and we believe the data because theyfollowthe model. In thatsense, 'all' statisticsis heuristicand all relieson inference. This dilemmais illustratedin Fig. 1(a). A Health Ministerseeing thatgraphwould ask, 'Is the incidenceof acquired immunedeficiencysyndromenow going up or down?'. Because of delays in reporting, we can 'only' answerthatquestionby relyingon a model. The model mustincludeassumptionsaboutthepatternof reporting. Choose yourassumptions, and you can give whicheveransweryou thinktheMinisterwouldprefer:lies, damnedlies,and .. .? The followingcontributions werereceivedin writingafterthemeeting. P. M. E. Altham(University ofCambridge) ProfessorLindseyhas givenus a discussionpaperwhichis scholarly, and thoughtprovoking. thoughtful I havejust a smallpointto make. The example about acquired immunedeficiencysyndromecorrespondsto a large and sparse (trithedevianceof 716.5, with413 degreesof freedom(DF) table.The sum forming angular)contingency quoted on p. 8, and the subsequentdeviancesforthe same data set includea highproportionof cells withfittedvalues around0 or 1. This must surelyhave a severe effecton the reliabilityof the x2A traditional approximation. remedyis to combinesome of the cells. For example,as a quick solution we combinethelasteightcolumnsofthedata setof De Angelisand Gilks(1994). Thenthedeviancefor model (1) is now 514.54 with254 DE Using thenegativebinomialfitting routinesuppliedin Venables and Ripley's(1997) library, I foundthatthecorresponding negativebinomialmodelhas deviance337.79 with254 DE Consideringthatthetotalsamplesize is 6160, thisdoes notseem such a severedeparture frommodel(1) and it allows some degreeof overdispersion relativeto thePoissonmodel. F. P. A. Coolen (University ofDurham) I shall mentiontwo approachesto statisticalinferenceand referto commentsby Lindsey.I strongly supportthe subjectiveapproachto inferencewheneverpracticalsituationsrequirethe use of knowthatis available in experimental or historicaldata. de Finetti(1974) used ledge otherthaninformation prevision(or 'expectation')insteadof probabilityas the main tool forinference(Lindsey,Section3), and indeed previsionfor observablerandomquantitiesprovidesa naturalconcept for dealing with This line of workhas been extendedsuccessfully uncertainty. by Goldstein;see theoverviewand further referencesin Goldstein(1998). One of the nice foundationalaspects of this work is that a Bayes but posterioris the expectationof whatone's beliefswill be afterobtainingthe additionalinformation, notnecessarilyequal to thosebeliefs(Goldstein,1997). Of course,scientific discovery(Lindsey,Section to a subjectivist.The factthatactual beliefsmightbe different froma Bayes posterior 3) is important is perfectlyacceptable.The subjectiveapproachallows the use of non-observablerandomquantities, or parameters,but well interpretable quantitiesare needed forassessment.Problemswith regardto 36 Discussion on thePaper by Lindsey inferenceon uninterpretable parameters(Lindsey,Section 8) are avoided if the actual problemsare expressedin termsofwell interpretable randomquantities; Notwithstanding my supportforthe subjectiveapproachto inference,I feel the need for a nonsubjectivemethodforinduction.I was delightedto see Lindsey'sreference to Hill (1988) butsad thatno further was paid to Hill's assumptionA(,,)and relatedinference. In manysituations(butnotfor attention examplein the case of timeseries),bothBayesiansand frequentists are happywitha finiteexchangeabilityassumptionon futurereal-valuedobservations, meaningthatall possibleorderingsare considered as equallylikely.Fornonparametric methodsthisassumptionof equallylikelyorderings shouldstillhold aftersome data have been observed.This post-dataassumptionis Hill's A(,,),whichis not sufficient to in manysituationsbutdoes providebounds(Coolen, 1998) which derivepreciseprobabilistic inferences are impreciseprobabilities(Walley,1991). I suggestthatA(,) providesan attractive, purelypredictive, thatcan be used as a 'base-linemethod'(Lindsey,Section13). Moreover, methodforstatistical inference morerealworlddata than A(,) can deal withdata describedby Zipf's law (Hill, 1988), whichrepresents any otherknownlaw, includingthe Gaussian law, and I doubtwhetherthisis trueforPoisson process models(Lindsey,Section6). The mainchallengeforA(,,)-basedstatisticalinferenceis generalization to multivariate data (Hill, 1993). JamesS. Hodges (University ofMinnesota,Minneapolis) ProfessorLindsey'spaperis mostwelcomeand greatfun.Hear,hear: 'Foundationsof statistics'is dead to practice;theneeds of applied statisticians because it is irrelevant shoulddrivethenew foundational on manyspecifics,e.g. debate.(Hodges (1996) takesa similartack.)Lindsey'spaperwas also delightful Bayes factors. None-the-less,two of ProfessorLindsey'skey premisesare unsatisfying. The firstis that 'model development[is] the principalactivityof most applied statisticians'.In 15 years of highlyvaried statisticalpractice,myprincipalactivityhas been helpingcolleagues to answersubstantivequestions. Contra Lindsey ('Scientistsusually come ... with a vague model formulation to be filled out by empiricaldata collection,not witha specifichypothesisto test'), my colleagues almostalwayshave a specificpurpose; thoughit is not always formulatedas testingsome , = 0, ProfessorLindsey's descriptionis stillinaccurate.Moreover,selectinga model rarelymatterseven to me. Instead,themain issue is whetherthe data give the same answerto the substantivequestion in alternativeanalyses, thatis internalto each analysis.(Oddly,Fig. 1 shows no uncertainty accountingforthe uncertainty bounds.) A minorityof such alternativeanalysesare motivatedby statisticalobsessionslike outliers. of variablesand other More often,theyinvolveexcludinglarge subsetsof the data,variantdefinitions in modelform. in likelihoodfunctions and notworthenshrining problemsthatare awkwardto represent The answerto thesubstantive questionusuallydoes notchangeacrossthealternative analysesbut,when it does, in myexperienceno meremodelselectioncriterion will convincea scepticalcolleague or editor on a particularanalysis. is compoundedby ProfessorLindsey'spreferred This difficulty criterion, forscepticsmustbuy the is criterion magicnumbera. But wheredoes a come from?SometimestheAkaikeor Bayes information acceptable,butotherwisewe mustpick a. In Section8.1, a = 0.22-not 0.21, not0.23 is presentedas 'reasonable'. Why should we preferProfessorLindsey's view of reasonableto anyone else's? The rationalein Section 11.1 is better:we need a = I0-'3 to choose the parametricmodel overthe others. This movesin a potentially usefuldirection-whatdoes it taketo changetheanswer? butis stillquite removedfromthequestionsforwhichsomeoneobtainsmoneyto collectdatato bringto a statistician. D. A. Sprott(University of Waterloo) This is an interesting, and well-written thoughtful paper.I have onlya fewminorpointsof disagreement or emphasis. It wouldbe nice to believethatthecriticismsof theemphasison unbiasedestimatesare well known. Butjournalsstillseem intenton publishingpaperscorrecting themaximumlikelihoodestimateforbias and estimatingthe varianceof the resultingestimateseven thoughthe resultingestimatingintervals containhighlyimplausible,ifnotimpossible,values oftheparameter. inference. But mymain commentis aboutthepositionrelegatedto confirmatory or non-exploratory is exploratory. But I thinkthattheadvanceof sciencealso rests Perhaps95% of theworkof statisticians on what happens afterthe exploratorystage the process of demonstrating repeatabilityand of the subsequentaccumulationof evidence.This is given shortshriftby the commentthatwhen this stage is reachedsophisticatedstatisticsis no longerneeded,the factsspeakingforthemselves.This is un- Discussion on thePaper by Lindsey 37 doubtedlythe finalresult,as discussedby Fisher(1953a) and by Barnard(1972). But much,possibly statistics sophisticated, precedesit.Fisher(1953b) was himselfan exampleof this.Perhapsthescientists who originatetheproblems,notstatisticians, will be thosewho are involvedin thisprocess,as suggested by MacGregor (1997). But this would merelyconfirmthe obvious low esteem advertedto in the aboutthislow esteemseemsless introductory sentenceofthepaper,althoughtheconcernof statisticians obvious. The authorrepliedlater,in writing, as follows. I am pleased to see thehighlevel of constructive criticismby mostdiscussants.This topiccan engender a strongemotions!Many questionthatI say anythingheretical.Yet, afterone earlierpresentation, theoreticalstatisticianasked me 'Do you actuallyteach yourstudentsthis stuff?'.One referee'sconclusion was 'This is a shallow,incomprehensible, unscholarlyofferingthatoughtnot to grace our is ofteneverydaypracticefortheappliedstatistician. Society'sjournals'.Heresyforthetheoretician In exploratory scientific as in muchof appliedstatistics, thegoal is to convinceyourpublic, inference, herethe scientificcommunity, thatthe empiricaldata supportyourscientifictheorybetterthanothers if a scientistapplies independent replicationto the statistical available.Replicabilityis key In contrast, he or she risksreceivingquitedifferent answers! analysisprocess,consultingseveralstatisticians, is thatwe oftenhave one favouritemodel, and inferenceproOne greatweaknessof statisticians to cedure,thatwe tendto applyeverywhere possible. In contrast,the likelihoodapproachis restricted scientific modelselection.It has nothingto offerin courtdecisions,industrial qualitycontrol exploratory or phase III clinicaltrialdrugacceptance. in mycaricaturesof Bayesians Severaldiscussantsobjectthattheydo notrecognizereal statisticians but these descriptionsreferto foundationprinciples,not to people. Should I have and frequentists, strictly followingBayesian replaced'Bayesiansor frequentists do' each timeby 'an applied statistician or frequentist momentdoes'? (Most theoreticians are less flexible.)If theproprinciplesat thisparticular ceduresof thesetwo schools are meantto describegood practice,whythendo certainargue('Neyman himselfdid notuse [it]whenhe analyseddata') thattheyare meantto be ignoredin application?Should we notexpectthe same highstandardsfromour professionas we do of scientistswho consultus? Will because ofthearbitrariness? notourpublicotherwisebe unconvinced, Virtuallyall the discussantsseem agreed thatno model is true.Few, however,address my more fundamentalargument,thatprobabilityis inappropriatefor exploratoryinference.Comparabilityof likelihoodcontourlevels withchangingparameterdimensionrequiressome functiong(p) forcalibrait is tion:L(O; y) xcg(p). Forlikelihoodinference, L(O; y) xcaP; (5) forthefrequentist approach,usingthedeviance, L(O; y) oc exp(-Xp/2) (6) whereasfortheBayesianapproachit is morecomplex, L(O; y) h{Pr(OIy)} f(y) Pr(O) (7) Althoughequations(5) and (6) superficially appearmostsimilar,theyare not.Ratherequations(6) and whereasequation(5) is a pointfunction. (7) bothrequireintegration, Several discussantsask formore detailsabout the choice of a. As witha confidenceor credibility level, it specifiesthe level of precisiondesired:how much less probablecan a model make the data, of comparedwithyourbestone,beforeyou concludethatit is implausible?Alongwiththespecification a scientifically useful effectin planninga study,this thendeterminesthe sample size. Justas 95% so has a = 1/e. intervalshavebeen foundreasonablein manypracticalsituations, I agreevirtuallycompletelywithmanyof the discussants(George Barnard,ChrisChatfield, Andrew Ehrenberg,David Hand, JamesHodges, Nick Longford,JohnNelder,Allan Reese, StephenSenn and David Sprott)and shallonlyreplyto some ofthen,verybriefly. The dangerof exploratory inferenceis that,ifenoughdifferent modelsare consideredfora givendata modelswill alwaysbe found.Thus, I agree withDavid Hand thatthe questionis set,some well fitting therestricted primary: groupof modelsto be considered,whetherbeforeor,ifnecessary,afterseeingthe data, must be sensitiveto the questionbut insensitiveto pointsthatare irrelevantto the question. 38 Discussion on thePaper by Lindsey However,I disagreethatforcinga model to be simple drivesit away fromthe truth:the art is to use to highlight, simplicity and to convinceyouraudienceof,thereal effects. I also maintainmysolutionto thefuelconsumption problem:thequestiondetermines whichunitsto use, but a unique model must apply for the two units or the answersto the two questionswill be fundamentally, notjust superficially, contradictory. As to theproblemwithunitsin myown model (John Nelder firstpointedthiserrorout to me), I can only plead guilty:the model mustbe hierarchicalcriticismbythescientific in operation. community Frequentist proceduresare notoriousforbeing ad hoc; David Draperbringsus up to date on recent in applied confirmatory patchesto Bayesiantheory.It will make its breakthrough statisticswhenphase III clinical trialprotocolsspecifypriordistributions. The beautyof the likelihoodapproachis thatit yieldssimpleclear interpretations, requiringno such arbitrariness: models arejudged by how probable theymaketheobserveddata. To convince your audience, probabilisticinferencestatementsrequireprior (to data collection) guarantees:a fixedmodel (hypothesis)and sample space forthe frequentist, a fixedset of modelswith theirprior probabilitiesfor the Bayesian (Senn, 1997). In this sense, they are deductive,making scientificdiscoverydifficultor impossible.Likelihood inferenceallows freedomto considermodels suggestedby the data,withthe same interpretable measuresof uncertainty, althoughtheywill be less convincingthanthosestatedbeforehand, hence requiringrepeatednesswithinthe scientific community. Scientificinferenceis nota (necessarilyindividual)decisionproblem. David Draperdownplaystheimportanceof coherence.But how can a posteriorBayesiancalculation have a probabilisticinterpretation withoutcoherence,exceptin theformalmathematicalsense thatany setofnon-negative numberssummingto 1 has? MurrayAitkinhas not yet convincedme thatposteriorBayes factorscan help, comparedwiththe of likelihoodratios,in communicating to studentsand clients. simpleinterpretation Nowheredo I claim thatfrequentists are unable to deal withnon-nestedhypotheses.As David Cox pointsout,methodsare available,although,as I stated,theyare notwidelyaccepted.Perhapsthereason is thatthe underlying models are unreasonablein most contexts.Most standardcourses(and all texts thatI have been able to consult) do not cover these methods.Studentsare only taughtthe nested approachand infer(iftheyare nottold!) thatitis theonlyone possible. The questionof nuisanceparametersarises afterselectinga model,whenwe make inferencesabout thisis notthetopicof thepaper.Nuisanceparameterscreateno problem of interest; specificparameters forlikelihood-basedexploratory inference.Profiles,as summariesof a complexlikelihoodsurface,are in the likelihoodapproach(farmorethanmarginalposteriors).The judgmentthat alwaysinterpretable theymisbehaveis asymptotic (e.g. consistency).Thus asymptotic theorydoes notjust generateapproximationsbut permeatescriteriaforchoosingprocedures.To be acceptable(publishable),a procedure must yield consistent,asymptotically normalestimatesnot on the parameterspace boundary,with calculable standarderrors,superiorasymptotic relativeefficiency, and so on, all irrelevant to likelihood inference, butneed notgivetheprobability ofthedata (estimating equations). I am more pessimisticthan David Cox about our image. Statisticsis more widely used through compulsion,notthe activedesireof manyactual users;wide use does not implyrespectability. (If you want to see the expressionof real disgust,withoutrevealingthatyou are a statisticiantalk to any laboratoryassistantor junior scientistwho once sufferedthroughintroductory statistics.)The main impulsionforthisexpansionappearsto be theneed forgood design,notinference. I entirely shouldbe calibrated.I do thatin theexample. agreewithJohnNelderthatthe log-likelihood However,in didactic communicationwith studentsand clients,speaking of the probabilityof the is oftenclearer. observeddataundervariousmodels,ratherthanitslogarithm, With modem computingpower, quasi-likelihoodscan easily be normalizednumericallyso this approachis now amenableto fulllikelihoodinference.I have nothad theoccasion to workwiththe hlikelihood,perhapsbecause I have not yet encounteredconsultingproblemsrequiringit. The large numberofparameters mustmakecompleteinferences (and notjust modelselection)complex. I agreewithAndrewEhrenbergthatmostexploratory workinvolvescomparingexistingmodels;I did notmeanto implythatthe95% referred to developingnewmodels. for inferencepurposes,as I definedit, is not the same as using a frequency Being a frequentist of probability in inference, interpretation somethingthatFisherobviouslydid. Indeed,I discussFisherian P-values GeorgeBarnardprovidesanotherexample and we shouldnotforgetfiducialprobability. But I can findno reference whereFisherrequiredlongrunguaranteesofhis procedures;he ratherargued thatsignificance oftheobserveddataundergivenhypotheses. testsprovidemeasuresofrarity Discussion on thePaper by Lindsey 39 If StephenSenn'smodelwithcarry-over is uselesswithoutan informative prior,thenwhatexactlycan the data tell us aboutit? Can we calculatetheprobability of thedata underthismodel and is theresult If not,and we believethata small carry-over effect distinctfromthatforthemodelwithoutcarry-over? is present,and important, perhapsthedesignshouldbe changedto obtaintherequiredinformation. Incompatibility meansthatthemotherwouldeitherdiscover,individually, thatneitherof thechildren or, collectively,thatat least one is guiltyof aggressivebehaviour.The point is thatthe criterionof whentheyarejudged separatelyand together. individualaggressiveness (precision)differs Dennis Lindley's 'self-evident Unfortunately, principles'are not evidentto me. He agrees thatthe alternative model mustbe trueto avoid theparadoxof pointpriorprobabilities.This is fundamental to of myrejectionofthisBayesianapproachto exploratory modelselection.Assessmentand understanding multivariate distributions of thedata are a sufficiently big problemforme withouttheadded complicationofprobabilities on theparameters. I would be interested to knowhow Nick Longfordproposesto remove,or smoothout,froma model This seems to be developinga new model,a further effectsthatare substantively step in unimportant. theexploratory process. quality Larger sample sizes are rarelybetter.Besides costs and time,problemsof homogeneity, and so on, increase.I strongly control,missingness, oppose analysinga randomsubsampleof thedata; if theyare available,use themall. I agreewithTom Louis thata largesampleperhapsshouldnotrequireus to increasecomplexitybut show me a statisticalprocedurethatdoes not have this characteristic forfixedprecisionlevel, unless thereis a truemodel. I do not criticizecontinuousprobabilitymodels,but ratherlikelihoodsbased on theirdensities.A theprobabilityof the observeddata,mustbe based on discreteprobabilitieseven likelihoodfunction, whenthemodelis continuous. I make no claim to newnessin the paper; indeedI statedthe oppositein mypresentation. It would havebeen morehelpfulifPeterMcCullaghhad pointedoutwhatwas nottrueratherthanmakingstrong unsubstantiated accusations.Nor have I claimed any internalcontradictions in the formalBayesian approachbased on coherentprobabilities.The problemis ratherwhenwe tryto implementit in a contextforwhich it was not designed:exploratoryscientificinference.As illustratedby David Draper's it is now in a stateof arbitrariness comments, rivallingfrequentist theory. As Pat Althampointsout,sparsecontingency tables are of concernforinferencesusing asymptotic such as the%2-distribution. approximations, However,I am arguingagainstthe latterin the contextof work.Such tablesdo notpose a problemforlikelihoodinference. exploratory Overdispersionmodels are meant to handle counts with dependentevents,usually on the same individualunit.Thatis notthecase herewherean assumptionof independently reportedacquiredimmune foroverdispersion. eventsshouldbe reasonable.Hence,I wouldopposecorrecting deficiency syndrome I certainlyagreewithJimHodges thatscientistscome witha specificpurpose(question),butto me is to formulateit in termsof models givingtheprobabilityof the data and to the role of a statistician select the most appropriateset. Nowheredo I argue forkeepingonlyone model. In likelihood-based model selection,theprocedureforselectingamongmodel functionsis identicalwiththatforchoosing parametervalues among parametervalues. Justas a likelihoodregioncontainsmodels withdifferent supportedbythedata,so you can have an envelopeofplausiblemodelfunctions. Dave Sprottis absolutelyrightabout the importanceof repeatabilityto scientificinference.My worthattempting a demonstration concernhereis withdiscoveringsomething ofrepeatability. Resortto emotionalaccusations,as withone Bayesian-inclined refereeand one frequentist-inclined and no reasonedreplyis available.In discussant,oftenmeansthatfundamental principlesare threatened currentstatistics, thefundamental divideis notbetweenfrequentists and Bayesiansbutbetweenapplied and journals. As one very statisticiansand the theoreticiansdominatingmany teachinginstitutions eminentstatistician told my son afterthemeeting,applied statisticsis a peculiarprofessionwhereyou haveto unlearnmuchofwhatyouweretaughtas a student. References in the discussion in generalized linearmodels.Statist.Comput., likelihoodanalysisofoverdispersion Aitkin,M. (1996) A generalmaximum 6, 251-262. of the (1997) The calibrationof P-values,posteriorBayes factorsand the AIC fromthe posteriordistribution 7, 253-272. likelihood(withdiscussion).Statist.Comput., 40 Discussion on thePaper by Lindsey Barnard, G. A. (1972) Theunityofstatistics. J R. Statist.Soc. A, 135, 1-15. in clinicaltrials.Statist.Med.,10, 871-890. Bauer,P. (1991) Multipletesting Bernardo, J.M., Berger,J.,Dawid,A. P. and Smith,A. F. M. (eds) (1998) BayesianStatistics6. Oxford:OxfordUniversity Press.To be published. J.M. andSmith,A. F. M. (1994) BayesianTheory. Wiley. Bernardo, Chichester: C. (1995) Model uncertainty, (withdiscussion).J R. Statist.Soc. A, 158, Chatfield, dataminingand statistical inference 419-466. inference forBayes' problem.Statist.Probab.Lett.,36, 349Coolen,F P. A. (1998) Low structure imprecisepredictive 357. incidenceaccountingfor De Angelis,D. and Gilks,W R. (1994) Estimating acquiredimmunedeficiencysyndrome reporting delay.J R. Statist.Soc. A, 157,31-40. A. P. (1974) The directuse of likelihoodforsignificance Dempster, testing.In Proc. Conf:FoundationalQuestionsin P. Blaesild and G. Sihon), pp. 335-352. Aarhus:University of StatisticalInference(eds 0. Barndorff-Nielsen, Aarhus. 7, 247-252. (1997) The directuse oflikelihoodforsignificance testing. Statist.Comput., andpropagation ofuncertainty Draper,D. (1995) Assessment (withdiscussion).J R. Statist.Soc. B, 57, 45-97. forbreastcancer"byG Parmigiani. In BayesianStatistics6 (1998a) Discussionof "Decisionmodelsin screening Press.To be published. (eds J.M. Bernardo, J.Berger,A. P. DawidandA. F. M. Smith).Oxford:OxfordUniversity calibration. TechnicalReport.Statistics Group, (1998b) 3CV: Bayesianmodelchoicevia out-of-sample predictive ofMathematical ofBath,Bath. Department Sciences,University Bayesiannon-parametric inference withskewed Draper,D., Cheal,R. and Sinclair,J.(1998) Fixingthebrokenbootstrap: and long-taileddata. TechnicalReport.StatisticsGroup,Department of Mathematical Sciences,University of Bath, Bath. familiesand theiruse in generalizedlinearregression. Efron,B. (1986) Double exponential J Am.Statist.Ass.,81, 709721. de Finetti, B. (1974) TheoiyofProbability, vol. 1. Chichester: Wiley. (1975) TheoryofProbability, vol. 2. Chichester: Wiley. Fisher,R. A. (1953a) The expansionofstatistics. J R. Statist.Soc. A, 116, 1-6. (1953b) Dispersionon a sphere.Proc.R. Soc. A, 217,295-305. (1955) Statistical methods andscientific induction. J R. Statist.Soc. B, 17,69-78. formultiple regression. Ann.Statist., 22, 1947-1975. Foster,D. P. andGeorge,E. I. (1994) Theriskinflation criterion vol. 3. New York: Gatsonis,C., Hodges,J.S., Kass,R. E. andMcCulloch,R. E. (1997) Case StudiesinBayesianStatistics, Springer. withimplementation usingpredictive distributions, Gelfand,A. E., Dey,D. K. andChang,H. (1992) Model determination via sampling-based methods(withdiscussion).In BayesianStatistics4 (eds J.M. Bernardo, J.0. Berger,A. P. Dawid Press. andA. F M. Smith),pp. 147-167. Oxford:OxfordUniversity forposteriorjudgements.In Proc. 10thInt. Congr.Logic, Methodologyand Goldstein,M. (1997) Priorinferences Kluwer. Philosophy ofScience(eds M. L. D. Chiara,K. Doets,D. MundiciandJ.vanBenthem).Dordrecht: (1998) Bayeslinearanalysis.In EncyclopaediaofStatisticalSciences,updatevol. 3. New York:Wiley.To be published. classrooms:thecase ofwordproblems.LearnngInstructn, 7, 293-307. Greer,B. (1997) Modellingrealityinmathematics in clinicalcross-over treatment trials.J Biopharm. effects Statist., 8, 191-233. Grieve,A. andSenn,S. J.(1998) Estimating inference and A(,) or Bayesiannonparametric predictive (withdisHill, B. M. (1988) De Finetti'stheorem, induction, 3 (eds J.M. Bernardo, M. H. DeGroot,D. V LindleyandA. F. M. Smith),pp. 211-241. cussion).In BayesianStatistics Press. Oxford:OxfordUniversity modelsforAn,:splitting J R. Statist.Soc. B, 55,423-433. (1993) Parametric processesandmixtures. In Modelingand a sketchof a theoryof appliedstatistics. Hodges,J. S. (1996) Statisticalpracticeas argumentation: Prediction Geisser(eds J.C. Lee, A. ZellnerandW 0. Johnson), pp. 19-45. New York:Springer. HonoringSeymour linearmodels.J R. Statist.Soc. B, 58, 619-656. Lee, Y. andNelder,J.A. (1996) Hierarchical generalized andAppliedMathematics. a Review.Philadelphia: Lindley,D. V (1972) BayesianStatistics, SocietyforIndustrial J.F (1997) Usingon-lineprocessdata to improvequality:challengesforstatisticians. Int.Statist.Rev.,65, MacGregor, 309-323. 2nd edn. Cambridge: StatisticalPrinciplesfor PracticalApplications, Mead, R. (1988) The Design of Experiments: Press. Cambridge University D. (1987) An extended function. Biometrika, 74, 221-232. Nelder,J.A. andPregibon, quasi-likelihood Trialsin ClinicalResearch.Chichester: Senn,S. (1993) Cross-over Wiley. ofpriorspastis notthesameas a trueprior.Br.Med.J, 314, 73. (1997) Presentremembrance intwo-stage models.J R. Statist.Soc. B, 60, 455-471. estimates hierarchical Shen,W andLouis,T. A. (1998) Triple-goal andAitkin.Statist.Comput., 7, 263-264. Stone,M. (1997) DiscussionofpapersbyDempster withS-Plus.New York:Springer. Venables,W N. andRipley,B. D. (1997) ModernAppliedStatistics inferencefor random Walker,S. G., Damien,P., Laud, P. W and Smith,A. F M. (1999) Bayesiannonparametric andrelatedfunctions distributions (withdiscussion).J R. Statist.Soc. B, 61, inthepress. London:ChapmanandHall. Probabilities. Walley,P. (1991) Statistical ReasoningwithImprecise