Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
8.1ConfidenceIntervals:TheBasics Pointestimator-Apointestimatorisastatisticthatprovidesanestimateofapopulationparameter.The valueofthatstatisticfromasampleiscalledapointestimate.Ideally,apointestimateisour“bestguess”at thevalueofanunknownparameter. Example–FromBatteriestoSmoking PointEstimators PROBLEM:Ineachofthefollowingsettings,determinethepointestimatoryouwoulduseandcalculatethe valueofthepointestimate. (a)QualitycontrolinspectorswanttoestimatethemeanlifetimeμoftheAAbatteriesproducedinanhour atafactory.Theyselectarandomsampleof30batteriesduringeachhourofproductionandthendrain themunderconditionsthatmimicnormaluse.Herearethelifetimes(inhours)ofthebatteriesfromone suchsample: Usethesamplemeanx-barasapointestimatorforthepopulationmeanμ.Forthesedata,ourpoint estimateis hours. (b)WhatproportionpofU.S.highschoolstudentssmoke?The2007YouthRiskBehavioralSurvey questionedarandomsampleof14,041studentsingrades9to12.Ofthese,2808saidtheyhadsmoked cigarettesatleastonedayinthepastmonth. Usethesampleproportion asapointestimatorforthepopulationproportionp.Forthissurvey,our pointestimateis . (c)Thequalitycontrolinspectorsinpart(a)wanttoinvestigatethevariabilityinbatterylifetimesby estimatingthepopulationvarianceσ2. SOLUTION: Usethesamplevariance data,ourpointestimateis asapointestimatorforthepopulationvarianceσ2.Forthebatterylife . ConfidenceInterval-Anintervalcalculatedfromthedata,whichhastheform: estimate±marginoferror Marginoferror-Themarginoferrortellshowclosetheestimatetendstobetotheunknownparameterin repeatedrandomsampling. ConfidenceLevelC–AconfidencelevelCgivestheoverallsuccessrateofthemethodforcalculatingthe confidenceinterval.Thatis,inC%ofallpossiblesamples,themethodwouldyieldanintervalthatcapturesthe trueparametervalue. Aconfidenceintervalissometimesreferredtoasanintervalestimate.Thisisconsistentwithourearlieruseof thetermpointestimate. Weusuallychooseaconfidencelevelof90%orhigherbecausewewanttobequitesureofour conclusions.Themostcommonconfidencelevelis95% 8.1.2InterpretingConfidenceLevelsandConfidenceIntervals Thefigurebelowillustratestheideaofaconfidenceintervalinadifferentform.Itshowstheresultofdrawing manySRSsfromthesamepopulationandcalculatinga95%confidenceintervalfromeachsample.Thecenter ofeachintervalisatx-barandthereforevariesfromsampletosample.Thesamplingdistributionofxbarappearsatthetopofthefiguretoshowthelong-termpatternofthisvariation.The95%confidence intervalsfrom25SRSsappearbelow. Here’swhatyoushouldnotice: • Thecenterx-barofeachintervalismarkedbyadot. • Thedistancefromthedottoeitherendpointofthe intervalisthemarginoferror. • 24ofthese25intervals(that’s96%)containthetrue valueofμ.Ifwetookallpossiblesamples,95%ofthe resultingconfidenceintervalswouldcaptureμ. Theconfidenceleveltellsushowlikelyitisthatthemethodweareusingwillproduceanintervalthat capturesthepopulationparameterifweuseitmanytimes.However,inpracticewetendtocalculateonlya singleconfidenceintervalforagivensituation. Theconfidenceleveldoesnottellusthechancethataparticularconfidenceintervalcapturesthepopulation parameter.Instead,theconfidenceintervalgivesusasetofplausiblevaluesfortheparameter. Weinterpretconfidencelevelsandconfidenceintervalsinmuchthesamewaywhetherweareestimatinga populationmean,proportion,orsomeotherparameter. Example–DoYouUseTwitter? Interpretingaconfidenceintervalandaconfidencelevel Inlate2009,thePewInternetandAmericanLifeProjectaskedarandomsampleof2253U.S.adults,“Doyou ever…useTwitteroranotherservicetoshareupdatesaboutyourselfortoseeupdatesaboutothers?”Ofthe sample,19%said“Yes.”AccordingtoPew,theresulting95%confidenceintervalis(0.167,0.213). Interprettheconfidenceintervalandtheconfidencelevel. AP EXAM TIP On a given problem, you may be asked to interpret the confidence interval, the confidence level, or both. Be sure you understand the difference: the confidence level describes the long-run capture rate of the method, and the confidence interval gives a set of plausible values for the parameter. Confidenceintervalsarestatementsaboutparameters.Inthepreviousexample,itwouldbewrongto say,“Weare95%confidentthattheintervalfrom0.167to0.213containsthesampleproportionofU.S.adults whouseTwitter.”Why?Becauseweknowthatthesampleproportion, ,isinthe interval.Likewise,inthemysterymeanexample,itwouldbewrongtosaythat“95%ofthevaluesare between230.79and250.79,”whetherwearereferringtothesampleorthepopulation.Allwecansay is,“BasedonMr.Schiel’ssample,webelievethatthepopulationmeanissomewherebetween230.79and 250.79.“ CHECKYOURUNDERSTANDING How much does the fat content of Brand X hot dogs vary? To find out, researchers measured the fat content (in grams) of a random sample of 10 Brand X hot dogs. A 95% confidence interval for the population standard deviation σ is 2.84 to 7.55. 1. Interpret the confidence interval. 2. Interpret the confidence level. 3. True or false: The interval from 2.84 to 7.55 has a 95% chance of containing the actual population standard deviation σ. Justify your answer. Amoregeneralformulaforaconfidenceinterval: statistic±(criticalvalue)·(standarddeviationofstatistic) ThecriticalvaluedependsonboththeconfidencelevelCandthesamplingdistributionofthestatistic. TheconfidenceintervalforthemysterymeanμofMr.Schiel’spopulationillustratesseveralimportant propertiesthataresharedbyallconfidenceintervalsincommonuse.Theuserchoosestheconfidence level,andthemarginoferrorfollowsfromthischoice.Wewouldlikehighconfidenceandalsoasmallmargin oferror.Highconfidencesaysthatourmethodalmostalwaysgivescorrectanswers.Asmallmarginoferror saysthatwehavepinneddowntheparameterquiteprecisely. Ourgeneralformulaforaconfidenceintervalis: Wecanseethatthemarginoferrordependsonthecriticalvalueandthestandarddeviationofthe statistic.Thecriticalvalueistieddirectlytotheconfidencelevel:greaterconfidencerequiresalargercritical value.Thestandarddeviationofthestatisticdependsonthesamplesizen:largersamplesgivemoreprecise estimates,whichmeanslessvariabilityinthestatistic. Sothemarginoferrorgetssmallerwhen: • Theconfidenceleveldecreases.Thereisatrade-offbetweentheconfidencelevelandthemarginof error.Toobtainasmallermarginoferrorfromthesamedata,youmustbewillingtoacceptlower • confidence.Earlier,wefoundthata95%confidenceintervalforMr.Schiel’smysterymeanμis230.79 to250.79.The80%confidenceintervalforμis234.39to247.19.Thefigurebelowcomparesthesetwo intervals. Thesamplesizenincreases.Increasingthesamplesizenreducesthemarginoferrorforanyfixed confidencelevel. 8.1.4UsingConfidenceIntervalsWisely Ourgoalinthissectionhasbeentointroduceyoutothebigideasofconfidenceintervalswithoutgetting boggeddownintechnicaldetails.Youmayhavenoticedthatweonlycalculatedintervalsinacontrived setting:estimatinganunknownpopulationmeanμwhenwesomehowknewthepopulationstandard deviationσ.Inpractice,whenwedon’tknowμ,wedon’tknowσeither.We’lllearntoconstructconfidence intervalsforapopulationmeaninthismorerealisticsettinginSection8.3.First,wewillstudyconfidence intervalsforapopulationproportionpinSection8.2.Althoughitispossibletoestimateother parameters,confidenceintervalsformeansandproportionsarethemostcommontoolsineverydayuse. Beforecalculatingaconfidenceintervalforμorp,therearethreeimportantconditionsthatyoushouldcheck. 1.Random:Thedatashouldcomefromawell-designedrandomsampleorrandomized experiment.Otherwise,there’snoscopeforinferencetoapopulation(sampling)orinferenceaboutcause andeffect(experiment).Ifwecan’tdrawconclusionsbeyondthedataathand,thenthere’snotmuchpoint constructingaconfidenceinterval!Anotherimportantreasonforrandomselectionorrandomassignmentis tointroducechanceintothedataproductionprocess. Theexperimentalequivalentofasamplingdistributioniscalledarandomizationdistribution. Wecanmodelchancebehaviorwithaprobabilitydistribution,likethesamplingdistributionsofchapter7.The probabilitydistributionhelpsuscalculateaconfidenceinterval. 2.Normal:Themethodsweusetoconstructconfidenceintervalsforμandpdependonthefactthatthe samplingdistributionofthestatistic(x-barorp-hat)isatleastapproximatelyNormal. Formeans:Thesamplingdistributionofx-barisexactlyNormalifthepopulationdistributionisNormal.When thepopulationdistributionisnotNormal,thecentrallimittheoremtellsusthatthesamplingdistributionofxbarwillbeapproximatelyNormalifnissufficientlylarge(say,atleast30). Forproportions:WecanusetheNormalapproximationtothesamplingdistributionofp-hataslongasnp≥10 andn(1−p)≥10. 3.Independent:Theproceduresforcalculatingconfidenceintervalsassumethatindividualobservationsare independent.Randomsamplingorrandomassignmentcanhelpensureindependent observations.However,ourformulasforthestandarddeviationofthesamplingdistributionactasifweare samplingwithreplacementfromapopulation.Thatisrarelythecase.Ifwe’resamplingwithoutreplacement fromafinitepopulation,weshouldcheckthe10%condition: Samplingmorethan10%ofapopulationcangiveusamorepreciseestimateofapopulationparameterbut wouldrequireustouseafinitepopulationcorrection.We’llavoidsuchsituationsinthetext. Here are two other important reminders for constructing and interpreting confidence intervals. • Our method of calculation assumes that the data come from an SRS of size n from the population of interest. Other types of random samples(stratified or cluster, say) might be preferable to an SRS in a given setting, but they require more complex calculations than the ones we’ll use. • The margin of error in a confidence interval covers only chance variation due to random sampling or random assignment. The margin of error is obtained from the sampling distribution (or randomization distribution in an experiment). It indicates how close our estimate is likely to be to the unknown parameter if we repeat the random sampling or random assignment process many times. Practical difficulties, such as undercoverage and nonresponse in a sample survey, can lead to additional errors that may be larger than this chance variation. Remember this unpleasant fact when reading the results of an opinion poll or other sample survey. The way in which a survey or experiment is conducted influences the trustworthiness of its results in ways that are not included in the announced margin of error.