Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
STAT200 GuidedExercise2Answers ForOn-LineStudents,besureto: • SubmityouranswersinaWordfiletoSakaiatthesame placeyoudownloadedthefile • RememberyoucanpasteanyExcelorJMPoutputintoa WordFile(usePasteSpecialforbestresults). • PutyournameandtheAssignment#onthefilename:e.g. IlventoGuided2.doc KeyTopics MeasuresofCentralTendency Stem&LeafPlotanddescribingdistributions MeasuresofVariability • • • Answerascompletelyasyoucanandshowyourwork.ThenuploadthefileviaSakaitogetcredit. 1. Let’sfinishuptheAcademyAwardwinnersforbestactor(andactress)since1996thatwasgiveninAssignment1, nowthatwehavecommandofbothcentraltendencyandvariability.EachyeartheAcademyoftheScreen ActorsGuildgivesanawardforthebestactorandactressinamotionpicture.Wehaverecordedthenameand ageofeachsince1996.Thedataformalesandfemalesisgivenbelow(thesamplesize,n=20).Thesumoftheir ageandthesumofagesquaredarealsogiven. YEAR ACTOR AGE ACTRESS AGE 1996 GeoffreyRush 45 FrancesMcDormand 39 1997 JackNicholson 60 HelenHunt 34 1998 RobertoBenigni 46 GwynethPaltrow 26 1999 KevinSpacey 40 HilarySwank 25 2000 RussellCrowe 36 JuliaRoberts 33 2001 DenzelWashington 47 HalleBerry 35 2002 AdrienBrody 29 NicoleKidman 35 2003 SeanPenn 43 CharlizeTheron 28 2004 JamieFoxx 37 HilarySwank 30 2005 PhilipSeymourHoffman 38 ReeseWitherspoon 29 2006 ForestWhitiker 45 HelenMirren 61 2007 DanielDay-Lewis 50 MarionCotillard 32 2008 SeanPenn 48 KateWinslet 33 2009 JeffBridges 60 SandraBullock 45 2010 ColinFirth 50 NataliePortman 29 2011 JeanDujardin 39 MerylStreep 62 2012 DanielDay-Lewis 55 JenniferLawrence 22 2013 MatthewMcConaughey 44 CateBlanchett 44 2014 EddieRedmayne 32 JulianneMoore 54 2015 LeonardoDiCaprio 41 BrieLarson 26 SumX 885 SumX 722 SumX-squared 40,465 SumX-squared 28,598 Page 1 of 7 a. HereistheStemandLeafplotforeachgrouptocomparethedistributions. StemandLeafPlotofActorsWinningAcademyAwardSince1996 Males Females Stem Leaf Stem Leaf 2 9 2 2566899 3 26789 3 02334559 4 013455678 4 45 5 005 5 4 6 00 6 12 6|0represents60 6|0represents60 b. Calculatethemeasuresofcentraltendencyandvariabilityforeachgroup.ThesumofXandthesumofX-squared foreachgrouparegivenabove. a. Males Females Mean 885/20=44.25 722/20=36.10 Median The10thobservationinordereddata =44The11Thobservationis45. Theaverageofthetwois44.5 The10thobservationinordereddata =33.The11thobservationisalso33. Theaverageofthetwois33. Mode Notauniquemode Notauniquemode Range 60–29=31 62–22=40 Variance [40,465–(885)2/20]/(20-1) [40,465–39,161.25]/19 1303.75/19=68.62 [28,598–(722)2/20]/(20-1) [28,598–26,064.20]/19 2533.80/19=133.36 StandardDeviation SQRT(68.62)=8.28 SQRT(133.36)=11.55 CoefficientofVariation CV=8.28/44.25*100=18.72% CV=11.55/36.10*100=31.99% c. BrieflycomparethetwodistributionswithanemphasisonthemeasuresofCentralTendencyandVariability. Formales,thedistributionissymmetricandcenteredaroundthemeanof44.25.Therearenoobviousoutliers. Themedianisveryclosetothemeanat44.50.Thevaluesvaryfrom29to60forarangeof31years.The standarddeviationis8.47years,whichisrelativelysmallcomparedwiththemean(CV=18.72%). Forfemales,themeanislowerat36.10,whichishigherthatthemedianof33.Thedistributionforfemalesis influencedbytwolargeroutliersat61and62,whichpulledthemeanup.Otherwisethespreadforfemalesis Page 2 of 7 centeredinthemid20stomid30s.Therangeislargerforfemalescomparedwiththatformales(62-22=40), asisthestandarddeviation(11.55forfemales).Thehigherstandarddeviationisalsoareflectionoftheoutliers. TheCVforfemalesismuchhigherthanthatofmalesat31.99%. d. Forbothmenandwomenthereareafewoutliers.Formentherearetwoindividualswithavalueof60.For womenthereisonewinneraged61andanotheraged62.Calculatez-scoresforthesevaluesandinterprettheir meaning. Zm=(60-44.25)/8.28=1.90 Zf1=(62-36.10)/11.55=2.24 Zf2=(61-36.10)/11.55=2.15 e. Supposewewantedtoremovethetwofemaleoutliersfromthedata.Calculatethenewmeanforwomen winnersfortheremaining16winners.Hint:subtractthevaluesfromtheoldsumanddivideby17.Didthe outliersinfluencethemeanagemuch? (722-62-61)=573 599/18=33.28 Themeanforfemalesdecreasedfrom36.10to33.28byremovingthetwooutliers.Thisisa7.8%decrease. 2. The following is some data from The Daily Beast on the 50 Most Stressful Universities in 2010. We are looking at the Acceptance rate for these 50 universities. The Acceptance rate is based on the percentage of applicants who were admitted. The Histogram and the Stem and Leaf Plot for this data is given below (note the Stem and Leaf Plot rounds the numbers to a whole number). Use the stem and leaf values for some calculations, such as the min and max. For other calculations, the Sum of (x) is 1574.70 and the Sum of (x2) is 62204.53. The Median for this data is 26.85. a. Calculatethe: Mean=31.49 Maximum=73 Variance=257.37 CoefficientofVariation=50.94 Median= 26.85 Minimum= 8 StandardDeviation=16.04 Page 3 of 7 Mode=22 Range=65 b.Whatisthepositionofthemedianvalueforthisdata?Sincen=50,thepositionisbetweenthe25thand26th positions.Wewouldtaketheaverageofthesetwovalues. c.DoesthemodemakesenseasameasureofCentralTendencyforthisdata?BasedontheStemandLeafPlot,the modeis22%.Thisisameasureofcenterforonebunchingofthedata,butthereismuchmorespreadandaother groupingsofthedata. d. Calculateaz-scoreforanacceptancerateof61% z=(61-31.49)/16.04=1.84.Thisvalueis1.84standarddeviationsabovethemean e. Basedonwhatyouknowaboutthedifferentcriteriausedbydifferentuniversitiestojudgestudentsfor admittance,whydoyouthinkthisdistributionlooksthewayitdoes?Thinkaboutthespreadofthedataand themeasuresofspreadforthedata,suchastherangeandstandarddeviation.Doesthespreadseemlarge? Hint:Harvardhasthelowestacceptancerateat7.9%.ThePennsylvaniaStateUniversityhasanacceptancerate of51.2%. Thespreadisverylarge.TheCVis50.94%.Itmightreflectdifferencesbetweenpublicandprivateinstitutions. Privateinstitutionsgenerallyhaveloweracceptancerates.Publicschoolsmayhaveaspartoftheirmissiontohave higherratesofacceptancetoprovideeducationalopportunitiestocitizensinthestate.Evenforthemoststress universities,generallythoughttobethemostrigorous,theacceptancerateforpublicinstitutionsshouldbehigher. Wecouldthinkofthisdataasbeingtwopopulations. TheBoxPlotsshowadifferencebetweenPublicandprivate Universities. Therestillisalotofspreadforeachtypeofuniversity-some privateuniversitieshavehighacceptanceratesandsome publicuniversitieshavelowacceptancerates.Butwecansee twodistinctgroups. Page 4 of 7 3. Answerthefollowingquestionsaboutvariabilityofdatasets: a. Howwouldyoudescribethevarianceandstandarddeviationinwords,ratherthanaformula?Thinkofwhat youarecalculatingandhowitmightbeusefulindescribingavariable. TheVarianceistheaverageSquareddeviationaroundthecenter(inthiscasethecenteristhemean). Thestandarddeviationistheaveragedeviationaroundthecenter(inthiscasethecenteristhemean). b. Whatistheprimaryadvantageofusingtheinter-quartilerangecomparedwiththerangewhendescribingthe variabilityofavariable? Therangeonlyusestwovalues-themaximumandtheminimum-tocalculatetherange.Itcanbevery sensitivetooutliers.Theinter-quartilerangeshowstherangeofthemiddle50%ofthevalues. c. Canthestandarddeviationeverbelargerthanthevariance?Explain. Inmostcasesthestandarddeviationislessthanthevariancesinceitisasquarerootofthevariance.However, inthespecialcasewherethevarianceisbetween0and1,thestandarddeviationwillbemorethanthe variance.Forexample,ifS2=.5,thens=.71 d. Canthevarianceeverbenegative?Whyorwhynot? Sincethevarianceisbasedonasquaredmeasure,no,itcannotbenegative. e. ShowtheformulafortheCoefficientofVariationandexplainwhatitisandhowitcanbeusefulincomparing thevariabilityofdifferentvariables. Theratioofthestandarddeviationtotheabsolutevalueofthemean,usuallymultipliedby100.Itexpressesthe standarddeviationinrelationtothemean.Itmakesiteasiertocomparethespreadofdifferent variables,eveniftheyaremeasuredondifferentmetrics Page 5 of 7 4. Two banks use alternative methods of waiting in line for a teller. Both banks user three tellers. Bank A uses separate lines for each teller so a customer must pick which line she or he thinks is best. This approach does allow a customer to pick his/her favorite teller. In contrast, the Bank B uses a single waiting line which leads customers to the next available teller out of all tellers available. We take a random sample of 15 customers from each bank and record the waiting time in minutes. We are asked to analyze the data and determine the differences we note between the approaches of the two banks. Use graphs and summary measures of central tendency and variability to explain the differences. In the end, I am asking that you summarize your finding in words and not just numbers. Here are the data. The data are given below (not sorted) and I provided the Sum(x) and the Sum(x^2): Sum(x) Sum(x^2) 71.50 360.19 70.00 330.68 Bank A - 1 Line 5.3 2.5 5.9 4.1 5.4 3.8 5.1 4.1 4.1 5.0 5.1 5.7 3.0 5.3 7.1 Bank B Multiple 5.0 3.8 4.9 4.3 5.0 4.7 3.9 5.4 5.1 4.0 4.1 4.5 5.1 5.4 4.8 a. Graph the two banks using stem and leaf plots. Describe he results of your graphs. Stem and Leaf Plot of Waiting Time at Two Banks Bank A Stem 2 3 4 5 6 7 8 Leaf 5 08 111 01133479 1 Bank B Stem Leaf 2 3 89 4 0135789 5 001144 6 7 8 Page 6 of 7 b. Calculate the following for each bank: Statistics Bank A Bank B Mean 71.5/15 = 4.77 70.00/15 = 4.67 Median N is odd, so (15+1)/2 = 8th observation = 5.1 N is odd, so (15+1)/2 = 8th observation = 4.8 Mode 4.1 which occurs 3 times No unique value Variance Std Deviation (360.19– (71.502/15))/(15-1) = 1.38 SQRT(1.38) = 1.18 (330.68 – (70.002/15))/(15-1) = .29 SQRT(.29) = .54 Maximum 7.10 5.40 Minimum 2.5 3.80 Range 7.10 – 2.50 = 4.60 5.40 – 3.80 = 1.60 Coefficient of Variation 1.18/4.77*100 = 24.68 .54/4.67*100 = 11.47 b. Summarize your results in a paragraph The measures of center for the two lines are close to each other, but the measures of spread are not. The mean and median for Bank A are close, 4.77 and 5,1, respectively. Likewise the mean and median for Bank B are close to each other and Bank A at 4.67 and 4.8, respectively. However, the spread for Bank A is much larger. The Variances are 1.38 and .29, respectively. This can also be seen in the Coefficient of Variations, with a value of 24.68 for Bank A and 11.47 for Bank B. Allowing customers to pick their line results in more variability in waiting time compared with a single line. Page 7 of 7