Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
STAT200On-Line GuidedExercise1 Besureto: • PleasesubmityouranswersinaWordfiletoSakaiatthesameplaceyou downloadedthefile • RememberyoucanpasteanyExcelorJMPoutputintoaWordFile(usePaste Specialforbestresults). • PutyournameandtheAssignment#onthefilename:e.g.IlventoGuided1.doc Answerascompletelyasyoucanandshowyourwork. 1. Anewpopularstatisticalitemisaposterofacollectionofinteresting statisticsandgraphics.Theoneontherightisacollectionofnumberson agingintheU.S.Itwasputtogetherbyaninsurancecompanyto emphasizewewillbelivinglongerandthereforewewillneedtoplan betterforthingslikeretirement,healthcare,andlivingarrangements. Someofthedataarebasedonsurveys,somefromlifetables,andsome basedonmodelsthatprojectintothefuture(thegeneralsourceisgiven onthegraph(largermoredetailedpicturesaregivenonthesecondand thirdpages).Thestatisticschosenandthewaytheyarepresentedare designedtocatchyourinterestandstimulatediscussion.Pleasenote,itis aninsurancecompanythatispresentingtheseideasandtheirgoalisto sellproducts. Thedetailsarebetterseenonthenexttwopagesandthesourceis givenbelow(youcanalsosearchfor“LiveLongerSlate”andfindit) http://www.slate.com/articles/health_and_science/prudential/2013/ 08/why_you_need_to_start_thinking_about_the_big_truths_with_livi ng_longer.html Reviewthedataandanswerthefollowingquestions. a. Whatfiguresstandouttoyouasbeingparticularlywellpresented?In otherwords,whichareeffectiveinmakingtheirpoint? Norightorwronganswerhere.Ilikecomparingtheprobabilitiesofbeing lefthanded,blondeorplayinganinstrumentwithlivingto100. b. Doanyofthefiguresseemsuspecttoyou,orbasedonanagenda,or perhapspresentedinawaytodistortorbiasanissue?Idon’tmean toimplythereisanythingwrongwiththenumbers,butonemight quibblewithwhatispresented(ornot)orthewayinwhichtheyare presented. Norightorwronganswerhere.Iwouldhavelikedmore informationontheoldestcities.Didcitiesneedtobeacertainsize? Whydidtheychoose60+? JimmyFallonalreadyreplacedLeno!!!! Page 1 of 8 Page 2 of 8 2.AresearcherinDelawarewantedtoseetheaffectofaneducationprogramforhospitalpatientsofheartattacks ontheirlikelihoodofreturningtothehospitalin30days(referredtoasrecidivism).Theeducationprogramconsisted ofmoreinvolvedtrainingondiet,exercise,weight,andstickingtotherecommendationsofthephysician.The educationprogramwasgiventoarandomsampleofpatientsduring2011andtheresultswerecomparedtoacontrol groupwhodidnotreceivethetraining.Ananalysisofthedatashowthatthegroupreceivingthetraininghada significantreductioninrecidivismcomparedtothecontrolgroup. a) Whatistheunitofanalysisinthisstudy? Theheartpatient b) Identifythedatacollectionmethodforthisstudy ExperimentalDesign.Sincethereisatreatmentandcontrolgroupitwasarandomsample. c) Wouldthestudyinvolvedescriptiveorinferentialstatistics? Theresearcherwantedtodescribethedata,butshealsowasinterestedininferringtoallheartpatients admittedtoahospital.SoitisInferential. d) Whatisthepopulation(orsample)ofinteresttotheresearchers? Allheartpatientsadmittedtoahospital. 3. ToddAndrlik,founderandeditorofJournaloftheAmericanRevolution(allthingsliberty.com),wrotea pieceabouthowyoungmanyofthefoundingfatherswerewhentheDeclarationofIndependencewas firstsignedin1776.Therewere56signersoftheDeclarationofIndependenceandtheiragesaregiven below,sortedbyage. OBS 1 2 3 4 5 6 7 8 9 10 Person ThomasLynch EdwardRutledge GeorgeWalton ThomasHeyward BenjaminRush ElbridgeGerry ThomasJefferson ThomasStone WilliamHooper ArthurMiddleton Age 26 26 27 29 30 31 33 33 34 34 Gender Male Male Male Male Male Male Male Male Male Male Page 3 of 8 State SouthCarolina SouthCarolina Georgia SouthCarolina Pennsylvania Massachusetts Virginia Maryland NorthCarolina SouthCarolina 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 JamesWilson SamuelChase WilliamPaca JohnPenn GeorgeClymer ThomasNelson,Jr. CharlesCarroll FrancisHopkinson CarterBraxton JohnHancock JohnAdams WilliamFloyd ButtonGwinnett FrancisLightfootLee RobertMorris ThomasMcKean GeorgeRead SamuelHuntington RichardHenryLee RobertTreatPaine RichardStockton WilliamWilliams JosiahBartlett JosephHewes GeorgeRoss WilliamWhipple CaesarRodney WilliamEllery OliverWolcott AbrahamClark BenjaminHarrison LewisMorris GeorgeWythe JohnMorton LymanHall SamuelAdams JohnWitherspoon RogerSherman JamesSmith PhilipLivingston GeorgeTaylor MatthewThornton FrancisLewis JohnHart StephenHopkins BenjaminFranklin 34 35 35 35 37 37 38 38 39 39 40 41 41 41 42 42 42 44 44 45 45 45 46 46 46 46 47 48 49 50 50 50 50 51 52 53 53 55 56 60 60 62 63 65 69 70 Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Pennsylvania Maryland Maryland NorthCarolina Pennsylvania Virginia Maryland NewJersey Virginia Massachusetts Massachusetts NewYork Georgia Virginia Pennsylvania Delaware Delaware Connecticut Virginia Massachusetts NewJersey Connecticut NewHampshire NorthCarolina Pennsylvania NewHampshire Delaware RhodeIsland Connecticut NewJersey Virginia NewYork Virginia Pennsylvania Georgia Massachusetts NewJersey Connecticut Pennsylvania NewYork Pennsylvania NewHampshire NewYork NewJersey RhodeIsland Pennsylvania a. Createastemandleafplotofthedata(youcandothisby“hand”inWordinthetablebelow).Todothisyou needtodecideonthestemsandthentheleaves. Stem Leaf 1 2 6679 3 0133444555778899 4 0111222445556666789 5 0000123356 6 002359 7 0 8 Page 4 of 8 Or Stem 2* 3 3* 4 4* 5 5* 6 6* 7 Leaf 6679 0133444 555778899 011122244 5556666789 00001233 56 0023 59 0 a. Calculatethemean,median,andmodeforthisdata.ThesumofallthevaluesisSum(X)=2,479. Mean=Sum(x)/n=2479/56=44.27or44.3 Medianisthemiddlevalue.Sinceniseven,themedianistheaverageofthe28thand29thvalues=44 Modeisthemostfrequentvalue46or50occur4timeseach.Oryoucouldsaythemodeisundefined. b. Brieflydescribethedistribution-focusontheshapeofthedistribution,andwhetherthereareanoutliersor strangevalues The distribution appears to be a symmetrical, mound shaped distribution with the center in the mid-40s. There are no large outliers. Below is the output from JMP software. Age Quantiles 30 40 50 60 70 100.0% maximum 99.5% 97.5% 90.0% 75.0% quartile 50.0% median 25.0% quartile 10.0% 2.5% 0.5% 0.0% minimum Summary Statistics 70.0 70.0 69.6 60.6 50.0 44.0 35.5 30.7 26.0 26.0 26.0 Mean Std Dev Std Err Mean Upper 95% Mean Lower 95% Mean N Sum Variance Skewness Kurtosis CV N Missing Median Range Interquartile Range Stem and Leaf 44.3 10.7 1.4 47.1 41.4 56.0 2479.0 114.1 0.5 -0.2 24.1 0.0 44.0 44.0 14.5 Stem 7 6 6 5 5 4 4 3 3 2 Leaf 0 59 0023 56 00001233 5556666789 011122244 555778899 0133444 6679 Count 1 2 4 2 8 10 9 9 7 4 2|6 represents 26 4. For the Signer of the Declaration of Independence data above, let’s now focus on two nominal level variables – Gender and State. a. For gender, how would you summarize the distribution for this variable? Think in terms of how we might describe data to talk about this variable. Is it in fact a variable? Gender is not a variable, it is a constant. All the signers were male. b. For State, there were 13 original colonies. Use the table below to make a frequency table of the information. Then summarize the results in words. You can decide how you might organize the states – alphabetically, by Page 5 of 8 north and south, or frequency order. How you organize the states will help how you can use cumulative frequencies to describe the data. Connecticut 4 4/56 = .0714 .0714 Delaware 3 .0536 .1250 Georgia 3 .0536 .1786 Maryland 4 .0714 .2500 Massachusetts 5 .0893 .3393 New Hampshire 3 .0536 .3929 New Jersey 5 .0893 .4821 New York 4 .0714 .5536 North Carolina 3 .0536 .6071 Pennsylvania 9 .1607 .7679 Rhode Island 2 .0357 .8036 South Carolina 4 .0714 .8750 Virginia 7 .1250 1.0000 JMP can organize it alphabetically or by ascending (or descending order) State State Level Connecticut Delaware Georgia Maryland Massachusetts New Hampshire New Jersey New York North Carolina Pennsylvania Rhode Island South Carolina Virginia Total N Missing 0 13 Levels Frequencies Count 4 3 3 4 5 3 5 4 3 9 2 4 7 56 Prob Cum Prob 0.0714 0.0714 0.0536 0.1250 0.0536 0.1786 0.0714 0.2500 0.0893 0.3393 0.0536 0.3929 0.0893 0.4821 0.0714 0.5536 0.0536 0.6071 0.1607 0.7679 0.0357 0.8036 0.0714 0.8750 0.1250 1.0000 1.0000 1.0000 Rhode Island Delaware Georgia New Hampshire North Carolina Connecticut Maryland New York South Carolina Massachusetts New Jersey Virginia Pennsylvania Connecticut Delaware Georgia Maryland Massachusetts New Hampshire New Jersey New York North Carolina Pennsylvania Rhode Island South Carolina Virginia Frequencies Level Rhode Island Delaware Georgia New Hampshire North Carolina Connecticut Maryland New York South Carolina Massachusetts New Jersey Virginia Pennsylvania Total N Missing 0 13 Levels Count 2 3 3 3 3 4 4 4 4 5 5 7 9 56 Prob Cum Prob 0.0357 0.0357 0.0536 0.0893 0.0536 0.1429 0.0536 0.1964 0.0536 0.2500 0.0714 0.3214 0.0714 0.3929 0.0714 0.4643 0.0714 0.5357 0.0893 0.6250 0.0893 0.7143 0.1250 0.8393 0.1607 1.0000 1.0000 1.0000 5. Below is the data for infant mortality for 44 countries, and the same data for OECD countries. The Organization for Economic Co-operation and Development (OECD) is an international economic organization of 34 countries, founded in 1961 to stimulate economic progress and world trade. It is a forum of countries describing themselves as committed to democracy and the market economy, providing a platform to compare policy experiences, seeking answers to common problems, identify good practices and coordinate domestic and international policies of its members (Wikipedia, https://en.wikipedia.org/wiki/Organisation_for_Economic_Co-operation_and_Development). OECD’s web site provided some data on infant mortality for 44 countries. Infant mortality (the rate of death of children under 1 year of age per 1,000 live births) is a measure of development. The table below has the data for 44 countries and the 34 OECD countries. a. Create a stem and leaf plot of the data (you can do this by “hand” in Word in the table below by typing in the stems and the leaves). Do this for the 44 countries and the 34 OECD countries. b. Calculate the mean, median, and mode for this data c. Briefly describe the distribution - focus on the shape of the distribution, and whether there are an outliers or strange values The sum of x Sum(x) for all 44 countries is 292.30 and the Sum(x) for 34 OECD countries is 128.20. Page 6 of 8 COUNTRY Iceland Finland Slovenia Estonia Japan Norway Spain Sweden Czech Rep. Denmark Israel Austria Germany Italy Korea Portugal Australia Switzerland Belgium Ireland United Kingdom France Luxembourg Greece Lithuania Netherlands New Zealand Latvia Poland Canada Hungary United States Slovak Rep. Chile Russian Fed. Costa Rica Turkey China Brazil Mexico Colombia Indonesia South Africa India IM 1.3 1.7 1.7 2.0 2.0 2.3 2.4 2.4 2.5 2.5 2.5 2.6 2.8 2.9 2.9 2.9 3.1 3.3 3.5 3.5 3.5 3.6 3.6 3.7 3.7 4.0 4.4 4.4 4.5 4.8 5.0 5.0 5.1 7.0 8.2 8.4 10.2 10.9 12.3 13.0 17.5 24.5 32.8 41.4 Stem Leaf 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 377 0034455568999 135556677 04458 001 0 24 29 3 0 5 The mean for the 44 countries is 6.64 (292.30/44 = 6.64) while the median is 3.60. Since n=44 is even, the median is the average of the two middle values. The 22nd (3.6) and the 23rdth values (3.6) which results in : (3.6+3.6)/2 = 3.6. The mean is pulled by the extreme values in the data. The most extreme values are for Indonesia (24.5), South Africa (32.8), and India (41.4). There is no single modal value. Three values occur 3 times – 2.5, 2.9 and 3.5. This distribution is highly skewed with a few extreme outliers. 5 8 4 Page 7 of 8 OECD Country Iceland Finland Slovenia Estonia Japan Norway Spain Sweden Czech Rep. Denmark Israel Austria Germany Italy Korea Portugal Australia Switzerland Belgium Ireland United Kingdom France Luxembourg Greece Netherlands New Zealand Poland Canada Hungary United States Slovak Rep. Chile Turkey Mexico IM 1.3 1.7 1.7 2.0 2.0 2.3 2.4 2.4 2.5 2.5 2.5 2.6 2.8 2.9 2.9 2.9 3.1 3.3 3.5 3.5 3.5 3.6 3.6 3.7 4.0 4.4 4.5 4.8 5.0 5.0 5.1 7.0 10.2 13.0 Stem 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Leaf 377 0034455568999 13555667 0458 001 0 2 0 The mean for the 34 countries is 3.77 (128.20/34 = 3.77) while the median is 3.20. Since n=34 is even, the median is the average of the two middle values. The 17th (3.1) and the 18th values (3.3) which results in : (3.1+3.3)/2 = 3.2. The two measures of center are close, but the mean is pulled somewhat by a few extreme values in the data. The most extreme values are for Mexico (13.0), Turkey (10.2), and Chile (7.0). There is no single modal value. Three values occur 3 times – 2.5, 2.9 and 3.5. This distribution is slightly skewed with a few extreme outliers. However, compared with the data for all 44 countries, this skew is light. Page 8 of 8