Download STAT 200 Guided Exercise 2 Answers

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Time series wikipedia , lookup

Transcript
STAT200
GuidedExercise2Answers
ForOn-LineStudents,besureto:
• SubmityouranswersinaWordfiletoSakaiatthesame
placeyoudownloadedthefile
• RememberyoucanpasteanyExcelorJMPoutputintoa
WordFile(usePasteSpecialforbestresults).
• PutyournameandtheAssignment#onthefilename:e.g.
IlventoGuided2.doc
KeyTopics
MeasuresofCentralTendency
Stem&LeafPlotanddescribingdistributions
MeasuresofVariability
•
•
•
Answerascompletelyasyoucanandshowyourwork.ThenuploadthefileviaSakaitogetcredit.
1. Let’sfinishuptheAcademyAwardwinnersforbestactor(andactress)since1996thatwasgiveninAssignment1,
nowthatwehavecommandofbothcentraltendencyandvariability.EachyeartheAcademyoftheScreen
ActorsGuildgivesanawardforthebestactorandactressinamotionpicture.Wehaverecordedthenameand
ageofeachsince1996.Thedataformalesandfemalesisgivenbelow(thesamplesize,n=20).Thesumoftheir
ageandthesumofagesquaredarealsogiven.
YEAR
ACTOR
AGE
ACTRESS
AGE
1996
GeoffreyRush
45
FrancesMcDormand
39
1997
JackNicholson
60
HelenHunt
34
1998
RobertoBenigni
46
GwynethPaltrow
26
1999
KevinSpacey
40
HilarySwank
25
2000
RussellCrowe
36
JuliaRoberts
33
2001
DenzelWashington
47
HalleBerry
35
2002
AdrienBrody
29
NicoleKidman
35
2003
SeanPenn
43
CharlizeTheron
28
2004
JamieFoxx
37
HilarySwank
30
2005
PhilipSeymourHoffman
38
ReeseWitherspoon
29
2006
ForestWhitiker
45
HelenMirren
61
2007
DanielDay-Lewis
50
MarionCotillard
32
2008
SeanPenn
48
KateWinslet
33
2009
JeffBridges
60
SandraBullock
45
2010
ColinFirth
50
NataliePortman
29
2011
JeanDujardin
39
MerylStreep
62
2012
DanielDay-Lewis
55
JenniferLawrence
22
2013
MatthewMcConaughey
44
CateBlanchett
44
2014
EddieRedmayne
32
JulianneMoore
54
2015
LeonardoDiCaprio
41
BrieLarson
26
SumX
885
SumX
722
SumX-squared
40,465
SumX-squared
28,598
Page 1 of 7
a. HereistheStemandLeafplotforeachgrouptocomparethedistributions.
StemandLeafPlotofActorsWinningAcademyAwardSince1996
Males
Females
Stem
Leaf
Stem
Leaf
2
9
2
2566899
3
26789
3
02334559
4
013455678
4
45
5
005
5
4
6
00
6
12
6|0represents60
6|0represents60
b. Calculatethemeasuresofcentraltendencyandvariabilityforeachgroup.ThesumofXandthesumofX-squared
foreachgrouparegivenabove.
a. Males
Females
Mean 885/20=44.25
722/20=36.10
Median
The10thobservationinordereddata
=44The11Thobservationis45.
Theaverageofthetwois44.5
The10thobservationinordereddata
=33.The11thobservationisalso33.
Theaverageofthetwois33.
Mode
Notauniquemode
Notauniquemode
Range
60–29=31
62–22=40
Variance
[40,465–(885)2/20]/(20-1)
[40,465–39,161.25]/19
1303.75/19=68.62
[28,598–(722)2/20]/(20-1)
[28,598–26,064.20]/19
2533.80/19=133.36
StandardDeviation
SQRT(68.62)=8.28
SQRT(133.36)=11.55
CoefficientofVariation
CV=8.28/44.25*100=18.72%
CV=11.55/36.10*100=31.99%
c. BrieflycomparethetwodistributionswithanemphasisonthemeasuresofCentralTendencyandVariability.
Formales,thedistributionissymmetricandcenteredaroundthemeanof44.25.Therearenoobviousoutliers.
Themedianisveryclosetothemeanat44.50.Thevaluesvaryfrom29to60forarangeof31years.The
standarddeviationis8.47years,whichisrelativelysmallcomparedwiththemean(CV=18.72%).
Forfemales,themeanislowerat36.10,whichishigherthatthemedianof33.Thedistributionforfemalesis
influencedbytwolargeroutliersat61and62,whichpulledthemeanup.Otherwisethespreadforfemalesis
Page 2 of 7
centeredinthemid20stomid30s.Therangeislargerforfemalescomparedwiththatformales(62-22=40),
asisthestandarddeviation(11.55forfemales).Thehigherstandarddeviationisalsoareflectionoftheoutliers.
TheCVforfemalesismuchhigherthanthatofmalesat31.99%.
d. Forbothmenandwomenthereareafewoutliers.Formentherearetwoindividualswithavalueof60.For
womenthereisonewinneraged61andanotheraged62.Calculatez-scoresforthesevaluesandinterprettheir
meaning.
Zm=(60-44.25)/8.28=1.90
Zf1=(62-36.10)/11.55=2.24
Zf2=(61-36.10)/11.55=2.15
e. Supposewewantedtoremovethetwofemaleoutliersfromthedata.Calculatethenewmeanforwomen
winnersfortheremaining16winners.Hint:subtractthevaluesfromtheoldsumanddivideby17.Didthe
outliersinfluencethemeanagemuch?
(722-62-61)=573
599/18=33.28
Themeanforfemalesdecreasedfrom36.10to33.28byremovingthetwooutliers.Thisisa7.8%decrease.
2. The following is some data from The Daily Beast on the 50 Most Stressful Universities in 2010. We
are looking at the Acceptance rate for these 50 universities. The Acceptance rate is based on the
percentage of applicants who were admitted. The Histogram and the Stem and Leaf Plot for this data is
given below (note the Stem and Leaf Plot rounds the numbers to a whole number). Use the stem and leaf
values for some calculations, such as the min and max. For other calculations, the Sum of (x) is 1574.70
and the Sum of (x2)
is 62204.53. The
Median for this data
is 26.85.
a.
Calculatethe:
Mean=31.49
Maximum=73
Variance=257.37 CoefficientofVariation=50.94
Median=
26.85 Minimum= 8
StandardDeviation=16.04
Page 3 of 7
Mode=22
Range=65
b.Whatisthepositionofthemedianvalueforthisdata?Sincen=50,thepositionisbetweenthe25thand26th
positions.Wewouldtaketheaverageofthesetwovalues.
c.DoesthemodemakesenseasameasureofCentralTendencyforthisdata?BasedontheStemandLeafPlot,the
modeis22%.Thisisameasureofcenterforonebunchingofthedata,butthereismuchmorespreadandaother
groupingsofthedata.
d. Calculateaz-scoreforanacceptancerateof61% z=(61-31.49)/16.04=1.84.Thisvalueis1.84standarddeviationsabovethemean
e. Basedonwhatyouknowaboutthedifferentcriteriausedbydifferentuniversitiestojudgestudentsfor
admittance,whydoyouthinkthisdistributionlooksthewayitdoes?Thinkaboutthespreadofthedataand
themeasuresofspreadforthedata,suchastherangeandstandarddeviation.Doesthespreadseemlarge?
Hint:Harvardhasthelowestacceptancerateat7.9%.ThePennsylvaniaStateUniversityhasanacceptancerate
of51.2%.
Thespreadisverylarge.TheCVis50.94%.Itmightreflectdifferencesbetweenpublicandprivateinstitutions.
Privateinstitutionsgenerallyhaveloweracceptancerates.Publicschoolsmayhaveaspartoftheirmissiontohave
higherratesofacceptancetoprovideeducationalopportunitiestocitizensinthestate.Evenforthemoststress
universities,generallythoughttobethemostrigorous,theacceptancerateforpublicinstitutionsshouldbehigher.
Wecouldthinkofthisdataasbeingtwopopulations.
TheBoxPlotsshowadifferencebetweenPublicandprivate
Universities.
Therestillisalotofspreadforeachtypeofuniversity-some
privateuniversitieshavehighacceptanceratesandsome
publicuniversitieshavelowacceptancerates.Butwecansee
twodistinctgroups.
Page 4 of 7
3. Answerthefollowingquestionsaboutvariabilityofdatasets:
a. Howwouldyoudescribethevarianceandstandarddeviationinwords,ratherthanaformula?Thinkofwhat
youarecalculatingandhowitmightbeusefulindescribingavariable.
TheVarianceistheaverageSquareddeviationaroundthecenter(inthiscasethecenteristhemean).
Thestandarddeviationistheaveragedeviationaroundthecenter(inthiscasethecenteristhemean).
b. Whatistheprimaryadvantageofusingtheinter-quartilerangecomparedwiththerangewhendescribingthe
variabilityofavariable?
Therangeonlyusestwovalues-themaximumandtheminimum-tocalculatetherange.Itcanbevery
sensitivetooutliers.Theinter-quartilerangeshowstherangeofthemiddle50%ofthevalues.
c. Canthestandarddeviationeverbelargerthanthevariance?Explain.
Inmostcasesthestandarddeviationislessthanthevariancesinceitisasquarerootofthevariance.However,
inthespecialcasewherethevarianceisbetween0and1,thestandarddeviationwillbemorethanthe
variance.Forexample,ifS2=.5,thens=.71
d. Canthevarianceeverbenegative?Whyorwhynot?
Sincethevarianceisbasedonasquaredmeasure,no,itcannotbenegative.
e. ShowtheformulafortheCoefficientofVariationandexplainwhatitisandhowitcanbeusefulincomparing
thevariabilityofdifferentvariables.
Theratioofthestandarddeviationtotheabsolutevalueofthemean,usuallymultipliedby100.Itexpressesthe
standarddeviationinrelationtothemean.Itmakesiteasiertocomparethespreadofdifferent
variables,eveniftheyaremeasuredondifferentmetrics
Page 5 of 7
4. Two banks use alternative methods of waiting in line for a teller. Both banks user three tellers. Bank
A uses separate lines for each teller so a customer must pick which line she or he thinks is best. This
approach does allow a customer to pick his/her favorite teller. In contrast, the Bank B uses a single
waiting line which leads customers to the next available teller out of all tellers available.
We take a random sample of 15 customers from each bank and record the waiting time in minutes. We
are asked to analyze the data and determine the differences we note between the approaches of the
two banks. Use graphs and summary measures of central tendency and variability to explain the
differences. In the end, I am asking that you summarize your finding in words and not just numbers.
Here are the data. The data are given below (not sorted) and I provided the Sum(x) and the Sum(x^2):
Sum(x)
Sum(x^2)
71.50
360.19
70.00
330.68
Bank A - 1 Line
5.3
2.5
5.9
4.1
5.4
3.8
5.1
4.1
4.1
5.0
5.1
5.7
3.0
5.3
7.1
Bank B Multiple
5.0
3.8
4.9
4.3
5.0
4.7
3.9
5.4
5.1
4.0
4.1
4.5
5.1
5.4
4.8
a. Graph the two banks using stem and leaf plots. Describe he results of your graphs.
Stem and Leaf Plot of Waiting Time at Two Banks
Bank A
Stem
2
3
4
5
6
7
8
Leaf
5
08
111
01133479
1
Bank B
Stem
Leaf
2
3 89
4 0135789
5 001144
6
7
8
Page 6 of 7
b. Calculate the following for each bank:
Statistics
Bank A
Bank B
Mean
71.5/15 = 4.77
70.00/15 = 4.67
Median
N is odd, so (15+1)/2 = 8th
observation
= 5.1
N is odd, so (15+1)/2 = 8th
observation
= 4.8
Mode
4.1 which occurs 3 times
No unique value
Variance
Std Deviation
(360.19– (71.502/15))/(15-1)
= 1.38
SQRT(1.38) = 1.18
(330.68 – (70.002/15))/(15-1)
= .29
SQRT(.29) = .54
Maximum
7.10
5.40
Minimum
2.5
3.80
Range
7.10 – 2.50 = 4.60
5.40 – 3.80 = 1.60
Coefficient of Variation
1.18/4.77*100 = 24.68
.54/4.67*100 = 11.47
b. Summarize your results in a paragraph
The measures of center for the two lines are close to each other, but the measures of spread are not. The
mean and median for Bank A are close, 4.77 and 5,1, respectively. Likewise the mean and median for
Bank B are close to each other and Bank A at 4.67 and 4.8, respectively. However, the spread for Bank
A is much larger. The Variances are 1.38 and .29, respectively. This can also be seen in the Coefficient
of Variations, with a value of 24.68 for Bank A and 11.47 for Bank B. Allowing customers to pick their
line results in more variability in waiting time compared with a single line.
Page 7 of 7