Download One Quantitative Variable

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
1/30/2015
TheBigPicture
STAT250
Dr.KariLockMorgan
DescribingData:
OneQuantitativeVariable
Population
Sample
SECTIONS2.2,2.3
• Onequantitativevariable(2.2,2.3)
Statistics:UnlockingthePowerofData
Sampling
Statistical
Inference
Lock5
Descriptive
Statistics
Statistics:UnlockingthePowerofData
Lock5
Obesity Trends* Among U.S. Adults
BRFSS, 1990, 2000, 2010
DescriptiveStatistics
(*BMI 30, or about 30 lbs. overweight for 5’4” person)
Inordertomakesenseofdata,weneedwaysto
summarize andvisualize it

2000
1990
Summarizingandvisualizingvariablesand
relationshipsbetweentwovariablesisoftenknown
asdescriptivestatistics(alsoknownasexploratory
dataanalysis)

2010
Typeofsummarystatisticsandvisualization
methodsdependonthetypeofvariable(s)being
analyzed(categoricalorquantitative)


Today:Onequantitativevariable
Statistics:UnlockingthePowerofData
No Data
Lock5
ObesityinAmerica
<10%
10%–14%
15%–19%
20%–24%
25%–29%
≥30%
Source: Behavioral Risk Factor Surveillance System, CDC.
BehavioralRiskFactorSurveillanceSystem
 ObesityisaHUGEprobleminAmerica
 We’llexplorethiswithtwodifferenttypesof
data,bothcollectedbytheCDC:
 Proportionofadultswhoareobeseineachstate
 BMIforarandomsampleofAmericans
http://www.cdc.gov/obesity/data/table‐adults.html
Statistics:UnlockingthePowerofData
Lock5
Statistics:UnlockingthePowerofData
Lock5
1
1/30/2015
ObesitybyState
Dotplot
 Inadotplot,eachcaseisrepresentedby
adotanddotsarestacked.
 Easywaytoseeeachcase
Minitab: Graph -> Dotplot -> One Y -> Simple
Statistics:UnlockingthePowerofData
Lock5
Statistics:UnlockingthePowerofData
Histogram
Lock5
Shape
 Theheightoftheeachbarcorrespondstothe
numberofcaseswithinthatrangeofthevariable
Long
right
tail
5stateswith
obesityrate
between
33.25and
33.75
Symmetric
Right‐Skewed
Left‐Skewed
Minitab: Graph -> Histogram -> Simple
Statistics:UnlockingthePowerofData
Lock5
NationalHealthandNutrition
ExaminationSurvey
Statistics:UnlockingthePowerofData
Statistics:UnlockingthePowerofData
Lock5
BMIofAmericans
Lock5
Statistics:UnlockingthePowerofData
Lock5
2
1/30/2015
BMIofAmericans
Notation
ThedistributionofBMIforAmericanadultsis
a) Symmetric
b) Left‐skewed
c) Right‐skewed
 Thesamplesize,thenumberofcasesinthe
sample,isdenotedbyn
 Weoftenletx ory standforanyvariable,andx1
,x2 ,…,xn representthen valuesofthevariablex
 x1 =32.4,x2 =28.4,x3 =26.8,…
Statistics:UnlockingthePowerofData
Lock5
Mean
Statistics:UnlockingthePowerofData
Lock5
Mean
Themean oraverageofthedatavaluesis
⋯
Theaverageobesityrateacrossthe50statesisµ=28.606.
∑
TheaverageBMIforAmericansinthissampleis
̅ 24.887.
 Samplemean: ̅
 Populationmean: (“mu”)
Minitab: Stat -> Basic Statistics -> Display Descriptive Statistics
Statistics:UnlockingthePowerofData
Lock5
Median
Statistics:UnlockingthePowerofData
Lock5
MeasuresofCenter
 Forsymmetricdistributions,themeanandthe
Themedian,m,isthemiddlevaluewhenthe
dataareordered.
Ifthereareanevennumberofvalues,the
medianistheaverageofthetwomiddlevalues.
medianwillbeaboutthesame
 Forskeweddistributions,themeanwillbe
morepulledtowardsthedirectionofskewness
 Themediansplitsthedatainhalf.
Minitab: Stat -> Basic Statistics -> Display Descriptive Statistics
Statistics:UnlockingthePowerofData
Lock5
Statistics:UnlockingthePowerofData
Lock5
3
1/30/2015
MeasuresofCenter
Skewness andCenter
m=24.163
Adistributionisleft‐skewed.Whichmeasureof
centerwouldyouexpecttobehigher?
Meanis“pulled”
 =24.887 inthedirection
ofskewness
Statistics:UnlockingthePowerofData
a) Mean
b) Median
Lock5
Statistics:UnlockingthePowerofData
Lock5
Outliers
Outlier
Anoutlier isanobservedvaluethat
isnotablydistinctfromtheother
valuesinadataset.
Moreinfohere
Statistics:UnlockingthePowerofData
Lock5
Resistance
 Whenusingstatisticsthatarenotresistantto
outliers,stopandthinkaboutwhetherthe
outlierisamistake
 Themedianisresistantwhilethemeanisnot.
With Outlier
WithoutOutlier
Statistics:UnlockingthePowerofData
Lock5
Outliers
Astatisticisresistant ifitis
relativelyunaffectedbyextreme
values.
Mean
105.22
102.56
Statistics:UnlockingthePowerofData
Median
101.0
100.5
Lock5
 Ifnot,youhavetodecidewhethertheoutlier
ispartofyourpopulationofinterestornot
 Usually,foroutliersthatarenotamistake,it’s
besttoruntheanalysistwice,oncewiththe
outlier(s)andoncewithout,toseehowmuch
theoutlier(s)areaffectingtheresults
Statistics:UnlockingthePowerofData
Lock5
4
1/30/2015
StandardDeviation
StandardDeviation
Thestandarddeviation fora
quantitativevariablemeasuresthe
spreadofthedata
 Thestandarddeviationgivesaroughestimate
∑
̅
ofthetypicaldistanceofadatavaluesfrom
themean
 Thelargerthestandarddeviation,themore
2
variabilitythereisinthedataandthemore
spreadoutthedataare
1
 Samplestandarddeviation:s
 Populationstandarddeviation: (“sigma”)
Minitab: Stat -> Basic Statistics -> Display Descriptive Statistics
Statistics:UnlockingthePowerofData
Lock5
Statistics:UnlockingthePowerofData
95%Rule
150
-5
0
5
150
-10
10
15
 Forapopulation,95%ofthedatawillbebetween
s4
µ– 2 andµ+2
0 50
Frequency
Ifadistributionofdataisapproximately
symmetricandbell‐shaped,about95%
ofthedatashouldfallwithintwo
standarddeviationsofthemean.
s 1
0 50
Frequency
StandardDeviation
-15
Lock5
 Forasample,95%ofthedatawillbebetween
-15
-10
-5
0
5
10
̅
15
2 and ̅
2
Bothofthesedistributionsarebell‐shaped
Statistics:UnlockingthePowerofData
Lock5
The95%Rule
Statistics:UnlockingthePowerofData
Lock5
95%Rule
Giveanintervalthatwilllikelycontain95%of
obesityratesofstates.
Statistics:UnlockingthePowerofData
Lock5
Statistics:UnlockingthePowerofData
Lock5
5
1/30/2015
95%Rule
Couldweusethesamemethodtogetan
intervalthatwillcontain95%ofBMIsof
Americanadults?
150
s 1
0 50
Frequency
The95%Rule
a) Yes
b) No
-2
-1
0
1
2
3
150
s4
0 50
Frequency
-3
-15
-10
-5
0
5
10
15
 StatKey
Statistics:UnlockingthePowerofData
Lock5
Thestandard
deviationforhoursof
sleeppernightis
closestto
Statistics:UnlockingthePowerofData
Lock5
ToDo
The95%Rule
a)
b)
c)
d)
e)
Statistics:UnlockingthePowerofData
 ReadSections2.2and2.3
 DoHomework2.2(dueFriday,2/6)
½
1
2
4
Ihavenoidea
Lock5
Statistics:UnlockingthePowerofData
Lock5
6