Download 12.2.1 Transforming with Powers and Roots When you visit a pizza

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Choice modelling wikipedia , lookup

Data assimilation wikipedia , lookup

Regression analysis wikipedia , lookup

Time series wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
12.2.1TransformingwithPowersandRoots
Whenyouvisitapizzaparlor,youorderapizzabyitsdiameter—say,10inches,12inches,or14inches.Buttheamountyougetto
eatdependsontheareaofthepizza.Theareaofacircleisπtimesthesquareofitsradiusr.Sotheareaofaroundpizzawith
diameterxis
p
Thisisapowermodeloftheformy=ax witha=π/4andp=2
p
Althoughapowermodeloftheformy=ax describestherelationshipbetweenxandyineachofthesesettings,thereis
p
alinearrelationshipbetweenx andy.Ifwetransformthevaluesoftheexplanatoryvariablexbyraisingthemtotheppower,and
p
graphthepoints(x ,y),thescatterplotshouldhavealinearform.Thefollowingexampleshowswhatwemean.
Example–GoFish
TransformingwithPowers
ImaginethatyouhavebeenputinchargeoforganizingafishingtournamentinwhichprizeswillbegivenfortheheaviestAtlantic
Oceanrockfishcaught.Youknowthatmanyofthefishcaughtduringthetournamentwillbemeasuredandreleased.Youarealso
awarethatusingdelicatescalestotrytoweighafishthatisfloppingaroundinamovingboatwillprobablynotyieldveryaccurate
results.Itwouldbemucheasiertomeasurethelengthofthefishwhileontheboat.Whatyouneedisawaytoconvertthelengthof
thefishtoitsweight.
Youcontactthenearbymarineresearchlaboratory,andtheyprovidereferencedataonthelength(incentimeters)andweight(in
grams)forAtlanticOceanrockfishofseveralsizes
Thefigurebelow(left)isascatterplotofthedata;notetheclearcurvedshape.Becauselengthisone-dimensionalandweight(like
3
volume)isthree-dimensional,apowermodeloftheformweight=a(length) shoulddescribetherelationship.Whathappensifwe
3
cubethelengthsinthedatatableandthengraphweightversuslength ?Thefigurebelow(right)givesustheanswer.This
transformationoftheexplanatoryvariablehelpsusproduceagraphthatisquitelinear.
There’sanotherwaytotransformthedataintheaboveexampletoachievelinearity.
Wecantakethecuberootoftheweightvaluesandgraph
versuslength.Thefigurebelowshowsthattheresulting
3
scatterplothasalinearform.Whydoesthistransformationwork?Startwithweight=a(length) andtakethecuberootofbothsides
oftheequation:
Thatis,thereisalinearrelationshipbetweenlengthand
.
Oncewestraightenoutthecurvedpatternintheoriginalscatterplot,wefitaleast-squareslinetothetransformeddata.Thislinear
2
modelcanbeusedtopredictvaluesoftheresponsevariabley.Theresidualsandther -valuetellushowwelltheregressionlinefits
thedata.
Example–GoFish
HereisMinitaboutputfromseparateregressionanalysesofthetwosetsoftransformedAtlanticOceanrockfishdata.
PROBLEM:Doeachofthefollowingforbothtransformations.
(a)Givetheequationoftheleast-squaresregressionline.Defineanyvariablesyouuse.
Transformation1:
Transformation2:
(b)SupposeacontestantinthefishingtournamentcatchesanAtlanticOceanrockfishthat’s36centimeterslong.Usethemodel
frompart(a)topredictthefish’sweight.Showyourwork.
Transformation1:
Transformation2:
grams
grams
(c)Interpretthevalueofsincontext.
ForTransformation1,thestandarddeviationoftheresidualsiss=18.841grams.Predictionsoffishweightusingthismodelwill
beoffbyanaverageofabout19grams.ForTransformation2,s=0.12.Thatis,predictionsofthecuberootoffishweightusing
thismodelwillbeoffbyanaverageofabout0.12.
Whenexperienceortheorysuggeststhattherelationshipbetweentwovariablesisdescribedbyapowermodelofthe
p
formy=ax ,younowhavetwostrategiesfortransformingthedatatoachievelinearity.
p
• 1.Raisethevaluesoftheexplanatoryvariablextotheppowerandplotthepoints(x ,y).
• 2.Takethepthrootofthevaluesoftheresponsevariableyandplotthepoints
.
Whatifyouhavenoideawhatpowertochoose?Youcouldguessandtestuntilyoufindatransformationthatworks.Some
technologycomeswithbuilt-inslidersthatallowyoutodynamicallyadjustthepowerandwatchthescatterplotchangeshapeas
youdo.
Itturnsoutthatthereisamuchmoreefficientmethodforlinearizingacurvedpatterninascatterplot.Insteadoftransformingwith
powersandroots,weuselogarithms.Thismoregeneralmethodworkswhenthedatafollowanunknownpowermodeloranyof
severalothercommonmathematicalmodels.
12.2.2TransformingwithLogarithms
Therelationshipbetweenlifeexpectancyandincomeperpersonisdescribedbyalogarithmicmodeloftheformy=a+blogx.We
canusethismodeltopredicthowlongacountry’scitizenswilllivefromhowmuchmoneytheymake.FortheUnitedStates,which
hasincomeperpersonof$41,256.10,
predictedlifeexpectancy=11.7+15.04log(41,256.1)=81.117years
TheactualU.S.lifeexpectancyin2009was79.43years.
Takingthelogarithmoftheincomeperpersonvaluesstraightenedoutthe
curvedpatternintheoriginalscatterplot.Thelogarithmtransformationcan
alsohelpachievelinearitywhentherelationshipbetweentwovariablesis
describedbyapowermodeloranexponentialmodel.
ExponentialModelsAlinearmodelhastheformy=a+bx.Thevalue
ofyincreases(ordecreases)ataconstantrateasxincreases.The
slopebdescribestheconstantrateofchangeofalinearmodel.Thatis,foreach
1unitincreaseinx,themodelpredictsanincreaseofbunitsiny.Youcanthink
ofalinearmodelasdescribingtherepeatedadditionofaconstant
amount.Sometimestherelationshipbetweenyandxisbasedon
repeatedmultiplicationbyaconstantfactor.Thatis,eachtimexincreasesby1
unit,thevalueofyismultipliedbyb.ExponentialmodelAnexponential
x
modeloftheformy=ab describessuchmultiplicativegrowth.
Populationsoflivingthingstendtogrowexponentiallyifnotrestrainedbyoutsidelimitssuchaslackoffoodorspace.More
pleasantly(unlesswe’retalkingaboutcreditcarddebt!),moneyalsodisplaysexponentialgrowthwheninterestiscompoundedeach
timeperiod.Compoundingmeansthatlastperiod’sincomeearnsincomeinthenextperiod.
Example–Money,Money,Money
Understandingexponentialgrowth
Supposethatyouinvest$100inasavingsaccountthatpays6%interestcompoundedannually.Afterayear,youwillhaveearned
$100(0.06)=$6.00ininterest.Yournewaccountbalanceistheinitialdepositplustheinterestearned:$100+($100)(0.06),or
$106.Wecanrewritethisas$100(1+0.06),ormoresimplyas$100(1.06).Thatis,6%annualinterestmeansthatanyamounton
depositfortheentireyearismultipliedby1.06.Ifyouleavethemoneyinvestedforasecondyear,yournewbalancewillbe
2
[$100(1.06)](1.06)=$100(1.06) =$112.36.Noticethatyouearn$6.36ininterestduringthesecondyear.That’sanother$6in
interestfromyourinitial$100depositplustheinterestonyour$6interestearnedforYear1.Afterxyears,youraccountbalanceyis
x
givenbytheexponentialmodely=100(1.06) .
Theaccompanyingtableshowsthebalanceinyoursavingsaccountattheendofeachofthefirstsixyears.Thefigureshowsthe
growthinyourinvestmentover100years.Itischaracteristicofexponentialgrowththattheincreaseappearsslowforalongperiod
andthenseemstoexplode.
x
Ifanexponentialmodeloftheformy=ab describestherelationshipbetweenxandy,wecanuselogarithmstotransformthedata
toproducealinearrelationship.Startbytakingthelogarithm(we’llusebase10,butthenaturallogarithmlnusingbaseewould
workjustaswell).Thenusealgebraicpropertiesoflogarithmstosimplifytheresultingexpressions.Herearethedetails:
Wecanthenrearrangethefinalequationaslogy=loga+(logb)x.Noticethatlogaandlogbareconstantsbecauseaandbare
constants.Sotheequationgivesalinearmodelrelatingtheexplanatoryvariablextothetransformedvariablelogy.Thus,ifthe
relationshipbetweentwovariablesfollowsanexponentialmodel,andweplotthelogarithm(base10orbasee)ofyagainstx,we
shouldobserveastraight-linepatterninthetransformeddata.
Ifwefitaleast-squaresregressionlinetothetransformeddata,wecanfindthepredictedvalueofthelogarithmofyforanyvalueof
theexplanatoryvariablexbysubstitutingourx-valueintotheequationoftheline.Toobtainthecorrespondingpredictionforthe
responsevariabley,wehaveto“undo”thelogarithmtransformationtoreturntotheoriginalunitsofmeasurement.Onewayof
doingthisistousethedefinitionofalogarithmasanexponent:
Forinstance,ifwehavelogy=2,then
Ifinsteadwehavelny=2,then
Example–Moore’sLawandComputerChips
Logarithmtransformationsandexponentialmodels
GordonMoore,oneofthefoundersofIntelCorporation,predictedin1965thatthenumberoftransistorsonanintegratedcircuit
chipwoulddoubleevery18months.ThisisMoore’slaw,onewaytomeasuretherevolutionincomputing.Herearedataonthe
datesandnumberoftransistorsforIntelmicroprocessors:
(a)Ascatterplotofthenaturallogarithm(logbaseeorln)ofthenumberoftransistorsonacomputerchipversusyearssince
1970isshown.Basedonthisgraph,explainwhyitwouldbereasonabletouseanexponentialmodeltodescribetherelationship
betweennumberoftransistorsandyearssince1970.
Ifanexponentialmodeldescribestherelationshipbetweentwovariablesx
andy,thenweexpectascatterplotof(x,lny)toberoughlylinear.The
scatterplotofln(transistors)versusyearssince1970hasafairlylinear
pattern,especiallythroughtheyear2000.Soanexponentialmodelseems
reasonablehere.
(b)Minitaboutputfromalinearregressionanalysisonthetransformeddataisshownbelow.Givetheequationoftheleastsquaresregressionline.Besuretodefineanyvariablesyouuse.
(c)Useyourmodelfrompart(b)topredictthenumberoftransistorsonanIntelcomputerchipin2020.Showyourwork.
Because2020is50yearssince1970,wehave
(d)Aresidualplotforthelinearregressioninpart(b)isshownbelow.Discusswhatthisgraphtellsyouabouttheappropriateness
ofthemodel.
Theresidualplotshowsadistinctpattern,withtheresidualsgoingfrom
positivetonegativetopositiveaswemovefromlefttoright.Buttheresiduals
aresmallinsizerelativetothetransformedy-values.Also,thescatterplotof
thetransformeddataismuchmorelinearthantheoriginalscatterplot.We
feelreasonablycomfortableusingthismodeltomakepredictionsaboutthe
numberoftransistorsonacomputerchip.
Make sure that you understand the big idea here. The necessary transformation is
carried out by taking the logarithm of the response variable. Your calculator and most
statistical software will calculate the logarithms of all the values of a variable with a
single command. The crucial property of the logarithm for our purposes is that if a
variable grows exponentially, its logarithm grows linearly.
CHECKYOURUNDERSTANDING
The following table gives the resident population of the United States from 1790 to 1880, in millions of people:
The graph at the left below is a scatterplot of these data using years since 1700 as the explanatory variable. A plot of
the natural logarithms of the population values against years since 1700 is shown at the right below.
1. Explain why it would be reasonable to use an exponential model to describe the relationship between the U.S.
population in the years 1790 to 1880 and the number of years since 1700.
Here is some Minitab output from a linear regression analysis on the transformed data.
2. Give the equation of the least-squares regression line. Define any variables you use.
3. Use this linear model to predict the U.S. population in 1890. Show your work.
4. Based on the residual plot, do you expect your prediction to be too high or too low? Justify your answer.
PowerModelsAgainBiologistshavefoundthatmanycharacteristicsoflivingthingsaredescribedquitecloselybypower
models.Therearemoremicethanelephants,andmorefliesthanmice—theabundanceofspeciesfollowsapowermodelwithbody
weightastheexplanatoryvariable.Sodopulserate,lengthoflife,thenumberofeggsabirdlays,andsoon.
Sometimesthepowerscanbepredictedfromgeometry,butsometimestheyaremysterious.Why,forexample,doestherateat
whichanimalsuseenergygoupasthe3/4poweroftheirbodyweight?BiologistscallthisrelationshipKleiber’slaw.Ithasbeen
foundtoworkallthewayfrombacteriatowhales.Thesearchgoesonforsomephysicalorgeometricalexplanationforwhylife
followspowerlaws.
Whenweapplythelogarithmtransformationtotheresponsevariableyinanexponentialmodel,weproducealinear
relationship.Toachievelinearityfromapowermodel,weapplythelogarithmtransformationtobothvariables.Herearethedetails:
p
1.Apowermodelhastheformy=ax ,whereaandpareconstants.
2.Takethelogarithmofbothsidesofthisequation.Usingpropertiesoflogarithms,
p
p
logy=log(ax )=loga+log(x )=loga+plogx
Theequationlogy=loga+plogxshowsthattakingthelogarithmofbothvariablesresultsinalinearrelationshipbetween
logxandlogy.
3.Lookcarefully:thepowerpinthepowermodelbecomestheslopeofthestraightlinethatlinkslogytologx.
Ifapowermodeldescribestherelationshipbetweentwovariables,ascatterplotofthelogarithmsofbothvariablesshouldproduce
alinearpattern.Thenwecanfitaleast-squaresregressionlinetothetransformeddataandusethelinearmodeltomake
predictions.
Example–What’saPlanetAnyway?
Powermodelsandlogarithmtransformations
OnJuly31,2005,ateamofastronomersannouncedthattheyhaddiscoveredwhatappearedtobeanewplanetinoursolar
system.TheyhadfirstobservedthisobjectalmosttwoyearsearlierusingatelescopeatCaltech’sPalomarObservatoryinCalifornia.
OriginallynamedUB313,thepotentialplanetisbiggerthanPlutoandhasanaveragedistanceofabout9.5billionmilesfromthe
sun.(Forreference,Earthisabout93millionmilesfromthesun.)Couldthisnewastronomicalbody,nowcalledEris,beanew
planet?
Atthetimeofthediscovery,therewerenineknownplanetsinoursolarsystem.Herearedataonthedistancefromthesunand
periodofrevolutionofthoseplanets.Notethatdistanceismeasuredinastronomicalunits(AU),thenumberofearthdistancesthe
objectisfromthesun.
Thegraphsbelowshowtheresultsoftwodifferenttransformationsofthedata.Figure(a)plotsthenaturallogarithmofperiod
againstdistancefromthesunforall9planets.Figure(b)plotsthenaturallogarithmofperiodagainstthenaturallogarithmof
distancefromthesunforthe9planets.
(a)Explainwhyapowermodelwouldprovideamoreappropriatedescriptionoftherelationshipbetweenperiodofrevolution
anddistancefromthesunthananexponentialmodel.
Thescatterplotofln(period)versusdistanceisclearlycurved,soanexponentialmodelwouldnotbeappropriate.However,the
graphofln(period)versusln(distance)hasastronglinearpattern,indicatingthatapowermodelwouldbemoreappropriate.
(b)Minitaboutputfromalinearregressionanalysisonthetransformeddatainfigureb(above)isshownbelow.Givethe
equationoftheleast-squaresregressionline.Besuretodefineanyvariablesyouuse.
(c)Useyourmodelfrompart(b)topredicttheperiodofrevolutionforEris,whichis9,500,000,000/93,000,000=102.15AUfrom
thesun.Showyourwork.
Eris’saveragedistancefromthesunis102.15AU.Usingthisvaluefordistanceinourmodelfrompart(b)gives
Topredicttheperiod,wehavetoundothelogarithmtransformation:
Wewouldn’twanttowaitforEristomakeafullrevolutiontoseeifourpredictionisaccurate!
(d)Aresidualplotforthelinearregressioninpart(b)isshownbelow.Doyouexpectyourpredictioninpart(c)tobetoohigh,too
low,orjustright?Justifyyouranswer.
(d)Eris’svalueforln(distance)is6.939,whichwouldfallatthefarrightofthe
residualplot,wherealltheresidualsarepositive.Becauseresidual=actualy−
predictedyseemslikelytobepositive,wewouldexpectourpredictiontobetoo
low.