Download Effective Python: 59 Specific Ways to Write Better Python (Effective

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
AboutThiseBook
ePUBisanopen,industry-standardformatforeBooks.However,supportofePUBand
itsmanyfeaturesvariesacrossreadingdevicesandapplications.Useyourdeviceorapp
settingstocustomizethepresentationtoyourliking.Settingsthatyoucancustomizeoften
includefont,fontsize,singleordoublecolumn,landscapeorportraitmode,andfigures
thatyoucanclickortaptoenlarge.Foradditionalinformationaboutthesettingsand
featuresonyourreadingdeviceorapp,visitthedevicemanufacturer’sWebsite.
Manytitlesincludeprogrammingcodeorconfigurationexamples.Tooptimizethe
presentationoftheseelements,viewtheeBookinsingle-column,landscapemodeand
adjustthefontsizetothesmallestsetting.Inadditiontopresentingcodeand
configurationsinthereflowabletextformat,wehaveincludedimagesofthecodethat
mimicthepresentationfoundintheprintbook;therefore,wherethereflowableformat
maycompromisethepresentationofthecodelisting,youwillseea“Clickheretoview
codeimage”link.Clickthelinktoviewtheprint-fidelitycodeimage.Toreturntothe
previouspageviewed,clicktheBackbuttononyourdeviceorapp.
EffectivePython
59SPECIFICWAYSTOWRITEBETTERPYTHON
BrettSlatkin
UpperSaddleRiver,NJ•Boston•Indianapolis•SanFrancisco
NewYork•Toronto•Montreal•London•Munich•Paris•Madrid
Capetown•Sydney•Tokyo•Singapore•MexicoCity
Manyofthedesignationsusedbymanufacturersandsellerstodistinguishtheirproducts
areclaimedastrademarks.Wherethosedesignationsappearinthisbook,andthe
publisherwasawareofatrademarkclaim,thedesignationshavebeenprintedwithinitial
capitallettersorinallcapitals.
Theauthorandpublisherhavetakencareinthepreparationofthisbook,butmakeno
expressedorimpliedwarrantyofanykindandassumenoresponsibilityforerrorsor
omissions.Noliabilityisassumedforincidentalorconsequentialdamagesinconnection
withorarisingoutoftheuseoftheinformationorprogramscontainedherein.
Forinformationaboutbuyingthistitleinbulkquantities,orforspecialsalesopportunities
(whichmayincludeelectronicversions;customcoverdesigns;andcontentparticularto
yourbusiness,traininggoals,marketingfocus,orbrandinginterests),pleasecontactour
[email protected](800)382-3419.
Forgovernmentsalesinquiries,[email protected].
ForquestionsaboutsalesoutsidetheUnitedStates,pleasecontact
[email protected].
VisitusontheWeb:informit.com/aw
LibraryofCongressCataloging-in-PublicationData
Slatkin,Brett,author.
EffectivePython:59specificwaystowritebetterPython/BrettSlatkin.
pagescm
Includesindex.
ISBN978-0-13-403428-7(pbk.:alk.paper)—ISBN0-13-403428-7(pbk.:alk.paper)
1.Python(Computerprogramlanguage)2.Computerprogramming.I.Title.
QA76.73.P98S572015
005.13’3—dc23
2014048305
Copyright©2015PearsonEducation,Inc.
Allrightsreserved.PrintedintheUnitedStatesofAmerica.Thispublicationisprotected
bycopyright,andpermissionmustbeobtainedfromthepublisherpriortoanyprohibited
reproduction,storageinaretrievalsystem,ortransmissioninanyformorbyanymeans,
electronic,mechanical,photocopying,recording,orlikewise.Toobtainpermissiontouse
materialfromthiswork,pleasesubmitawrittenrequesttoPearsonEducation,Inc.,
PermissionsDepartment,OneLakeStreet,UpperSaddleRiver,NewJersey07458,oryou
mayfaxyourrequestto(201)236-3290.
ISBN-13:978-0-13-403428-7
ISBN-10:0-13-403428-7
TextprintedintheUnitedStatesonrecycledpaperatRRDonnelleyin
Crawfordsville,Indiana.
Firstprinting,March2015
Editor-in-Chief
MarkL.Taub
SeniorAcquisitionsEditor
TrinaMacDonald
ManagingEditor
JohnFuller
Full-ServiceProductionManager
JulieB.Nahil
CopyEditor
StephanieGeels
Indexer
JackLewis
Proofreader
MelissaPanagos
TechnicalReviewers
BrettCannon
TavisRudd
MikeTaylor
EditorialAssistant
OliviaBasegio
CoverDesigner
ChutiPrasertsith
Compositor
LaurelTech
PraiseforEffectivePython
“EachiteminSlatkin’sEffectivePythonteachesaself-containedlessonwithits
ownsourcecode.Thismakesthebookrandom-access:Itemsareeasytobrowse
andstudyinwhateverorderthereaderneeds.IwillberecommendingEffective
Pythontostudentsasanadmirablycompactsourceofmainstreamadviceona
verybroadrangeoftopicsfortheintermediatePythonprogrammer.”
—BrandonRhodes,softwareengineeratDropboxandchairofPyCon2016-2017
“I’vebeenprogramminginPythonforyearsandthoughtIknewitprettywell.
Thankstothistreasuretroveoftipsandtechniques,Irealizethere’ssomuchmore
IcouldbedoingwithmyPythoncodetomakeitfaster(e.g.,usingbuilt-indata
structures),easiertoread(e.g.,enforcingkeyword-onlyarguments),andmuch
morePythonic(e.g.,usingziptoiterateoverlistsinparallel).”
—PamelaFox,educationeer,KhanAcademy
“IfIhadthisbookwhenIfirstswitchedfromJavatoPython,itwouldhavesaved
memanymonthsofrepeatedcoderewrites,whichhappenedeachtimeIrealizedI
wasdoingparticularthings‘non-Pythonically.’Thisbookcollectsthevast
majorityofbasicPython‘must-knows’intooneplace,eliminatingtheneedto
stumbleuponthemone-by-oneoverthecourseofmonthsoryears.Thescopeof
thebookisimpressive,startingwiththeimportanceofPEP8aswellasthatof
majorPythonidioms,thenreachingthroughfunction,methodandclassdesign,
effectivestandardlibraryuse,qualityAPIdesign,testing,andperformance
measurement—thisbookreallyhasitall.Afantasticintroductiontowhatitreally
meanstobeaPythonprogrammerforboththenoviceandtheexperienced
developer.”
—MikeBayer,creatorofSQLAlchemy
“EffectivePythonwilltakeyourPythonskillstothenextlevelwithclear
guidelinesforimprovingPythoncodestyleandfunction.”
—LeahCulver,developeradvocate,Dropbox
“Thisbookisanexceptionallygreatresourceforseasoneddevelopersinother
languageswhoarelookingtoquicklypickupPythonandmovebeyondthebasic
languageconstructsintomorePythoniccode.Theorganizationofthebookis
clear,concise,andeasytodigest,andeachitemandchaptercanstandonitsown
asameditationonaparticulartopic.Thebookcoversthebreadthoflanguage
constructsinpurePythonwithoutconfusingthereaderwiththecomplexitiesof
thebroaderPythonecosystem.Formoreseasoneddevelopersthebookprovides
in-depthexamplesoflanguageconstructstheymaynothavepreviously
encountered,andprovidesexamplesoflesscommonlyusedlanguagefeatures.It
isclearthattheauthorisexceptionallyfacilewithPython,andheuseshis
professionalexperiencetoalertthereadertocommonsubtlebugsandcommon
failuremodes.Furthermore,thebookdoesanexcellentjobofpointingout
subtletiesbetweenPython2.XandPython3.Xandcouldserveasarefresher
courseasonetransitionsbetweenvariantsofPython.”
—KatherineScott,softwarelead,TempoAutomation
“Thisisagreatbookforbothnoviceandexperiencedprogrammers.Thecode
examplesandexplanationsarewellthoughtoutandexplainedconciselyand
thoroughly.”
—C.TitusBrown,associateprofessor,UCDavis
“ThisisanimmenselyusefulresourceforadvancedPythonusageandbuilding
cleaner,moremaintainablesoftware.AnyonelookingtotaketheirPythonskillsto
thenextlevelwouldbenefitfromputtingthebook’sadviceintopractice.”
—WesMcKinney,creatorofpandas;authorofPythonforDataAnalysis;and
softwareengineeratCloudera
Toourfamily,lovedandlost
Contents
Preface
Acknowledgments
AbouttheAuthor
Chapter1:PythonicThinking
Item1:KnowWhichVersionofPythonYou’reUsing
Item2:FollowthePEP8StyleGuide
Item3:KnowtheDifferencesBetweenbytes,str,andunicode
Item4:WriteHelperFunctionsInsteadofComplexExpressions
Item5:KnowHowtoSliceSequences
Item6:AvoidUsingstart,end,andstrideinaSingleSlice
Item7:UseListComprehensionsInsteadofmapandfilter
Item8:AvoidMoreThanTwoExpressionsinListComprehensions
Item9:ConsiderGeneratorExpressionsforLargeComprehensions
Item10:PreferenumerateOverrange
Item11:UseziptoProcessIteratorsinParallel
Item12:AvoidelseBlocksAfterforandwhileLoops
Item13:TakeAdvantageofEachBlockintry/except/else/finally
Chapter2:Functions
Item14:PreferExceptionstoReturningNone
Item15:KnowHowClosuresInteractwithVariableScope
Item16:ConsiderGeneratorsInsteadofReturningLists
Item17:BeDefensiveWhenIteratingOverArguments
Item18:ReduceVisualNoisewithVariablePositionalArguments
Item19:ProvideOptionalBehaviorwithKeywordArguments
Item20:UseNoneandDocstringstoSpecifyDynamicDefaultArguments
Item21:EnforceClaritywithKeyword-OnlyArguments
Chapter3:ClassesandInheritance
Item22:PreferHelperClassesOverBookkeepingwithDictionariesandTuples
Item23:AcceptFunctionsforSimpleInterfacesInsteadofClasses
Item24:Use@classmethodPolymorphismtoConstructObjectsGenerically
Item25:InitializeParentClasseswithsuper
Item26:UseMultipleInheritanceOnlyforMix-inUtilityClasses
Item27:PreferPublicAttributesOverPrivateOnes
Item28:Inheritfromcollections.abcforCustomContainerTypes
Chapter4:MetaclassesandAttributes
Item29:UsePlainAttributesInsteadofGetandSetMethods
Item30:Consider@propertyInsteadofRefactoringAttributes
Item31:UseDescriptorsforReusable@propertyMethods
Item32:Use__getattr__,__getattribute__,and__setattr__forLazy
Attributes
Item33:ValidateSubclasseswithMetaclasses
Item34:RegisterClassExistencewithMetaclasses
Item35:AnnotateClassAttributeswithMetaclasses
Chapter5:ConcurrencyandParallelism
Item36:UsesubprocesstoManageChildProcesses
Item37:UseThreadsforBlockingI/O,AvoidforParallelism
Item38:UseLocktoPreventDataRacesinThreads
Item39:UseQueuetoCoordinateWorkBetweenThreads
Item40:ConsiderCoroutinestoRunManyFunctionsConcurrently
Item41:Considerconcurrent.futuresforTrueParallelism
Chapter6:Built-inModules
Item42:DefineFunctionDecoratorswithfunctools.wraps
Item43:ConsidercontextlibandwithStatementsforReusabletry/finally
Behavior
Item44:MakepickleReliablewithcopyreg
Item45:UsedatetimeInsteadoftimeforLocalClocks
Item46:UseBuilt-inAlgorithmsandDataStructures
Item47:UsedecimalWhenPrecisionIsParamount
Item48:KnowWheretoFindCommunity-BuiltModules
Chapter7:Collaboration
Item49:WriteDocstringsforEveryFunction,Class,andModule
Item50:UsePackagestoOrganizeModulesandProvideStableAPIs
Item51:DefineaRootExceptiontoInsulateCallersfromAPIs
Item52:KnowHowtoBreakCircularDependencies
Item53:UseVirtualEnvironmentsforIsolatedandReproducibleDependencies
Chapter8:Production
Item54:ConsiderModule-ScopedCodetoConfigureDeploymentEnvironments
Item55:UsereprStringsforDebuggingOutput
Item56:TestEverythingwithunittest
Item57:ConsiderInteractiveDebuggingwithpdb
Item58:ProfileBeforeOptimizing
Item59:UsetracemalloctoUnderstandMemoryUsageandLeaks
Index
Preface
ThePythonprogramminglanguagehasuniquestrengthsandcharmsthatcanbehardto
grasp.ManyprogrammersfamiliarwithotherlanguagesoftenapproachPythonfroma
limitedmindsetinsteadofembracingitsfullexpressivity.Someprogrammersgotoofarin
theotherdirection,overusingPythonfeaturesthatcancausebigproblemslater.
ThisbookprovidesinsightintothePythonicwayofwritingprograms:thebestwaytouse
Python.ItbuildsonafundamentalunderstandingofthelanguagethatIassumeyou
alreadyhave.NoviceprogrammerswilllearnthebestpracticesofPython’scapabilities.
Experiencedprogrammerswilllearnhowtoembracethestrangenessofanewtoolwith
confidence.
MygoalistoprepareyoutomakeabigimpactwithPython.
WhatThisBookCovers
Eachchapterinthisbookcontainsabroadbutrelatedsetofitems.Feelfreetojump
betweenitemsandfollowyourinterest.Eachitemcontainsconciseandspecificguidance
explaininghowyoucanwritePythonprogramsmoreeffectively.Itemsincludeadviceon
whattodo,whattoavoid,howtostriketherightbalance,andwhythisisthebestchoice.
TheitemsinthisbookareforPython3andPython2programmersalike(seeItem1:
“KnowWhichVersionofPythonYou’reUsing”).Programmersusingalternativeruntimes
likeJython,IronPython,orPyPyshouldalsofindthemajorityofitemstobeapplicable.
Chapter1:PythonicThinking
ThePythoncommunityhascometousetheadjectivePythonictodescribecodethat
followsaparticularstyle.TheidiomsofPythonhaveemergedovertimethrough
experienceusingthelanguageandworkingwithothers.Thischaptercoversthebestway
todothemostcommonthingsinPython.
Chapter2:Functions
FunctionsinPythonhaveavarietyofextrafeaturesthatmakeaprogrammer’slifeeasier.
Somearesimilartocapabilitiesinotherprogramminglanguages,butmanyareuniqueto
Python.Thischaptercovershowtousefunctionstoclarifyintention,promotereuse,and
reducebugs.
Chapter3:ClassesandInheritance
Pythonisanobject-orientedlanguage.GettingthingsdoneinPythonoftenrequires
writingnewclassesanddefininghowtheyinteractthroughtheirinterfacesand
hierarchies.Thischaptercovershowtouseclassesandinheritancetoexpressyour
intendedbehaviorswithobjects.
Chapter4:MetaclassesandAttributes
MetaclassesanddynamicattributesarepowerfulPythonfeatures.However,theyalso
enableyoutoimplementextremelybizarreandunexpectedbehaviors.Thischaptercovers
thecommonidiomsforusingthesemechanismstoensurethatyoufollowtheruleofleast
surprise.
Chapter5:ConcurrencyandParallelism
Pythonmakesiteasytowriteconcurrentprogramsthatdomanydifferentthings
seeminglyatthesametime.Pythoncanalsobeusedtodoparallelworkthroughsystem
calls,subprocesses,andC-extensions.ThischaptercovershowtobestutilizePythonin
thesesubtlydifferentsituations.
Chapter6:Built-inModules
Pythonisinstalledwithmanyoftheimportantmodulesthatyou’llneedtowriteprograms.
ThesestandardpackagesaresocloselyintertwinedwithidiomaticPythonthattheymayas
wellbepartofthelanguagespecification.Thischaptercoverstheessentialbuilt-in
modules.
Chapter7:Collaboration
CollaboratingonPythonprogramsrequiresyoutobedeliberateabouthowyouwriteyour
code.Evenifyou’reworkingalone,you’llwanttounderstandhowtousemoduleswritten
byothers.Thischaptercoversthestandardtoolsandbestpracticesthatenablepeopleto
worktogetheronPythonprograms.
Chapter8:Production
Pythonhasfacilitiesforadaptingtomultipledeploymentenvironments.Italsohasbuilt-in
modulesthataidinhardeningyourprogramsandmakingthembulletproof.Thischapter
covershowtousePythontodebug,optimize,andtestyourprogramstomaximizequality
andperformanceatruntime.
ConventionsUsedinThisBook
Pythoncodesnippetsinthisbookareinmonospacefontandhavesyntax
highlighting.ItakesomeartisticlicensewiththePythonstyleguidetomakethecode
examplesbetterfittheformatofabookortohighlightthemostimportantparts.When
linesarelong,Iuse characterstoindicatethattheywrap.Itruncatesnippetswith
ellipsescomments(#…)toindicateregionswherecodeexiststhatisn’tessentialfor
expressingthepoint.I’vealsoleftoutembeddeddocumentationtoreducethesizeofcode
examples.Istronglysuggestthatyoudon’tdothisinyourprojects;instead,youshould
followthestyleguide(seeItem2:“FollowthePEP8StyleGuide”)andwrite
documentation(seeItem49:“WriteDocstringsforEveryFunction,Class,andModule”).
Mostcodesnippetsinthisbookareaccompaniedbythecorrespondingoutputfrom
runningthecode.WhenIsay“output,”Imeanconsoleorterminaloutput:whatyousee
whenrunningthePythonprograminaninteractiveinterpreter.Outputsectionsarein
monospacefontandareprecededbya>>>line(thePythoninteractiveprompt).Theidea
isthatyoucouldtypethecodesnippetsintoaPythonshellandreproducetheexpected
output.
Finally,therearesomeothersectionsinmonospacefontthatarenotprecededbya>>>
line.TheserepresenttheoutputofrunningprogramsbesidesthePythoninterpreter.These
examplesoftenbeginwith$characterstoindicatethatI’mrunningprogramsfroma
command-lineshelllikeBash.
WheretoGettheCodeandErrata
It’susefultoviewsomeoftheexamplesinthisbookaswholeprogramswithout
interleavedprose.Thisalsogivesyouachancetotinkerwiththecodeyourselfand
understandwhytheprogramworksasdescribed.Youcanfindthesourcecodeforallcode
snippetsinthisbookonthebook’swebsite(http://www.effectivepython.com).Anyerrors
foundinthebookwillhavecorrectionspostedonthewebsite.
Acknowledgments
Thisbookwouldnothavebeenpossiblewithouttheguidance,support,and
encouragementfrommanypeopleinmylife.
ThankstoScottMeyersfortheEffectiveSoftwareDevelopmentseries.Ifirstread
EffectiveC++whenIwas15yearsoldandfellinlovewiththelanguage.There’sno
doubtthatScott’sbooksledtomyacademicexperienceandfirstjobatGoogle.I’m
thrilledtohavehadtheopportunitytowritethisbook.
Thankstomycoretechnicalreviewersforthedepthandthoroughnessoftheirfeedback:
BrettCannon,TavisRudd,andMikeTaylor.ThankstoLeahCulverandAdrianHolovaty
forthinkingthisbookwouldbeagoodidea.Thankstomyfriendswhopatientlyread
earlierversionsofthisbook:MichaelLevine,MarziaNiccolai,AdeOshineye,andKatrina
Sostek.ThankstomycolleaguesatGooglefortheirreview.Withoutallofyourhelp,this
bookwouldhavebeeninscrutable.
Thankstoeveryoneinvolvedinmakingthisbookareality.ThankstomyeditorTrina
MacDonaldforkickingofftheprocessandbeingsupportivethroughout.Thankstothe
teamwhowereinstrumental:developmenteditorsTomCirtinandChrisZahn,editorial
assistantOliviaBasegio,marketingmanagerStephaneNakib,copyeditorStephanie
Geels,andproductioneditorJulieNahil.
ThankstothewonderfulPythonprogrammersI’veknownandworkedwith:Anthony
Baxter,BrettCannon,WesleyChun,JeremyHylton,AlexMartelli,NealNorwitz,Guido
vanRossum,AndySmith,GregStein,andKa-PingYee.Iappreciateyourtutelageand
leadership.PythonhasanexcellentcommunityandIfeelluckytobeapartofit.
Thankstomyteammatesovertheyearsforlettingmebetheworstplayerintheband.
ThankstoKevinGibbsforhelpingmetakerisks.ThankstoKenAshcraft,RyanBarrett,
andJonMcAlisterforshowingmehowit’sdone.ThankstoBradFitzpatrickfortakingit
tothenextlevel.ThankstoPaulMcDonaldforco-foundingourcrazyproject.Thanksto
JeremyGinsbergandJackHebertformakingitareality.
ThankstotheinspiringprogrammingteachersI’vehad:BenChelf,VinceHugo,Russ
Lewin,JonStemmle,DerekThomson,andDanielWang.Withoutyourinstruction,I
wouldneverhavepursuedourcraftorgainedtheperspectiverequiredtoteachothers.
Thankstomymotherforgivingmeasenseofpurposeandencouragingmetobecomea
programmer.Thankstomybrother,mygrandparents,andtherestofmyfamilyand
childhoodfriendsforbeingrolemodelsasIgrewupandfoundmypassion.
Finally,thankstomywife,Colleen,forherlove,support,andlaughterthroughthejourney
oflife.
AbouttheAuthor
BrettSlatkinisaseniorstaffsoftwareengineeratGoogle.Heistheengineeringleadand
co-founderofGoogleConsumerSurveys.HeformerlyworkedonGoogleAppEngine’s
Pythoninfrastructure.Heistheco-creatorofthePubSubHubbubprotocol.Nineyearsago
hecuthisteethusingPythontomanageGoogle’senormousfleetofservers.
Outsideofhisdayjob,heworksonopensourcetoolsandwritesaboutsoftware,bicycles,
andothertopicsonhispersonalwebsite(http://onebigfluke.com).HeearnedhisB.S.in
computerengineeringfromColumbiaUniversityintheCityofNewYork.HelivesinSan
Francisco.
1.PythonicThinking
Theidiomsofaprogramminglanguagearedefinedbyitsusers.Overtheyears,the
PythoncommunityhascometousetheadjectivePythonictodescribecodethatfollowsa
particularstyle.ThePythonicstyleisn’tregimentedorenforcedbythecompiler.Ithas
emergedovertimethroughexperienceusingthelanguageandworkingwithothers.
Pythonprogrammersprefertobeexplicit,tochoosesimpleovercomplex,andto
maximizereadability(typeimportthis).
ProgrammersfamiliarwithotherlanguagesmaytrytowritePythonasifit’sC++,Java,or
whatevertheyknowbest.Newprogrammersmaystillbegettingcomfortablewiththevast
rangeofconceptsexpressibleinPython.It’simportantforeveryonetoknowthebest—the
Pythonic—waytodothemostcommonthingsinPython.Thesepatternswillaffectevery
programyouwrite.
Item1:KnowWhichVersionofPythonYou’reUsing
Throughoutthisbook,themajorityofexamplecodeisinthesyntaxofPython3.4
(releasedMarch17,2014).Thisbookalsoprovidessomeexamplesinthesyntaxof
Python2.7(releasedJuly3,2010)tohighlightimportantdifferences.Mostofmyadvice
appliestoallofthepopularPythonruntimes:CPython,Jython,IronPython,PyPy,etc.
ManycomputerscomewithmultipleversionsofthestandardCPythonruntime
preinstalled.However,thedefaultmeaningofpythononthecommand-linemaynotbe
clear.pythonisusuallyanaliasforpython2.7,butitcansometimesbeanaliasfor
olderversionslikepython2.6orpython2.5.Tofindoutexactlywhichversionof
Pythonyou’reusing,youcanusethe--versionflag.
$python—version
Python2.7.8
Python3isusuallyavailableunderthenamepython3.
$python3—version
Python3.4.2
YoucanalsofigureouttheversionofPythonyou’reusingatruntimebyinspectingvalues
inthesysbuilt-inmodule.
Clickheretoviewcodeimage
importsys
print(sys.version_info)
print(sys.version)
>>>
sys.version_info(major=3,minor=4,micro=2,releaselevel=‘final’,serial=0)
3.4.2(default,Oct192014,17:52:17)
[GCC4.2.1CompatibleAppleLLVM6.0(clang-600.0.51)]
Python2andPython3arebothactivelymaintainedbythePythoncommunity.
DevelopmentonPython2isfrozenbeyondbugfixes,securityimprovements,and
backportstoeasethetransitionfromPython2toPython3.Helpfultoolslikethe2to3
andsixexisttomakeiteasiertoadoptPython3goingforward.
Python3isconstantlygettingnewfeaturesandimprovementsthatwillneverbeaddedto
Python2.Asofthewritingofthisbook,themajorityofPython’smostcommonopen
sourcelibrariesarecompatiblewithPython3.IstronglyencourageyoutousePython3
foryournextPythonproject.
ThingstoRemember
TherearetwomajorversionsofPythonstillinactiveuse:Python2andPython3.
TherearemultiplepopularruntimesforPython:CPython,Jython,IronPython,PyPy,
etc.
Besurethatthecommand-lineforrunningPythononyoursystemistheversionyou
expectittobe.
PreferPython3foryournextprojectbecausethatistheprimaryfocusofthePython
community.
Item2:FollowthePEP8StyleGuide
PythonEnhancementProposal#8,otherwiseknownasPEP8,isthestyleguideforhowto
formatPythoncode.YouarewelcometowritePythoncodehoweveryouwant,aslongas
ithasvalidsyntax.However,usingaconsistentstylemakesyourcodemoreapproachable
andeasiertoread.SharingacommonstylewithotherPythonprogrammersinthelarger
communityfacilitatescollaborationonprojects.Butevenifyouaretheonlyonewhowill
everreadyourcode,followingthestyleguidewillmakeiteasiertochangethingslater.
PEP8hasawealthofdetailsabouthowtowriteclearPythoncode.Itcontinuestobe
updatedasthePythonlanguageevolves.It’sworthreadingthewholeguideonline
(http://www.python.org/dev/peps/pep-0008/).Hereareafewrulesyoushouldbesureto
follow:
Whitespace:InPython,whitespaceissyntacticallysignificant.Pythonprogrammers
areespeciallysensitivetotheeffectsofwhitespaceoncodeclarity.
•Usespacesinsteadoftabsforindentation.
•Usefourspacesforeachlevelofsyntacticallysignificantindenting.
•Linesshouldbe79charactersinlengthorless.
•Continuationsoflongexpressionsontoadditionallinesshouldbeindentedbyfour
extraspacesfromtheirnormalindentationlevel.
•Inafile,functionsandclassesshouldbeseparatedbytwoblanklines.
•Inaclass,methodsshouldbeseparatedbyoneblankline.
•Don’tputspacesaroundlistindexes,functioncalls,orkeywordargument
assignments.
•Putone—andonlyone—spacebeforeandaftervariableassignments.
Naming:PEP8suggestsuniquestylesofnamingfordifferentpartsinthelanguage.
Thismakesiteasytodistinguishwhichtypecorrespondstoeachnamewhenreading
code.
•Functions,variables,andattributesshouldbeinlowercase_underscore
format.
•Protectedinstanceattributesshouldbein_leading_underscoreformat.
•Privateinstanceattributesshouldbein__double_leading_underscore
format.
•ClassesandexceptionsshouldbeinCapitalizedWordformat.
•Module-levelconstantsshouldbeinALL_CAPSformat.
•Instancemethodsinclassesshoulduseselfasthenameofthefirstparameter
(whichreferstotheobject).
•Classmethodsshoulduseclsasthenameofthefirstparameter(whichrefersto
theclass).
ExpressionsandStatements:TheZenofPythonstates:“Thereshouldbeone—and
preferablyonlyone—obviouswaytodoit.”PEP8attemptstocodifythisstyleinits
guidanceforexpressionsandstatements.
•Useinlinenegation(ifaisnotb)insteadofnegationofpositiveexpressions
(ifnotaisb).
•Don’tcheckforemptyvalues(like[]or'')bycheckingthelength(if
len(somelist)==0).Useifnotsomelistandassumeemptyvalues
implicitlyevaluatetoFalse.
•Thesamethinggoesfornon-emptyvalues(like[1]or'hi').Thestatementif
somelistisimplicitlyTruefornon-emptyvalues.
•Avoidsingle-lineifstatements,forandwhileloops,andexceptcompound
statements.Spreadtheseovermultiplelinesforclarity.
•Alwaysputimportstatementsatthetopofafile.
•Alwaysuseabsolutenamesformoduleswhenimportingthem,notnamesrelativeto
thecurrentmodule’sownpath.Forexample,toimportthefoomodulefromthe
barpackage,youshoulddofrombarimportfoo,notjustimportfoo.
•Ifyoumustdorelativeimports,usetheexplicitsyntaxfrom.importfoo.
•Importsshouldbeinsectionsinthefollowingorder:standardlibrarymodules,thirdpartymodules,yourownmodules.Eachsubsectionshouldhaveimportsin
alphabeticalorder.
Note
ThePylinttool(http://www.pylint.org/)isapopularstaticanalyzerforPython
sourcecode.PylintprovidesautomatedenforcementofthePEP8styleguideand
detectsmanyothertypesofcommonerrorsinPythonprograms.
ThingstoRemember
AlwaysfollowthePEP8styleguidewhenwritingPythoncode.
SharingacommonstylewiththelargerPythoncommunityfacilitatescollaboration
withothers.
Usingaconsistentstylemakesiteasiertomodifyyourowncodelater.
Item3:KnowtheDifferencesBetweenbytes,str,and
unicode
InPython3,therearetwotypesthatrepresentsequencesofcharacters:bytesandstr.
Instancesofbytescontainraw8-bitvalues.InstancesofstrcontainUnicode
characters.
InPython2,therearetwotypesthatrepresentsequencesofcharacters:strand
unicode.IncontrasttoPython3,instancesofstrcontainraw8-bitvalues.Instancesof
unicodecontainUnicodecharacters.
TherearemanywaystorepresentUnicodecharactersasbinarydata(raw8-bitvalues).
ThemostcommonencodingisUTF-8.Importantly,strinstancesinPython3and
unicodeinstancesinPython2donothaveanassociatedbinaryencoding.Toconvert
Unicodecharacterstobinarydata,youmustusetheencodemethod.Toconvertbinary
datatoUnicodecharacters,youmustusethedecodemethod.
Whenyou’rewritingPythonprograms,it’simportanttodoencodinganddecodingof
Unicodeatthefurthestboundaryofyourinterfaces.Thecoreofyourprogramshoulduse
Unicodecharactertypes(strinPython3,unicodeinPython2)andshouldnotassume
anythingaboutcharacterencodings.Thisapproachallowsyoutobeveryacceptingof
alternativetextencodings(suchasLatin-1,ShiftJIS,andBig5)whilebeingstrictabout
youroutputtextencoding(ideally,UTF-8).
ThesplitbetweencharactertypesleadstotwocommonsituationsinPythoncode:
Youwanttooperateonraw8-bitvaluesthatareUTF-8-encodedcharacters(orsome
otherencoding).
YouwanttooperateonUnicodecharactersthathavenospecificencoding.
You’lloftenneedtwohelperfunctionstoconvertbetweenthesetwocasesandtoensure
thatthetypeofinputvaluesmatchesyourcode’sexpectations.
InPython3,you’llneedonemethodthattakesastrorbytesandalwaysreturnsa
str.
Clickheretoviewcodeimage
defto_str(bytes_or_str):
ifisinstance(bytes_or_str,bytes):
value=bytes_or_str.decode(‘utf-8’)
else:
value=bytes_or_str
returnvalue#Instanceofstr
You’llneedanothermethodthattakesastrorbytesandalwaysreturnsabytes.
Clickheretoviewcodeimage
defto_bytes(bytes_or_str):
ifisinstance(bytes_or_str,str):
value=bytes_or_str.encode(‘utf-8’)
else:
value=bytes_or_str
returnvalue#Instanceofbytes
InPython2,you’llneedonemethodthattakesastrorunicodeandalwaysreturnsa
unicode.
Clickheretoviewcodeimage
#Python2
defto_unicode(unicode_or_str):
ifisinstance(unicode_or_str,str):
value=unicode_or_str.decode(‘utf-8’)
else:
value=unicode_or_str
returnvalue#Instanceofunicode
You’llneedanothermethodthattakesstrorunicodeandalwaysreturnsastr.
Clickheretoviewcodeimage
#Python2
defto_str(unicode_or_str):
ifisinstance(unicode_or_str,unicode):
value=unicode_or_str.encode(‘utf-8’)
else:
value=unicode_or_str
returnvalue#Instanceofstr
Therearetwobiggotchaswhendealingwithraw8-bitvaluesandUnicodecharactersin
Python.
ThefirstissueisthatinPython2,unicodeandstrinstancesseemtobethesametype
whenastronlycontains7-bitASCIIcharacters.
Youcancombinesuchastrandunicodetogetherusingthe+operator.
Youcancomparesuchstrandunicodeinstancesusingequalityandinequality
operators.
Youcanuseunicodeinstancesforformatstringslike'%s'.
Allofthisbehaviormeansthatyoucanoftenpassastrorunicodeinstancetoa
functionexpectingoneortheotherandthingswilljustwork(aslongasyou’reonly
dealingwith7-bitASCII).InPython3,bytesandstrinstancesareneverequivalent—
noteventheemptystring—soyoumustbemoredeliberateaboutthetypesofcharacter
sequencesthatyou’repassingaround.
ThesecondissueisthatinPython3,operationsinvolvingfilehandles(returnedbythe
openbuilt-infunction)defaulttoUTF-8encoding.InPython2,fileoperationsdefaultto
binaryencoding.Thiscausessurprisingfailures,especiallyforprogrammersaccustomed
toPython2.
Forexample,sayyouwanttowritesomerandombinarydatatoafile.InPython2,this
works.InPython3,thisbreaks.
Clickheretoviewcodeimage
withopen(‘/tmp/random.bin’,‘w’)asf:
f.write(os.urandom(10))
>>>
TypeError:mustbestr,notbytes
Thecauseofthisexceptionisthenewencodingargumentforopenthatwasaddedin
Python3.Thisparameterdefaultsto'utf-8'.Thatmakesreadandwriteoperations
onfilehandlesexpectstrinstancescontainingUnicodecharactersinsteadofbytes
instancescontainingbinarydata.
Tomakethisworkproperly,youmustindicatethatthedataisbeingopenedinwrite
binarymode('wb')insteadofwritecharactermode('w').Here,Iuseopeninaway
thatworkscorrectlyinPython2andPython3:
Clickheretoviewcodeimage
withopen(‘/tmp/random.bin’,‘wb’)asf:
f.write(os.urandom(10))
Thisproblemalsoexistsforreadingdatafromfiles.Thesolutionisthesame:Indicate
binarymodebyusing'rb'insteadof'r'whenopeningafile.
ThingstoRemember
InPython3,bytescontainssequencesof8-bitvalues,strcontainssequencesof
Unicodecharacters.bytesandstrinstancescan’tbeusedtogetherwithoperators
(like>or+).
InPython2,strcontainssequencesof8-bitvalues,unicodecontainssequences
ofUnicodecharacters.strandunicodecanbeusedtogetherwithoperatorsif
thestronlycontains7-bitASCIIcharacters.
Usehelperfunctionstoensurethattheinputsyouoperateonarethetypeof
charactersequenceyouexpect(8-bitvalues,UTF-8encodedcharacters,Unicode
characters,etc.).
Ifyouwanttoreadorwritebinarydatato/fromafile,alwaysopenthefileusinga
binarymode(like'rb'or'wb').
Item4:WriteHelperFunctionsInsteadofComplex
Expressions
Python’spithysyntaxmakesiteasytowritesingle-lineexpressionsthatimplementalot
oflogic.Forexample,sayyouwanttodecodethequerystringfromaURL.Here,each
querystringparameterrepresentsanintegervalue:
Clickheretoviewcodeimage
fromurllib.parseimportparse_qs
my_values=parse_qs(‘red=5&blue=0&green=’,
keep_blank_values=True)
print(repr(my_values))
>>>
{‘red’:[‘5’],‘green’:[”],‘blue’:[‘0’]}
Somequerystringparametersmayhavemultiplevalues,somemayhavesinglevalues,
somemaybepresentbuthaveblankvalues,andsomemaybemissingentirely.Usingthe
getmethodontheresultdictionarywillreturndifferentvaluesineachcircumstance.
Clickheretoviewcodeimage
print(‘Red:’,my_values.get(‘red’))
print(‘Green:’,my_values.get(‘green’))
print(‘Opacity:‘,my_values.get(‘opacity’))
>>>
Red:[‘5’]
Green:[”]
Opacity:None
It’dbeniceifadefaultvalueof0wasassignedwhenaparameterisn’tsuppliedoris
blank.YoumightchoosetodothiswithBooleanexpressionsbecauseitfeelslikethis
logicdoesn’tmeritawholeifstatementorhelperfunctionquiteyet.
Python’ssyntaxmakesthischoicealltooeasy.Thetrickhereisthattheemptystring,the
emptylist,andzeroallevaluatetoFalseimplicitly.Thus,theexpressionsbelowwill
evaluatetothesubexpressionaftertheoroperatorwhenthefirstsubexpressionis
False.
Clickheretoviewcodeimage
#Forquerystring‘red=5&blue=0&green=’
red=my_values.get(‘red’,[”])[0]or0
green=my_values.get(‘green’,[”])[0]or0
opacity=my_values.get(‘opacity’,[”])[0]or0
print(‘Red:%r’%red)
print(‘Green:%r’%green)
print(‘Opacity:%r’%opacity)
>>>
Red:‘5’
Green:0
Opacity:0
Theredcaseworksbecausethekeyispresentinthemy_valuesdictionary.Thevalue
isalistwithonemember:thestring'5'.ThisstringimplicitlyevaluatestoTrue,so
redisassignedtothefirstpartoftheorexpression.
Thegreencaseworksbecausethevalueinthemy_valuesdictionaryisalistwithone
member:anemptystring.TheemptystringimplicitlyevaluatestoFalse,causingtheor
expressiontoevaluateto0.
Theopacitycaseworksbecausethevalueinthemy_valuesdictionaryismissing
altogether.Thebehaviorofthegetmethodistoreturnitssecondargumentifthekey
doesn’texistinthedictionary.Thedefaultvalueinthiscaseisalistwithonemember,an
emptystring.Whenopacityisn’tfoundinthedictionary,thiscodedoesexactlythe
samethingasthegreencase.
However,thisexpressionisdifficulttoreadanditstilldoesn’tdoeverythingyouneed.
You’dalsowanttoensurethatalltheparametervaluesareintegerssoyoucanusethemin
mathematicalexpressions.Todothat,you’dwrapeachexpressionwiththeintbuilt-in
functiontoparsethestringasaninteger.
Clickheretoviewcodeimage
red=int(my_values.get(‘red’,[”])[0]or0)
Thisisnowextremelyhardtoread.There’ssomuchvisualnoise.Thecodeisn’t
approachable.Anewreaderofthecodewouldhavetospendtoomuchtimepickingapart
theexpressiontofigureoutwhatitactuallydoes.Eventhoughit’snicetokeepthings
short,it’snotworthtryingtofitthisallononeline.
Python2.5addedif/elseconditional—orternary—expressionstomakecaseslikethis
clearerwhilekeepingthecodeshort.
Clickheretoviewcodeimage
red=my_values.get(‘red’,[”])
red=int(red[0])ifred[0]else0
Thisisbetter.Forlesscomplicatedsituations,if/elseconditionalexpressionscanmake
thingsveryclear.Buttheexampleaboveisstillnotasclearasthealternativeofafull
if/elsestatementovermultiplelines.Seeingallofthelogicspreadoutlikethismakes
thedenseversionseemevenmorecomplex.
Clickheretoviewcodeimage
green=my_values.get(‘green’,[”])
ifgreen[0]:
green=int(green[0])
else:
green=0
Writingahelperfunctionisthewaytogo,especiallyifyouneedtousethislogic
repeatedly.
Clickheretoviewcodeimage
defget_first_int(values,key,default=0):
found=values.get(key,[”])
iffound[0]:
found=int(found[0])
else:
found=default
returnfound
Thecallingcodeismuchclearerthanthecomplexexpressionusingorandthetwo-line
versionusingtheif/elseexpression.
Clickheretoviewcodeimage
green=get_first_int(my_values,‘green’)
Assoonasyourexpressionsgetcomplicated,it’stimetoconsidersplittingtheminto
smallerpiecesandmovinglogicintohelperfunctions.Whatyougaininreadability
alwaysoutweighswhatbrevitymayhaveaffordedyou.Don’tletPython’spithysyntaxfor
complexexpressionsgetyouintoamesslikethis.
ThingstoRemember
Python’ssyntaxmakesitalltooeasytowritesingle-lineexpressionsthatareoverly
complicatedanddifficulttoread.
Movecomplexexpressionsintohelperfunctions,especiallyifyouneedtousethe
samelogicrepeatedly.
Theif/elseexpressionprovidesamorereadablealternativetousingBoolean
operatorslikeorandandinexpressions.
Item5:KnowHowtoSliceSequences
Pythonincludessyntaxforslicingsequencesintopieces.Slicingletsyouaccessasubset
ofasequence’sitemswithminimaleffort.Thesimplestusesforslicingarethebuilt-in
typeslist,str,andbytes.SlicingcanbeextendedtoanyPythonclassthat
implementsthe__getitem__and__setitem__specialmethods(seeItem28:
“Inheritfromcollections.abcforCustomContainerTypes”).
Thebasicformoftheslicingsyntaxissomelist[start:end],wherestartis
inclusiveandendisexclusive.
Clickheretoviewcodeimage
a=[‘a’,‘b’,‘c’,‘d’,‘e’,‘f’,‘g’,‘h’]
print(‘Firstfour:’,a[:4])
print(‘Lastfour:‘,a[-4:])
print(‘Middletwo:’,a[3:-3])
>>>
Firstfour:[‘a’,‘b’,‘c’,‘d’]
Lastfour:[‘e’,‘f’,‘g’,‘h’]
Middletwo:[‘d’,‘e’]
Whenslicingfromthestartofalist,youshouldleaveoutthezeroindextoreducevisual
noise.
asserta[:5]==a[0:5]
Whenslicingtotheendofalist,youshouldleaveoutthefinalindexbecauseit’s
redundant.
asserta[5:]==a[5:len(a)]
Usingnegativenumbersforslicingishelpfulfordoingoffsetsrelativetotheendofalist.
Alloftheseformsofslicingwouldbecleartoanewreaderofyourcode.Thereareno
surprises,andIencourageyoutousethesevariations.
Clickheretoviewcodeimage
a[:]#[‘a’,‘b’,‘c’,‘d’,‘e’,‘f’,‘g’,‘h’]
a[:5]#[‘a’,‘b’,‘c’,‘d’,‘e’]
a[:-1]#[‘a’,‘b’,‘c’,‘d’,‘e’,‘f’,‘g’]
a[4:]#[‘e’,‘f’,‘g’,‘h’]
a[-3:]#[‘f’,‘g’,‘h’]
a[2:5]#[‘c’,‘d’,‘e’]
a[2:-1]#[‘c’,‘d’,‘e’,‘f’,‘g’]
a[-3:-1]#[‘f’,‘g’]
Slicingdealsproperlywithstartandendindexesthatarebeyondtheboundariesofthe
list.Thatmakesiteasyforyourcodetoestablishamaximumlengthtoconsiderforan
inputsequence.
first_twenty_items=a[:20]
last_twenty_items=a[-20:]
Incontrast,accessingthesameindexdirectlycausesanexception.
Clickheretoviewcodeimage
a[20]
>>>
IndexError:listindexoutofrange
Note
Bewarethatindexingalistbyanegativevariableisoneofthefewsituationsin
whichyoucangetsurprisingresultsfromslicing.Forexample,theexpression
somelist[-n:]willworkfinewhennisgreaterthanone(e.g.,
somelist[-3:]).However,whenniszero,theexpressionsomelist[-0:]
willresultinacopyoftheoriginallist.
Theresultofslicingalistisawholenewlist.Referencestotheobjectsfromtheoriginal
listaremaintained.Modifyingtheresultofslicingwon’taffecttheoriginallist.
Clickheretoviewcodeimage
b=a[4:]
print(‘Before:’,b)
b[1]=99
print(‘After:’,b)
print(‘Nochange:’,a)
>>>
Before:[‘e’,‘f’,‘g’,‘h’]
After:[‘e’,99,‘g’,‘h’]
Nochange:[‘a’,‘b’,‘c’,‘d’,‘e’,‘f’,‘g’,‘h’]
Whenusedinassignments,sliceswillreplacethespecifiedrangeintheoriginallist.
Unliketupleassignments(likea,b=c[:2]),thelengthofsliceassignmentsdon’t
needtobethesame.Thevaluesbeforeandaftertheassignedslicewillbepreserved.The
listwillgroworshrinktoaccommodatethenewvalues.
Clickheretoviewcodeimage
print(‘Before‘,a)
a[2:7]=[99,22,14]
print(‘After’,a)
>>>
Before[‘a’,‘b’,‘c’,‘d’,‘e’,‘f’,‘g’,‘h’]
After[‘a’,‘b’,99,22,14,‘h’]
Ifyouleaveoutboththestartandtheendindexeswhenslicing,you’llendupwithacopy
oftheoriginallist.
Clickheretoviewcodeimage
b=a[:]
assertb==aandbisnota
Ifyouassignaslicewithnostartorendindexes,you’llreplaceitsentirecontentswitha
copyofwhat’sreferenced(insteadofallocatinganewlist).
Clickheretoviewcodeimage
b=a
print(‘Before’,a)
a[:]=[101,102,103]
assertaisb#Stillthesamelistobject
print(‘After‘,a)#Nowhasdifferentcontents
>>>
Before[‘a’,‘b’,99,22,14,‘h’]
After[101,102,103]
ThingstoRemember
Avoidbeingverbose:Don’tsupply0forthestartindexorthelengthofthe
sequencefortheendindex.
Slicingisforgivingofstartorendindexesthatareoutofbounds,makingiteasy
toexpressslicesonthefrontorbackboundariesofasequence(likea[:20]or
a[-20:]).
Assigningtoalistslicewillreplacethatrangeintheoriginalsequencewith
what’sreferencedeveniftheirlengthsaredifferent.
Item6:AvoidUsingstart,end,andstrideinaSingle
Slice
Inadditiontobasicslicing(seeItem5:“KnowHowtoSliceSequences”),Pythonhas
specialsyntaxforthestrideofasliceintheformsomelist[start:end:stride].
Thisletsyoutakeeverynthitemwhenslicingasequence.Forexample,thestridemakes
iteasytogroupbyevenandoddindexesinalist.
Clickheretoviewcodeimage
a=[‘red’,‘orange’,‘yellow’,‘green’,‘blue’,‘purple’]
odds=a[::2]
evens=a[1::2]
print(odds)
print(evens)
>>>
[‘red’,‘yellow’,‘blue’]
[‘orange’,‘green’,‘purple’]
Theproblemisthatthestridesyntaxoftencausesunexpectedbehaviorthatcan
introducebugs.Forexample,acommonPythontrickforreversingabytestringistoslice
thestringwithastrideof-1.
x=b’mongoose’
y=x[::-1]
print(y)
>>>
b’esoognom’
ThatworkswellforbytestringsandASCIIcharacters,butitwillbreakforUnicode
charactersencodedasUTF-8bytestrings.
Clickheretoviewcodeimage
w=‘
’
x=w.encode(‘utf-8’)
y=x[::-1]
z=y.decode(‘utf-8’)
>>>
UnicodeDecodeError:‘utf-8’codeccan’tdecodebyte0x9din
position0:invalidstartbyte
Arenegativestridesbesides-1useful?Considerthefollowingexamples.
Clickheretoviewcodeimage
a=[‘a’,‘b’,‘c’,‘d’,‘e’,‘f’,‘g’,‘h’]
a[::2]#[‘a’,‘c’,‘e’,‘g’]
a[::-2]#[‘h’,‘f’,‘d’,‘b’]
Here,::2meansselecteveryseconditemstartingatthebeginning.Trickier,::-2
meansselecteveryseconditemstartingattheendandmovingbackwards.
Whatdoyouthink2::2means?Whatabout-2::-2vs.-2:2:-2vs.2:2:-2?
Clickheretoviewcodeimage
a[2::2]#[‘c’,‘e’,‘g’]
a[-2::-2]#[‘g’,‘e’,‘c’,‘a’]
a[-2:2:-2]#[‘g’,‘e’]
a[2:2:-2]#[]
Thepointisthatthestridepartoftheslicingsyntaxcanbeextremelyconfusing.
Havingthreenumberswithinthebracketsishardenoughtoreadbecauseofitsdensity.
Thenit’snotobviouswhenthestartandendindexescomeintoeffectrelativetothe
stridevalue,especiallywhenstrideisnegative.
Topreventproblems,avoidusingstridealongwithstartandendindexes.Ifyou
mustuseastride,prefermakingitapositivevalueandomitstartandendindexes.
Ifyoumustusestridewithstartorendindexes,considerusingoneassignmentto
strideandanothertoslice.
Clickheretoviewcodeimage
b=a[::2]#[‘a’,‘c’,‘e’,‘g’]
c=b[1:-1]#[‘c’,‘e’]
Slicingandthenstridingwillcreateanextrashallowcopyofthedata.Thefirstoperation
shouldtrytoreducethesizeoftheresultingslicebyasmuchaspossible.Ifyourprogram
can’taffordthetimeormemoryrequiredfortwosteps,considerusingtheitertools
built-inmodule’sislicemethod(seeItem46:“UseBuilt-inAlgorithmsandData
Structures”),whichdoesn’tpermitnegativevaluesforstart,end,orstride.
ThingstoRemember
Specifyingstart,end,andstrideinaslicecanbeextremelyconfusing.
Preferusingpositivestridevaluesinsliceswithoutstartorendindexes.
Avoidnegativestridevaluesifpossible.
Avoidusingstart,end,andstridetogetherinasingleslice.Ifyouneedall
threeparameters,considerdoingtwoassignments(onetoslice,anothertostride)or
usingislicefromtheitertoolsbuilt-inmodule.
Item7:UseListComprehensionsInsteadofmapand
filter
Pythonprovidescompactsyntaxforderivingonelistfromanother.Theseexpressionsare
calledlistcomprehensions.Forexample,sayyouwanttocomputethesquareofeach
numberinalist.Youcandothisbyprovidingtheexpressionforyourcomputationandthe
inputsequencetoloopover.
Clickheretoviewcodeimage
a=[1,2,3,4,5,6,7,8,9,10]
squares=[x**2forxina]
print(squares)
>>>
[1,4,9,16,25,36,49,64,81,100]
Unlessyou’reapplyingasingle-argumentfunction,listcomprehensionsareclearerthan
themapbuilt-infunctionforsimplecases.maprequirescreatingalambdafunctionfor
thecomputation,whichisvisuallynoisy.
Clickheretoviewcodeimage
squares=map(lambdax:x**2,a)
Unlikemap,listcomprehensionsletyoueasilyfilteritemsfromtheinputlist,removing
correspondingoutputsfromtheresult.Forexample,sayyouonlywanttocomputethe
squaresofthenumbersthataredivisibleby2.Here,Idothisbyaddingaconditional
expressiontothelistcomprehensionaftertheloop:
Clickheretoviewcodeimage
even_squares=[x**2forxinaifx%2==0]
print(even_squares)
>>>
[4,16,36,64,100]
Thefilterbuilt-infunctioncanbeusedalongwithmaptoachievethesameoutcome,
butitismuchhardertoread.
Clickheretoviewcodeimage
alt=map(lambdax:x**2,filter(lambdax:x%2==0,a))
asserteven_squares==list(alt)
Dictionariesandsetshavetheirownequivalentsoflistcomprehensions.Thesemakeit
easytocreatederivativedatastructureswhenwritingalgorithms.
Clickheretoviewcodeimage
chile_ranks={‘ghost’:1,‘habanero’:2,‘cayenne’:3}
rank_dict={rank:nameforname,rankinchile_ranks.items()}
chile_len_set={len(name)fornameinrank_dict.values()}
print(rank_dict)
print(chile_len_set)
>>>
{1:‘ghost’,2:‘habanero’,3:‘cayenne’}
{8,5,7}
ThingstoRemember
Listcomprehensionsareclearerthanthemapandfilterbuilt-infunctions
becausetheydon’trequireextralambdaexpressions.
Listcomprehensionsallowyoutoeasilyskipitemsfromtheinputlist,abehavior
mapdoesn’tsupportwithouthelpfromfilter.
Dictionariesandsetsalsosupportcomprehensionexpressions.
Item8:AvoidMoreThanTwoExpressionsinList
Comprehensions
Beyondbasicusage(seeItem7:“UseListComprehensionsInsteadofmapand
filter”),listcomprehensionsalsosupportmultiplelevelsoflooping.Forexample,say
youwanttosimplifyamatrix(alistcontainingotherlists)intooneflatlistofallcells.
Here,Idothiswithalistcomprehensionbyincludingtwoforexpressions.These
expressionsrunintheorderprovidedfromlefttoright.
Clickheretoviewcodeimage
matrix=[[1,2,3],[4,5,6],[7,8,9]]
flat=[xforrowinmatrixforxinrow]
print(flat)
>>>
[1,2,3,4,5,6,7,8,9]
Theexampleaboveissimple,readable,andareasonableusageofmultipleloops.Another
reasonableusageofmultipleloopsisreplicatingthetwo-leveldeeplayoutoftheinputlist.
Forexample,sayyouwanttosquarethevalueineachcellofatwo-dimensionalmatrix.
Thisexpressionisnoisierbecauseoftheextra[]characters,butit’sstilleasytoread.
Clickheretoviewcodeimage
squared=[[x**2forxinrow]forrowinmatrix]
print(squared)
>>>
[[1,4,9],[16,25,36],[49,64,81]]
Ifthisexpressionincludedanotherloop,thelistcomprehensionwouldgetsolongthat
you’dhavetosplititovermultiplelines.
Clickheretoviewcodeimage
my_lists=[
[[1,2,3],[4,5,6]],
#…
]
flat=[xforsublist1inmy_lists
forsublist2insublist1
forxinsublist2]
Atthispoint,themultilinecomprehensionisn’tmuchshorterthanthealternative.Here,I
producethesameresultusingnormalloopstatements.Theindentationofthisversion
makestheloopingclearerthanthelistcomprehension.
flat=[]
forsublist1inmy_lists:
forsublist2insublist1:
flat.extend(sublist2)
Listcomprehensionsalsosupportmultipleifconditions.Multipleconditionsatthesame
looplevelareanimplicitandexpression.Forexample,sayyouwanttofilteralistof
numberstoonlyevenvaluesgreaterthanfour.Thesetwolistcomprehensionsare
equivalent.
Clickheretoviewcodeimage
a=[1,2,3,4,5,6,7,8,9,10]
b=[xforxinaifx>4ifx%2==0]
c=[xforxinaifx>4andx%2==0]
Conditionscanbespecifiedateachlevelofloopingaftertheforexpression.For
example,sayyouwanttofilteramatrixsotheonlycellsremainingarethosedivisibleby
3inrowsthatsumto10orhigher.Expressingthiswithlistcomprehensionsisshort,but
extremelydifficulttoread.
Clickheretoviewcodeimage
matrix=[[1,2,3],[4,5,6],[7,8,9]]
filtered=[[xforxinrowifx%3==0]
forrowinmatrixifsum(row)>=10]
print(filtered)
>>>
[[6],[9]]
Thoughthisexampleisabitconvoluted,inpracticeyou’llseesituationsarisewheresuch
expressionsseemlikeagoodfit.Istronglyencourageyoutoavoidusinglist
comprehensionsthatlooklikethis.Theresultingcodeisverydifficultforothersto
comprehend.Whatyousaveinthenumberoflinesdoesn’toutweighthedifficultiesit
couldcauselater.
Theruleofthumbistoavoidusingmorethantwoexpressionsinalistcomprehension.
Thiscouldbetwoconditions,twoloops,oroneconditionandoneloop.Assoonasitgets
morecomplicatedthanthat,youshouldusenormalifandforstatementsandwritea
helperfunction(seeItem16:“ConsiderGeneratorsInsteadofReturningLists”).
ThingstoRemember
Listcomprehensionssupportmultiplelevelsofloopsandmultipleconditionsper
looplevel.
Listcomprehensionswithmorethantwoexpressionsareverydifficulttoreadand
shouldbeavoided.
Item9:ConsiderGeneratorExpressionsforLarge
Comprehensions
Theproblemwithlistcomprehensions(seeItem7:“UseListComprehensionsInsteadof
mapandfilter”)isthattheymaycreateawholenewlistcontainingoneitemforeach
valueintheinputsequence.Thisisfineforsmallinputs,butforlargeinputsthiscould
consumesignificantamountsofmemoryandcauseyourprogramtocrash.
Forexample,sayyouwanttoreadafileandreturnthenumberofcharactersoneachline.
Doingthiswithalistcomprehensionwouldrequireholdingthelengthofeverylineofthe
fileinmemory.Ifthefileisabsolutelyenormousorperhapsanever-endingnetwork
socket,listcomprehensionsareproblematic.Here,Iusealistcomprehensioninawaythat
canonlyhandlesmallinputvalues.
Clickheretoviewcodeimage
value=[len(x)forxinopen(‘/tmp/my_file.txt’)]
print(value)
>>>
[100,57,15,1,12,75,5,86,89,11]
Tosolvethis,Pythonprovidesgeneratorexpressions,ageneralizationoflist
comprehensionsandgenerators.Generatorexpressionsdon’tmaterializethewholeoutput
sequencewhenthey’rerun.Instead,generatorexpressionsevaluatetoaniteratorthat
yieldsoneitematatimefromtheexpression.
Ageneratorexpressioniscreatedbyputtinglist-comprehension-likesyntaxbetween()
characters.Here,Iuseageneratorexpressionthatisequivalenttothecodeabove.
However,thegeneratorexpressionimmediatelyevaluatestoaniteratoranddoesn’tmake
anyforwardprogress.
Clickheretoviewcodeimage
it=(len(x)forxinopen(‘/tmp/my_file.txt’))
print(it)
>>>
<generatorobject<genexpr>at0x101b81480>
Thereturnediteratorcanbeadvancedonestepatatimetoproducethenextoutputfrom
thegeneratorexpressionasneeded(usingthenextbuilt-infunction).Yourcodecan
consumeasmuchofthegeneratorexpressionasyouwantwithoutriskingablowupin
memoryusage.
print(next(it))
print(next(it))
>>>
100
57
Anotherpowerfuloutcomeofgeneratorexpressionsisthattheycanbecomposedtogether.
Here,Itaketheiteratorreturnedbythegeneratorexpressionaboveanduseitastheinput
foranothergeneratorexpression.
Clickheretoviewcodeimage
roots=((x,x**0.5)forxinit)
EachtimeIadvancethisiterator,itwillalsoadvancetheinterioriterator,creatinga
dominoeffectoflooping,evaluatingconditionalexpressions,andpassingaroundinputs
andoutputs.
print(next(roots))
>>>
(15,3.872983346207417)
ChaininggeneratorslikethisexecutesveryquicklyinPython.Whenyou’relookingfora
waytocomposefunctionalitythat’soperatingonalargestreamofinput,generator
expressionsarethebesttoolforthejob.Theonlygotchaisthattheiteratorsreturnedby
generatorexpressionsarestateful,soyoumustbecarefulnottousethemmorethanonce
(seeItem17:“BeDefensiveWhenIteratingOverArguments”).
ThingstoRemember
Listcomprehensionscancauseproblemsforlargeinputsbyusingtoomuch
memory.
Generatorexpressionsavoidmemoryissuesbyproducingoutputsoneatatimeas
aniterator.
Generatorexpressionscanbecomposedbypassingtheiteratorfromonegenerator
expressionintotheforsubexpressionofanother.
Generatorexpressionsexecuteveryquicklywhenchainedtogether.
Item10:PreferenumerateOverrange
Therangebuilt-infunctionisusefulforloopsthatiterateoverasetofintegers.
random_bits=0
foriinrange(64):
ifrandint(0,1):
random_bits|=1<<i
Whenyouhaveadatastructuretoiterateover,likealistofstrings,youcanloopdirectly
overthesequence.
Clickheretoviewcodeimage
flavor_list=[‘vanilla’,‘chocolate’,‘pecan’,‘strawberry’]
forflavorinflavor_list:
print(‘%sisdelicious’%flavor)
Often,you’llwanttoiterateoveralistandalsoknowtheindexofthecurrentiteminthe
list.Forexample,sayyouwanttoprinttherankingofyourfavoriteicecreamflavors.One
waytodoitisusingrange.
Clickheretoviewcodeimage
foriinrange(len(flavor_list)):
flavor=flavor_list[i]
print(‘%d:%s’%(i+1,flavor))
Thislooksclumsycomparedwiththeotherexamplesofiteratingoverflavor_listor
range.Youhavetogetthelengthofthelist.Youhavetoindexintothearray.It’sharder
toread.
Pythonprovidestheenumeratebuilt-infunctionforaddressingthissituation.
enumeratewrapsanyiteratorwithalazygenerator.Thisgeneratoryieldspairsofthe
loopindexandthenextvaluefromtheiterator.Theresultingcodeismuchclearer.
Clickheretoviewcodeimage
fori,flavorinenumerate(flavor_list):
print(‘%d:%s’%(i+1,flavor))
>>>
1:vanilla
2:chocolate
3:pecan
4:strawberry
Youcanmakethisevenshorterbyspecifyingthenumberfromwhichenumerate
shouldbegincounting(1inthiscase).
Clickheretoviewcodeimage
fori,flavorinenumerate(flavor_list,1):
print(‘%d:%s’%(i,flavor))
ThingstoRemember
enumerateprovidesconcisesyntaxforloopingoveraniteratorandgettingthe
indexofeachitemfromtheiteratorasyougo.
Preferenumerateinsteadofloopingoverarangeandindexingintoasequence.
Youcansupplyasecondparametertoenumeratetospecifythenumberfrom
whichtobegincounting(zeroisthedefault).
Item11:UseziptoProcessIteratorsinParallel
OfteninPythonyoufindyourselfwithmanylistsofrelatedobjects.Listcomprehensions
makeiteasytotakeasourcelistandgetaderivedlistbyapplyinganexpression(seeItem
7:“UseListComprehensionsInsteadofmapandfilter”).
Clickheretoviewcodeimage
names=[‘Cecilia’,‘Lise’,‘Marie’]
letters=[len(n)forninnames]
Theitemsinthederivedlistarerelatedtotheitemsinthesourcelistbytheirindexes.To
iterateoverbothlistsinparallel,youcaniterateoverthelengthofthenamessourcelist.
Clickheretoviewcodeimage
longest_name=None
max_letters=0
foriinrange(len(names)):
count=letters[i]
ifcount>max_letters:
longest_name=names[i]
max_letters=count
print(longest_name)
>>>
Cecilia
Theproblemisthatthiswholeloopstatementisvisuallynoisy.Theindexesintonames
andlettersmakethecodehardtoread.Indexingintothearraysbytheloopindexi
happenstwice.Usingenumerate(seeItem10:“PreferenumerateOverrange”)
improvesthisslightly,butit’sstillnotideal.
Clickheretoviewcodeimage
fori,nameinenumerate(names):
count=letters[i]
ifcount>max_letters:
longest_name=name
max_letters=count
Tomakethiscodeclearer,Pythonprovidesthezipbuilt-infunction.InPython3,zip
wrapstwoormoreiteratorswithalazygenerator.Thezipgeneratoryieldstuples
containingthenextvaluefromeachiterator.Theresultingcodeismuchcleanerthan
indexingintomultiplelists.
Clickheretoviewcodeimage
forname,countinzip(names,letters):
ifcount>max_letters:
longest_name=name
max_letters=count
Therearetwoproblemswiththezipbuilt-in.
ThefirstissueisthatinPython2zipisnotagenerator;itwillfullyexhaustthesupplied
iteratorsandreturnalistofallthetuplesitcreates.Thiscouldpotentiallyusealotof
memoryandcauseyourprogramtocrash.Ifyouwanttozipverylargeiteratorsin
Python2,youshoulduseizipfromtheitertoolsbuilt-inmodule(seeItem46:“Use
Built-inAlgorithmsandDataStructures”).
Thesecondissueisthatzipbehavesstrangelyiftheinputiteratorsareofdifferent
lengths.Forexample,sayyouaddanothernametothelistabovebutforgettoupdatethe
lettercounts.Runningziponthetwoinputlistswillhaveanunexpectedresult.
Clickheretoviewcodeimage
names.append(‘Rosalind’)
forname,countinzip(names,letters):
print(name)
>>>
Cecilia
Lise
Marie
Thenewitemfor'Rosalind'isn’tthere.Thisisjusthowzipworks.Itkeepsyielding
tuplesuntilawrappediteratorisexhausted.Thisapproachworksfinewhenyouknowthat
theiteratorsareofthesamelength,whichisoftenthecaseforderivedlistscreatedbylist
comprehensions.Inmanyothercases,thetruncatingbehaviorofzipissurprisingand
bad.Ifyouaren’tconfidentthatthelengthsofthelistsyouwanttozipareequal,
considerusingthezip_longestfunctionfromtheitertoolsbuilt-inmodule
instead(alsocalledizip_longestinPython2).
ThingstoRemember
Thezipbuilt-infunctioncanbeusedtoiterateovermultipleiteratorsinparallel.
InPython3,zipisalazygeneratorthatproducestuples.InPython2,zipreturns
thefullresultasalistoftuples.
ziptruncatesitsoutputsilentlyifyousupplyitwithiteratorsofdifferentlengths.
Thezip_longestfunctionfromtheitertoolsbuilt-inmoduleletsyouiterate
overmultipleiteratorsinparallelregardlessoftheirlengths(seeItem46:“Use
Built-inAlgorithmsandDataStructures”).
Item12:AvoidelseBlocksAfterforandwhileLoops
Pythonloopshaveanextrafeaturethatisnotavailableinmostotherprogramming
languages:youcanputanelseblockimmediatelyafteraloop’srepeatedinteriorblock.
foriinrange(3):
print(‘Loop%d’%i)
else:
print(‘Elseblock!’)
>>>
Loop0
Loop1
Loop2
Elseblock!
Surprisingly,theelseblockrunsimmediatelyaftertheloopfinishes.Whyistheclause
called“else”?Whynot“and”?Inanif/elsestatement,elsemeans,“Dothisifthe
blockbeforethisdoesn’thappen.”Inatry/exceptstatement,excepthasthesame
definition:“Dothisiftryingtheblockbeforethisfailed.”
Similarly,elsefromtry/except/elsefollowsthispattern(seeItem13:“Take
AdvantageofEachBlockintry/except/else/finally”)becauseitmeans,“Dothis
iftheblockbeforedidnotfail.”try/finallyisalsointuitivebecauseitmeans,
“Alwaysdowhatisfinalaftertryingtheblockbefore.”
Givenalloftheusesofelse,except,andfinallyinPython,anewprogrammer
mightassumethattheelsepartoffor/elsemeans,“Dothisiftheloopwasn’t
completed.”Inreality,itdoesexactlytheopposite.Usingabreakstatementinaloop
willactuallyskiptheelseblock.
foriinrange(3):
print(‘Loop%d’%i)
ifi==1:
break
else:
print(‘Elseblock!’)
>>>
Loop0
Loop1
Anothersurpriseisthattheelseblockwillrunimmediatelyifyouloopoveranempty
sequence.
Clickheretoviewcodeimage
forxin[]:
print(‘Neverruns’)
else:
print(‘ForElseblock!’)
>>>
ForElseblock!
Theelseblockalsorunswhenwhileloopsareinitiallyfalse.
Clickheretoviewcodeimage
whileFalse:
print(‘Neverruns’)
else:
print(‘WhileElseblock!’)
>>>
WhileElseblock!
Therationaleforthesebehaviorsisthatelseblocksafterloopsareusefulwhenyou’re
usingloopstosearchforsomething.Forexample,sayyouwanttodeterminewhethertwo
numbersarecoprime(theironlycommondivisoris1).Here,Iiteratethroughevery
possiblecommondivisorandtestthenumbers.Aftereveryoptionhasbeentried,theloop
ends.Theelseblockrunswhenthenumbersarecoprimebecausetheloopdoesn’t
encounterabreak.
Clickheretoviewcodeimage
a=4
b=9
foriinrange(2,min(a,b)+1):
print(‘Testing’,i)
ifa%i==0andb%i==0:
print(‘Notcoprime’)
break
else:
print(‘Coprime’)
>>>
Testing2
Testing3
Testing4
Coprime
Inpractice,youwouldn’twritethecodethisway.Instead,you’dwriteahelperfunctionto
dothecalculation.Suchahelperfunctioniswrittenintwocommonstyles.
Thefirstapproachistoreturnearlywhenyoufindtheconditionyou’relookingfor.You
returnthedefaultoutcomeifyoufallthroughtheloop.
Clickheretoviewcodeimage
defcoprime(a,b):
foriinrange(2,min(a,b)+1):
ifa%i==0andb%i==0:
returnFalse
returnTrue
Thesecondwayistohavearesultvariablethatindicateswhetheryou’vefoundwhat
you’relookingforintheloop.Youbreakoutoftheloopassoonasyoufindsomething.
Clickheretoviewcodeimage
defcoprime2(a,b):
is_coprime=True
foriinrange(2,min(a,b)+1):
ifa%i==0andb%i==0:
is_coprime=False
break
returnis_coprime
Bothoftheseapproachesaresomuchclearertoreadersofunfamiliarcode.The
expressivityyougainfromtheelseblockdoesn’toutweightheburdenyouputon
people(includingyourself)whowanttounderstandyourcodeinthefuture.Simple
constructslikeloopsshouldbeself-evidentinPython.Youshouldavoidusingelse
blocksafterloopsentirely.
ThingstoRemember
Pythonhasspecialsyntaxthatallowselseblockstoimmediatelyfollowforand
whileloopinteriorblocks.
Theelseblockafteralooponlyrunsiftheloopbodydidnotencounterabreak
statement.
Avoidusingelseblocksafterloopsbecausetheirbehaviorisn’tintuitiveandcan
beconfusing.
Item13:TakeAdvantageofEachBlockin
try/except/else/finally
Therearefourdistincttimesthatyoumaywanttotakeactionduringexceptionhandling
inPython.Thesearecapturedinthefunctionalityoftry,except,else,andfinally
blocks.Eachblockservesauniquepurposeinthecompoundstatement,andtheirvarious
combinationsareuseful(seeItem51:“DefineaRootExceptiontoInsulateCallers
fromAPIs”foranotherexample).
FinallyBlocks
Usetry/finallywhenyouwantexceptionstopropagateup,butyoualsowanttorun
cleanupcodeevenwhenexceptionsoccur.Onecommonusageoftry/finallyisfor
reliablyclosingfilehandles(seeItem43:“ConsidercontextlibandwithStatements
forReusabletry/finallyBehavior”foranotherapproach).
Clickheretoviewcodeimage
handle=open(‘/tmp/random_data.txt’)#MayraiseIOError
try:
data=handle.read()#MayraiseUnicodeDecodeError
finally:
handle.close()#Alwaysrunsaftertry:
Anyexceptionraisedbythereadmethodwillalwayspropagateuptothecallingcode,
yettheclosemethodofhandleisalsoguaranteedtoruninthefinallyblock.You
mustcallopenbeforethetryblockbecauseexceptionsthatoccurwhenopeningthefile
(likeIOErrorifthefiledoesnotexist)shouldskipthefinallyblock.
ElseBlocks
Usetry/except/elsetomakeitclearwhichexceptionswillbehandledbyyourcode
andwhichexceptionswillpropagateup.Whenthetryblockdoesn’traiseanexception,
theelseblockwillrun.Theelseblockhelpsyouminimizetheamountofcodeinthe
tryblockandimprovesreadability.Forexample,sayyouwanttoloadJSONdictionary
datafromastringandreturnthevalueofakeyitcontains.
Clickheretoviewcodeimage
defload_json_key(data,key):
try:
result_dict=json.loads(data)#MayraiseValueError
exceptValueErrorase:
raiseKeyErrorfrome
else:
returnresult_dict[key]#MayraiseKeyError
Ifthedataisn’tvalidJSON,thendecodingwithjson.loadswillraisea
ValueError.Theexceptioniscaughtbytheexceptblockandhandled.Ifdecodingis
successful,thenthekeylookupwilloccurintheelseblock.Ifthekeylookupraisesany
exceptions,theywillpropagateuptothecallerbecausetheyareoutsidethetryblock.
Theelseclauseensuresthatwhatfollowsthetry/exceptisvisuallydistinguished
fromtheexceptblock.Thismakestheexceptionpropagationbehaviorclear.
EverythingTogether
Usetry/except/else/finallywhenyouwanttodoitallinonecompound
statement.Forexample,sayyouwanttoreadadescriptionofworktodofromafile,
processit,andthenupdatethefileinplace.Here,thetryblockisusedtoreadthefile
andprocessit.Theexceptblockisusedtohandleexceptionsfromthetryblockthat
areexpected.Theelseblockisusedtoupdatethefileinplaceandtoallowrelated
exceptionstopropagateup.Thefinallyblockcleansupthefilehandle.
Clickheretoviewcodeimage
UNDEFINED=object()
defdivide_json(path):
handle=open(path,‘r+’)#MayraiseIOError
try:
data=handle.read()#MayraiseUnicodeDecodeError
op=json.loads(data)#MayraiseValueError
value=(
op[‘numerator’]/
op[‘denominator’])#MayraiseZeroDivisionError
exceptZeroDivisionErrorase:
returnUNDEFINED
else:
op[‘result’]=value
result=json.dumps(op)
handle.seek(0)
handle.write(result)#MayraiseIOError
returnvalue
finally:
handle.close()#Alwaysruns
Thislayoutisespeciallyusefulbecausealloftheblocksworktogetherinintuitiveways.
Forexample,ifanexceptiongetsraisedintheelseblockwhilerewritingtheresultdata,
thefinallyblockwillstillrunandclosethefilehandle.
ThingstoRemember
Thetry/finallycompoundstatementletsyouruncleanupcoderegardlessof
whetherexceptionswereraisedinthetryblock.
Theelseblockhelpsyouminimizetheamountofcodeintryblocksandvisually
distinguishthesuccesscasefromthetry/exceptblocks.
Anelseblockcanbeusedtoperformadditionalactionsafterasuccessfultry
blockbutbeforecommoncleanupinafinallyblock.
2.Functions
ThefirstorganizationaltoolprogrammersuseinPythonisthefunction.Asinother
programminglanguages,functionsenableyoutobreaklargeprogramsintosmaller,
simplerpieces.Theyimprovereadabilityandmakecodemoreapproachable.Theyallow
forreuseandrefactoring.
FunctionsinPythonhaveavarietyofextrafeaturesthatmaketheprogrammer’slife
easier.Somearesimilartocapabilitiesinotherprogramminglanguages,butmanyare
uniquetoPython.Theseextrascanmakeafunction’spurposemoreobvious.Theycan
eliminatenoiseandclarifytheintentionofcallers.Theycansignificantlyreducesubtle
bugsthataredifficulttofind.
Item14:PreferExceptionstoReturningNone
Whenwritingutilityfunctions,there’sadrawforPythonprogrammerstogivespecial
meaningtothereturnvalueofNone.Itseemstomakessenseinsomecases.Forexample,
sayyouwantahelperfunctionthatdividesonenumberbyanother.Inthecaseofdividing
byzero,returningNoneseemsnaturalbecausetheresultisundefined.
defdivide(a,b):
try:
returna/b
exceptZeroDivisionError:
returnNone
Codeusingthisfunctioncaninterpretthereturnvalueaccordingly.
result=divide(x,y)
ifresultisNone:
print(‘Invalidinputs’)
Whathappenswhenthenumeratoriszero?Thatwillcausethereturnvaluetoalsobezero
(ifthedenominatorisnon-zero).Thiscancauseproblemswhenyouevaluatetheresultin
aconditionlikeanifstatement.YoumayaccidentallylookforanyFalseequivalent
valuetoindicateerrorsinsteadofonlylookingforNone(seeItem4:“WriteHelper
FunctionsInsteadofComplexExpressions”forasimilarsituation).
Clickheretoviewcodeimage
x,y=0,5
result=divide(x,y)
ifnotresult:
print(‘Invalidinputs’)#Thisiswrong!
ThisisacommonmistakeinPythoncodewhenNonehasspecialmeaning.Thisiswhy
returningNonefromafunctioniserrorprone.Therearetwowaystoreducethechanceof
sucherrors.
Thefirstwayistosplitthereturnvalueintoatwo-tuple.Thefirstpartofthetuple
indicatesthattheoperationwasasuccessorfailure.Thesecondpartistheactualresult
thatwascomputed.
defdivide(a,b):
try:
returnTrue,a/b
exceptZeroDivisionError:
returnFalse,None
Callersofthisfunctionhavetounpackthetuple.Thatforcesthemtoconsiderthestatus
partofthetupleinsteadofjustlookingattheresultofdivision.
Clickheretoviewcodeimage
success,result=divide(x,y)
ifnotsuccess:
print(‘Invalidinputs’)
Theproblemisthatcallerscaneasilyignorethefirstpartofthetuple(usingthe
underscorevariablename,aPythonconventionforunusedvariables).Theresultingcode
doesn’tlookwrongatfirstglance.ThisisasbadasjustreturningNone.
_,result=divide(x,y)
ifnotresult:
print(‘Invalidinputs’)
Thesecond,betterwaytoreducetheseerrorsistoneverreturnNoneatall.Instead,raise
anexceptionuptothecallerandmakethemdealwithit.Here,Iturna
ZeroDivisionErrorintoaValueErrortoindicatetothecallerthattheinput
valuesarebad:
Clickheretoviewcodeimage
defdivide(a,b):
try:
returna/b
exceptZeroDivisionErrorase:
raiseValueError(‘Invalidinputs’)frome
Nowthecallershouldhandletheexceptionfortheinvalidinputcase(thisbehaviorshould
bedocumented;seeItem49:“WriteDocstringsforEveryFunction,Class,andModule”).
Thecallernolongerrequiresaconditiononthereturnvalueofthefunction.Ifthe
functiondidn’traiseanexception,thenthereturnvaluemustbegood.Theoutcomeof
exceptionhandlingisclear.
Clickheretoviewcodeimage
x,y=5,2
try:
result=divide(x,y)
exceptValueError:
print(‘Invalidinputs’)
else:
print(‘Resultis%.1f’%result)
>>>
Resultis2.5
ThingstoRemember
FunctionsthatreturnNonetoindicatespecialmeaningareerrorpronebecause
Noneandothervalues(e.g.,zero,theemptystring)allevaluatetoFalsein
conditionalexpressions.
RaiseexceptionstoindicatespecialsituationsinsteadofreturningNone.Expectthe
callingcodetohandleexceptionsproperlywhenthey’redocumented.
Item15:KnowHowClosuresInteractwithVariableScope
Sayyouwanttosortalistofnumbersbutprioritizeonegroupofnumberstocomefirst.
Thispatternisusefulwhenyou’rerenderingauserinterfaceandwantimportantmessages
orexceptionaleventstobedisplayedbeforeeverythingelse.
Acommonwaytodothisistopassahelperfunctionasthekeyargumenttoalist’s
sortmethod.Thehelper’sreturnvaluewillbeusedasthevalueforsortingeachitemin
thelist.Thehelpercancheckwhetherthegivenitemisintheimportantgroupandcan
varythesortkeyaccordingly.
Clickheretoviewcodeimage
defsort_priority(values,group):
defhelper(x):
ifxingroup:
return(0,x)
return(1,x)
values.sort(key=helper)
Thisfunctionworksforsimpleinputs.
Clickheretoviewcodeimage
numbers=[8,3,1,2,5,4,7,6]
group={2,3,5,7}
sort_priority(numbers,group)
print(numbers)
>>>
[2,3,5,7,1,4,6,8]
Therearethreereasonswhythisfunctionoperatesasexpected:
Pythonsupportsclosures:functionsthatrefertovariablesfromthescopeinwhich
theyweredefined.Thisiswhythehelperfunctionisabletoaccessthegroup
argumenttosort_priority.
Functionsarefirst-classobjectsinPython,meaningyoucanrefertothemdirectly,
assignthemtovariables,passthemasargumentstootherfunctions,comparethem
inexpressionsandifstatements,etc.Thisishowthesortmethodcanaccepta
closurefunctionasthekeyargument.
Pythonhasspecificrulesforcomparingtuples.Itfirstcomparesitemsinindexzero,
thenindexone,thenindextwo,andsoon.Thisiswhythereturnvaluefromthe
helperclosurecausesthesortordertohavetwodistinctgroups.
It’dbeniceifthisfunctionreturnedwhetherhigher-priorityitemswereseenatallsothe
userinterfacecodecanactaccordingly.Addingsuchbehaviorseemsstraightforward.
There’salreadyaclosurefunctionfordecidingwhichgroupeachnumberisin.Whynot
alsousetheclosuretoflipaflagwhenhigh-priorityitemsareseen?Thenthefunctioncan
returntheflagvalueafterit’sbeenmodifiedbytheclosure.
Here,Itrytodothatinaseeminglyobviousway:
Clickheretoviewcodeimage
defsort_priority2(numbers,group):
found=False
defhelper(x):
ifxingroup:
found=True#Seemssimple
return(0,x)
return(1,x)
numbers.sort(key=helper)
returnfound
Icanrunthefunctiononthesameinputsasbefore.
Clickheretoviewcodeimage
found=sort_priority2(numbers,group)
print(‘Found:’,found)
print(numbers)
>>>
Found:False
[2,3,5,7,1,4,6,8]
Thesortedresultsarecorrect,butthefoundresultiswrong.Itemsfromgroupwere
definitelyfoundinnumbers,butthefunctionreturnedFalse.Howcouldthishappen?
Whenyoureferenceavariableinanexpression,thePythoninterpreterwilltraversethe
scopetoresolvethereferenceinthisorder:
1.Thecurrentfunction’sscope
2.Anyenclosingscopes(likeothercontainingfunctions)
3.Thescopeofthemodulethatcontainsthecode(alsocalledtheglobalscope)
4.Thebuilt-inscope(thatcontainsfunctionslikelenandstr)
Ifnoneoftheseplaceshaveadefinedvariablewiththereferencedname,thena
NameErrorexceptionisraised.
Assigningavaluetoavariableworksdifferently.Ifthevariableisalreadydefinedinthe
currentscope,thenitwilljusttakeonthenewvalue.Ifthevariabledoesn’texistinthe
currentscope,thenPythontreatstheassignmentasavariabledefinition.Thescopeofthe
newlydefinedvariableisthefunctionthatcontainstheassignment.
Thisassignmentbehaviorexplainsthewrongreturnvalueofthesort_priority2
function.ThefoundvariableisassignedtoTrueinthehelperclosure.Theclosure’s
assignmentistreatedasanewvariabledefinitionwithinhelper,notasanassignment
withinsort_priority2.
Clickheretoviewcodeimage
defsort_priority2(numbers,group):
found=False#Scope:‘sort_priority2’
defhelper(x):
ifxingroup:
found=True#Scope:‘helper’—Bad!
return(0,x)
return(1,x)
numbers.sort(key=helper)
returnfound
Encounteringthisproblemissometimescalledthescopingbugbecauseitcanbeso
surprisingtonewbies.Butthisistheintendedresult.Thisbehaviorpreventslocal
variablesinafunctionfrompollutingthecontainingmodule.Otherwise,everyassignment
withinafunctionwouldputgarbageintotheglobalmodulescope.Notonlywouldthatbe
noise,buttheinterplayoftheresultingglobalvariablescouldcauseobscurebugs.
GettingDataOut
InPython3,thereisspecialsyntaxforgettingdataoutofaclosure.Thenonlocal
statementisusedtoindicatethatscopetraversalshouldhappenuponassignmentfora
specificvariablename.Theonlylimitisthatnonlocalwon’ttraverseuptothemodulelevelscope(toavoidpollutingglobals).
Here,Idefinethesamefunctionagainusingnonlocal:
Clickheretoviewcodeimage
defsort_priority3(numbers,group):
found=False
defhelper(x):
nonlocalfound
ifxingroup:
found=True
return(0,x)
return(1,x)
numbers.sort(key=helper)
returnfound
Thenonlocalstatementmakesitclearwhendataisbeingassignedoutofaclosureinto
anotherscope.It’scomplementarytotheglobalstatement,whichindicatesthata
variable’sassignmentshouldgodirectlyintothemodulescope.
However,muchliketheanti-patternofglobalvariables,I’dcautionagainstusing
nonlocalforanythingbeyondsimplefunctions.Thesideeffectsofnonlocalcanbe
hardtofollow.It’sespeciallyhardtounderstandinlongfunctionswherethenonlocal
statementsandassignmentstoassociatedvariablesarefarapart.
Whenyourusageofnonlocalstartsgettingcomplicated,it’sbettertowrapyourstate
inahelperclass.Here,Idefineaclassthatachievesthesameresultasthenonlocal
approach.It’salittlelonger,butismucheasiertoread(seeItem23:“AcceptFunctionsfor
SimpleInterfacesInsteadofClasses”fordetailsonthe__call__specialmethod).
Clickheretoviewcodeimage
classSorter(object):
def__init__(self,group):
self.group=group
self.found=False
def__call__(self,x):
ifxinself.group:
self.found=True
return(0,x)
return(1,x)
sorter=Sorter(group)
numbers.sort(key=sorter)
assertsorter.foundisTrue
ScopeinPython2
Unfortunately,Python2doesn’tsupportthenonlocalkeyword.Inordertogetsimilar
behavior,youneedtouseawork-aroundthattakesadvantageofPython’sscopingrules.
Thisapproachisn’tpretty,butit’sthecommonPythonidiom.
Clickheretoviewcodeimage
#Python2
defsort_priority(numbers,group):
found=[False]
defhelper(x):
ifxingroup:
found[0]=True
return(0,x)
return(1,x)
numbers.sort(key=helper)
returnfound[0]
Asexplainedabove,Pythonwilltraverseupthescopewherethefoundvariableis
referencedtoresolveitscurrentvalue.Thetrickisthatthevalueforfoundisalist,
whichismutable.Thismeansthatonceretrieved,theclosurecanmodifythestateof
foundtosenddataoutoftheinnerscope(withfound[0]=True).
Thisapproachalsoworkswhenthevariableusedtotraversethescopeisadictionary,a
set,oraninstanceofaclassyou’vedefined.
ThingstoRemember
Closurefunctionscanrefertovariablesfromanyofthescopesinwhichtheywere
defined.
Bydefault,closurescan’taffectenclosingscopesbyassigningvariables.
InPython3,usethenonlocalstatementtoindicatewhenaclosurecanmodifya
variableinitsenclosingscopes.
InPython2,useamutablevalue(likeasingle-itemlist)toworkaroundthelackof
thenonlocalstatement.
Avoidusingnonlocalstatementsforanythingbeyondsimplefunctions.
Item16:ConsiderGeneratorsInsteadofReturningLists
Thesimplestchoiceforfunctionsthatproduceasequenceofresultsistoreturnalistof
items.Forexample,sayyouwanttofindtheindexofeverywordinastring.Here,I
accumulateresultsinalistusingtheappendmethodandreturnitattheendofthe
function:
Clickheretoviewcodeimage
defindex_words(text):
result=[]
iftext:
result.append(0)
forindex,letterinenumerate(text):
ifletter==‘‘:
result.append(index+1)
returnresult
Thisworksasexpectedforsomesampleinput.
Clickheretoviewcodeimage
address=‘Fourscoreandsevenyearsago…’
result=index_words(address)
print(result[:3])
>>>
[0,5,11]
Therearetwoproblemswiththeindex_wordsfunction.
Thefirstproblemisthatthecodeisabitdenseandnoisy.Eachtimeanewresultisfound,
Icalltheappendmethod.Themethodcall’sbulk(result.append)deemphasizesthe
valuebeingaddedtothelist(index+1).Thereisonelineforcreatingtheresultlist
andanotherforreturningit.Whilethefunctionbodycontains~130characters(without
whitespace),only~75charactersareimportant.
Abetterwaytowritethisfunctionisusingagenerator.Generatorsarefunctionsthatuse
yieldexpressions.Whencalled,generatorfunctionsdonotactuallyrunbutinstead
immediatelyreturnaniterator.Witheachcalltothenextbuilt-infunction,theiterator
willadvancethegeneratortoitsnextyieldexpression.Eachvaluepassedtoyieldby
thegeneratorwillbereturnedbytheiteratortothecaller.
Here,Idefineageneratorfunctionthatproducesthesameresultsasbefore:
Clickheretoviewcodeimage
defindex_words_iter(text):
iftext:
yield0
forindex,letterinenumerate(text):
ifletter==‘‘:
yieldindex+1
It’ssignificantlyeasiertoreadbecauseallinteractionswiththeresultlisthavebeen
eliminated.Resultsarepassedtoyieldexpressionsinstead.Theiteratorreturnedbythe
generatorcallcaneasilybeconvertedtoalistbypassingittothelistbuilt-infunction
(seeItem9:“ConsiderGeneratorExpressionsforLargeComprehensions”forhowthis
works).
Clickheretoviewcodeimage
result=list(index_words_iter(address))
Thesecondproblemwithindex_wordsisthatitrequiresallresultstobestoredinthe
listbeforebeingreturned.Forhugeinputs,thiscancauseyourprogramtorunoutof
memoryandcrash.Incontrast,ageneratorversionofthisfunctioncaneasilybeadapted
totakeinputsofarbitrarylength.
Here,Idefineageneratorthatstreamsinputfromafileonelineatatimeandyields
outputsonewordatatime.Theworkingmemoryforthisfunctionisboundedtothe
maximumlengthofonelineofinput.
defindex_file(handle):
offset=0
forlineinhandle:
ifline:
yieldoffset
forletterinline:
offset+=1
ifletter==‘‘:
yieldoffset
Runningthegeneratorproducesthesameresults.
Clickheretoviewcodeimage
withopen(‘/tmp/address.txt’,‘r’)asf:
it=index_file(f)
results=islice(it,0,3)
print(list(results))
>>>
[0,5,11]
Theonlygotchaofdefininggeneratorslikethisisthatthecallersmustbeawarethatthe
iteratorsreturnedarestatefulandcan’tbereused(seeItem17:“BeDefensiveWhen
IteratingOverArguments”).
ThingstoRemember
Usinggeneratorscanbeclearerthanthealternativeofreturninglistsofaccumulated
results.
Theiteratorreturnedbyageneratorproducesthesetofvaluespassedtoyield
expressionswithinthegeneratorfunction’sbody.
Generatorscanproduceasequenceofoutputsforarbitrarilylargeinputsbecause
theirworkingmemorydoesn’tincludeallinputsandoutputs.
Item17:BeDefensiveWhenIteratingOverArguments
Whenafunctiontakesalistofobjectsasaparameter,it’softenimportanttoiterateover
thatlistmultipletimes.Forexample,sayyouwanttoanalyzetourismnumbersforthe
U.S.stateofTexas.Imaginethedatasetisthenumberofvisitorstoeachcity(inmillions
peryear).You’dliketofigureoutwhatpercentageofoveralltourismeachcityreceives.
Todothisyouneedanormalizationfunction.Itsumstheinputstodeterminethetotal
numberoftouristsperyear.Thenitdivideseachcity’sindividualvisitorcountbythetotal
tofindthatcity’scontributiontothewhole.
Clickheretoviewcodeimage
defnormalize(numbers):
total=sum(numbers)
result=[]
forvalueinnumbers:
percent=100*value/total
result.append(percent)
returnresult
Thisfunctionworkswhengivenalistofvisits.
Clickheretoviewcodeimage
visits=[15,35,80]
percentages=normalize(visits)
print(percentages)
>>>
[11.538461538461538,26.923076923076923,61.53846153846154]
Toscalethisup,IneedtoreadthedatafromafilethatcontainseverycityinallofTexas.
IdefineageneratortodothisbecausethenIcanreusethesamefunctionlaterwhenIwant
tocomputetourismnumbersforthewholeworld,amuchlargerdataset(seeItem16:
“ConsiderGeneratorsInsteadofReturningLists”).
Clickheretoviewcodeimage
defread_visits(data_path):
withopen(data_path)asf:
forlineinf:
yieldint(line)
Surprisingly,callingnormalizeonthegenerator’sreturnvalueproducesnoresults.
Clickheretoviewcodeimage
it=read_visits(‘/tmp/my_numbers.txt’)
percentages=normalize(it)
print(percentages)
>>>
[]
Thecauseofthisbehavioristhataniteratoronlyproducesitsresultsasingletime.Ifyou
iterateoveraniteratororgeneratorthathasalreadyraisedaStopIterationexception,
youwon’tgetanyresultsthesecondtimearound.
Clickheretoviewcodeimage
it=read_visits(‘/tmp/my_numbers.txt’)
print(list(it))
print(list(it))#Alreadyexhausted
>>>
[15,35,80]
[]
What’sconfusingisthatyoualsowon’tgetanyerrorswhenyouiterateoveranalready
exhaustediterator.forloops,thelistconstructor,andmanyotherfunctionsthroughout
thePythonstandardlibraryexpecttheStopIterationexceptiontoberaisedduring
normaloperation.Thesefunctionscan’ttellthedifferencebetweenaniteratorthathasno
outputandaniteratorthathadoutputandisnowexhausted.
Tosolvethisproblem,youcanexplicitlyexhaustaninputiteratorandkeepacopyofits
entirecontentsinalist.Youcantheniterateoverthelistversionofthedataasmanytimes
asyouneedto.Here’sthesamefunctionasbefore,butitdefensivelycopiestheinput
iterator:
Clickheretoviewcodeimage
defnormalize_copy(numbers):
numbers=list(numbers)#Copytheiterator
total=sum(numbers)
result=[]
forvalueinnumbers:
percent=100*value/total
result.append(percent)
returnresult
Nowthefunctionworkscorrectlyonagenerator’sreturnvalue.
Clickheretoviewcodeimage
it=read_visits(‘/tmp/my_numbers.txt’)
percentages=normalize_copy(it)
print(percentages)
>>>
[11.538461538461538,26.923076923076923,61.53846153846154]
Theproblemwiththisapproachisthecopyoftheinputiterator’scontentscouldbelarge.
Copyingtheiteratorcouldcauseyourprogramtorunoutofmemoryandcrash.Oneway
aroundthisistoacceptafunctionthatreturnsanewiteratoreachtimeit’scalled.
Clickheretoviewcodeimage
defnormalize_func(get_iter):
total=sum(get_iter())#Newiterator
result=[]
forvalueinget_iter():#Newiterator
percent=100*value/total
result.append(percent)
returnresult
Tousenormalize_func,youcanpassinalambdaexpressionthatcallsthegenerator
andproducesanewiteratoreachtime.
Clickheretoviewcodeimage
percentages=normalize_func(lambda:read_visits(path))
Thoughitworks,havingtopassalambdafunctionlikethisisclumsy.Thebetterwayto
achievethesameresultistoprovideanewcontainerclassthatimplementstheiterator
protocol.
TheiteratorprotocolishowPythonforloopsandrelatedexpressionstraversethe
contentsofacontainertype.WhenPythonseesastatementlikeforxinfooitwill
actuallycalliter(foo).Theiterbuilt-infunctioncallsthefoo.__iter__special
methodinturn.The__iter__methodmustreturnaniteratorobject(whichitself
implementsthe__next__specialmethod).Thentheforlooprepeatedlycallsthe
nextbuilt-infunctionontheiteratorobjectuntilit’sexhausted(andraisesa
StopIterationexception).
Itsoundscomplicated,butpracticallyspeakingyoucanachieveallofthisbehaviorfor
yourclassesbyimplementingthe__iter__methodasagenerator.Here,Idefinean
iterablecontainerclassthatreadsthefilescontainingtourismdata:
Clickheretoviewcodeimage
classReadVisits(object):
def__init__(self,data_path):
self.data_path=data_path
def__iter__(self):
withopen(self.data_path)asf:
forlineinf:
yieldint(line)
Thisnewcontainertypeworkscorrectlywhenpassedtotheoriginalfunctionwithoutany
modifications.
Clickheretoviewcodeimage
visits=ReadVisits(path)
percentages=normalize(visits)
print(percentages)
>>>
[11.538461538461538,26.923076923076923,61.53846153846154]
Thisworksbecausethesummethodinnormalizewillcall
ReadVisits.__iter__toallocateanewiteratorobject.Theforlooptonormalize
thenumberswillalsocall__iter__toallocateaseconditeratorobject.Eachofthose
iteratorswillbeadvancedandexhaustedindependently,ensuringthateachunique
iterationseesalloftheinputdatavalues.Theonlydownsideofthisapproachisthatit
readstheinputdatamultipletimes.
NowthatyouknowhowcontainerslikeReadVisitswork,youcanwriteyour
functionstoensurethatparametersaren’tjustiterators.Theprotocolstatesthatwhenan
iteratorispassedtotheiterbuilt-infunction,iterwillreturntheiteratoritself.In
contrast,whenacontainertypeispassedtoiter,anewiteratorobjectwillbereturned
eachtime.Thus,youcantestaninputvalueforthisbehaviorandraiseaTypeErrorto
rejectiterators.
Clickheretoviewcodeimage
defnormalize_defensive(numbers):
ifiter(numbers)isiter(numbers):#Aniterator—bad!
raiseTypeError(‘Mustsupplyacontainer’)
total=sum(numbers)
result=[]
forvalueinnumbers:
percent=100*value/total
result.append(percent)
returnresult
Thisisidealifyoudon’twanttocopythefullinputiteratorlikenormalize_copy
above,butyoualsoneedtoiterateovertheinputdatamultipletimes.Thisfunctionworks
asexpectedforlistandReadVisitsinputsbecausetheyarecontainers.Itwillwork
foranytypeofcontainerthatfollowstheiteratorprotocol.
Clickheretoviewcodeimage
visits=[15,35,80]
normalize_defensive(visits)#Noerror
visits=ReadVisits(path)
normalize_defensive(visits)#Noerror
Thefunctionwillraiseanexceptioniftheinputisiterablebutnotacontainer.
Clickheretoviewcodeimage
it=iter(visits)
normalize_defensive(it)
>>>
TypeError:Mustsupplyacontainer
ThingstoRemember
Bewareoffunctionsthatiterateoverinputargumentsmultipletimes.Ifthese
argumentsareiterators,youmayseestrangebehaviorandmissingvalues.
Python’siteratorprotocoldefineshowcontainersanditeratorsinteractwiththe
iterandnextbuilt-infunctions,forloops,andrelatedexpressions.
Youcaneasilydefineyourowniterablecontainertypebyimplementingthe
__iter__methodasagenerator.
Youcandetectthatavalueisaniterator(insteadofacontainer)ifcallingiteron
ittwiceproducesthesameresult,whichcanthenbeprogressedwiththenextbuiltinfunction.
Item18:ReduceVisualNoisewithVariablePositional
Arguments
Acceptingoptionalpositionalarguments(oftencalledstarargsinreferencetothe
conventionalnamefortheparameter,*args)canmakeafunctioncallmoreclearand
removevisualnoise.
Forexample,sayyouwanttologsomedebuginformation.Withafixednumberof
arguments,youwouldneedafunctionthattakesamessageandalistofvalues.
Clickheretoviewcodeimage
deflog(message,values):
ifnotvalues:
print(message)
else:
values_str=‘,‘.join(str(x)forxinvalues)
print(‘%s:%s’%(message,values_str))
log(‘Mynumbersare’,[1,2])
log(‘Hithere’,[])
>>>
Mynumbersare:1,2
Hithere
Havingtopassanemptylistwhenyouhavenovaluestologiscumbersomeandnoisy.
It’dbebettertoleaveoutthesecondargumententirely.YoucandothisinPythonby
prefixingthelastpositionalparameternamewith*.Thefirstparameterforthelog
messageisrequired,whereasanynumberofsubsequentpositionalargumentsareoptional.
Thefunctionbodydoesn’tneedtochange,onlythecallersdo.
Clickheretoviewcodeimage
deflog(message,*values):#Theonlydifference
ifnotvalues:
print(message)
else:
values_str=‘,‘.join(str(x)forxinvalues)
print(‘%s:%s’%(message,values_str))
log(‘Mynumbersare’,1,2)
log(‘Hithere’)#Muchbetter
>>>
Mynumbersare:1,2
Hithere
Ifyoualreadyhavealistandwanttocallavariableargumentfunctionlikelog,youcan
dothisbyusingthe*operator.ThisinstructsPythontopassitemsfromthesequenceas
positionalarguments.
Clickheretoviewcodeimage
favorites=[7,33,99]
log(‘Favoritecolors’,*favorites)
>>>
Favoritecolors:7,33,99
Therearetwoproblemswithacceptingavariablenumberofpositionalarguments.
Thefirstissueisthatthevariableargumentsarealwaysturnedintoatuplebeforetheyare
passedtoyourfunction.Thismeansthatifthecallerofyourfunctionusesthe*operator
onagenerator,itwillbeiterateduntilit’sexhausted.Theresultingtuplewillincludeevery
valuefromthegenerator,whichcouldconsumealotofmemoryandcauseyourprogram
tocrash.
Clickheretoviewcodeimage
defmy_generator():
foriinrange(10):
yieldi
defmy_func(*args):
print(args)
it=my_generator()
my_func(*it)
>>>
(0,1,2,3,4,5,6,7,8,9)
Functionsthataccept*argsarebestforsituationswhereyouknowthenumberofinputs
intheargumentlistwillbereasonablysmall.It’sidealforfunctioncallsthatpassmany
literalsorvariablenamestogether.It’sprimarilyfortheconvenienceoftheprogrammer
andthereadabilityofthecode.
Thesecondissuewith*argsisthatyoucan’taddnewpositionalargumentstoyour
functioninthefuturewithoutmigratingeverycaller.Ifyoutrytoaddapositional
argumentinthefrontoftheargumentlist,existingcallerswillsubtlybreakiftheyaren’t
updated.
Clickheretoviewcodeimage
deflog(sequence,message,*values):
ifnotvalues:
print(‘%s:%s’%(sequence,message))
else:
values_str=‘,‘.join(str(x)forxinvalues)
print(‘%s:%s:%s’%(sequence,message,values_str))
log(1,‘Favorites’,7,33)#NewusageisOK
log(‘Favoritenumbers’,7,33)#Oldusagebreaks
>>>
1:Favorites:7,33
Favoritenumbers:7:33
Theproblemhereisthatthesecondcalltologused7asthemessageparameter
becauseasequenceargumentwasn’tgiven.Bugslikethisarehardtotrackdown
becausethecodestillrunswithoutraisinganyexceptions.Toavoidthispossibility
entirely,youshouldusekeyword-onlyargumentswhenyouwanttoextendfunctionsthat
accept*args(seeItem21:“EnforceClaritywithKeyword-OnlyArguments”).
ThingstoRemember
Functionscanacceptavariablenumberofpositionalargumentsbyusing*argsin
thedefstatement.
Youcanusetheitemsfromasequenceasthepositionalargumentsforafunction
withthe*operator.
Usingthe*operatorwithageneratormaycauseyourprogramtorunoutof
memoryandcrash.
Addingnewpositionalparameterstofunctionsthataccept*argscanintroduce
hard-to-findbugs.
Item19:ProvideOptionalBehaviorwithKeyword
Arguments
Likemostotherprogramminglanguages,callingafunctioninPythonallowsforpassing
argumentsbyposition.
Clickheretoviewcodeimage
defremainder(number,divisor):
returnnumber%divisor
assertremainder(20,7)==6
AllpositionalargumentstoPythonfunctionscanalsobepassedbykeyword,wherethe
nameoftheargumentisusedinanassignmentwithintheparenthesesofafunctioncall.
Thekeywordargumentscanbepassedinanyorderaslongasalloftherequiredpositional
argumentsarespecified.Youcanmixandmatchkeywordandpositionalarguments.These
callsareequivalent:
Clickheretoviewcodeimage
remainder(20,7)
remainder(20,divisor=7)
remainder(number=20,divisor=7)
remainder(divisor=7,number=20)
Positionalargumentsmustbespecifiedbeforekeywordarguments.
Clickheretoviewcodeimage
remainder(number=20,7)
>>>
SyntaxError:non-keywordargafterkeywordarg
Eachargumentcanonlybespecifiedonce.
Clickheretoviewcodeimage
remainder(20,number=7)
>>>
TypeError:remainder()gotmultiplevaluesforargument‘number’
Theflexibilityofkeywordargumentsprovidesthreesignificantbenefits.
Thefirstadvantageisthatkeywordargumentsmakethefunctioncallclearertonew
readersofthecode.Withthecallremainder(20,7),it’snotevidentwhichargument
isthenumberandwhichisthedivisorwithoutlookingattheimplementationofthe
remaindermethod.Inthecallwithkeywordarguments,number=20and
divisor=7makeitimmediatelyobviouswhichparameterisbeingusedforeach
purpose.
Thesecondimpactofkeywordargumentsisthattheycanhavedefaultvaluesspecifiedin
thefunctiondefinition.Thisallowsafunctiontoprovideadditionalcapabilitieswhenyou
needthembutletsyouacceptthedefaultbehaviormostofthetime.Thiscaneliminate
repetitivecodeandreducenoise.
Forexample,sayyouwanttocomputetherateoffluidflowingintoavat.Ifthevatisalso
onascale,thenyoucouldusethedifferencebetweentwoweightmeasurementsattwo
differenttimestodeterminetheflowrate.
Clickheretoviewcodeimage
defflow_rate(weight_diff,time_diff):
returnweight_diff/time_diff
weight_diff=0.5
time_diff=3
flow=flow_rate(weight_diff,time_diff)
print(‘%.3fkgpersecond’%flow)
>>>
0.167kgpersecond
Inthetypicalcase,it’susefultoknowtheflowrateinkilogramspersecond.Othertimes,
it’dbehelpfultousethelastsensormeasurementstoapproximatelargertimescales,like
hoursordays.Youcanprovidethisbehaviorinthesamefunctionbyaddinganargument
forthetimeperiodscalingfactor.
Clickheretoviewcodeimage
defflow_rate(weight_diff,time_diff,period):
return(weight_diff/time_diff)*period
Theproblemisthatnowyouneedtospecifytheperiodargumenteverytimeyoucall
thefunction,eveninthecommoncaseofflowratepersecond(wheretheperiodis1).
Clickheretoviewcodeimage
flow_per_second=flow_rate(weight_diff,time_diff,1)
Tomakethislessnoisy,Icangivetheperiodargumentadefaultvalue.
Clickheretoviewcodeimage
defflow_rate(weight_diff,time_diff,period=1):
return(weight_diff/time_diff)*period
Theperiodargumentisnowoptional.
Clickheretoviewcodeimage
flow_per_second=flow_rate(weight_diff,time_diff)
flow_per_hour=flow_rate(weight_diff,time_diff,period=3600)
Thisworkswellforsimpledefaultvalues(itgetstrickyforcomplexdefaultvalues—see
Item20:“UseNoneandDocstringstoSpecifyDynamicDefaultArguments”).
Thethirdreasontousekeywordargumentsisthattheyprovideapowerfulwaytoextenda
function’sparameterswhileremainingbackwardscompatiblewithexistingcallers.This
letsyouprovideadditionalfunctionalitywithouthavingtomigratealotofcode,reducing
thechanceofintroducingbugs.
Forexample,sayyouwanttoextendtheflow_ratefunctionabovetocalculateflow
ratesinweightunitsbesideskilograms.Youcandothisbyaddinganewoptional
parameterthatprovidesaconversionratetoyourpreferredmeasurementunits.
Clickheretoviewcodeimage
defflow_rate(weight_diff,time_diff,
period=1,units_per_kg=1):
return((weight_diff/units_per_kg)/time_diff)*period
Thedefaultargumentvalueforunits_per_kgis1,whichmakesthereturnedweight
unitsremainaskilograms.Thismeansthatallexistingcallerswillseenochangein
behavior.Newcallerstoflow_ratecanspecifythenewkeywordargumenttoseethe
newbehavior.
Clickheretoviewcodeimage
pounds_per_hour=flow_rate(weight_diff,time_diff,
period=3600,units_per_kg=2.2)
Theonlyproblemwiththisapproachisthatoptionalkeywordargumentslikeperiod
andunits_per_kgmaystillbespecifiedaspositionalarguments.
Clickheretoviewcodeimage
pounds_per_hour=flow_rate(weight_diff,time_diff,3600,2.2)
Supplyingoptionalargumentspositionallycanbeconfusingbecauseitisn’tclearwhatthe
values3600and2.2correspondto.Thebestpracticeistoalwaysspecifyoptional
argumentsusingthekeywordnamesandneverpassthemaspositionalarguments.
Note
Backwardscompatibilityusingoptionalkeywordargumentslikethisiscrucialfor
functionsthataccept*args(seeItem18:“ReduceVisualNoisewithVariable
PositionalArguments”).Butanevenbetterpracticeistousekeyword-only
arguments(seeItem21:“EnforceClaritywithKeyword-OnlyArguments”).
ThingstoRemember
Functionargumentscanbespecifiedbypositionorbykeyword.
Keywordsmakeitclearwhatthepurposeofeachargumentiswhenitwouldbe
confusingwithonlypositionalarguments.
Keywordargumentswithdefaultvaluesmakeiteasytoaddnewbehaviorstoa
function,especiallywhenthefunctionhasexistingcallers.
Optionalkeywordargumentsshouldalwaysbepassedbykeywordinsteadofby
position.
Item20:UseNoneandDocstringstoSpecifyDynamic
DefaultArguments
Sometimesyouneedtouseanon-statictypeasakeywordargument’sdefaultvalue.For
example,sayyouwanttoprintloggingmessagesthataremarkedwiththetimeofthe
loggedevent.Inthedefaultcase,youwantthemessagetoincludethetimewhenthe
functionwascalled.Youmighttrythefollowingapproach,assumingthedefault
argumentsarereevaluatedeachtimethefunctioniscalled.
Clickheretoviewcodeimage
deflog(message,when=datetime.now()):
print(‘%s:%s’%(when,message))
log(‘Hithere!’)
sleep(0.1)
log(‘Hiagain!’)
>>>
2014-11-1521:10:10.371432:Hithere!
2014-11-1521:10:10.371432:Hiagain!
Thetimestampsarethesamebecausedatetime.nowisonlyexecutedasingletime:
whenthefunctionisdefined.Defaultargumentvaluesareevaluatedonlyoncepermodule
load,whichusuallyhappenswhenaprogramstartsup.Afterthemodulecontainingthis
codeisloaded,thedatetime.nowdefaultargumentwillneverbeevaluatedagain.
TheconventionforachievingthedesiredresultinPythonistoprovideadefaultvalueof
Noneandtodocumenttheactualbehaviorinthedocstring(seeItem49:“Write
DocstringsforEveryFunction,Class,andModule”).Whenyourcodeseesanargument
valueofNone,youallocatethedefaultvalueaccordingly.
Clickheretoviewcodeimage
deflog(message,when=None):
“““Logamessagewithatimestamp.
Args:
message:Messagetoprint.
when:datetimeofwhenthemessageoccurred.
Defaultstothepresenttime.
”””
when=datetime.now()ifwhenisNoneelsewhen
print(‘%s:%s’%(when,message))
Nowthetimestampswillbedifferent.
Clickheretoviewcodeimage
log(‘Hithere!’)
sleep(0.1)
log(‘Hiagain!’)
>>>
2014-11-1521:10:10.472303:Hithere!
2014-11-1521:10:10.573395:Hiagain!
UsingNonefordefaultargumentvaluesisespeciallyimportantwhentheargumentsare
mutable.Forexample,sayyouwanttoloadavalueencodedasJSONdata.Ifdecoding
thedatafails,youwantanemptydictionarytobereturnedbydefault.Youmighttrythis
approach.
Clickheretoviewcodeimage
defdecode(data,default={}):
try:
returnjson.loads(data)
exceptValueError:
returndefault
Theproblemhereisthesameasthedatetime.nowexampleabove.Thedictionary
specifiedfordefaultwillbesharedbyallcallstodecodebecausedefaultargument
valuesareonlyevaluatedonce(atmoduleloadtime).Thiscancauseextremelysurprising
behavior.
foo=decode(‘baddata’)
foo[‘stuff’]=5
bar=decode(‘alsobad’)
bar[‘meep’]=1
print(‘Foo:’,foo)
print(‘Bar:’,bar)
>>>
Foo:{‘stuff’:5,‘meep’:1}
Bar:{‘stuff’:5,‘meep’:1}
You’dexpecttwodifferentdictionaries,eachwithasinglekeyandvalue.Butmodifying
oneseemstoalsomodifytheother.Theculpritisthatfooandbararebothequaltothe
defaultparameter.Theyarethesamedictionaryobject.
assertfooisbar
ThefixistosetthekeywordargumentdefaultvaluetoNoneandthendocumentthe
behaviorinthefunction’sdocstring.
Clickheretoviewcodeimage
defdecode(data,default=None):
“““LoadJSONdatafromastring.
Args:
data:JSONdatatodecode.
default:Valuetoreturnifdecodingfails.
Defaultstoanemptydictionary.
”””
ifdefaultisNone:
default={}
try:
returnjson.loads(data)
exceptValueError:
returndefault
Now,runningthesametestcodeasbeforeproducestheexpectedresult.
foo=decode(‘baddata’)
foo[‘stuff’]=5
bar=decode(‘alsobad’)
bar[‘meep’]=1
print(‘Foo:’,foo)
print(‘Bar:’,bar)
>>>
Foo:{‘stuff’:5}
Bar:{‘meep’:1}
ThingstoRemember
Defaultargumentsareonlyevaluatedonce:duringfunctiondefinitionatmodule
loadtime.Thiscancauseoddbehaviorsfordynamicvalues(like{}or[]).
UseNoneasthedefaultvalueforkeywordargumentsthathaveadynamicvalue.
Documenttheactualdefaultbehaviorinthefunction’sdocstring.
Item21:EnforceClaritywithKeyword-OnlyArguments
PassingargumentsbykeywordisapowerfulfeatureofPythonfunctions(seeItem19:
“ProvideOptionalBehaviorwithKeywordArguments”).Theflexibilityofkeyword
argumentsenablesyoutowritecodethatwillbeclearforyourusecases.
Forexample,sayyouwanttodivideonenumberbyanotherbutbeverycarefulabout
specialcases.SometimesyouwanttoignoreZeroDivisionErrorexceptionsand
returninfinityinstead.Othertimes,youwanttoignoreOverflowErrorexceptionsand
returnzeroinstead.
Clickheretoviewcodeimage
defsafe_division(number,divisor,ignore_overflow,
ignore_zero_division):
try:
returnnumber/divisor
exceptOverflowError:
ifignore_overflow:
return0
else:
raise
exceptZeroDivisionError:
ifignore_zero_division:
returnfloat(‘inf’)
else:
raise
Usingthisfunctionisstraightforward.Thiscallwillignorethefloatoverflowfrom
divisionandwillreturnzero.
Clickheretoviewcodeimage
result=safe_division(1,10**500,True,False)
print(result)
>>>
0.0
Thiscallwillignoretheerrorfromdividingbyzeroandwillreturninfinity.
Clickheretoviewcodeimage
result=safe_division(1,0,False,True)
print(result)
>>>
inf
Theproblemisthatit’seasytoconfusethepositionofthetwoBooleanargumentsthat
controltheexception-ignoringbehavior.Thiscaneasilycausebugsthatarehardtotrack
down.Onewaytoimprovethereadabilityofthiscodeistousekeywordarguments.By
default,thefunctioncanbeoverlycautiousandcanalwaysre-raiseexceptions.
Clickheretoviewcodeimage
defsafe_division_b(number,divisor,
ignore_overflow=False,
ignore_zero_division=False):
#…
Thencallerscanusekeywordargumentstospecifywhichoftheignoreflagstheywantto
flipforspecificoperations,overridingthedefaultbehavior.
Clickheretoviewcodeimage
safe_division_b(1,10**500,ignore_overflow=True)
safe_division_b(1,0,ignore_zero_division=True)
Theproblemis,sincethesekeywordargumentsareoptionalbehavior,there’snothing
forcingcallersofyourfunctionstousekeywordargumentsforclarity.Evenwiththenew
definitionofsafe_division_b,youcanstillcallittheoldwaywithpositional
arguments.
Clickheretoviewcodeimage
safe_division_b(1,10**500,True,False)
Withcomplexfunctionslikethis,it’sbettertorequirethatcallersareclearabouttheir
intentions.InPython3,youcandemandclaritybydefiningyourfunctionswithkeywordonlyarguments.Theseargumentscanonlybesuppliedbykeyword,neverbyposition.
Here,Iredefinethesafe_divisionfunctiontoacceptkeyword-onlyarguments.The
*symbolintheargumentlistindicatestheendofpositionalargumentsandthebeginning
ofkeyword-onlyarguments.
Clickheretoviewcodeimage
defsafe_division_c(number,divisor,*,
ignore_overflow=False,
ignore_zero_division=False):
#…
Now,callingthefunctionwithpositionalargumentsforthekeywordargumentswon’t
work.
Clickheretoviewcodeimage
safe_division_c(1,10**500,True,False)
>>>
TypeError:safe_division_c()takes2positionalargumentsbut4weregiven
Keywordargumentsandtheirdefaultvaluesworkasexpected.
Clickheretoviewcodeimage
safe_division_c(1,0,ignore_zero_division=True)#OK
try:
safe_division_c(1,0)
exceptZeroDivisionError:
pass#Expected
Keyword-OnlyArgumentsinPython2
Unfortunately,Python2doesn’thaveexplicitsyntaxforspecifyingkeyword-only
argumentslikePython3.ButyoucanachievethesamebehaviorofraisingTypeErrors
forinvalidfunctioncallsbyusingthe**operatorinargumentlists.The**operatoris
similartothe*operator(seeItem18:“ReduceVisualNoisewithVariablePositional
Arguments”),exceptthatinsteadofacceptingavariablenumberofpositionalarguments,
itacceptsanynumberofkeywordarguments,evenwhenthey’renotdefined.
Clickheretoviewcodeimage
#Python2
defprint_args(*args,**kwargs):
print‘Positional:’,args
print‘Keyword:’,kwargs
print_args(1,2,foo=‘bar’,stuff=‘meep’)
>>>
Positional:(1,2)
Keyword:{‘foo’:‘bar’,‘stuff’:‘meep’}
Tomakesafe_divisiontakekeyword-onlyargumentsinPython2,youhavethe
functionaccept**kwargs.Thenyoupopkeywordargumentsthatyouexpectoutofthe
kwargsdictionary,usingthepopmethod’ssecondargumenttospecifythedefaultvalue
whenthekeyismissing.Finally,youmakesuretherearenomorekeywordargumentsleft
inkwargstopreventcallersfromsupplyingargumentsthatareinvalid.
Clickheretoviewcodeimage
#Python2
defsafe_division_d(number,divisor,**kwargs):
ignore_overflow=kwargs.pop(‘ignore_overflow’,False)
ignore_zero_div=kwargs.pop(‘ignore_zero_division’,False)
ifkwargs:
raiseTypeError(‘Unexpected**kwargs:%r’%kwargs)
#…
Now,youcancallthefunctionwithorwithoutkeywordarguments.
Clickheretoviewcodeimage
safe_division_d(1,10)
safe_division_d(1,0,ignore_zero_division=True)
safe_division_d(1,10**500,ignore_overflow=True)
Tryingtopasskeyword-onlyargumentsbypositionwon’twork,justlikeinPython3.
Clickheretoviewcodeimage
safe_division_d(1,0,False,True)
>>>
TypeError:safe_division_d()takes2positionalargumentsbut4weregiven
Tryingtopassunexpectedkeywordargumentsalsowon’twork.
Clickheretoviewcodeimage
safe_division_d(0,0,unexpected=True)
>>>
TypeError:Unexpected**kwargs:{‘unexpected’:True}
ThingstoRemember
Keywordargumentsmaketheintentionofafunctioncallmoreclear.
Usekeyword-onlyargumentstoforcecallerstosupplykeywordargumentsfor
potentiallyconfusingfunctions,especiallythosethatacceptmultipleBooleanflags.
Python3supportsexplicitsyntaxforkeyword-onlyargumentsinfunctions.
Python2canemulatekeyword-onlyargumentsforfunctionsbyusing**kwargs
andmanuallyraisingTypeErrorexceptions.
3.ClassesandInheritance
Asanobject-orientedprogramminglanguage,Pythonsupportsafullrangeoffeatures,
suchasinheritance,polymorphism,andencapsulation.GettingthingsdoneinPython
oftenrequireswritingnewclassesanddefininghowtheyinteractthroughtheirinterfaces
andhierarchies.
Python’sclassesandinheritancemakeiteasytoexpressyourprogram’sintended
behaviorswithobjects.Theyallowyoutoimproveandexpandfunctionalityovertime.
Theyprovideflexibilityinanenvironmentofchangingrequirements.Knowinghowtouse
themwellenablesyoutowritemaintainablecode.
Item22:PreferHelperClassesOverBookkeepingwith
DictionariesandTuples
Python’sbuilt-indictionarytypeiswonderfulformaintainingdynamicinternalstateover
thelifetimeofanobject.Bydynamic,Imeansituationsinwhichyouneedtodo
bookkeepingforanunexpectedsetofidentifiers.Forexample,sayyouwanttorecordthe
gradesofasetofstudentswhosenamesaren’tknowninadvance.Youcandefineaclass
tostorethenamesinadictionaryinsteadofusingapredefinedattributeforeachstudent.
Clickheretoviewcodeimage
classSimpleGradebook(object):
def__init__(self):
self._grades={}
defadd_student(self,name):
self._grades[name]=[]
defreport_grade(self,name,score):
self._grades[name].append(score)
defaverage_grade(self,name):
grades=self._grades[name]
returnsum(grades)/len(grades)
Usingtheclassissimple.
Clickheretoviewcodeimage
book=SimpleGradebook()
book.add_student(‘IsaacNewton’)
book.report_grade(‘IsaacNewton’,90)
#…
print(book.average_grade(‘IsaacNewton’))
>>>
90.0
Dictionariesaresoeasytousethatthere’sadangerofoverextendingthemtowritebrittle
code.Forexample,sayyouwanttoextendtheSimpleGradebookclasstokeepalist
ofgradesbysubject,notjustoverall.Youcandothisbychangingthe_grades
dictionarytomapstudentnames(thekeys)toyetanotherdictionary(thevalues).The
innermostdictionarywillmapsubjects(thekeys)togrades(thevalues).
Clickheretoviewcodeimage
classBySubjectGradebook(object):
def__init__(self):
self._grades={}
defadd_student(self,name):
self._grades[name]={}
Thisseemsstraightforwardenough.Thereport_gradeandaverage_grade
methodswillgainquiteabitofcomplexitytodealwiththemultileveldictionary,butit’s
manageable.
Clickheretoviewcodeimage
defreport_grade(self,name,subject,grade):
by_subject=self._grades[name]
grade_list=by_subject.setdefault(subject,[])
grade_list.append(grade)
defaverage_grade(self,name):
by_subject=self._grades[name]
total,count=0,0
forgradesinby_subject.values():
total+=sum(grades)
count+=len(grades)
returntotal/count
Usingtheclassremainssimple.
Clickheretoviewcodeimage
book=BySubjectGradebook()
book.add_student(‘AlbertEinstein’)
book.report_grade(‘AlbertEinstein’,‘Math’,75)
book.report_grade(‘AlbertEinstein’,‘Math’,65)
book.report_grade(‘AlbertEinstein’,‘Gym’,90)
book.report_grade(‘AlbertEinstein’,‘Gym’,95)
Now,imagineyourrequirementschangeagain.Youalsowanttotracktheweightofeach
scoretowardtheoverallgradeintheclasssomidtermsandfinalsaremoreimportantthan
popquizzes.Onewaytoimplementthisfeatureistochangetheinnermostdictionary;
insteadofmappingsubjects(thekeys)togrades(thevalues),Icanusethetuple
(score,weight)asvalues.
Clickheretoviewcodeimage
classWeightedGradebook(object):
#…
defreport_grade(self,name,subject,score,weight):
by_subject=self._grades[name]
grade_list=by_subject.setdefault(subject,[])
grade_list.append((score,weight))
Althoughthechangestoreport_gradeseemsimple—justmakethevalueatuple—the
average_grademethodnowhasaloopwithinaloopandisdifficulttoread.
Clickheretoviewcodeimage
defaverage_grade(self,name):
by_subject=self._grades[name]
score_sum,score_count=0,0
forsubject,scoresinby_subject.items():
subject_avg,total_weight=0,0
forscore,weightinscores:
#…
returnscore_sum/score_count
Usingtheclasshasalsogottenmoredifficult.It’sunclearwhatallofthenumbersinthe
positionalargumentsmean.
Clickheretoviewcodeimage
book.report_grade(‘AlbertEinstein’,‘Math’,80,0.10)
Whenyouseecomplexitylikethishappen,it’stimetomaketheleapfromdictionaries
andtuplestoahierarchyofclasses.
Atfirst,youdidn’tknowyou’dneedtosupportweightedgrades,sothecomplexityof
additionalhelperclassesseemedunwarranted.Python’sbuilt-indictionaryandtupletypes
madeiteasytokeepgoing,addinglayerafterlayertotheinternalbookkeeping.Butyou
shouldavoiddoingthisformorethanonelevelofnesting(i.e.,avoiddictionariesthat
containdictionaries).Itmakesyourcodehardtoreadbyotherprogrammersandsetsyou
upforamaintenancenightmare.
Assoonasyourealizethebookkeepingisgettingcomplicated,breakitalloutintoclasses.
Thisletsyouprovidewell-definedinterfacesthatbetterencapsulateyourdata.Thisalso
enablesyoutocreatealayerofabstractionbetweenyourinterfacesandyourconcrete
implementations.
RefactoringtoClasses
Youcanstartmovingtoclassesatthebottomofthedependencytree:asinglegrade.A
classseemstooheavyweightforsuchsimpleinformation.Atuple,though,seems
appropriatebecausegradesareimmutable.Here,Iusethetuple(score,weight)to
trackgradesinalist:
Clickheretoviewcodeimage
grades=[]
grades.append((95,0.45))
#…
total=sum(score*weightforscore,weightingrades)
total_weight=sum(weightfor_,weightingrades)
average_grade=total/total_weight
Theproblemisthatplaintuplesarepositional.Whenyouwanttoassociatemore
informationwithagrade,likeasetofnotesfromtheteacher,you’llneedtorewriteevery
usageofthetwo-tupletobeawarethattherearenowthreeitemspresentinsteadoftwo.
Here,Iuse_(theunderscorevariablename,aPythonconventionforunusedvariables)to
capturethethirdentryinthetupleandjustignoreit:
Clickheretoviewcodeimage
grades=[]
grades.append((95,0.45,‘Greatjob’))
#…
total=sum(score*weightforscore,weight,_ingrades)
total_weight=sum(weightfor_,weight,_ingrades)
average_grade=total/total_weight
Thispatternofextendingtupleslongerandlongerissimilartodeepeninglayersof
dictionaries.Assoonasyoufindyourselfgoinglongerthanatwo-tuple,it’stimeto
consideranotherapproach.
Thenamedtupletypeinthecollectionsmoduledoesexactlywhatyouneed.It
letsyoueasilydefinetiny,immutabledataclasses.
Clickheretoviewcodeimage
importcollections
Grade=collections.namedtuple(‘Grade’,(‘score’,‘weight’))
Theseclassescanbeconstructedwithpositionalorkeywordarguments.Thefieldsare
accessiblewithnamedattributes.Havingnamedattributesmakesiteasytomovefroma
namedtupletoyourownclasslaterifyourrequirementschangeagainandyouneedto
addbehaviorstothesimpledatacontainers.
Limitationsofnamedtuple
Althoughusefulinmanycircumstances,it’simportanttounderstandwhen
namedtuplecancausemoreharmthangood.
Youcan’tspecifydefaultargumentvaluesfornamedtupleclasses.Thismakes
themunwieldywhenyourdatamayhavemanyoptionalproperties.Ifyoufind
yourselfusingmorethanahandfulofattributes,definingyourownclassmaybea
betterchoice.
Theattributevaluesofnamedtupleinstancesarestillaccessibleusingnumerical
indexesanditeration.EspeciallyinexternalizedAPIs,thiscanleadtounintentional
usagethatmakesithardertomovetoarealclasslater.Ifyou’renotincontrolofall
oftheusageofyournamedtupleinstances,it’sbettertodefineyourownclass.
Next,youcanwriteaclasstorepresentasinglesubjectthatcontainsasetofgrades.
Clickheretoviewcodeimage
classSubject(object):
def__init__(self):
self._grades=[]
defreport_grade(self,score,weight):
self._grades.append(Grade(score,weight))
defaverage_grade(self):
total,total_weight=0,0
forgradeinself._grades:
total+=grade.score*grade.weight
total_weight+=grade.weight
returntotal/total_weight
Thenyouwouldwriteaclasstorepresentasetofsubjectsthatarebeingstudiedbya
singlestudent.
Clickheretoviewcodeimage
classStudent(object):
def__init__(self):
self._subjects={}
defsubject(self,name):
ifnamenotinself._subjects:
self._subjects[name]=Subject()
returnself._subjects[name]
defaverage_grade(self):
total,count=0,0
forsubjectinself._subjects.values():
total+=subject.average_grade()
count+=1
returntotal/count
Finally,you’dwriteacontainerforallofthestudentskeyeddynamicallybytheirnames.
Clickheretoviewcodeimage
classGradebook(object):
def__init__(self):
self._students={}
defstudent(self,name):
ifnamenotinself._students:
self._students[name]=Student()
returnself._students[name]
Thelinecountoftheseclassesisalmostdoublethepreviousimplementation’ssize.But
thiscodeismucheasiertoread.Theexampledrivingtheclassesisalsomoreclearand
extensible.
Clickheretoviewcodeimage
book=Gradebook()
albert=book.student(‘AlbertEinstein’)
math=albert.subject(‘Math’)
math.report_grade(80,0.10)
#…
print(albert.average_grade())
>>>
81.5
Ifnecessary,youcanwritebackwards-compatiblemethodstohelpmigrateusageofthe
oldAPIstyletothenewhierarchyofobjects.
ThingstoRemember
Avoidmakingdictionarieswithvaluesthatareotherdictionariesorlongtuples.
Usenamedtupleforlightweight,immutabledatacontainersbeforeyouneedthe
flexibilityofafullclass.
Moveyourbookkeepingcodetousemultiplehelperclasseswhenyourinternalstate
dictionariesgetcomplicated.
Item23:AcceptFunctionsforSimpleInterfacesInsteadof
Classes
ManyofPython’sbuilt-inAPIsallowyoutocustomizebehaviorbypassinginafunction.
ThesehooksareusedbyAPIstocallbackyourcodewhiletheyexecute.Forexample,the
listtype’ssortmethodtakesanoptionalkeyargumentthat’susedtodetermineeach
index’svalueforsorting.Here,Isortalistofnamesbasedontheirlengthsbyprovidinga
lambdaexpressionasthekeyhook:
Clickheretoviewcodeimage
names=[‘Socrates’,‘Archimedes’,‘Plato’,‘Aristotle’]
names.sort(key=lambdax:len(x))
print(names)
>>>
[‘Plato’,‘Socrates’,‘Aristotle’,‘Archimedes’]
Inotherlanguages,youmightexpecthookstobedefinedbyanabstractclass.InPython,
manyhooksarejuststatelessfunctionswithwell-definedargumentsandreturnvalues.
Functionsareidealforhooksbecausetheyareeasiertodescribeandsimplertodefine
thanclasses.FunctionsworkashooksbecausePythonhasfirst-classfunctions:Functions
andmethodscanbepassedaroundandreferencedlikeanyothervalueinthelanguage.
Forexample,sayyouwanttocustomizethebehaviorofthedefaultdictclass(see
Item46:“UseBuilt-inAlgorithmsandDataStructures”fordetails).Thisdatastructure
allowsyoutosupplyafunctionthatwillbecalledeachtimeamissingkeyisaccessed.
Thefunctionmustreturnthedefaultvaluethemissingkeyshouldhaveinthedictionary.
Here,Idefineahookthatlogseachtimeakeyismissingandreturns0forthedefault
value:
deflog_missing():
print(‘Keyadded’)
return0
Givenaninitialdictionaryandasetofdesiredincrements,Icancausethe
log_missingfunctiontorunandprinttwice(for'red'and'orange').
Clickheretoviewcodeimage
current={‘green’:12,‘blue’:3}
increments=[
(‘red’,5),
(‘blue’,17),
(‘orange’,9),
]
result=defaultdict(log_missing,current)
print(‘Before:’,dict(result))
forkey,amountinincrements:
result[key]+=amount
print(‘After:‘,dict(result))
>>>
Before:{‘green’:12,‘blue’:3}
Keyadded
Keyadded
After:{‘orange’:9,‘green’:12,‘blue’:20,‘red’:5}
Supplyingfunctionslikelog_missingmakesAPIseasytobuildandtestbecauseit
separatessideeffectsfromdeterministicbehavior.Forexample,sayyounowwantthe
defaultvaluehookpassedtodefaultdicttocountthetotalnumberofkeysthatwere
missing.Onewaytoachievethisisusingastatefulclosure(seeItem15:“KnowHow
ClosuresInteractwithVariableScope”fordetails).Here,Idefineahelperfunctionthat
usessuchaclosureasthedefaultvaluehook:
Clickheretoviewcodeimage
defincrement_with_report(current,increments):
added_count=0
defmissing():
nonlocaladded_count#Statefulclosure
added_count+=1
return0
result=defaultdict(missing,current)
forkey,amountinincrements:
result[key]+=amount
returnresult,added_count
Runningthisfunctionproducestheexpectedresult(2),eventhoughthedefaultdict
hasnoideathatthemissinghookmaintainsstate.Thisisanotherbenefitofaccepting
simplefunctionsforinterfaces.It’seasytoaddfunctionalitylaterbyhidingstateina
closure.
Clickheretoviewcodeimage
result,count=increment_with_report(current,increments)
assertcount==2
Theproblemwithdefiningaclosureforstatefulhooksisthatit’shardertoreadthanthe
statelessfunctionexample.Anotherapproachistodefineasmallclassthatencapsulates
thestateyouwanttotrack.
classCountMissing(object):
def__init__(self):
self.added=0
defmissing(self):
self.added+=1
return0
Inotherlanguages,youmightexpectthatnowdefaultdictwouldhavetobe
modifiedtoaccommodatetheinterfaceofCountMissing.ButinPython,thanksto
first-classfunctions,youcanreferencetheCountMissing.missingmethoddirectly
onanobjectandpassittodefaultdictasthedefaultvaluehook.It’strivialtohavea
methodsatisfyafunctioninterface.
Clickheretoviewcodeimage
counter=CountMissing()
result=defaultdict(counter.missing,current)#Methodref
forkey,amountinincrements:
result[key]+=amount
assertcounter.added==2
Usingahelperclasslikethistoprovidethebehaviorofastatefulclosureisclearerthan
theincrement_with_reportfunctionabove.However,inisolationit’sstillnot
immediatelyobviouswhatthepurposeoftheCountMissingclassis.Whoconstructsa
CountMissingobject?Whocallsthemissingmethod?Willtheclassneedother
publicmethodstobeaddedinthefuture?Untilyouseeitsusagewithdefaultdict,
theclassisamystery.
Toclarifythissituation,Pythonallowsclassestodefinethe__call__specialmethod.
__call__allowsanobjecttobecalledjustlikeafunction.Italsocausesthe
callablebuilt-infunctiontoreturnTrueforsuchaninstance.
Clickheretoviewcodeimage
classBetterCountMissing(object):
def__init__(self):
self.added=0
def__call__(self):
self.added+=1
return0
counter=BetterCountMissing()
counter()
assertcallable(counter)
Here,IuseaBetterCountMissinginstanceasthedefaultvaluehookfora
defaultdicttotrackthenumberofmissingkeysthatwereadded:
Clickheretoviewcodeimage
counter=BetterCountMissing()
result=defaultdict(counter,current)#Relieson__call__
forkey,amountinincrements:
result[key]+=amount
assertcounter.added==2
ThisismuchclearerthantheCountMissing.missingexample.The__call__
methodindicatesthataclass’sinstanceswillbeusedsomewhereafunctionargument
wouldalsobesuitable(likeAPIhooks).Itdirectsnewreadersofthecodetotheentry
pointthat’sresponsiblefortheclass’sprimarybehavior.Itprovidesastronghintthatthe
goaloftheclassistoactasastatefulclosure.
Bestofall,defaultdictstillhasnoviewintowhat’sgoingonwhenyouuse
__call__.Allthatdefaultdictrequiresisafunctionforthedefaultvaluehook.
Pythonprovidesmanydifferentwaystosatisfyasimplefunctioninterfacedependingon
whatyouneedtoaccomplish.
ThingstoRemember
Insteadofdefiningandinstantiatingclasses,functionsareoftenallyouneedfor
simpleinterfacesbetweencomponentsinPython.
ReferencestofunctionsandmethodsinPythonarefirstclass,meaningtheycanbe
usedinexpressionslikeanyothertype.
The__call__specialmethodenablesinstancesofaclasstobecalledlikeplain
Pythonfunctions.
Whenyouneedafunctiontomaintainstate,considerdefiningaclassthatprovides
the__call__methodinsteadofdefiningastatefulclosure(seeItem15:“Know
HowClosuresInteractwithVariableScope”).
Item24:Use@classmethodPolymorphismtoConstruct
ObjectsGenerically
InPython,notonlydotheobjectssupportpolymorphism,buttheclassesdoaswell.What
doesthatmean,andwhatisitgoodfor?
Polymorphismisawayformultipleclassesinahierarchytoimplementtheirownunique
versionsofamethod.Thisallowsmanyclassestofulfillthesameinterfaceorabstract
baseclasswhileprovidingdifferentfunctionality(seeItem28:“Inheritfrom
collections.abcforCustomContainerTypes”foranexample).
Forexample,sayyou’rewritingaMapReduceimplementationandyouwantacommon
classtorepresenttheinputdata.Here,Idefinesuchaclasswithareadmethodthatmust
bedefinedbysubclasses:
Clickheretoviewcodeimage
classInputData(object):
defread(self):
raiseNotImplementedError
Here,IhaveaconcretesubclassofInputDatathatreadsdatafromafileondisk:
Clickheretoviewcodeimage
classPathInputData(InputData):
def__init__(self,path):
super().__init__()
self.path=path
defread(self):
returnopen(self.path).read()
YoucouldhaveanynumberofInputDatasubclasseslikePathInputDataandeach
ofthemcouldimplementthestandardinterfaceforreadtoreturnthebytesofdatato
process.OtherInputDatasubclassescouldreadfromthenetwork,decompressdata
transparently,etc.
You’dwantasimilarabstractinterfacefortheMapReduceworkerthatconsumestheinput
datainastandardway.
Clickheretoviewcodeimage
classWorker(object):
def__init__(self,input_data):
self.input_data=input_data
self.result=None
defmap(self):
raiseNotImplementedError
defreduce(self,other):
raiseNotImplementedError
Here,IdefineaconcretesubclassofWorkertoimplementthespecificMapReduce
functionIwanttoapply:asimplenewlinecounter:
Clickheretoviewcodeimage
classLineCountWorker(Worker):
defmap(self):
data=self.input_data.read()
self.result=data.count(‘\n’)
defreduce(self,other):
self.result+=other.result
Itmaylooklikethisimplementationisgoinggreat,butI’vereachedthebiggesthurdlein
allofthis.Whatconnectsallofthesepieces?Ihaveanicesetofclasseswithreasonable
interfacesandabstractions—butthat’sonlyusefuloncetheobjectsareconstructed.
What’sresponsibleforbuildingtheobjectsandorchestratingtheMapReduce?
Thesimplestapproachistomanuallybuildandconnecttheobjectswithsomehelper
functions.Here,IlistthecontentsofadirectoryandconstructaPathInputData
instanceforeachfileitcontains:
Clickheretoviewcodeimage
defgenerate_inputs(data_dir):
fornameinos.listdir(data_dir):
yieldPathInputData(os.path.join(data_dir,name))
Next,IcreatetheLineCountWorkerinstancesusingtheInputDatainstances
returnedbygenerate_inputs.
Clickheretoviewcodeimage
defcreate_workers(input_list):
workers=[]
forinput_dataininput_list:
workers.append(LineCountWorker(input_data))
returnworkers
IexecutetheseWorkerinstancesbyfanningoutthemapsteptomultiplethreads(see
Item37:“UseThreadsforBlockingI/O,AvoidforParallelism”).Then,Icallreduce
repeatedlytocombinetheresultsintoonefinalvalue.
Clickheretoviewcodeimage
defexecute(workers):
threads=[Thread(target=w.map)forwinworkers]
forthreadinthreads:thread.start()
forthreadinthreads:thread.join()
first,rest=workers[0],workers[1:]
forworkerinrest:
first.reduce(worker)
returnfirst.result
Finally,Iconnectallofthepiecestogetherinafunctiontoruneachstep.
Clickheretoviewcodeimage
defmapreduce(data_dir):
inputs=generate_inputs(data_dir)
workers=create_workers(inputs)
returnexecute(workers)
Runningthisfunctiononasetoftestinputfilesworksgreat.
Clickheretoviewcodeimage
fromtempfileimportTemporaryDirectory
defwrite_test_files(tmpdir):
#…
withTemporaryDirectory()astmpdir:
write_test_files(tmpdir)
result=mapreduce(tmpdir)
print(‘Thereare’,result,‘lines’)
>>>
Thereare4360lines
What’stheproblem?Thehugeissueisthemapreducefunctionisnotgenericatall.If
youwanttowriteanotherInputDataorWorkersubclass,youwouldalsohaveto
rewritethegenerate_inputs,create_workers,andmapreducefunctionsto
match.
Thisproblemboilsdowntoneedingagenericwaytoconstructobjects.Inother
languages,you’dsolvethisproblemwithconstructorpolymorphism,requiringthateach
InputDatasubclassprovidesaspecialconstructorthatcanbeusedgenericallybythe
helpermethodsthatorchestratetheMapReduce.ThetroubleisthatPythononlyallowsfor
thesingleconstructormethod__init__.It’sunreasonabletorequireevery
InputDatasubclasstohaveacompatibleconstructor.
Thebestwaytosolvethisproblemiswith@classmethodpolymorphism.Thisis
exactlyliketheinstancemethodpolymorphismIusedforInputData.read,except
thatitappliestowholeclassesinsteadoftheirconstructedobjects.
LetmeapplythisideatotheMapReduceclasses.Here,IextendtheInputDataclass
withagenericclassmethodthat’sresponsibleforcreatingnewInputDatainstances
usingacommoninterface:
Clickheretoviewcodeimage
classGenericInputData(object):
defread(self):
raiseNotImplementedError
@classmethod
defgenerate_inputs(cls,config):
raiseNotImplementedError
Ihavegenerate_inputstakeadictionarywithasetofconfigurationparametersthat
areuptotheInputDataconcretesubclasstointerpret.Here,Iusetheconfigtofind
thedirectorytolistforinputfiles:
Clickheretoviewcodeimage
classPathInputData(GenericInputData):
#…
defread(self):
returnopen(self.path).read()
@classmethod
defgenerate_inputs(cls,config):
data_dir=config[‘data_dir’]
fornameinos.listdir(data_dir):
yieldcls(os.path.join(data_dir,name))
Similarly,Icanmakethecreate_workershelperpartoftheGenericWorkerclass.
Here,Iusetheinput_classparameter,whichmustbeasubclassof
GenericInputData,togeneratethenecessaryinputs.Iconstructinstancesofthe
GenericWorkerconcretesubclassusingcls()asagenericconstructor.
Clickheretoviewcodeimage
classGenericWorker(object):
#…
defmap(self):
raiseNotImplementedError
defreduce(self,other):
raiseNotImplementedError
@classmethod
defcreate_workers(cls,input_class,config):
workers=[]
forinput_dataininput_class.generate_inputs(config):
workers.append(cls(input_data))
returnworkers
Notethatthecalltoinput_class.generate_inputsaboveistheclass
polymorphismI’mtryingtoshow.Youcanalsoseehowcreate_workerscallingcls
providesanalternatewaytoconstructGenericWorkerobjectsbesidesusingthe
__init__methoddirectly.
TheeffectonmyconcreteGenericWorkersubclassisnothingmorethanchangingits
parentclass.
Clickheretoviewcodeimage
classLineCountWorker(GenericWorker):
#…
Andfinally,Icanrewritethemapreducefunctiontobecompletelygeneric.
Clickheretoviewcodeimage
defmapreduce(worker_class,input_class,config):
workers=worker_class.create_workers(input_class,config)
returnexecute(workers)
Runningthenewworkeronasetoftestfilesproducesthesameresultastheold
implementation.Thedifferenceisthatthemapreducefunctionrequiresmore
parameterssothatitcanoperategenerically.
Clickheretoviewcodeimage
withTemporaryDirectory()astmpdir:
write_test_files(tmpdir)
config={‘data_dir’:tmpdir}
result=mapreduce(LineCountWorker,PathInputData,config)
NowyoucanwriteotherGenericInputDataandGenericWorkerclassesasyou
wishandnothavetorewriteanyofthegluecode.
ThingstoRemember
Pythononlysupportsasingleconstructorperclass,the__init__method.
Use@classmethodtodefinealternativeconstructorsforyourclasses.
Useclassmethodpolymorphismtoprovidegenericwaystobuildandconnect
concretesubclasses.
Item25:InitializeParentClasseswithsuper
Theoldwaytoinitializeaparentclassfromachildclassistodirectlycalltheparent
class’s__init__methodwiththechildinstance.
Clickheretoviewcodeimage
classMyBaseClass(object):
def__init__(self,value):
self.value=value
classMyChildClass(MyBaseClass):
def__init__(self):
MyBaseClass.__init__(self,5)
Thisapproachworksfineforsimplehierarchiesbutbreaksdowninmanycases.
Ifyourclassisaffectedbymultipleinheritance(somethingtoavoidingeneral;seeItem
26:“UseMultipleInheritanceOnlyforMix-inUtilityClasses”),callingthesuperclasses’
__init__methodsdirectlycanleadtounpredictablebehavior.
Oneproblemisthatthe__init__callorderisn’tspecifiedacrossallsubclasses.For
example,hereIdefinetwoparentclassesthatoperateontheinstance’svaluefield:
classTimesTwo(object):
def__init__(self):
self.value*=2
classPlusFive(object):
def__init__(self):
self.value+=5
Thisclassdefinesitsparentclassesinoneordering.
Clickheretoviewcodeimage
classOneWay(MyBaseClass,TimesTwo,PlusFive):
def__init__(self,value):
MyBaseClass.__init__(self,value)
TimesTwo.__init__(self)
PlusFive.__init__(self)
Andconstructingitproducesaresultthatmatchestheparentclassordering.
Clickheretoviewcodeimage
foo=OneWay(5)
print(‘Firstorderingis(5*2)+5=’,foo.value)
>>>
Firstorderingis(5*2)+5=15
Here’sanotherclassthatdefinesthesameparentclassesbutinadifferentordering:
Clickheretoviewcodeimage
classAnotherWay(MyBaseClass,PlusFive,TimesTwo):
def__init__(self,value):
MyBaseClass.__init__(self,value)
TimesTwo.__init__(self)
PlusFive.__init__(self)
However,IleftthecallstotheparentclassconstructorsPlusFive.__init__and
TimesTwo.__init__inthesameorderasbefore,causingthisclass’sbehaviornotto
matchtheorderoftheparentclassesinitsdefinition.
Clickheretoviewcodeimage
bar=AnotherWay(5)
print(‘Secondorderingstillis’,bar.value)
>>>
Secondorderingstillis15
Anotherproblemoccurswithdiamondinheritance.Diamondinheritancehappenswhena
subclassinheritsfromtwoseparateclassesthathavethesamesuperclasssomewherein
thehierarchy.Diamondinheritancecausesthecommonsuperclass’s__init__method
torunmultipletimes,causingunexpectedbehavior.Forexample,hereIdefinetwochild
classesthatinheritfromMyBaseClass.
Clickheretoviewcodeimage
classTimesFive(MyBaseClass):
def__init__(self,value):
MyBaseClass.__init__(self,value)
self.value*=5
classPlusTwo(MyBaseClass):
def__init__(self,value):
MyBaseClass.__init__(self,value)
self.value+=2
Then,Idefineachildclassthatinheritsfrombothoftheseclasses,making
MyBaseClassthetopofthediamond.
Clickheretoviewcodeimage
classThisWay(TimesFive,PlusTwo):
def__init__(self,value):
TimesFive.__init__(self,value)
PlusTwo.__init__(self,value)
foo=ThisWay(5)
print(‘Shouldbe(5*5)+2=27butis’,foo.value)
>>>
Shouldbe(5*5)+2=27butis7
Theoutputshouldbe27because(5*5)+2=27.Butthecalltothesecond
parentclass’sconstructor,PlusTwo.__init__,causesself.valuetoberesetback
to5whenMyBaseClass.__init__getscalledasecondtime.
Tosolvetheseproblems,Python2.2addedthesuperbuilt-infunctionanddefinedthe
methodresolutionorder(MRO).TheMROstandardizeswhichsuperclassesareinitialized
beforeothers(e.g.,depth-first,left-to-right).Italsoensuresthatcommonsuperclassesin
diamondhierarchiesareonlyrunonce.
Here,Icreateadiamond-shapedclasshierarchyagain,butthistimeIusesuper(inthe
Python2style)toinitializetheparentclass:
Clickheretoviewcodeimage
#Python2
classTimesFiveCorrect(MyBaseClass):
def__init__(self,value):
super(TimesFiveCorrect,self).__init__(value)
self.value*=5
classPlusTwoCorrect(MyBaseClass):
def__init__(self,value):
super(PlusTwoCorrect,self).__init__(value)
self.value+=2
Nowthetoppartofthediamond,MyBaseClass.__init__,isonlyrunasingletime.
Theotherparentclassesarerunintheorderspecifiedintheclassstatement.
Clickheretoviewcodeimage
#Python2
classGoodWay(TimesFiveCorrect,PlusTwoCorrect):
def__init__(self,value):
super(GoodWay,self).__init__(value)
foo=GoodWay(5)
print‘Shouldbe5*(5+2)=35andis’,foo.value
>>>
Shouldbe5*(5+2)=35andis35
Thisordermayseembackwardsatfirst.Shouldn’tTimesFiveCorrect.__init__
haverunfirst?Shouldn’ttheresultbe(5*5)+2=27?Theanswerisno.This
orderingmatcheswhattheMROdefinesforthisclass.TheMROorderingisavailableon
aclassmethodcalledmro.
Clickheretoviewcodeimage
frompprintimportpprint
pprint(GoodWay.mro())
>>>
[<class‘__main__.GoodWay’>,
<class‘__main__.TimesFiveCorrect’>,
<class‘__main__.PlusTwoCorrect’>,
<class‘__main__.MyBaseClass’>,
<class‘object’>]
WhenIcallGoodWay(5),itinturncallsTimesFiveCorrect.__init__,which
callsPlusTwoCorrect.__init__,whichcallsMyBaseClass.__init__.Once
thisreachesthetopofthediamond,thenalloftheinitializationmethodsactuallydotheir
workintheoppositeorderfromhowtheir__init__functionswerecalled.
MyBaseClass.__init__assignsthevalueto5.PlusTwoCorrect.__init__
adds2tomakevalueequal7.TimesFiveCorrect.__init__multipliesitby5to
makevalueequal35.
Thesuperbuilt-infunctionworkswell,butitstillhastwonoticeableproblemsinPython
2:
Itssyntaxisabitverbose.Youhavetospecifytheclassyou’rein,theselfobject,
themethodname(usually__init__),andallthearguments.Thisconstructioncan
beconfusingtonewPythonprogrammers.
Youhavetospecifythecurrentclassbynameinthecalltosuper.Ifyouever
changetheclass’sname—averycommonactivitywhenimprovingaclasshierarchy
—youalsoneedtoupdateeverycalltosuper.
Thankfully,Python3fixestheseissuesbymakingcallstosuperwithnoarguments
equivalenttocallingsuperwith__class__andselfspecified.InPython3,you
shouldalwaysusesuperbecauseit’sclear,concise,andalwaysdoestherightthing.
Clickheretoviewcodeimage
classExplicit(MyBaseClass):
def__init__(self,value):
super(__class__,self).__init__(value*2)
classImplicit(MyBaseClass):
def__init__(self,value):
super().__init__(value*2)
assertExplicit(10).value==Implicit(10).value
ThisworksbecausePython3letsyoureliablyreferencethecurrentclassinmethodsusing
the__class__variable.Thisdoesn’tworkinPython2because__class__isn’t
defined.Youmayguessthatyoucoulduseself.__class__asanargumentto
super,butthisbreaksbecauseofthewaysuperisimplementedinPython2.
ThingstoRemember
Python’sstandardmethodresolutionorder(MRO)solvestheproblemsofsuperclass
initializationorderanddiamondinheritance.
Alwaysusethesuperbuilt-infunctiontoinitializeparentclasses.
Item26:UseMultipleInheritanceOnlyforMix-inUtility
Classes
Pythonisanobject-orientedlanguagewithbuilt-infacilitiesformakingmultiple
inheritancetractable(seeItem25:“InitializeParentClasseswithsuper”).However,it’s
bettertoavoidmultipleinheritancealtogether.
Ifyoufindyourselfdesiringtheconvenienceandencapsulationthatcomeswithmultiple
inheritance,considerwritingamix-ininstead.Amix-inisasmallclassthatonlydefinesa
setofadditionalmethodsthataclassshouldprovide.Mix-inclassesdon’tdefinetheir
owninstanceattributesnorrequiretheir__init__constructortobecalled.
Writingmix-insiseasybecausePythonmakesittrivialtoinspectthecurrentstateofany
objectregardlessofitstype.Dynamicinspectionletsyouwritegenericfunctionalitya
singletime,inamix-in,thatcanbeappliedtomanyotherclasses.Mix-inscanbe
composedandlayeredtominimizerepetitivecodeandmaximizereuse.
Forexample,sayyouwanttheabilitytoconvertaPythonobjectfromitsin-memory
representationtoadictionarythat’sreadyforserialization.Whynotwritethis
functionalitygenericallysoyoucanuseitwithallofyourclasses?
Here,Idefineanexamplemix-inthataccomplishesthiswithanewpublicmethodthat’s
addedtoanyclassthatinheritsfromit:
Clickheretoviewcodeimage
classToDictMixin(object):
defto_dict(self):
returnself._traverse_dict(self.__dict__)
Theimplementationdetailsarestraightforwardandrelyondynamicattributeaccessusing
hasattr,dynamictypeinspectionwithisinstance,andaccessingtheinstance
dictionary__dict__.
Clickheretoviewcodeimage
def_traverse_dict(self,instance_dict):
output={}
forkey,valueininstance_dict.items():
output[key]=self._traverse(key,value)
returnoutput
def_traverse(self,key,value):
ifisinstance(value,ToDictMixin):
returnvalue.to_dict()
elifisinstance(value,dict):
returnself._traverse_dict(value)
elifisinstance(value,list):
return[self._traverse(key,i)foriinvalue]
elifhasattr(value,‘__dict__’):
returnself._traverse_dict(value.__dict__)
else:
returnvalue
Here,Idefineanexampleclassthatusesthemix-intomakeadictionaryrepresentationof
abinarytree:
Clickheretoviewcodeimage
classBinaryTree(ToDictMixin):
def__init__(self,value,left=None,right=None):
self.value=value
self.left=left
self.right=right
TranslatingalargenumberofrelatedPythonobjectsintoadictionarybecomeseasy.
Clickheretoviewcodeimage
tree=BinaryTree(10,
left=BinaryTree(7,right=BinaryTree(9)),
right=BinaryTree(13,left=BinaryTree(11)))
print(tree.to_dict())
>>>
{‘left’:{‘left’:None,
‘right’:{‘left’:None,‘right’:None,‘value’:9},
‘value’:7},
‘right’:{‘left’:{‘left’:None,‘right’:None,‘value’:11},
‘right’:None,
‘value’:13},
‘value’:10}
Thebestpartaboutmix-insisthatyoucanmaketheirgenericfunctionalitypluggableso
behaviorscanbeoverriddenwhenrequired.Forexample,hereIdefineasubclassof
BinaryTreethatholdsareferencetoitsparent.Thiscircularreferencewouldcausethe
defaultimplementationofToDictMixin.to_dicttoloopforever.
Clickheretoviewcodeimage
classBinaryTreeWithParent(BinaryTree):
def__init__(self,value,left=None,
right=None,parent=None):
super().__init__(value,left=left,right=right)
self.parent=parent
ThesolutionistooverridetheToDictMixin._traversemethodinthe
BinaryTreeWithParentclasstoonlyprocessvaluesthatmatter,preventingcycles
encounteredbythemix-in.Here,Ioverridethe_traversemethodtonottraversethe
parentandjustinsertitsnumericalvalue:
Clickheretoviewcodeimage
def_traverse(self,key,value):
if(isinstance(value,BinaryTreeWithParent)and
key==‘parent’):
returnvalue.value#Preventcycles
else:
returnsuper()._traverse(key,value)
CallingBinaryTreeWithParent.to_dictwillworkwithoutissuebecausethe
circularreferencingpropertiesaren’tfollowed.
Clickheretoviewcodeimage
root=BinaryTreeWithParent(10)
root.left=BinaryTreeWithParent(7,parent=root)
root.left.right=BinaryTreeWithParent(9,parent=root.left)
print(root.to_dict())
>>>
{‘left’:{‘left’:None,
‘parent’:10,
‘right’:{‘left’:None,
‘parent’:7,
‘right’:None,
‘value’:9},
‘value’:7},
‘parent’:None,
‘right’:None,
‘value’:10}
BydefiningBinaryTreeWithParent._traverse,I’vealsoenabledanyclassthat
hasanattributeoftypeBinaryTreeWithParenttoautomaticallyworkwith
ToDictMixin.
Clickheretoviewcodeimage
classNamedSubTree(ToDictMixin):
def__init__(self,name,tree_with_parent):
self.name=name
self.tree_with_parent=tree_with_parent
my_tree=NamedSubTree(‘foobar’,root.left.right)
print(my_tree.to_dict())#Noinfiniteloop
>>>
{‘name’:‘foobar’,
‘tree_with_parent’:{‘left’:None,
‘parent’:7,
‘right’:None,
‘value’:9}}
Mix-inscanalsobecomposedtogether.Forexample,sayyouwantamix-inthatprovides
genericJSONserializationforanyclass.Youcandothisbyassumingthataclassprovides
ato_dictmethod(whichmayormaynotbeprovidedbytheToDictMixinclass).
Clickheretoviewcodeimage
classJsonMixin(object):
@classmethod
deffrom_json(cls,data):
kwargs=json.loads(data)
returncls(**kwargs)
defto_json(self):
returnjson.dumps(self.to_dict())
NotehowtheJsonMixinclassdefinesbothinstancemethodsandclassmethods.Mixinsletyouaddeitherkindofbehavior.Inthisexample,theonlyrequirementsofthe
JsonMixinarethattheclasshasato_dictmethodandits__init__methodtakes
keywordarguments(seeItem19:“ProvideOptionalBehaviorwithKeyword
Arguments”).
Thismix-inmakesitsimpletocreatehierarchiesofutilityclassesthatcanbeserializedto
andfromJSONwithlittleboilerplate.Forexample,hereIhaveahierarchyofdataclasses
representingpartsofadatacentertopology:
Clickheretoviewcodeimage
classDatacenterRack(ToDictMixin,JsonMixin):
def__init__(self,switch=None,machines=None):
self.switch=Switch(**switch)
self.machines=[
Machine(**kwargs)forkwargsinmachines]
classSwitch(ToDictMixin,JsonMixin):
#…
classMachine(ToDictMixin,JsonMixin):
#…
SerializingtheseclassestoandfromJSONissimple.Here,Iverifythatthedataisableto
besentround-tripthroughserializinganddeserializing:
Clickheretoviewcodeimage
serialized=”””{
“switch”:{“ports”:5,“speed”:1e9},
“machines”:[
{“cores”:8,“ram”:32e9,“disk”:5e12},
{“cores”:4,“ram”:16e9,“disk”:1e12},
{“cores”:2,“ram”:4e9,“disk”:500e9}
]
}”””
deserialized=DatacenterRack.from_json(serialized)
roundtrip=deserialized.to_json()
assertjson.loads(serialized)==json.loads(roundtrip)
Whenyouusemix-inslikethis,it’salsofineiftheclassalreadyinheritsfrom
JsonMixinhigherupintheobjecthierarchy.Theresultingclasswillbehavethesame
way.
ThingstoRemember
Avoidusingmultipleinheritanceifmix-inclassescanachievethesameoutcome.
Usepluggablebehaviorsattheinstanceleveltoprovideper-classcustomization
whenmix-inclassesmayrequireit.
Composemix-instocreatecomplexfunctionalityfromsimplebehaviors.
Item27:PreferPublicAttributesOverPrivateOnes
InPython,thereareonlytwotypesofattributevisibilityforaclass’sattributes:publicand
private.
Clickheretoviewcodeimage
classMyObject(object):
def__init__(self):
self.public_field=5
self.__private_field=10
defget_private_field(self):
returnself.__private_field
Publicattributescanbeaccessedbyanyoneusingthedotoperatorontheobject.
Clickheretoviewcodeimage
foo=MyObject()
assertfoo.public_field==5
Privatefieldsarespecifiedbyprefixinganattribute’snamewithadoubleunderscore.
Theycanbeaccesseddirectlybymethodsofthecontainingclass.
Clickheretoviewcodeimage
assertfoo.get_private_field()==10
Directlyaccessingprivatefieldsfromoutsidetheclassraisesanexception.
Clickheretoviewcodeimage
foo.__private_field
>>>
AttributeError:‘MyObject’objecthasnoattribute‘__private_field’
Classmethodsalsohaveaccesstoprivateattributesbecausetheyaredeclaredwithinthe
surroundingclassblock.
Clickheretoviewcodeimage
classMyOtherObject(object):
def__init__(self):
self.__private_field=71
@classmethod
defget_private_field_of_instance(cls,instance):
returninstance.__private_field
bar=MyOtherObject()
assertMyOtherObject.get_private_field_of_instance(bar)==71
Asyou’dexpectwithprivatefields,asubclasscan’taccessitsparentclass’sprivatefields.
Clickheretoviewcodeimage
classMyParentObject(object):
def__init__(self):
self.__private_field=71
classMyChildObject(MyParentObject):
defget_private_field(self):
returnself.__private_field
baz=MyChildObject()
baz.get_private_field()
>>>
AttributeError:‘MyChildObject’objecthasnoattribute
‘_MyChildObject__private_field’
Theprivateattributebehaviorisimplementedwithasimpletransformationoftheattribute
name.WhenthePythoncompilerseesprivateattributeaccessinmethodslike
MyChildObject.get_private_field,ittranslates__private_fieldto
access_MyChildObject__private_fieldinstead.Inthisexample,
__private_fieldwasonlydefinedinMyParentObject.__init__,meaning
theprivateattribute’srealnameis_MyParentObject__private_field.
Accessingtheparent’sprivateattributefromthechildclassfailssimplybecausethe
transformedattributenamedoesn’tmatch.
Knowingthisscheme,youcaneasilyaccesstheprivateattributesofanyclass,froma
subclassorexternally,withoutaskingforpermission.
Clickheretoviewcodeimage
assertbaz._MyParentObject__private_field==71
Ifyoulookintheobject’sattributedictionary,you’llseethatprivateattributesareactually
storedwiththenamesastheyappearafterthetransformation.
Clickheretoviewcodeimage
print(baz.__dict__)
>>>
{‘_MyParentObject__private_field’:71}
Whydoesn’tthesyntaxforprivateattributesactuallyenforcestrictvisibility?The
simplestanswerisoneoften-quotedmottoofPython:“Weareallconsentingadultshere.”
Pythonprogrammersbelievethatthebenefitsofbeingopenoutweighthedownsidesof
beingclosed.
Beyondthat,havingtheabilitytohooklanguagefeatureslikeattributeaccess(seeItem
32:“Use__getattr__,__getattribute__,and__setattr__forLazy
Attributes”)enablesyoutomessaroundwiththeinternalsofobjectswheneveryouwish.
Ifyoucandothat,whatisthevalueofPythontryingtopreventprivateattributeaccess
otherwise?
Tominimizethedamageofaccessinginternalsunknowingly,Pythonprogrammersfollow
anamingconventiondefinedinthestyleguide(seeItem2:“FollowthePEP8Style
Guide”).Fieldsprefixedbyasingleunderscore(like_protected_field)are
protected,meaningexternalusersoftheclassshouldproceedwithcaution.
However,manyprogrammerswhoarenewtoPythonuseprivatefieldstoindicatean
internalAPIthatshouldn’tbeaccessedbysubclassesorexternally.
Clickheretoviewcodeimage
classMyClass(object):
def__init__(self,value):
self.__value=value
defget_value(self):
returnstr(self.__value)
foo=MyClass(5)
assertfoo.get_value()==‘5’
Thisisthewrongapproach.Inevitablysomeone,includingyou,willwanttosubclassyour
classtoaddnewbehaviorortoworkarounddeficienciesinexistingmethods(likeabove,
howMyClass.get_valuealwaysreturnsastring).Bychoosingprivateattributes,
you’reonlymakingsubclassoverridesandextensionscumbersomeandbrittle.Your
potentialsubclasserswillstillaccesstheprivatefieldswhentheyabsolutelyneedtodoso.
Clickheretoviewcodeimage
classMyIntegerSubclass(MyClass):
defget_value(self):
returnint(self._MyClass__value)
foo=MyIntegerSubclass(5)
assertfoo.get_value()==5
Butiftheclasshierarchychangesbeneathyou,theseclasseswillbreakbecausetheprivate
referencesarenolongervalid.Here,theMyIntegerSubclassclass’simmediate
parent,MyClass,hashadanotherparentclassaddedcalledMyBaseClass:
Clickheretoviewcodeimage
classMyBaseClass(object):
def__init__(self,value):
self.__value=value
#…
classMyClass(MyBaseClass):
#…
classMyIntegerSubclass(MyClass):
defget_value(self):
returnint(self._MyClass__value)
The__valueattributeisnowassignedintheMyBaseClassparentclass,notthe
MyClassparent.Thatcausestheprivatevariablereferenceself._MyClass__value
tobreakinMyIntegerSubclass.
Clickheretoviewcodeimage
foo=MyIntegerSubclass(5)
foo.get_value()
>>>
AttributeError:‘MyIntegerSubclass’objecthasnoattribute‘_MyClass__value’
Ingeneral,it’sbettertoerronthesideofallowingsubclassestodomorebyusing
protectedattributes.DocumenteachprotectedfieldandexplainwhichareinternalAPIs
availabletosubclassesandwhichshouldbeleftaloneentirely.Thisisasmuchadviceto
otherprogrammersasitisguidanceforyourfutureselfonhowtoextendyourowncode
safely.
Clickheretoviewcodeimage
classMyClass(object):
def__init__(self,value):
#Thisstorestheuser-suppliedvaluefortheobject.
#Itshouldbecoercibletoastring.Onceassignedfor
#theobjectitshouldbetreatedasimmutable.
self._value=value
Theonlytimetoseriouslyconsiderusingprivateattributesiswhenyou’reworriedabout
namingconflictswithsubclasses.Thisproblemoccurswhenachildclassunwittingly
definesanattributethatwasalreadydefinedbyitsparentclass.
Clickheretoviewcodeimage
classApiClass(object):
def__init__(self):
self._value=5
defget(self):
returnself._value
classChild(ApiClass):
def__init__(self):
super().__init__()
self._value=‘hello’#Conflicts
a=Child()
print(a.get(),‘and’,a._value,‘shouldbedifferent’)
>>>
helloandhelloshouldbedifferent
ThisisprimarilyaconcernwithclassesthatarepartofapublicAPI;thesubclassesare
outofyourcontrol,soyoucan’trefactortofixtheproblem.Suchaconflictisespecially
possiblewithattributenamesthatareverycommon(likevalue).Toreducetheriskof
thishappening,youcanuseaprivateattributeintheparentclasstoensurethatthereare
noattributenamesthatoverlapwithchildclasses.
Clickheretoviewcodeimage
classApiClass(object):
def__init__(self):
self.__value=5
defget(self):
returnself.__value
classChild(ApiClass):
def__init__(self):
super().__init__()
self._value=‘hello’#OK!
a=Child()
print(a.get(),‘and’,a._value,‘aredifferent’)
>>>
5andhelloaredifferent
ThingstoRemember
Privateattributesaren’trigorouslyenforcedbythePythoncompiler.
PlanfromthebeginningtoallowsubclassestodomorewithyourinternalAPIsand
attributesinsteadoflockingthemoutbydefault.
Usedocumentationofprotectedfieldstoguidesubclassesinsteadoftryingtoforce
accesscontrolwithprivateattributes.
Onlyconsiderusingprivateattributestoavoidnamingconflictswithsubclassesthat
areoutofyourcontrol.
Item28:Inheritfromcollections.abcforCustom
ContainerTypes
MuchofprogramminginPythonisdefiningclassesthatcontaindataanddescribinghow
suchobjectsrelatetoeachother.EveryPythonclassisacontainerofsomekind,
encapsulatingattributesandfunctionalitytogether.Pythonalsoprovidesbuilt-incontainer
typesformanagingdata:lists,tuples,sets,anddictionaries.
Whenyou’redesigningclassesforsimpleusecaseslikesequences,it’snaturalthatyou’d
wanttosubclassPython’sbuilt-inlisttypedirectly.Forexample,sayyouwanttocreate
yourowncustomlisttypethathasadditionalmethodsforcountingthefrequencyofits
members.
Clickheretoviewcodeimage
classFrequencyList(list):
def__init__(self,members):
super().__init__(members)
deffrequency(self):
counts={}
foriteminself:
counts.setdefault(item,0)
counts[item]+=1
returncounts
Bysubclassinglist,yougetalloflist’sstandardfunctionalityandpreservethe
semanticsfamiliartoallPythonprogrammers.Youradditionalmethodscanaddany
custombehaviorsyouneed.
Clickheretoviewcodeimage
foo=FrequencyList([‘a’,‘b’,‘a’,‘c’,‘b’,‘a’,‘d’])
print(‘Lengthis’,len(foo))
foo.pop()
print(‘Afterpop:’,repr(foo))
print(‘Frequency:’,foo.frequency())
>>>
Lengthis7
Afterpop:[‘a’,‘b’,‘a’,‘c’,‘b’,‘a’]
Frequency:{‘a’:3,‘c’:1,‘b’:2}
Now,imagineyouwanttoprovideanobjectthatfeelslikealist,allowingindexing,but
isn’talistsubclass.Forexample,sayyouwanttoprovidesequencesemantics(like
listortuple)forabinarytreeclass.
Clickheretoviewcodeimage
classBinaryNode(object):
def__init__(self,value,left=None,right=None):
self.value=value
self.left=left
self.right=right
Howdoyoumakethisactlikeasequencetype?Pythonimplementsitscontainer
behaviorswithinstancemethodsthathavespecialnames.Whenyouaccessasequence
itembyindex:
bar=[1,2,3]
bar[0]
itwillbeinterpretedas:
bar.__getitem__(0)
TomaketheBinaryNodeclassactlikeasequence,youcanprovideacustom
implementationof__getitem__thattraversestheobjecttreedepthfirst.
Clickheretoviewcodeimage
classIndexableNode(BinaryNode):
def_search(self,count,index):
#…
#Returns(found,count)
def__getitem__(self,index):
found,_=self._search(0,index)
ifnotfound:
raiseIndexError(‘Indexoutofrange’)
returnfound.value
Youcanconstructyourbinarytreeasusual.
Clickheretoviewcodeimage
tree=IndexableNode(
10,
left=IndexableNode(
5,
left=IndexableNode(2),
right=IndexableNode(
6,right=IndexableNode(7))),
right=IndexableNode(
15,left=IndexableNode(11)))
Butyoucanalsoaccessitlikealistinadditiontotreetraversal.
Clickheretoviewcodeimage
print(‘LRR=’,tree.left.right.right.value)
print(‘Index0=’,tree[0])
print(‘Index1=’,tree[1])
print(‘11inthetree?’,11intree)
print(‘17inthetree?’,17intree)
print(‘Treeis’,list(tree))
>>>
LRR=7
Index0=2
Index1=5
11inthetree?True
17inthetree?False
Treeis[2,5,6,7,10,11,15]
Theproblemisthatimplementing__getitem__isn’tenoughtoprovideallofthe
sequencesemanticsyou’dexpect.
Clickheretoviewcodeimage
len(tree)
>>>
TypeError:objectoftype‘IndexableNode’hasnolen()
Thelenbuilt-infunctionrequiresanotherspecialmethodnamed__len__thatmust
haveanimplementationforyourcustomsequencetype.
Clickheretoviewcodeimage
classSequenceNode(IndexableNode):
def__len__(self):
_,count=self._search(0,None)
returncount
tree=SequenceNode(
#…
)
print(‘Treehas%dnodes’%len(tree))
>>>
Treehas7nodes
Unfortunately,thisstillisn’tenough.Alsomissingarethecountandindexmethods
thataPythonprogrammerwouldexpecttoseeonasequencelikelistortuple.
Definingyourowncontainertypesismuchharderthanitlooks.
ToavoidthisdifficultythroughoutthePythonuniverse,thebuilt-incollections.abc
moduledefinesasetofabstractbaseclassesthatprovideallofthetypicalmethodsfor
eachcontainertype.Whenyousubclassfromtheseabstractbaseclassesandforgetto
implementrequiredmethods,themodulewilltellyousomethingiswrong.
Clickheretoviewcodeimage
fromcollections.abcimportSequence
classBadType(Sequence):
pass
foo=BadType()
>>>
TypeError:Can’tinstantiateabstractclassBadTypewithabstractmethods
__getitem__,__len__
Whenyoudoimplementallofthemethodsrequiredbyanabstractbaseclass,asIdid
abovewithSequenceNode,itwillprovidealloftheadditionalmethodslikeindex
andcountforfree.
Clickheretoviewcodeimage
classBetterNode(SequenceNode,Sequence):
pass
tree=BetterNode(
#…
)
print(‘Indexof7is’,tree.index(7))
print(‘Countof10is’,tree.count(10))
>>>
Indexof7is3
Countof10is1
Thebenefitofusingtheseabstractbaseclassesisevengreaterformorecomplextypes
likeSetandMutableMapping,whichhavealargenumberofspecialmethodsthat
needtobeimplementedtomatchPythonconventions.
ThingstoRemember
InheritdirectlyfromPython’scontainertypes(likelistordict)forsimpleuse
cases.
Bewareofthelargenumberofmethodsrequiredtoimplementcustomcontainer
typescorrectly.
Haveyourcustomcontainertypesinheritfromtheinterfacesdefinedin
collections.abctoensurethatyourclassesmatchrequiredinterfacesand
behaviors.
4.MetaclassesandAttributes
MetaclassesareoftenmentionedinlistsofPython’sfeatures,butfewunderstandwhat
theyaccomplishinpractice.Thenamemetaclassvaguelyimpliesaconceptaboveand
beyondaclass.Simplyput,metaclassesletyouinterceptPython’sclassstatementand
providespecialbehavioreachtimeaclassisdefined.
SimilarlymysteriousandpowerfularePython’sbuilt-infeaturesfordynamically
customizingattributeaccesses.AlongwithPython’sobject-orientedconstructs,these
facilitiesprovidewonderfultoolstoeasethetransitionfromsimpleclassestocomplex
ones.
However,withthesepowerscomemanypitfalls.Dynamicattributesenableyouto
overrideobjectsandcauseunexpectedsideeffects.Metaclassescancreateextremely
bizarrebehaviorsthatareunapproachabletonewcomers.It’simportantthatyoufollowthe
ruleofleastsurpriseandonlyusethesemechanismstoimplementwell-understood
idioms.
Item29:UsePlainAttributesInsteadofGetandSet
Methods
ProgrammerscomingtoPythonfromotherlanguagesmaynaturallytrytoimplement
explicitgetterandsettermethodsintheirclasses.
classOldResistor(object):
def__init__(self,ohms):
self._ohms=ohms
defget_ohms(self):
returnself._ohms
defset_ohms(self,ohms):
self._ohms=ohms
Usingthesesettersandgettersissimple,butit’snotPythonic.
Clickheretoviewcodeimage
r0=OldResistor(50e3)
print(‘Before:%5r’%r0.get_ohms())
r0.set_ohms(10e3)
print(‘After:%5r’%r0.get_ohms())
>>>
Before:50000.0
After:10000.0
Suchmethodsareespeciallyclumsyforoperationslikeincrementinginplace.
Clickheretoviewcodeimage
r0.set_ohms(r0.get_ohms()+5e3)
Theseutilitymethodsdohelpdefinetheinterfaceforyourclass,makingiteasierto
encapsulatefunctionality,validateusage,anddefineboundaries.Thoseareimportant
goalswhendesigningaclasstoensureyoudon’tbreakcallersasyourclassevolvesover
time.
InPython,however,youalmostneverneedtoimplementexplicitsetterorgettermethods.
Instead,youshouldalwaysstartyourimplementationswithsimplepublicattributes.
Clickheretoviewcodeimage
classResistor(object):
def__init__(self,ohms):
self.ohms=ohms
self.voltage=0
self.current=0
r1=Resistor(50e3)
r1.ohms=10e3
Thesemakeoperationslikeincrementinginplacenaturalandclear.
r1.ohms+=5e3
Later,ifyoudecideyouneedspecialbehaviorwhenanattributeisset,youcanmigrateto
[email protected],Idefineanew
subclassofResistorthatletsmevarythecurrentbyassigningthevoltage
property.Notethatinordertoworkproperlythenameofboththesetterandgetter
methodsmustmatchtheintendedpropertyname.
Clickheretoviewcodeimage
classVoltageResistance(Resistor):
def__init__(self,ohms):
super().__init__(ohms)
self._voltage=0
@property
defvoltage(self):
returnself._voltage
@voltage.setter
defvoltage(self,voltage):
self._voltage=voltage
self.current=self._voltage/self.ohms
Now,assigningthevoltagepropertywillrunthevoltagesettermethod,updatingthe
currentpropertyoftheobjecttomatch.
Clickheretoviewcodeimage
r2=VoltageResistance(1e3)
print(‘Before:%5ramps’%r2.current)
r2.voltage=10
print(‘After:%5ramps’%r2.current)
>>>
Before:0amps
After:0.01amps
Specifyingasetteronapropertyalsoletsyouperformtypecheckingandvalidationon
valuespassedtoyourclass.Here,Idefineaclassthatensuresallresistancevaluesare
abovezeroohms:
Clickheretoviewcodeimage
classBoundedResistance(Resistor):
def__init__(self,ohms):
super().__init__(ohms)
@property
defohms(self):
returnself._ohms
@ohms.setter
defohms(self,ohms):
ifohms<=0:
raiseValueError(‘%fohmsmustbe>0’%ohms)
self._ohms=ohms
Assigninganinvalidresistancetotheattributeraisesanexception.
Clickheretoviewcodeimage
r3=BoundedResistance(1e3)
r3.ohms=0
>>>
ValueError:0.000000ohmsmustbe>0
Anexceptionwillalsoberaisedifyoupassaninvalidvaluetotheconstructor.
Clickheretoviewcodeimage
BoundedResistance(-5)
>>>
ValueError:-5.000000ohmsmustbe>0
ThishappensbecauseBoundedResistance.__init__calls
Resistor.__init__,whichassignsself.ohms=-5.Thatassignmentcausesthe
@ohms.settermethodfromBoundedResistancetobecalled,immediately
runningthevalidationcodebeforeobjectconstructionhascompleted.
Youcanevenuse@propertytomakeattributesfromparentclassesimmutable.
Clickheretoviewcodeimage
classFixedResistance(Resistor):
#…
@property
defohms(self):
returnself._ohms
@ohms.setter
defohms(self,ohms):
ifhasattr(self,‘_ohms’):
raiseAttributeError(“Can’tsetattribute”)
self._ohms=ohms
Tryingtoassigntothepropertyafterconstructionraisesanexception.
Clickheretoviewcodeimage
r4=FixedResistance(1e3)
r4.ohms=2e3
>>>
AttributeError:Can’tsetattribute
Thebiggestshortcomingof@propertyisthatthemethodsforanattributecanonlybe
sharedbysubclasses.Unrelatedclassescan’tsharethesameimplementation.However,
Pythonalsosupportsdescriptors(seeItem31:“UseDescriptorsforReusable
@propertyMethods”)thatenablereusablepropertylogicandmanyotherusecases.
Finally,whenyouuse@propertymethodstoimplementsettersandgetters,besurethat
thebehavioryouimplementisnotsurprising.Forexample,don’tsetotherattributesin
getterpropertymethods.
Clickheretoviewcodeimage
classMysteriousResistor(Resistor):
@property
defohms(self):
self.voltage=self._ohms*self.current
returnself._ohms
#…
Thisleadstoextremelybizarrebehavior.
Clickheretoviewcodeimage
r7=MysteriousResistor(10)
r7.current=0.01
print(‘Before:%5r’%r7.voltage)
r7.ohms
print(‘After:%5r’%r7.voltage)
>>>
Before:0
After:0.1
Thebestpolicyistoonlymodifyrelatedobjectstatein@property.settermethods.
Besuretoavoidanyothersideeffectsthecallermaynotexpectbeyondtheobject,suchas
importingmodulesdynamically,runningslowhelperfunctions,ormakingexpensive
databasequeries.UsersofyourclasswillexpectitsattributestobelikeanyotherPython
object:quickandeasy.Usenormalmethodstodoanythingmorecomplexorslow.
ThingstoRemember
Definenewclassinterfacesusingsimplepublicattributes,andavoidsetandget
methods.
Use@propertytodefinespecialbehaviorwhenattributesareaccessedonyour
objects,ifnecessary.
Followtheruleofleastsurpriseandavoidweirdsideeffectsinyour@property
methods.
Ensurethat@propertymethodsarefast;dosloworcomplexworkusingnormal
methods.
Item30:Consider@propertyInsteadofRefactoring
Attributes
Thebuilt-in@propertydecoratormakesiteasyforsimpleaccessesofaninstance’s
attributestoactsmarter(seeItem29:“UsePlainAttributesInsteadofGetandSet
Methods”).Oneadvancedbutcommonuseof@propertyistransitioningwhatwas
onceasimplenumericalattributeintoanon-the-flycalculation.Thisisextremelyhelpful
becauseitletsyoumigrateallexistingusageofaclasstohavenewbehaviorswithout
rewritinganyofthecallsites.Italsoprovidesanimportantstopgapforimprovingyour
interfacesovertime.
Forexample,sayyouwanttoimplementaleakybucketquotausingplainPythonobjects.
Here,theBucketclassrepresentshowmuchquotaremainsandthedurationforwhich
thequotawillbeavailable:
Clickheretoviewcodeimage
classBucket(object):
def__init__(self,period):
self.period_delta=timedelta(seconds=period)
self.reset_time=datetime.now()
self.quota=0
def__repr__(self):
return‘Bucket(quota=%d)’%self.quota
Theleakybucketalgorithmworksbyensuringthat,wheneverthebucketisfilled,the
amountofquotadoesnotcarryoverfromoneperiodtothenext.
Clickheretoviewcodeimage
deffill(bucket,amount):
now=datetime.now()
ifnow-bucket.reset_time>bucket.period_delta:
bucket.quota=0
bucket.reset_time=now
bucket.quota+=amount
Eachtimeaquotaconsumerwantstodosomething,itfirstmustensurethatitcandeduct
theamountofquotaitneedstouse.
Clickheretoviewcodeimage
defdeduct(bucket,amount):
now=datetime.now()
ifnow-bucket.reset_time>bucket.period_delta:
returnFalse
ifbucket.quota-amount<0:
returnFalse
bucket.quota-=amount
returnTrue
Tousethisclass,firstIfillthebucket.
bucket=Bucket(60)
fill(bucket,100)
print(bucket)
>>>
Bucket(quota=100)
Then,IdeductthequotathatIneed.
Clickheretoviewcodeimage
ifdeduct(bucket,99):
print(‘Had99quota’)
else:
print(‘Notenoughfor99quota’)
print(bucket)
>>>
Had99quota
Bucket(quota=1)
Eventually,I’mpreventedfrommakingprogressbecauseItrytodeductmorequotathan
isavailable.Inthiscase,thebucket’squotalevelremainsunchanged.
Clickheretoviewcodeimage
ifdeduct(bucket,3):
print(‘Had3quota’)
else:
print(‘Notenoughfor3quota’)
print(bucket)
>>>
Notenoughfor3quota
Bucket(quota=1)
TheproblemwiththisimplementationisthatIneverknowwhatquotalevelthebucket
startedwith.Thequotaisdeductedoverthecourseoftheperioduntilitreacheszero.At
thatpoint,deductwillalwaysreturnFalse.Whenthathappens,itwouldbeusefulto
knowwhethercallerstodeductarebeingblockedbecausetheBucketranoutofquota
orbecausetheBucketneverhadquotainthefirstplace.
Tofixthis,Icanchangetheclasstokeeptrackofthemax_quotaissuedintheperiod
andthequota_consumedintheperiod.
Clickheretoviewcodeimage
classBucket(object):
def__init__(self,period):
self.period_delta=timedelta(seconds=period)
self.reset_time=datetime.now()
self.max_quota=0
self.quota_consumed=0
def__repr__(self):
return(‘Bucket(max_quota=%d,quota_consumed=%d)’%
(self.max_quota,self.quota_consumed))
Iusea@propertymethodtocomputethecurrentlevelofquotaon-the-flyusingthese
newattributes.
Clickheretoviewcodeimage
@property
defquota(self):
returnself.max_quota-self.quota_consumed
Whenthequotaattributeisassigned,Itakespecialactionmatchingthecurrentinterface
oftheclassusedbyfillanddeduct.
Clickheretoviewcodeimage
@quota.setter
defquota(self,amount):
delta=self.max_quota-amount
ifamount==0:
#Quotabeingresetforanewperiod
self.quota_consumed=0
self.max_quota=0
elifdelta<0:
#Quotabeingfilledforthenewperiod
assertself.quota_consumed==0
self.max_quota=amount
else:
#Quotabeingconsumedduringtheperiod
assertself.max_quota>=self.quota_consumed
self.quota_consumed+=delta
Rerunningthedemocodefromaboveproducesthesameresults.
Clickheretoviewcodeimage
bucket=Bucket(60)
print(‘Initial’,bucket)
fill(bucket,100)
print(‘Filled’,bucket)
ifdeduct(bucket,99):
print(‘Had99quota’)
else:
print(‘Notenoughfor99quota’)
print(‘Now’,bucket)
ifdeduct(bucket,3):
print(‘Had3quota’)
else:
print(‘Notenoughfor3quota’)
print(‘Still’,bucket)
>>>
InitialBucket(max_quota=0,quota_consumed=0)
FilledBucket(max_quota=100,quota_consumed=0)
Had99quota
NowBucket(max_quota=100,quota_consumed=99)
Notenoughfor3quota
StillBucket(max_quota=100,quota_consumed=99)
ThebestpartisthatthecodeusingBucket.quotadoesn’thavetochangeorknowthat
theclasshaschanged.NewusageofBucketcandotherightthingandaccess
max_quotaandquota_consumeddirectly.
Iespeciallylike@propertybecauseitletsyoumakeincrementalprogresstowarda
betterdatamodelovertime.ReadingtheBucketexampleabove,youmayhavethought
toyourself,“fillanddeductshouldhavebeenimplementedasinstancemethodsin
thefirstplace.”Althoughyou’reprobablyright(seeItem22:“PreferHelperClassesOver
BookkeepingwithDictionariesandTuples”),inpracticetherearemanysituationsin
whichobjectsstartwithpoorlydefinedinterfacesoractasdumbdatacontainers.This
happenswhencodegrowsovertime,scopeincreases,multipleauthorscontributewithout
anyoneconsideringlong-termhygiene,etc.
@propertyisatooltohelpyouaddressproblemsyou’llcomeacrossinreal-world
code.Don’toveruseit.Whenyoufindyourselfrepeatedlyextending@property
methods,it’sprobablytimetorefactoryourclassinsteadoffurtherpavingoveryour
code’spoordesign.
ThingstoRemember
Use@propertytogiveexistinginstanceattributesnewfunctionality.
Makeincrementalprogresstowardbetterdatamodelsbyusing@property.
Considerrefactoringaclassandallcallsiteswhenyoufindyourselfusing
@propertytooheavily.
Item31:UseDescriptorsforReusable@propertyMethods
Thebigproblemwiththe@propertybuilt-in(seeItem29:“UsePlainAttributes
InsteadofGetandSetMethods”andItem30:“Consider@propertyInsteadof
RefactoringAttributes”)isreuse.Themethodsitdecoratescan’tbereusedformultiple
attributesofthesameclass.Theyalsocan’tbereusedbyunrelatedclasses.
Forexample,sayyouwantaclasstovalidatethatthegradereceivedbyastudentona
homeworkassignmentisapercentage.
Clickheretoviewcodeimage
classHomework(object):
def__init__(self):
self._grade=0
@property
defgrade(self):
returnself._grade
@grade.setter
defgrade(self,value):
ifnot(0<=value<=100):
raiseValueError(‘Grademustbebetween0and100’)
self._grade=value
Usingan@propertymakesthisclasseasytouse.
galileo=Homework()
galileo.grade=95
Sayyoualsowanttogivethestudentagradeforanexam,wheretheexamhasmultiple
subjects,eachwithaseparategrade.
Clickheretoviewcodeimage
classExam(object):
def__init__(self):
self._writing_grade=0
self._math_grade=0
@staticmethod
def_check_grade(value):
ifnot(0<=value<=100):
raiseValueError(‘Grademustbebetween0and100’)
Thisquicklygetstedious.Eachsectionoftheexamrequiresaddinganew@property
andrelatedvalidation.
Clickheretoviewcodeimage
@property
defwriting_grade(self):
returnself._writing_grade
@writing_grade.setter
defwriting_grade(self,value):
self._check_grade(value)
self._writing_grade=value
@property
defmath_grade(self):
returnself._math_grade
@math_grade.setter
defmath_grade(self,value):
self._check_grade(value)
self._math_grade=value
Also,thisapproachisnotgeneral.Ifyouwanttoreusethispercentagevalidationbeyond
homeworkandexams,you’dneedtowritethe@propertyboilerplateand
_check_graderepeatedly.
ThebetterwaytodothisinPythonistouseadescriptor.Thedescriptorprotocoldefines
howattributeaccessisinterpretedbythelanguage.Adescriptorclasscanprovide
__get__and__set__methodsthatletyoureusethegradevalidationbehaviorwithout
anyboilerplate.Forthispurpose,descriptorsarealsobetterthanmix-ins(seeItem26:
“UseMultipleInheritanceOnlyforMix-inUtilityClasses”)becausetheyletyoureusethe
samelogicformanydifferentattributesinasingleclass.
Here,IdefineanewclasscalledExamwithclassattributesthatareGradeinstances.The
Gradeclassimplementsthedescriptorprotocol.BeforeIexplainhowtheGradeclass
works,it’simportanttounderstandwhatPythonwilldowhenyourcodeaccessessuch
descriptorattributesonanExaminstance.
Clickheretoviewcodeimage
classGrade(object):
def__get__(*args,**kwargs):
#…
def__set__(*args,**kwargs):
#…
classExam(object):
#Classattributes
math_grade=Grade()
writing_grade=Grade()
science_grade=Grade()
Whenyouassignaproperty:
exam=Exam()
exam.writing_grade=40
itwillbeinterpretedas:
Clickheretoviewcodeimage
Exam.__dict__[‘writing_grade’].__set__(exam,40)
Whenyouretrieveaproperty:
print(exam.writing_grade)
itwillbeinterpretedas:
Clickheretoviewcodeimage
print(Exam.__dict__[‘writing_grade’].__get__(exam,Exam))
Whatdrivesthisbehavioristhe__getattribute__methodofobject(seeItem32:
“Use__getattr__,__getattribute__,and__setattr__forLazy
Attributes”).Inshort,whenanExaminstancedoesn’thaveanattributenamed
writing_grade,PythonwillfallbacktotheExamclass’sattributeinstead.Ifthis
classattributeisanobjectthathas__get__and__set__methods,Pythonwillassume
youwanttofollowthedescriptorprotocol.
KnowingthisbehaviorandhowIused@propertyforgradevalidationinthe
Homeworkclass,here’sareasonablefirstattemptatimplementingtheGradedescriptor.
Clickheretoviewcodeimage
classGrade(object):
def__init__(self):
self._value=0
def__get__(self,instance,instance_type):
returnself._value
def__set__(self,instance,value):
ifnot(0<=value<=100):
raiseValueError(‘Grademustbebetween0and100’)
self._value=value
Unfortunately,thisiswrongandwillresultinbrokenbehavior.Accessingmultiple
attributesonasingleExaminstanceworksasexpected.
Clickheretoviewcodeimage
first_exam=Exam()
first_exam.writing_grade=82
first_exam.science_grade=99
print(‘Writing’,first_exam.writing_grade)
print(‘Science’,first_exam.science_grade)
>>>
Writing82
Science99
ButaccessingtheseattributesonmultipleExaminstanceswillhaveunexpectedbehavior.
Clickheretoviewcodeimage
second_exam=Exam()
second_exam.writing_grade=75
print(‘Second’,second_exam.writing_grade,‘isright’)
print(‘First‘,first_exam.writing_grade,‘iswrong’)
>>>
Second75isright
First75iswrong
TheproblemisthatasingleGradeinstanceissharedacrossallExaminstancesforthe
classattributewriting_grade.TheGradeinstanceforthisattributeisconstructed
onceintheprogramlifetimewhentheExamclassisfirstdefined,noteachtimeanExam
instanceiscreated.
Tosolvethis,IneedtheGradeclasstokeeptrackofitsvalueforeachuniqueExam
instance.Icandothisbysavingtheper-instancestateinadictionary.
Clickheretoviewcodeimage
classGrade(object):
def__init__(self):
self._values={}
def__get__(self,instance,instance_type):
ifinstanceisNone:returnself
returnself._values.get(instance,0)
def__set__(self,instance,value):
ifnot(0<=value<=100):
raiseValueError(‘Grademustbebetween0and100’)
self._values[instance]=value
Thisimplementationissimpleandworkswell,butthere’sstillonegotcha:Itleaks
memory.The_valuesdictionarywillholdareferencetoeveryinstanceofExamever
passedto__set__overthelifetimeoftheprogram.Thiscausesinstancestoneverhave
theirreferencecountgotozero,preventingcleanupbythegarbagecollector.
Tofixthis,IcanusePython’sweakrefbuilt-inmodule.Thismoduleprovidesaspecial
classcalledWeakKeyDictionarythatcantaketheplaceofthesimpledictionaryused
for_values.TheuniquebehaviorofWeakKeyDictionaryisthatitwillremove
Examinstancesfromitssetofkeyswhentheruntimeknowsit’sholdingtheinstance’s
lastremainingreferenceintheprogram.Pythonwilldothebookkeepingforyouand
ensurethatthe_valuesdictionarywillbeemptywhenallExaminstancesarenolonger
inuse.
Clickheretoviewcodeimage
classGrade(object):
def__init__(self):
self._values=WeakKeyDictionary()
#…
UsingthisimplementationoftheGradedescriptor,everythingworksasexpected.
Clickheretoviewcodeimage
classExam(object):
math_grade=Grade()
writing_grade=Grade()
science_grade=Grade()
first_exam=Exam()
first_exam.writing_grade=82
second_exam=Exam()
second_exam.writing_grade=75
print(‘First‘,first_exam.writing_grade,‘isright’)
print(‘Second’,second_exam.writing_grade,‘isright’)
>>>
First82isright
Second75isright
ThingstoRemember
Reusethebehaviorandvalidationof@propertymethodsbydefiningyourown
descriptorclasses.
UseWeakKeyDictionarytoensurethatyourdescriptorclassesdon’tcause
memoryleaks.
Don’tgetboggeddowntryingtounderstandexactlyhow__getattribute__
usesthedescriptorprotocolforgettingandsettingattributes.
Item32:Use__getattr__,__getattribute__,and
__setattr__forLazyAttributes
Python’slanguagehooksmakeiteasytowritegenericcodeforgluingsystemstogether.
Forexample,sayyouwanttorepresenttherowsofyourdatabaseasPythonobjects.Your
databasehasitsschemaset.Yourcodethatusesobjectscorrespondingtothoserowsmust
alsoknowwhatyourdatabaselookslike.However,inPython,thecodethatconnectsyour
Pythonobjectstothedatabasedoesn’tneedtoknowtheschemaofyourrows;itcanbe
generic.
Howisthatpossible?Plaininstanceattributes,@propertymethods,anddescriptors
can’tdothisbecausetheyallneedtobedefinedinadvance.Pythonmakesthisdynamic
behaviorpossiblewiththe__getattr__specialmethod.Ifyourclassdefines
__getattr__,thatmethodiscalledeverytimeanattributecan’tbefoundinanobject’s
instancedictionary.
Clickheretoviewcodeimage
classLazyDB(object):
def__init__(self):
self.exists=5
def__getattr__(self,name):
value=‘Valuefor%s’%name
setattr(self,name,value)
returnvalue
Here,Iaccessthemissingpropertyfoo.ThiscausesPythontocallthe__getattr__
methodabove,whichmutatestheinstancedictionary__dict__:
Clickheretoviewcodeimage
data=LazyDB()
print(‘Before:’,data.__dict__)
print(‘foo:’,data.foo)
print(‘After:‘,data.__dict__)
>>>
Before:{‘exists’:5}
foo:Valueforfoo
After:{‘exists’:5,‘foo’:‘Valueforfoo’}
Here,IaddloggingtoLazyDBtoshowwhen__getattr__isactuallycalled.Note
thatIusesuper().__getattr__()togettherealpropertyvalueinordertoavoid
infiniterecursion.
Clickheretoviewcodeimage
classLoggingLazyDB(LazyDB):
def__getattr__(self,name):
print(‘Called__getattr__(%s)’%name)
returnsuper().__getattr__(name)
data=LoggingLazyDB()
print(‘exists:’,data.exists)
print(‘foo:’,data.foo)
print(‘foo:’,data.foo)
>>>
exists:5
Called__getattr__(foo)
foo:Valueforfoo
foo:Valueforfoo
Theexistsattributeispresentintheinstancedictionary,so__getattr__isnever
calledforit.Thefooattributeisnotintheinstancedictionaryinitially,so
__getattr__iscalledthefirsttime.Butthecallto__getattr__forfooalsodoes
asetattr,whichpopulatesfoointheinstancedictionary.Thisiswhythesecondtime
Iaccessfoothereisn’tacallto__getattr__.
Thisbehaviorisespeciallyhelpfulforusecaseslikelazilyaccessingschemalessdata.
__getattr__runsoncetodothehardworkofloadingaproperty;allsubsequent
accessesretrievetheexistingresult.
Sayyoualsowanttransactionsinthisdatabasesystem.Thenexttimetheuseraccessesa
property,youwanttoknowwhetherthecorrespondingrowinthedatabaseisstillvalid
andwhetherthetransactionisstillopen.The__getattr__hookwon’tletyoudothis
reliablybecauseitwillusetheobject’sinstancedictionaryasthefastpathforexisting
attributes.
Toenablethisusecase,Pythonhasanotherlanguagehookcalled__getattribute__.
Thisspecialmethodiscalledeverytimeanattributeisaccessedonanobject,evenin
caseswhereitdoesexistintheattributedictionary.Thisenablesyoutodothingslike
checkglobaltransactionstateoneverypropertyaccess.Here,IdefineValidatingDB
tologeachtime__getattribute__iscalled:
Clickheretoviewcodeimage
classValidatingDB(object):
def__init__(self):
self.exists=5
def__getattribute__(self,name):
print(‘Called__getattribute__(%s)’%name)
try:
returnsuper().__getattribute__(name)
exceptAttributeError:
value=‘Valuefor%s’%name
setattr(self,name,value)
returnvalue
data=ValidatingDB()
print(‘exists:’,data.exists)
print(‘foo:’,data.foo)
print(‘foo:’,data.foo)
>>>
Called__getattribute__(exists)
exists:5
Called__getattribute__(foo)
foo:Valueforfoo
Called__getattribute__(foo)
foo:Valueforfoo
Intheeventthatadynamicallyaccessedpropertyshouldn’texist,youcanraisean
AttributeErrortocausePython’sstandardmissingpropertybehaviorforboth
__getattr__and__getattribute__.
Clickheretoviewcodeimage
classMissingPropertyDB(object):
def__getattr__(self,name):
ifname==‘bad_name’:
raiseAttributeError(‘%sismissing’%name)
#…
data=MissingPropertyDB()
data.bad_name
>>>
AttributeError:bad_nameismissing
Pythoncodeimplementinggenericfunctionalityoftenreliesonthehasattrbuilt-in
functiontodeterminewhenpropertiesexist,andthegetattrbuilt-infunctionto
retrievepropertyvalues.Thesefunctionsalsolookintheinstancedictionaryforan
attributenamebeforecalling__getattr__.
Clickheretoviewcodeimage
data=LoggingLazyDB()
print(‘Before:’,data.__dict__)
print(‘fooexists:‘,hasattr(data,‘foo’))
print(‘After:’,data.__dict__)
print(‘fooexists:‘,hasattr(data,‘foo’))
>>>
Before:{‘exists’:5}
Called__getattr__(foo)
fooexists:True
After:{‘exists’:5,‘foo’:‘Valueforfoo’}
fooexists:True
Intheexampleabove,__getattr__isonlycalledonce.Incontrast,classesthat
implement__getattribute__willhavethatmethodcalledeachtimehasattror
getattrisrunonanobject.
Clickheretoviewcodeimage
data=ValidatingDB()
print(‘fooexists:‘,hasattr(data,‘foo’))
print(‘fooexists:‘,hasattr(data,‘foo’))
>>>
Called__getattribute__(foo)
fooexists:True
Called__getattribute__(foo)
fooexists:True
Now,sayyouwanttolazilypushdatabacktothedatabasewhenvaluesareassignedto
yourPythonobject.Youcandothiswith__setattr__,asimilarlanguagehookthat
letsyouinterceptarbitraryattributeassignments.Unlikeretrievinganattributewith
__getattr__and__getattribute__,there’snoneedfortwoseparatemethods.
The__setattr__methodisalwayscalledeverytimeanattributeisassignedonan
instance(eitherdirectlyorthroughthesetattrbuilt-infunction).
Clickheretoviewcodeimage
classSavingDB(object):
def__setattr__(self,name,value):
#SavesomedatatotheDBlog
#…
super().__setattr__(name,value)
Here,IdefinealoggingsubclassofSavingDB.Its__setattr__methodisalways
calledoneachattributeassignment:
Clickheretoviewcodeimage
classLoggingSavingDB(SavingDB):
def__setattr__(self,name,value):
print(‘Called__setattr__(%s,%r)’%(name,value))
super().__setattr__(name,value)
data=LoggingSavingDB()
print(‘Before:‘,data.__dict__)
data.foo=5
print(‘After:’,data.__dict__)
data.foo=7
print(‘Finally:’,data.__dict__)
>>>
Before:{}
Called__setattr__(foo,5)
After:{‘foo’:5}
Called__setattr__(foo,7)
Finally:{‘foo’:7}
Theproblemwith__getattribute__and__setattr__isthatthey’recalledon
everyattributeaccessforanobject,evenwhenyoumaynotwantthattohappen.For
example,sayyouwantattributeaccessesonyourobjecttoactuallylookupkeysinan
associateddictionary.
Clickheretoviewcodeimage
classBrokenDictionaryDB(object):
def__init__(self,data):
self._data={}
def__getattribute__(self,name):
print(‘Called__getattribute__(%s)’%name)
returnself._data[name]
Thisrequiresaccessingself._datafromthe__getattribute__method.
However,ifyouactuallytrytodothat,Pythonwillrecurseuntilitreachesitsstacklimit,
andthenit’lldie.
Clickheretoviewcodeimage
data=BrokenDictionaryDB({‘foo’:3})
data.foo
>>>
Called__getattribute__(foo)
Called__getattribute__(_data)
Called__getattribute__(_data)
…
Traceback…
RuntimeError:maximumrecursiondepthexceeded
Theproblemisthat__getattribute__accessesself._data,whichcauses
__getattribute__torunagain,whichaccessesself._dataagain,andsoon.The
solutionistousethesuper().__getattribute__methodonyourinstancetofetch
valuesfromtheinstanceattributedictionary.Thisavoidstherecursion.
Clickheretoviewcodeimage
classDictionaryDB(object):
def__init__(self,data):
self._data=data
def__getattribute__(self,name):
data_dict=super().__getattribute__(‘_data’)
returndata_dict[name]
Similarly,you’llneed__setattr__methodsthatmodifyattributesonanobjecttouse
super().__setattr__.
ThingstoRemember
Use__getattr__and__setattr__tolazilyloadandsaveattributesforan
object.
Understandthat__getattr__onlygetscalledoncewhenaccessingamissing
attribute,whereas__getattribute__getscalledeverytimeanattributeis
accessed.
Avoidinfiniterecursionin__getattribute__and__setattr__byusing
methodsfromsuper()(i.e.,theobjectclass)toaccessinstanceattributes
directly.
Item33:ValidateSubclasseswithMetaclasses
Oneofthesimplestapplicationsofmetaclassesisverifyingthataclasswasdefined
correctly.Whenyou’rebuildingacomplexclasshierarchy,youmaywanttoenforcestyle,
requireoverridingmethods,orhavestrictrelationshipsbetweenclassattributes.
Metaclassesenabletheseusecasesbyprovidingareliablewaytorunyourvalidationcode
eachtimeanewsubclassisdefined.
Oftenaclass’svalidationcoderunsinthe__init__method,whenanobjectofthe
class’stypeisconstructed(seeItem28:“Inheritfromcollections.abcforCustom
ContainerTypes”foranexample).Usingmetaclassesforvalidationcanraiseerrorsmuch
earlier.
BeforeIgetintohowtodefineametaclassforvalidatingsubclasses,it’simportantto
understandthemetaclassactionforstandardobjects.Ametaclassisdefinedbyinheriting
fromtype.Inthedefaultcase,ametaclassreceivesthecontentsofassociatedclass
statementsinits__new__method.Here,youcanmodifytheclassinformationbeforethe
typeisactuallyconstructed:
Clickheretoviewcodeimage
classMeta(type):
def__new__(meta,name,bases,class_dict):
print((meta,name,bases,class_dict))
returntype.__new__(meta,name,bases,class_dict)
classMyClass(object,metaclass=Meta):
stuff=123
deffoo(self):
pass
Themetaclasshasaccesstothenameoftheclass,theparentclassesitinheritsfrom,and
alloftheclassattributesthatweredefinedintheclass’sbody.
Clickheretoviewcodeimage
>>>
(<class‘__main__.Meta’>,
‘MyClass’,
(<class‘object’>,),
{‘__module__’:‘__main__’,
‘__qualname__’:‘MyClass’,
‘foo’:<functionMyClass.fooat0x102c7dd08>,
‘stuff’:123})
Python2hasslightlydifferentsyntaxandspecifiesametaclassusingthe
__metaclass__classattribute.TheMeta.__new__interfaceisthesame.
Clickheretoviewcodeimage
#Python2
classMeta(type):
def__new__(meta,name,bases,class_dict):
#…
classMyClassInPython2(object):
__metaclass__=Meta
#…
YoucanaddfunctionalitytotheMeta.__new__methodinordertovalidateallofthe
parametersofaclassbeforeit’sdefined.Forexample,sayyouwanttorepresentanytype
ofmultisidedpolygon.Youcandothisbydefiningaspecialvalidatingmetaclassand
usingitinthebaseclassofyourpolygonclasshierarchy.Notethatit’simportantnotto
applythesamevalidationtothebaseclass.
Clickheretoviewcodeimage
classValidatePolygon(type):
def__new__(meta,name,bases,class_dict):
#Don’tvalidatetheabstractPolygonclass
ifbases!=(object,):
ifclass_dict[‘sides’]<3:
raiseValueError(‘Polygonsneed3+sides’)
returntype.__new__(meta,name,bases,class_dict)
classPolygon(object,metaclass=ValidatePolygon):
sides=None#Specifiedbysubclasses
@classmethod
definterior_angles(cls):
return(cls.sides-2)*180
classTriangle(Polygon):
sides=3
Ifyoutrytodefineapolygonwithfewerthanthreesides,thevalidationwillcausethe
classstatementtofailimmediatelyaftertheclassstatementbody.Thismeansyour
programwillnotevenbeabletostartrunningwhenyoudefinesuchaclass.
Clickheretoviewcodeimage
print(‘Beforeclass’)
classLine(Polygon):
print(‘Beforesides’)
sides=1
print(‘Aftersides’)
print(‘Afterclass’)
>>>
Beforeclass
Beforesides
Aftersides
Traceback…
ValueError:Polygonsneed3+sides
ThingstoRemember
Usemetaclassestoensurethatsubclassesarewellformedatthetimetheyare
defined,beforeobjectsoftheirtypeareconstructed.
MetaclasseshaveslightlydifferentsyntaxinPython2vs.Python3.
The__new__methodofmetaclassesisrunaftertheclassstatement’sentire
bodyhasbeenprocessed.
Item34:RegisterClassExistencewithMetaclasses
Anothercommonuseofmetaclassesistoautomaticallyregistertypesinyourprogram.
Registrationisusefulfordoingreverselookups,whereyouneedtomapasimpleidentifier
backtoacorrespondingclass.
Forexample,sayyouwanttoimplementyourownserializedrepresentationofaPython
objectusingJSON.YouneedawaytotakeanobjectandturnitintoaJSONstring.Here,
Idothisgenericallybydefiningabaseclassthatrecordstheconstructorparametersand
turnsthemintoaJSONdictionary:
Clickheretoviewcodeimage
classSerializable(object):
def__init__(self,*args):
self.args=args
defserialize(self):
returnjson.dumps({‘args’:self.args})
Thisclassmakesiteasytoserializesimple,immutabledatastructureslikePoint2Dtoa
string.
Clickheretoviewcodeimage
classPoint2D(Serializable):
def__init__(self,x,y):
super().__init__(x,y)
self.x=x
self.y=y
def__repr__(self):
return‘Point2D(%d,%d)’%(self.x,self.y)
point=Point2D(5,3)
print(‘Object:’,point)
print(‘Serialized:’,point.serialize())
>>>
Object:Point2D(5,3)
Serialized:{“args”:[5,3]}
Now,IneedtodeserializethisJSONstringandconstructthePoint2Dobjectit
represents.Here,Idefineanotherclassthatcandeserializethedatafromits
Serializableparentclass:
Clickheretoviewcodeimage
classDeserializable(Serializable):
@classmethod
defdeserialize(cls,json_data):
params=json.loads(json_data)
returncls(*params[‘args’])
UsingDeserializablemakesiteasytoserializeanddeserializesimple,immutable
objectsinagenericway.
Clickheretoviewcodeimage
classBetterPoint2D(Deserializable):
#…
point=BetterPoint2D(5,3)
print(‘Before:’,point)
data=point.serialize()
print(‘Serialized:’,data)
after=BetterPoint2D.deserialize(data)
print(‘After:’,after)
>>>
Before:BetterPoint2D(5,3)
Serialized:{“args”:[5,3]}
After:BetterPoint2D(5,3)
Theproblemwiththisapproachisthatitonlyworksifyouknowtheintendedtypeofthe
serializeddataaheadoftime(e.g.,Point2D,BetterPoint2D).Ideally,you’dhavea
largenumberofclassesserializingtoJSONandonecommonfunctionthatcould
deserializeanyofthembacktoacorrespondingPythonobject.
Todothis,Icanincludetheserializedobject’sclassnameintheJSONdata.
Clickheretoviewcodeimage
classBetterSerializable(object):
def__init__(self,*args):
self.args=args
defserialize(self):
returnjson.dumps({
‘class’:self.__class__.__name__,
‘args’:self.args,
})
def__repr__(self):
#…
Then,Icanmaintainamappingofclassnamesbacktoconstructorsforthoseobjects.The
generaldeserializefunctionwillworkforanyclassespassedto
register_class.
Clickheretoviewcodeimage
registry={}
defregister_class(target_class):
registry[target_class.__name__]=target_class
defdeserialize(data):
params=json.loads(data)
name=params[‘class’]
target_class=registry[name]
returntarget_class(*params[‘args’])
Toensurethatdeserializealwaysworksproperly,Imustcallregister_class
foreveryclassImaywanttodeserializeinthefuture.
Clickheretoviewcodeimage
classEvenBetterPoint2D(BetterSerializable):
def__init__(self,x,y):
super().__init__(x,y)
self.x=x
self.y=y
register_class(EvenBetterPoint2D)
Now,IcandeserializeanarbitraryJSONstringwithouthavingtoknowwhichclassit
contains.
Clickheretoviewcodeimage
point=EvenBetterPoint2D(5,3)
print(‘Before:’,point)
data=point.serialize()
print(‘Serialized:’,data)
after=deserialize(data)
print(‘After:’,after)
>>>
Before:EvenBetterPoint2D(5,3)
Serialized:{“class”:“EvenBetterPoint2D”,“args”:[5,3]}
After:EvenBetterPoint2D(5,3)
Theproblemwiththisapproachisthatyoucanforgettocallregister_class.
Clickheretoviewcodeimage
classPoint3D(BetterSerializable):
def__init__(self,x,y,z):
super().__init__(x,y,z)
self.x=x
self.y=y
self.z=z
#Forgottocallregister_class!Whoops!
Thiswillcauseyourcodetobreakatruntime,whenyoufinallytrytodeserializeanobject
ofaclassyouforgottoregister.
point=Point3D(5,9,-4)
data=point.serialize()
deserialize(data)
>>>
KeyError:‘Point3D’
EventhoughyouchosetosubclassBetterSerializable,youwon’tactuallygetall
ofitsfeaturesifyouforgettocallregister_classafteryourclassstatementbody.
Thisapproachiserrorproneandespeciallychallengingforbeginners.Thesameomission
canhappenwithclassdecoratorsinPython3.
Whatifyoucouldsomehowactontheprogrammer’sintenttouse
BetterSerializableandensurethatregister_classiscalledinallcases?
Metaclassesenablethisbyinterceptingtheclassstatementwhensubclassesaredefined
(seeItem33:“ValidateSubclasseswithMetaclasses”).Thisletsyouregisterthenewtype
immediatelyaftertheclass’sbody.
Clickheretoviewcodeimage
classMeta(type):
def__new__(meta,name,bases,class_dict):
cls=type.__new__(meta,name,bases,class_dict)
register_class(cls)
returncls
classRegisteredSerializable(BetterSerializable,
metaclass=Meta):
pass
WhenIdefineasubclassofRegisteredSerializable,Icanbeconfidentthatthe
calltoregister_classhappenedanddeserializewillalwaysworkasexpected.
Clickheretoviewcodeimage
classVector3D(RegisteredSerializable):
def__init__(self,x,y,z):
super().__init__(x,y,z)
self.x,self.y,self.z=x,y,z
v3=Vector3D(10,-7,3)
print(‘Before:’,v3)
data=v3.serialize()
print(‘Serialized:’,data)
print(‘After:’,deserialize(data))
>>>
Before:Vector3D(10,-7,3)
Serialized:{“class”:“Vector3D”,“args”:[10,-7,3]}
After:Vector3D(10,-7,3)
Usingmetaclassesforclassregistrationensuresthatyou’llnevermissaclassaslongas
theinheritancetreeisright.Thisworkswellforserialization,asI’veshown,andalso
appliestodatabaseobject-relationshipmappings(ORMs),plug-insystems,andsystem
hooks.
ThingstoRemember
ClassregistrationisahelpfulpatternforbuildingmodularPythonprograms.
Metaclassesletyourunregistrationcodeautomaticallyeachtimeyourbaseclassis
subclassedinaprogram.
Usingmetaclassesforclassregistrationavoidserrorsbyensuringthatyounever
missaregistrationcall.
Item35:AnnotateClassAttributeswithMetaclasses
Onemoreusefulfeatureenabledbymetaclassesistheabilitytomodifyorannotate
propertiesafteraclassisdefinedbutbeforetheclassisactuallyused.Thisapproachis
commonlyusedwithdescriptors(seeItem31:“UseDescriptorsforReusable
@propertyMethods”)togivethemmoreintrospectionintohowthey’rebeingused
withintheircontainingclass.
Forexample,sayyouwanttodefineanewclassthatrepresentsarowinyourcustomer
database.You’dlikeacorrespondingpropertyontheclassforeachcolumninthedatabase
table.Todothis,hereIdefineadescriptorclasstoconnectattributestocolumnnames.
Clickheretoviewcodeimage
classField(object):
def__init__(self,name):
self.name=name
self.internal_name=‘_’+self.name
def__get__(self,instance,instance_type):
ifinstanceisNone:returnself
returngetattr(instance,self.internal_name,”)
def__set__(self,instance,value):
setattr(instance,self.internal_name,value)
WiththecolumnnamestoredintheFielddescriptor,Icansavealloftheper-instance
statedirectlyintheinstancedictionaryasprotectedfieldsusingthesetattrand
getattrbuilt-infunctions.Atfirst,thisseemstobemuchmoreconvenientthan
buildingdescriptorswithweakreftoavoidmemoryleaks.
Definingtheclassrepresentingarowrequiressupplyingthecolumnnameforeachclass
attribute.
Clickheretoviewcodeimage
classCustomer(object):
#Classattributes
first_name=Field(‘first_name’)
last_name=Field(‘last_name’)
prefix=Field(‘prefix’)
suffix=Field(‘suffix’)
Usingtheclassissimple.Here,youcanseehowtheFielddescriptorsmodifythe
instancedictionary__dict__asexpected:
Clickheretoviewcodeimage
foo=Customer()
print(‘Before:’,repr(foo.first_name),foo.__dict__)
foo.first_name=‘Euclid’
print(‘After:‘,repr(foo.first_name),foo.__dict__)
>>>
Before:”{}
After:‘Euclid’{‘_first_name’:‘Euclid’}
Butitseemsredundant.IalreadydeclaredthenameofthefieldwhenIassignedthe
constructedFieldobjecttoCustomer.first_nameintheclassstatementbody.
WhydoIalsohavetopassthefieldname('first_name'inthiscase)totheField
constructor?
TheproblemisthattheorderofoperationsintheCustomerclassdefinitionisthe
oppositeofhowitreadsfromlefttoright.First,theFieldconstructoriscalledas
Field('first_name').Then,thereturnvalueofthatisassignedto
Customer.field_name.There’snowayfortheFieldtoknowupfrontwhichclass
attributeitwillbeassignedto.
Toeliminatetheredundancy,Icanuseametaclass.Metaclassesletyouhooktheclass
statementdirectlyandtakeactionassoonasaclassbodyisfinished.Inthiscase,Ican
usethemetaclasstoassignField.nameandField.internal_nameonthe
descriptorautomaticallyinsteadofmanuallyspecifyingthefieldnamemultipletimes.
Clickheretoviewcodeimage
classMeta(type):
def__new__(meta,name,bases,class_dict):
forkey,valueinclass_dict.items():
ifisinstance(value,Field):
value.name=key
value.internal_name=‘_’+key
cls=type.__new__(meta,name,bases,class_dict)
returncls
Here,Idefineabaseclassthatusesthemetaclass.Allclassesrepresentingdatabaserows
shouldinheritfromthisclasstoensurethattheyusethemetaclass:
Clickheretoviewcodeimage
classDatabaseRow(object,metaclass=Meta):
pass
Toworkwiththemetaclass,thefielddescriptorislargelyunchanged.Theonlydifference
isthatitnolongerrequiresanyargumentstobepassedtoitsconstructor.Instead,its
attributesaresetbytheMeta.__new__methodabove.
Clickheretoviewcodeimage
classField(object):
def__init__(self):
#Thesewillbeassignedbythemetaclass.
self.name=None
self.internal_name=None
#…
Byusingthemetaclass,thenewDatabaseRowbaseclass,andthenewField
descriptor,theclassdefinitionforadatabaserownolongerhastheredundancyfrom
before.
Clickheretoviewcodeimage
classBetterCustomer(DatabaseRow):
first_name=Field()
last_name=Field()
prefix=Field()
suffix=Field()
Thebehaviorofthenewclassisidenticaltotheoldone.
Clickheretoviewcodeimage
foo=BetterCustomer()
print(‘Before:’,repr(foo.first_name),foo.__dict__)
foo.first_name=‘Euler’
print(‘After:‘,repr(foo.first_name),foo.__dict__)
>>>
Before:”{}
After:‘Euler’{‘_first_name’:‘Euler’}
ThingstoRemember
Metaclassesenableyoutomodifyaclass’sattributesbeforetheclassisfully
defined.
Descriptorsandmetaclassesmakeapowerfulcombinationfordeclarativebehavior
andruntimeintrospection.
Youcanavoidbothmemoryleaksandtheweakrefmodulebyusingmetaclasses
alongwithdescriptors.
5.ConcurrencyandParallelism
Concurrencyiswhenacomputerdoesmanydifferentthingsseeminglyatthesametime.
Forexample,onacomputerwithoneCPUcore,theoperatingsystemwillrapidlychange
whichprogramisrunningonthesingleprocessor.Thisinterleavesexecutionofthe
programs,providingtheillusionthattheprogramsarerunningsimultaneously.
Parallelismisactuallydoingmanydifferentthingsatthesametime.Computerswith
multipleCPUcorescanexecutemultipleprogramssimultaneously.EachCPUcoreruns
theinstructionsofaseparateprogram,allowingeachprogramtomakeforwardprogress
duringthesameinstant.
Withinasingleprogram,concurrencyisatoolthatmakesiteasierforprogrammersto
solvecertaintypesofproblems.Concurrentprogramsenablemanydistinctpathsof
executiontomakeforwardprogressinawaythatseemstobebothsimultaneousand
independent.
Thekeydifferencebetweenparallelismandconcurrencyisspeedup.Whentwodistinct
pathsofexecutioninaprogrammakeforwardprogressinparallel,thetimeittakestodo
thetotalworkiscutinhalf;thespeedofexecutionisfasterbyafactoroftwo.Incontrast,
concurrentprogramsmayrunthousandsofseparatepathsofexecutionseeminglyin
parallelbutprovidenospeedupforthetotalwork.
Pythonmakesiteasytowriteconcurrentprograms.Pythoncanalsobeusedtodoparallel
workthroughsystemcalls,subprocesses,andC-extensions.Butitcanbeverydifficultto
makeconcurrentPythoncodetrulyruninparallel.It’simportanttounderstandhowtobest
utilizePythoninthesesubtlydifferentsituations.
Item36:UsesubprocesstoManageChildProcesses
Pythonhasbattle-hardenedlibrariesforrunningandmanagingchildprocesses.This
makesPythonagreatlanguageforgluingothertoolstogether,suchascommand-line
utilities.Whenexistingshellscriptsgetcomplicated,astheyoftendoovertime,
graduatingthemtoarewriteinPythonisanaturalchoiceforthesakeofreadabilityand
maintainability.
ChildprocessesstartedbyPythonareabletoruninparallel,enablingyoutousePythonto
consumealloftheCPUcoresofyourmachineandmaximizethethroughputofyour
programs.AlthoughPythonitselfmaybeCPUbound(seeItem37:“UseThreadsfor
BlockingI/O,AvoidforParallelism”),it’seasytousePythontodriveandcoordinate
CPU-intensiveworkloads.
Pythonhashadmanywaystorunsubprocessesovertheyears,includingpopen,
popen2,andos.exec*.WiththePythonoftoday,thebestandsimplestchoicefor
managingchildprocessesistousethesubprocessbuilt-inmodule.
Runningachildprocesswithsubprocessissimple.Here,thePopenconstructorstarts
theprocess.Thecommunicatemethodreadsthechildprocess’soutputandwaitsfor
termination.
Clickheretoviewcodeimage
proc=subprocess.Popen(
[‘echo’,‘Hellofromthechild!’],
stdout=subprocess.PIPE)
out,err=proc.communicate()
print(out.decode(‘utf-8’))
>>>
Hellofromthechild!
Childprocesseswillrunindependentlyfromtheirparentprocess,thePythoninterpreter.
TheirstatuscanbepolledperiodicallywhilePythondoesotherwork.
Clickheretoviewcodeimage
proc=subprocess.Popen([‘sleep’,‘0.3’])
whileproc.poll()isNone:
print(‘Working…’)
#Sometime-consumingworkhere
#…
print(‘Exitstatus’,proc.poll())
>>>
Working…
Working…
Exitstatus0
Decouplingthechildprocessfromtheparentmeansthattheparentprocessisfreetorun
manychildprocessesinparallel.Youcandothisbystartingallthechildprocesses
togetherupfront.
Clickheretoviewcodeimage
defrun_sleep(period):
proc=subprocess.Popen([‘sleep’,str(period)])
returnproc
start=time()
procs=[]
for_inrange(10):
proc=run_sleep(0.1)
procs.append(proc)
Later,youcanwaitforthemtofinishtheirI/Oandterminatewiththecommunicate
method.
Clickheretoviewcodeimage
forprocinprocs:
proc.communicate()
end=time()
print(‘Finishedin%.3fseconds’%(end-start))
>>>
Finishedin0.117seconds
Note
Iftheseprocessesraninsequence,thetotaldelaywouldbe1second,notthe~0.1
secondImeasured.
YoucanalsopipedatafromyourPythonprogramintoasubprocessandretrieveits
output.Thisallowsyoutoutilizeotherprogramstodoworkinparallel.Forexample,say
youwanttousetheopensslcommand-linetooltoencryptsomedata.Startingthechild
processwithcommand-lineargumentsandI/Opipesiseasy.
Clickheretoviewcodeimage
defrun_openssl(data):
env=os.environ.copy()
env[‘password’]=b’\xe24U\n\xd0Ql3S\x11’
proc=subprocess.Popen(
[‘openssl’,‘enc’,‘-des3’,‘-pass’,‘env:password’],
env=env,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE)
proc.stdin.write(data)
proc.stdin.flush()#Ensurethechildgetsinput
returnproc
Here,Ipiperandombytesintotheencryptionfunction,butinpracticethiswouldbeuser
input,afilehandle,anetworksocket,etc.:
procs=[]
for_inrange(3):
data=os.urandom(10)
proc=run_openssl(data)
procs.append(proc)
Thechildprocesseswillruninparallelandconsumetheirinput.Here,Iwaitforthemto
finishandthenretrievetheirfinaloutput:
Clickheretoviewcodeimage
forprocinprocs:
out,err=proc.communicate()
print(out[-10:])
>>>
b’o4,G\x91\x95\xfe\xa0\xaa\xb7’
b’\x0b\x01\\xb1\xb7\xfb\xb2C\xe1b’
b’ds\xc5\xf4;j\x1f\xd0c-‘
YoucanalsocreatechainsofparallelprocessesjustlikeUNIXpipes,connectingthe
outputofonechildprocessintotheinputofanother,andsoon.Here’safunctionthat
startsachildprocessthatwillcausethemd5command-linetooltoconsumeaninput
stream:
Clickheretoviewcodeimage
defrun_md5(input_stdin):
proc=subprocess.Popen(
[‘md5’],
stdin=input_stdin,
stdout=subprocess.PIPE)
returnproc
Note
Python’shashlibbuilt-inmoduleprovidesthemd5function,sorunninga
subprocesslikethisisn’talwaysnecessary.Thegoalhereistodemonstratehow
subprocessescanpipeinputsandoutputs.
Now,Icankickoffasetofopensslprocessestoencryptsomedataandanothersetof
processestomd5hashtheencryptedoutput.
Clickheretoviewcodeimage
input_procs=[]
hash_procs=[]
for_inrange(3):
data=os.urandom(10)
proc=run_openssl(data)
input_procs.append(proc)
hash_proc=run_md5(proc.stdout)
hash_procs.append(hash_proc)
TheI/Obetweenthechildprocesseswillhappenautomaticallyonceyougetthemstarted.
Allyouneedtodoiswaitforthemtofinishandprintthefinaloutput.
Clickheretoviewcodeimage
forprocininput_procs:
proc.communicate()
forprocinhash_procs:
out,err=proc.communicate()
print(out.strip())
>>>
b‘7a1822875dcf9650a5a71e5e41e77bf3’
b’d41d8cd98f00b204e9800998ecf8427e’
b‘1720f581cfdc448b6273048d42621100’
Ifyou’reworriedaboutthechildprocessesneverfinishingorsomehowblockingoninput
oroutputpipes,thenbesuretopassthetimeoutparametertothecommunicate
method.Thiswillcauseanexceptiontoberaisedifthechildprocesshasn’tresponded
withinatimeperiod,givingyouachancetoterminatethemisbehavingchild.
Clickheretoviewcodeimage
proc=run_sleep(10)
try:
proc.communicate(timeout=0.1)
exceptsubprocess.TimeoutExpired:
proc.terminate()
proc.wait()
print(‘Exitstatus’,proc.poll())
>>>
Exitstatus-15
Unfortunately,thetimeoutparameterisonlyavailableinPython3.3andlater.Inearlier
versionsofPython,you’llneedtousetheselectbuilt-inmoduleonproc.stdin,
proc.stdout,andproc.stderrinordertoenforcetimeoutsonI/O.
ThingstoRemember
Usethesubprocessmoduletorunchildprocessesandmanagetheirinputand
outputstreams.
ChildprocessesruninparallelwiththePythoninterpreter,enablingyouto
maximizeyourCPUusage.
Usethetimeoutparameterwithcommunicatetoavoiddeadlocksandhanging
childprocesses.
Item37:UseThreadsforBlockingI/O,Avoidfor
Parallelism
ThestandardimplementationofPythoniscalledCPython.CPythonrunsaPython
programintwosteps.First,itparsesandcompilesthesourcetextintobytecode.Then,it
runsthebytecodeusingastack-basedinterpreter.Thebytecodeinterpreterhasstatethat
mustbemaintainedandcoherentwhilethePythonprogramexecutes.Pythonenforces
coherencewithamechanismcalledtheglobalinterpreterlock(GIL).
Essentially,theGILisamutual-exclusionlock(mutex)thatpreventsCPythonfrombeing
affectedbypreemptivemultithreading,whereonethreadtakescontrolofaprogramby
interruptinganotherthread.Suchaninterruptioncouldcorrupttheinterpreterstateifit
comesatanunexpectedtime.TheGILpreventstheseinterruptionsandensuresthatevery
bytecodeinstructionworkscorrectlywiththeCPythonimplementationanditsCextensionmodules.
TheGILhasanimportantnegativesideeffect.Withprogramswritteninlanguageslike
C++orJava,havingmultiplethreadsofexecutionmeansyourprogramcouldutilize
multipleCPUcoresatthesametime.AlthoughPythonsupportsmultiplethreadsof
execution,theGILcausesonlyoneofthemtomakeforwardprogressatatime.This
meansthatwhenyoureachforthreadstodoparallelcomputationandspeedupyour
Pythonprograms,youwillbesorelydisappointed.
Forexample,sayyouwanttodosomethingcomputationallyintensivewithPython.I’ll
useanaivenumberfactorizationalgorithmasaproxy.
Clickheretoviewcodeimage
deffactorize(number):
foriinrange(1,number+1):
ifnumber%i==0:
yieldi
Factoringasetofnumbersinserialtakesquitealongtime.
Clickheretoviewcodeimage
numbers=[2139079,1214759,1516637,1852285]
start=time()
fornumberinnumbers:
list(factorize(number))
end=time()
print(‘Took%.3fseconds’%(end-start))
>>>
Took1.040seconds
Usingmultiplethreadstodothiscomputationwouldmakesenseinotherlanguages
becauseyoucouldtakeadvantageofalloftheCPUcoresofyourcomputer.Letmetry
thatinPython.Here,IdefineaPythonthreadfordoingthesamecomputationasbefore:
Clickheretoviewcodeimage
fromthreadingimportThread
classFactorizeThread(Thread):
def__init__(self,number):
super().__init__()
self.number=number
defrun(self):
self.factors=list(factorize(self.number))
Then,Istartathreadforfactorizingeachnumberinparallel.
Clickheretoviewcodeimage
start=time()
threads=[]
fornumberinnumbers:
thread=FactorizeThread(number)
thread.start()
threads.append(thread)
Finally,Iwaitforallofthethreadstofinish.
Clickheretoviewcodeimage
forthreadinthreads:
thread.join()
end=time()
print(‘Took%.3fseconds’%(end-start))
>>>
Took1.061seconds
What’ssurprisingisthatthistakesevenlongerthanrunningfactorizeinserial.With
onethreadpernumber,youmayexpectlessthana4×speedupinotherlanguagesdueto
theoverheadofcreatingthreadsandcoordinatingwiththem.Youmayexpectonlya2×
speeduponthedual-coremachineIusedtorunthiscode.Butyouwouldneverexpectthe
performanceofthesethreadstobeworsewhenyouhavemultipleCPUstoutilize.This
demonstratestheeffectoftheGILonprogramsrunninginthestandardCPython
interpreter.
TherearewaystogetCPythontoutilizemultiplecores,butitdoesn’tworkwiththe
standardThreadclass(seeItem41:“Considerconcurrent.futuresforTrue
Parallelism”)anditcanrequiresubstantialeffort.Knowingtheselimitationsyoumay
wonder,whydoesPythonsupportthreadsatall?Therearetwogoodreasons.
First,multiplethreadsmakeiteasyforyourprogramtoseemlikeit’sdoingmultiple
thingsatthesametime.Managingthejugglingactofsimultaneoustasksisdifficultto
implementyourself(seeItem40:“ConsiderCoroutinestoRunManyFunctions
Concurrently”foranexample).Withthreads,youcanleaveittoPythontorunyour
functionsseeminglyinparallel.ThisworksbecauseCPythonensuresaleveloffairness
betweenPythonthreadsofexecution,eventhoughonlyoneofthemmakesforward
progressatatimeduetotheGIL.
ThesecondreasonPythonsupportsthreadsistodealwithblockingI/O,whichhappens
whenPythondoescertaintypesofsystemcalls.SystemcallsarehowyourPython
programasksyourcomputer’soperatingsystemtointeractwiththeexternalenvironment
onyourbehalf.BlockingI/Oincludesthingslikereadingandwritingfiles,interacting
withnetworks,communicatingwithdeviceslikedisplays,etc.Threadshelpyouhandle
blockingI/Obyinsulatingyourprogramfromthetimeittakesfortheoperatingsystemto
respondtoyourrequests.
Forexample,sayyouwanttosendasignaltoaremote-controlledhelicopterthrougha
serialport.I’lluseaslowsystemcall(select)asaproxyforthisactivity.Thisfunction
askstheoperatingsystemtoblockfor0.1secondandthenreturncontroltomyprogram,
similartowhatwouldhappenwhenusingasynchronousserialport.
Clickheretoviewcodeimage
importselect
defslow_systemcall():
select.select([],[],[],0.1)
Runningthissystemcallinserialrequiresalinearlyincreasingamountoftime.
Clickheretoviewcodeimage
start=time()
for_inrange(5):
slow_systemcall()
end=time()
print(‘Took%.3fseconds’%(end-start))
>>>
Took0.503seconds
Theproblemisthatwhiletheslow_systemcallfunctionisrunning,myprogram
can’tmakeanyotherprogress.Myprogram’smainthreadofexecutionisblockedonthe
selectsystemcall.Thissituationisawfulinpractice.Youneedtobeabletocompute
yourhelicopter’snextmovewhileyou’resendingitasignal,otherwiseit’llcrash.When
youfindyourselfneedingtodoblockingI/Oandcomputationsimultaneously,it’stimeto
considermovingyoursystemcallstothreads.
Here,Irunmultipleinvocationsoftheslow_systemcallfunctioninseparatethreads.
Thiswouldallowyoutocommunicatewithmultipleserialports(andhelicopters)atthe
sametime,whileleavingthemainthreadtodowhatevercomputationisrequired.
Clickheretoviewcodeimage
start=time()
threads=[]
for_inrange(5):
thread=Thread(target=slow_systemcall)
thread.start()
threads.append(thread)
Withthethreadsstarted,hereIdosomeworktocalculatethenexthelicoptermovebefore
waitingforthesystemcallthreadstofinish.
Clickheretoviewcodeimage
defcompute_helicopter_location(index):
#…
foriinrange(5):
compute_helicopter_location(i)
forthreadinthreads:
thread.join()
end=time()
print(‘Took%.3fseconds’%(end-start))
>>>
Took0.102seconds
Theparalleltimeis5×lessthantheserialtime.Thisshowsthatthesystemcallswillall
runinparallelfrommultiplePythonthreadseventhoughthey’relimitedbytheGIL.The
GILpreventsmyPythoncodefromrunninginparallel,butithasnonegativeeffecton
systemcalls.ThisworksbecausePythonthreadsreleasetheGILjustbeforetheymake
systemcallsandreacquiretheGILassoonasthesystemcallsaredone.
TherearemanyotherwaystodealwithblockingI/Obesidesthreads,suchasthe
asynciobuilt-inmodule,andthesealternativeshaveimportantbenefits.Butthese
optionsalsorequireextraworkinrefactoringyourcodetofitadifferentmodelof
execution(seeItem40:“ConsiderCoroutinestoRunManyFunctionsConcurrently”).
UsingthreadsisthesimplestwaytodoblockingI/Oinparallelwithminimalchangesto
yourprogram.
ThingstoRemember
Pythonthreadscan’trunbytecodeinparallelonmultipleCPUcoresbecauseofthe
globalinterpreterlock(GIL).
PythonthreadsarestillusefuldespitetheGILbecausetheyprovideaneasywayto
domultiplethingsatseeminglythesametime.
UsePythonthreadstomakemultiplesystemcallsinparallel.Thisallowsyoutodo
blockingI/Oatthesametimeascomputation.
Item38:UseLocktoPreventDataRacesinThreads
Afterlearningabouttheglobalinterpreterlock(GIL)(seeItem37:“UseThreadsfor
BlockingI/O,AvoidforParallelism”),manynewPythonprogrammersassumetheycan
forgousingmutual-exclusionlocks(mutexes)intheircodealtogether.IftheGILis
alreadypreventingPythonthreadsfromrunningonmultipleCPUcoresinparallel,itmust
alsoactasalockforaprogram’sdatastructures,right?Sometestingontypeslikelists
anddictionariesmayevenshowthatthisassumptionappearstohold.
Butbeware,thisistrulynotthecase.TheGILwillnotprotectyou.Althoughonlyone
Pythonthreadrunsatatime,athread’soperationsondatastructurescanbeinterrupted
betweenanytwobytecodeinstructionsinthePythoninterpreter.Thisisdangerousifyou
accessthesameobjectsfrommultiplethreadssimultaneously.Theinvariantsofyourdata
structurescouldbeviolatedatpracticallyanytimebecauseoftheseinterruptions,leaving
yourprograminacorruptedstate.
Forexample,sayyouwanttowriteaprogramthatcountsmanythingsinparallel,like
samplinglightlevelsfromawholenetworkofsensors.Ifyouwanttodeterminethetotal
numberoflightsamplesovertime,youcanaggregatethemwithanewclass.
Clickheretoviewcodeimage
classCounter(object):
def__init__(self):
self.count=0
defincrement(self,offset):
self.count+=offset
Imaginethateachsensorhasitsownworkerthreadbecausereadingfromthesensor
requiresblockingI/O.Aftereachsensormeasurement,theworkerthreadincrementsthe
counteruptoamaximumnumberofdesiredreadings.
Clickheretoviewcodeimage
defworker(sensor_index,how_many,counter):
for_inrange(how_many):
#Readfromthesensor
#…
counter.increment(1)
Here,Idefineafunctionthatstartsaworkerthreadforeachsensorandwaitsforthemall
tofinishtheirreadings:
Clickheretoviewcodeimage
defrun_threads(func,how_many,counter):
threads=[]
foriinrange(5):
args=(i,how_many,counter)
thread=Thread(target=func,args=args)
threads.append(thread)
thread.start()
forthreadinthreads:
thread.join()
Runningfivethreadsinparallelseemssimple,andtheoutcomeshouldbeobvious.
Clickheretoviewcodeimage
how_many=10**5
counter=Counter()
run_threads(worker,how_many,counter)
print(‘Countershouldbe%d,found%d’%
(5*how_many,counter.count))
>>>
Countershouldbe500000,found278328
Butthisresultiswayoff!Whathappenedhere?Howcouldsomethingsosimplegoso
wrong,especiallysinceonlyonePythoninterpreterthreadcanrunatatime?
ThePythoninterpreterenforcesfairnessbetweenallofthethreadsthatareexecutingto
ensuretheygetaroughlyequalamountofprocessingtime.Todothis,Pythonwill
suspendathreadasit’srunningandwillresumeanotherthreadinturn.Theproblemis
thatyoudon’tknowexactlywhenPythonwillsuspendyourthreads.Athreadcanevenbe
pausedseeminglyhalfwaythroughwhatlookslikeanatomicoperation.That’swhat
happenedinthiscase.
TheCounterobject’sincrementmethodlookssimple.
counter.count+=offset
Butthe+=operatorusedonanobjectattributeactuallyinstructsPythontodothree
separateoperationsbehindthescenes.Thestatementaboveisequivalenttothis:
Clickheretoviewcodeimage
value=getattr(counter,‘count’)
result=value+offset
setattr(counter,‘count’,result)
Pythonthreadsincrementingthecountercanbesuspendedbetweenanytwoofthese
operations.Thisisproblematicifthewaytheoperationsinterleavecausesoldversionsof
valuetobeassignedtothecounter.Here’sanexampleofbadinteractionbetweentwo
threads,AandB:
Clickheretoviewcodeimage
#RunninginThreadA
value_a=getattr(counter,‘count’)
#ContextswitchtoThreadB
value_b=getattr(counter,‘count’)
result_b=value_b+1
setattr(counter,‘count’,result_b)
#ContextswitchbacktoThreadA
result_a=value_a+1
setattr(counter,‘count’,result_a)
ThreadAstompedonthreadB,erasingallofitsprogressincrementingthecounter.Thisis
exactlywhathappenedinthelightsensorexampleabove.
Topreventdataracesliketheseandotherformsofdatastructurecorruption,Python
includesarobustsetoftoolsinthethreadingbuilt-inmodule.Thesimplestandmost
usefulofthemistheLockclass,amutual-exclusionlock(mutex).
Byusingalock,IcanhavetheCounterclassprotectitscurrentvalueagainst
simultaneousaccessfrommultiplethreads.Onlyonethreadwillbeabletoacquirethe
lockatatime.Here,Iuseawithstatementtoacquireandreleasethelock;thismakesit
easiertoseewhichcodeisexecutingwhilethelockisheld(seeItem43:“Consider
contextlibandwithStatementsforReusabletry/finallyBehavior”fordetails):
Clickheretoviewcodeimage
classLockingCounter(object):
def__init__(self):
self.lock=Lock()
self.count=0
defincrement(self,offset):
withself.lock:
self.count+=offset
NowIruntheworkerthreadsasbefore,butuseaLockingCounterinstead.
Clickheretoviewcodeimage
counter=LockingCounter()
run_threads(worker,how_many,counter)
print(‘Countershouldbe%d,found%d’%
(5*how_many,counter.count))
>>>
Countershouldbe500000,found500000
TheresultisexactlywhatIexpect.TheLocksolvedtheproblem.
ThingstoRemember
EventhoughPythonhasaglobalinterpreterlock,you’restillresponsiblefor
protectingagainstdataracesbetweenthethreadsinyourprograms.
Yourprogramswillcorrupttheirdatastructuresifyouallowmultiplethreadsto
modifythesameobjectswithoutlocks.
TheLockclassinthethreadingbuilt-inmoduleisPython’sstandardmutual
exclusionlockimplementation.
Item39:UseQueuetoCoordinateWorkBetweenThreads
Pythonprogramsthatdomanythingsconcurrentlyoftenneedtocoordinatetheirwork.
Oneofthemostusefularrangementsforconcurrentworkisapipelineoffunctions.
Apipelineworkslikeanassemblylineusedinmanufacturing.Pipelineshavemany
phasesinserialwithaspecificfunctionforeachphase.Newpiecesofworkareconstantly
addedtothebeginningofthepipeline.Eachfunctioncanoperateconcurrentlyonthe
pieceofworkinitsphase.Theworkmovesforwardaseachfunctioncompletesuntilthere
arenophasesremaining.Thisapproachisespeciallygoodforworkthatincludesblocking
I/Oorsubprocesses—activitiesthatcaneasilybeparallelizedusingPython(seeItem37:
“UseThreadsforBlockingI/O,AvoidforParallelism”).
Forexample,sayyouwanttobuildasystemthatwilltakeaconstantstreamofimages
fromyourdigitalcamera,resizethem,andthenaddthemtoaphotogalleryonline.Sucha
programcouldbesplitintothreephasesofapipeline.Newimagesareretrievedinthefirst
phase.Thedownloadedimagesarepassedthroughtheresizefunctioninthesecondphase.
Theresizedimagesareconsumedbytheuploadfunctioninthefinalphase.
ImagineyouhadalreadywrittenPythonfunctionsthatexecutethephases:download,
resize,upload.Howdoyouassembleapipelinetodotheworkconcurrently?
Thefirstthingyouneedisawaytohandoffworkbetweenthepipelinephases.Thiscan
bemodeledasathread-safeproducer-consumerqueue(seeItem38:“UseLockto
PreventDataRacesinThreads”tounderstandtheimportanceofthreadsafetyinPython;
seeItem46:“UseBuilt-inAlgorithmsandDataStructures”forthedequeclass).
classMyQueue(object):
def__init__(self):
self.items=deque()
self.lock=Lock()
Theproducer,yourdigitalcamera,addsnewimagestotheendofthelistofpending
items.
Clickheretoviewcodeimage
defput(self,item):
withself.lock:
self.items.append(item)
Theconsumer,thefirstphaseofyourprocessingpipeline,removesimagesfromthefront
ofthelistofpendingitems.
Clickheretoviewcodeimage
defget(self):
withself.lock:
returnself.items.popleft()
Here,IrepresenteachphaseofthepipelineasaPythonthreadthattakesworkfromone
queuelikethis,runsafunctiononit,andputstheresultonanotherqueue.Ialsotrackhow
manytimestheworkerhascheckedfornewinputandhowmuchworkit’scompleted.
Clickheretoviewcodeimage
classWorker(Thread):
def__init__(self,func,in_queue,out_queue):
super().__init__()
self.func=func
self.in_queue=in_queue
self.out_queue=out_queue
self.polled_count=0
self.work_done=0
Thetrickiestpartisthattheworkerthreadmustproperlyhandlethecasewheretheinput
queueisemptybecausethepreviousphasehasn’tcompleteditsworkyet.Thishappens
whereIcatchtheIndexErrorexceptionbelow.Youcanthinkofthisasaholdupinthe
assemblyline.
Clickheretoviewcodeimage
defrun(self):
whileTrue:
self.polled_count+=1
try:
item=self.in_queue.get()
exceptIndexError:
sleep(0.01)#Noworktodo
else:
result=self.func(item)
self.out_queue.put(result)
self.work_done+=1
NowIcanconnectthethreephasestogetherbycreatingthequeuesfortheircoordination
pointsandthecorrespondingworkerthreads.
Clickheretoviewcodeimage
download_queue=MyQueue()
resize_queue=MyQueue()
upload_queue=MyQueue()
done_queue=MyQueue()
threads=[
Worker(download,download_queue,resize_queue),
Worker(resize,resize_queue,upload_queue),
Worker(upload,upload_queue,done_queue),
]
Icanstartthethreadsandtheninjectabunchofworkintothefirstphaseofthepipeline.
Here,Iuseaplainobjectinstanceasaproxyfortherealdatarequiredbythe
downloadfunction:
Clickheretoviewcodeimage
forthreadinthreads:
thread.start()
for_inrange(1000):
download_queue.put(object())
NowIwaitforalloftheitemstobeprocessedbythepipelineandendupinthe
done_queue.
Clickheretoviewcodeimage
whilelen(done_queue.items)<1000:
#Dosomethingusefulwhilewaiting
#…
Thisrunsproperly,butthere’saninterestingsideeffectcausedbythethreadspollingtheir
inputqueuesfornewwork.Thetrickypart,whereIcatchIndexErrorexceptionsinthe
runmethod,executesalargenumberoftimes.
Clickheretoviewcodeimage
processed=len(done_queue.items)
polled=sum(t.polled_countfortinthreads)
print(‘Processed’,processed,‘itemsafterpolling’,
polled,‘times’)
>>>
Processed1000itemsafterpolling3030times
Whentheworkerfunctionsvaryinspeeds,anearlierphasecanpreventprogressinlater
phases,backingupthepipeline.Thiscauseslaterphasestostarveandconstantlycheck
theirinputqueuesfornewworkinatightloop.Theoutcomeisthatworkerthreadswaste
CPUtimedoingnothinguseful(they’reconstantlyraisingandcatchingIndexError
exceptions).
Butthat’sjustthebeginningofwhat’swrongwiththisimplementation.Therearethree
moreproblemsthatyoushouldalsoavoid.First,determiningthatalloftheinputworkis
completerequiresyetanotherbusywaitonthedone_queue.Second,inWorkerthe
runmethodwillexecuteforeverinitsbusyloop.There’snowaytosignaltoaworker
threadthatit’stimetoexit.
Third,andworstofall,abackupinthepipelinecancausetheprogramtocrasharbitrarily.
Ifthefirstphasemakesrapidprogressbutthesecondphasemakesslowprogress,thenthe
queueconnectingthefirstphasetothesecondphasewillconstantlyincreaseinsize.The
secondphasewon’tbeabletokeepup.Givenenoughtimeandinputdata,theprogram
willeventuallyrunoutofmemoryanddie.
Thelessonhereisn’tthatpipelinesarebad;it’sthatit’shardtobuildagoodproducerconsumerqueueyourself.
QueuetotheRescue
TheQueueclassfromthequeuebuilt-inmoduleprovidesallofthefunctionalityyou
needtosolvetheseproblems.
Queueeliminatesthebusywaitingintheworkerbymakingthegetmethodblockuntil
newdataisavailable.Forexample,hereIstartathreadthatwaitsforsomeinputdataona
queue:
Clickheretoviewcodeimage
fromqueueimportQueue
queue=Queue()
defconsumer():
print(‘Consumerwaiting’)
queue.get()#Runsafterput()below
print(‘Consumerdone’)
thread=Thread(target=consumer)
thread.start()
Eventhoughthethreadisrunningfirst,itwon’tfinishuntilanitemisputontheQueue
instanceandthegetmethodhassomethingtoreturn.
Clickheretoviewcodeimage
print(‘Producerputting’)
queue.put(object())#Runsbeforeget()above
thread.join()
print(‘Producerdone’)
>>>
Consumerwaiting
Producerputting
Consumerdone
Producerdone
Tosolvethepipelinebackupissue,theQueueclassletsyouspecifythemaximum
amountofpendingworkyou’llallowbetweentwophases.Thisbuffersizecausescallsto
puttoblockwhenthequeueisalreadyfull.Forexample,hereIdefineathreadthatwaits
forawhilebeforeconsumingaqueue:
Clickheretoviewcodeimage
queue=Queue(1)#Buffersizeof1
defconsumer():
time.sleep(0.1)#Wait
queue.get()#Runssecond
print(‘Consumergot1’)
queue.get()#Runsfourth
print(‘Consumergot2’)
thread=Thread(target=consumer)
thread.start()
Thewaitshouldallowtheproducerthreadtoputbothobjectsonthequeuebeforethe
consumethreadevercallsget.ButtheQueuesizeisone.Thatmeanstheproducer
addingitemstothequeuewillhavetowaitfortheconsumerthreadtocallgetatleast
oncebeforethesecondcalltoputwillstopblockingandaddtheseconditemtothe
queue.
Clickheretoviewcodeimage
queue.put(object())#Runsfirst
print(‘Producerput1’)
queue.put(object())#Runsthird
print(‘Producerput2’)
thread.join()
print(‘Producerdone’)
>>>
Producerput1
Consumergot1
Producerput2
Consumergot2
Producerdone
TheQueueclasscanalsotracktheprogressofworkusingthetask_donemethod.This
letsyouwaitforaphase’sinputqueuetodrainandeliminatestheneedforpollingthe
done_queueattheendofyourpipeline.Forexample,hereIdefineaconsumerthread
thatcallstask_donewhenitfinishesworkingonanitem.
Clickheretoviewcodeimage
in_queue=Queue()
defconsumer():
print(‘Consumerwaiting’)
work=in_queue.get()#Donesecond
print(‘Consumerworking’)
#Doingwork
#…
print(‘Consumerdone’)
in_queue.task_done()#Donethird
Thread(target=consumer).start()
Now,theproducercodedoesn’thavetojointheconsumerthreadorpoll.Theproducer
canjustwaitforthein_queuetofinishbycallingjoinontheQueueinstance.Even
onceit’sempty,thein_queuewon’tbejoinableuntilaftertask_doneiscalledfor
everyitemthatwaseverenqueued.
Clickheretoviewcodeimage
in_queue.put(object())#Donefirst
print(‘Producerwaiting’)
in_queue.join()#Donefourth
print(‘Producerdone’)
>>>
Consumerwaiting
Producerwaiting
Consumerworking
Consumerdone
Producerdone
IcanputallofthesebehaviorstogetherintoaQueuesubclassthatalsotellstheworker
threadwhenitshouldstopprocessing.Here,Idefineaclosemethodthataddsaspecial
itemtothequeuethatindicatestherewillbenomoreinputitemsafterit:
Clickheretoviewcodeimage
classClosableQueue(Queue):
SENTINEL=object()
defclose(self):
self.put(self.SENTINEL)
Then,Idefineaniteratorforthequeuethatlooksforthisspecialobjectandstopsiteration
whenit’sfound.This__iter__methodalsocallstask_doneatappropriatetimes,
lettingmetracktheprogressofworkonthequeue.
Clickheretoviewcodeimage
def__iter__(self):
whileTrue:
item=self.get()
try:
ifitemisself.SENTINEL:
return#Causethethreadtoexit
yielditem
finally:
self.task_done()
Now,IcanredefinemyworkerthreadtorelyonthebehavioroftheClosableQueue
class.Thethreadwillexitoncetheforloopisexhausted.
Clickheretoviewcodeimage
classStoppableWorker(Thread):
def__init__(self,func,in_queue,out_queue):
#…
defrun(self):
foriteminself.in_queue:
result=self.func(item)
self.out_queue.put(result)
Here,Ire-createthesetofworkerthreadsusingthenewworkerclass:
Clickheretoviewcodeimage
download_queue=ClosableQueue()
#…
threads=[
StoppableWorker(download,download_queue,resize_queue),
#…
]
Afterrunningtheworkerthreadslikebefore,Ialsosendthestopsignaloncealltheinput
workhasbeeninjectedbyclosingtheinputqueueofthefirstphase.
Clickheretoviewcodeimage
forthreadinthreads:
thread.start()
for_inrange(1000):
download_queue.put(object())
download_queue.close()
Finally,Iwaitfortheworktofinishbyjoiningeachqueuethatconnectsthephases.Each
timeonephaseisdone,Isignalthenextphasetostopbyclosingitsinputqueue.Atthe
end,thedone_queuecontainsalloftheoutputobjectsasexpected.
Clickheretoviewcodeimage
download_queue.join()
resize_queue.close()
resize_queue.join()
upload_queue.close()
upload_queue.join()
print(done_queue.qsize(),‘itemsfinished’)
>>>
1000itemsfinished
ThingstoRemember
Pipelinesareagreatwaytoorganizesequencesofworkthatrunconcurrentlyusing
multiplePythonthreads.
Beawareofthemanyproblemsinbuildingconcurrentpipelines:busywaiting,
stoppingworkers,andmemoryexplosion.
TheQueueclasshasallofthefacilitiesyouneedtobuildrobustpipelines:blocking
operations,buffersizes,andjoining.
Item40:ConsiderCoroutinestoRunManyFunctions
Concurrently
ThreadsgivePythonprogrammersawaytorunmultiplefunctionsseeminglyatthesame
time(seeItem37:“UseThreadsforBlockingI/O,AvoidforParallelism”).Butthereare
threebigproblemswiththreads:
Theyrequirespecialtoolstocoordinatewitheachothersafely(seeItem38:“Use
LocktoPreventDataRacesinThreads”andItem39:“UseQueuetoCoordinate
WorkBetweenThreads”).Thismakescodethatusesthreadshardertoreasonabout
thanprocedural,single-threadedcode.Thiscomplexitymakesthreadedcodemore
difficulttoextendandmaintainovertime.
Threadsrequirealotofmemory,about8MBperexecutingthread.Onmany
computers,thatamountofmemorydoesn’tmatterforadozenthreadsorso.But
whatifyouwantyourprogramtoruntensofthousandsoffunctions
“simultaneously”?Thesefunctionsmaycorrespondtouserrequeststoaserver,
pixelsonascreen,particlesinasimulation,etc.Runningathreadperuniqueactivity
justwon’twork.
Threadsarecostlytostart.Ifyouwanttoconstantlybecreatingnewconcurrent
functionsandfinishingthem,theoverheadofusingthreadsbecomeslargeandslows
everythingdown.
Pythoncanworkaroundalltheseissueswithcoroutines.Coroutinesletyouhavemany
seeminglysimultaneousfunctionsinyourPythonprograms.They’reimplementedasan
extensiontogenerators(seeItem16:“ConsiderGeneratorsInsteadofReturningLists”).
Thecostofstartingageneratorcoroutineisafunctioncall.Onceactive,theyeachuseless
than1KBofmemoryuntilthey’reexhausted.
Coroutinesworkbyenablingthecodeconsumingageneratortosendavaluebackinto
thegeneratorfunctionaftereachyieldexpression.Thegeneratorfunctionreceivesthe
valuepassedtothesendfunctionastheresultofthecorrespondingyieldexpression.
Clickheretoviewcodeimage
defmy_coroutine():
whileTrue:
received=yield
print(‘Received:’,received)
it=my_coroutine()
next(it)#Primethecoroutine
it.send(‘First’)
it.send(‘Second’)
>>>
Received:First
Received:Second
Theinitialcalltonextisrequiredtopreparethegeneratorforreceivingthefirstsend
byadvancingittothefirstyieldexpression.Together,yieldandsendprovide
generatorswithastandardwaytovarytheirnextyieldedvalueinresponsetoexternal
input.
Forexample,sayyouwanttoimplementageneratorcoroutinethatyieldstheminimum
valueit’sbeensentsofar.Here,thebareyieldpreparesthecoroutinewiththeinitial
minimumvaluesentinfromtheoutside.Thenthegeneratorrepeatedlyyieldsthenew
minimuminexchangeforthenextvaluetoconsider.
Clickheretoviewcodeimage
defminimize():
current=yield
whileTrue:
value=yieldcurrent
current=min(value,current)
Thecodeconsumingthegeneratorcanrunonestepatatimeandwilloutputtheminimum
valueseenaftereachinput.
Clickheretoviewcodeimage
it=minimize()
next(it)#Primethegenerator
print(it.send(10))
print(it.send(4))
print(it.send(22))
print(it.send(-1))
>>>
10
4
4
-1
Thegeneratorfunctionwillseeminglyrunforever,makingforwardprogresswitheach
newcalltosend.Likethreads,coroutinesareindependentfunctionsthatcanconsume
inputsfromtheirenvironmentandproduceresultingoutputs.Thedifferenceisthat
coroutinespauseateachyieldexpressioninthegeneratorfunctionandresumeafter
eachcalltosendfromtheoutside.Thisisthemagicalmechanismofcoroutines.
Thisbehaviorallowsthecodeconsumingthegeneratortotakeactionaftereachyield
expressioninthecoroutine.Theconsumingcodecanusethegenerator’soutputvaluesto
callotherfunctionsandupdatedatastructures.Mostimportantly,itcanadvanceother
generatorfunctionsuntiltheirnextyieldexpressions.Byadvancingmanyseparate
generatorsinlockstep,theywillallseemtoberunningsimultaneously,mimickingthe
concurrentbehaviorofPythonthreads.
TheGameofLife
Letmedemonstratethesimultaneousbehaviorofcoroutineswithanexample.Sayyou
wanttousecoroutinestoimplementConway’sGameofLife.Therulesofthegameare
simple.Youhaveatwo-dimensionalgridofanarbitrarysize.Eachcellinthegridcan
eitherbealiveorempty.
ALIVE=‘*’
EMPTY=‘-‘
Thegameprogressesonetickoftheclockatatime.Ateachtick,eachcellcountshow
manyofitsneighboringeightcellsarestillalive.Basedonitsneighborcount,eachcell
decidesifitwillkeepliving,die,orregenerate.Here’sanexampleofa5×5GameofLife
gridafterfourgenerationswithtimegoingtotheright.I’llexplainthespecificrules
furtherbelow.
Clickheretoviewcodeimage
0|1|2|3|4
–—|–—|–—|–—|–—
-*–|—*—|—**-|—*—|–—
—**-|—**-|-*–|-*–|-**—
–*-|—**-|—**-|—*—|–—
–—|–—|–—|–—|–—
Icanmodelthisgamebyrepresentingeachcellasageneratorcoroutinerunningin
lockstepwithalltheothers.
Toimplementthis,firstIneedawaytoretrievethestatusofneighboringcells.Icando
thiswithacoroutinenamedcount_neighborsthatworksbyyieldingQueryobjects.
TheQueryclassIdefinemyself.Itspurposeistoprovidethegeneratorcoroutinewitha
waytoaskitssurroundingenvironmentforinformation.
Clickheretoviewcodeimage
Query=namedtuple(‘Query’,(‘y’,‘x’))
ThecoroutineyieldsaQueryforeachneighbor.Theresultofeachyieldexpression
willbethevalueALIVEorEMPTY.That’stheinterfacecontractI’vedefinedbetweenthe
coroutineanditsconsumingcode.Thecount_neighborsgeneratorseesthe
neighbors’statesandreturnsthecountoflivingneighbors.
Clickheretoviewcodeimage
defcount_neighbors(y,x):
n_=yieldQuery(y+1,x+0)#North
ne=yieldQuery(y+1,x+1)#Northeast
#Definee_,se,s_,sw,w_,nw…
#…
neighbor_states=[n_,ne,e_,se,s_,sw,w_,nw]
count=0
forstateinneighbor_states:
ifstate==ALIVE:
count+=1
returncount
Icandrivethecount_neighborscoroutinewithfakedatatotestit.Here,Ishowhow
Queryobjectswillbeyieldedforeachneighbor.count_neighborsexpectsto
receivecellstatescorrespondingtoeachQuerythroughthecoroutine’ssendmethod.
ThefinalcountisreturnedintheStopIterationexceptionthatisraisedwhenthe
generatorisexhaustedbythereturnstatement.
Clickheretoviewcodeimage
it=count_neighbors(10,5)
q1=next(it)#Getthefirstquery
print(‘Firstyield:‘,q1)
q2=it.send(ALIVE)#Sendq1state,getq2
print(‘Secondyield:’,q2)
q3=it.send(ALIVE)#Sendq2state,getq3
#…
try:
count=it.send(EMPTY)#Sendq8state,retrievecount
exceptStopIterationase:
print(‘Count:‘,e.value)#Valuefromreturnstatement
>>>
Firstyield:Query(y=11,x=5)
Secondyield:Query(y=11,x=6)
…
Count:2
NowIneedtheabilitytoindicatethatacellwilltransitiontoanewstateinresponsetothe
neighborcountthatitfoundfromcount_neighbors.Todothis,Idefineanother
coroutinecalledstep_cell.Thisgeneratorwillindicatetransitionsinacell’sstateby
yieldingTransitionobjects.ThisisanotherclassthatIdefine,justliketheQuery
class.
Clickheretoviewcodeimage
Transition=namedtuple(‘Transition’,(‘y’,‘x’,‘state’))
Thestep_cellcoroutinereceivesitscoordinatesinthegridasarguments.Ityieldsa
Querytogettheinitialstateofthosecoordinates.Itrunscount_neighborsto
inspectthecellsaroundit.Itrunsthegamelogictodeterminewhatstatethecellshould
haveforthenextclocktick.Finally,ityieldsaTransitionobjecttotellthe
environmentthecell’snextstate.
Clickheretoviewcodeimage
defgame_logic(state,neighbors):
#…
defstep_cell(y,x):
state=yieldQuery(y,x)
neighbors=yieldfromcount_neighbors(y,x)
next_state=game_logic(state,neighbors)
yieldTransition(y,x,next_state)
Importantly,thecalltocount_neighborsusestheyieldfromexpression.This
expressionallowsPythontocomposegeneratorcoroutinestogether,makingiteasyto
reusesmallerpiecesoffunctionalityandbuildcomplexcoroutinesfromsimplerones.
Whencount_neighborsisexhausted,thefinalvalueitreturns(withthereturn
statement)willbepassedtostep_cellastheresultoftheyieldfromexpression.
Now,IcanfinallydefinethesimplegamelogicforConway’sGameofLife.Thereare
onlythreerules.
Clickheretoviewcodeimage
defgame_logic(state,neighbors):
ifstate==ALIVE:
ifneighbors<2:
returnEMPTY#Die:Toofew
elifneighbors>3:
returnEMPTY#Die:Toomany
else:
ifneighbors==3:
returnALIVE#Regenerate
returnstate
Icandrivethestep_cellcoroutinewithfakedatatotestit.
Clickheretoviewcodeimage
it=step_cell(10,5)
q0=next(it)#Initiallocationquery
print(‘Me:’,q0)
q1=it.send(ALIVE)#Sendmystatus,getneighborquery
print(‘Q1:’,q1)
#…
t1=it.send(EMPTY)#Sendforq8,getgamedecision
print(‘Outcome:‘,t1)
>>>
Me:Query(y=10,x=5)
Q1:Query(y=11,x=5)
…
Outcome:Transition(y=10,x=5,state=’-‘)
Thegoalofthegameistorunthislogicforawholegridofcellsinlockstep.Todothis,I
canfurthercomposethestep_cellcoroutineintoasimulatecoroutine.This
coroutineprogressesthegridofcellsforwardbyyieldingfromstep_cellmanytimes.
Afterprogressingeverycoordinate,ityieldsaTICKobjecttoindicatethatthecurrent
generationofcellshavealltransitioned.
Clickheretoviewcodeimage
TICK=object()
defsimulate(height,width):
whileTrue:
foryinrange(height):
forxinrange(width):
yieldfromstep_cell(y,x)
yieldTICK
What’simpressiveaboutsimulateisthatit’scompletelydisconnectedfromthe
surroundingenvironment.Istillhaven’tdefinedhowthegridisrepresentedinPython
objects,howQuery,Transition,andTICKvaluesarehandledontheoutside,nor
howthegamegetsitsinitialstate.Butthelogicisclear.Eachcellwilltransitionby
runningstep_cell.Thenthegameclockwilltick.Thiswillcontinueforever,aslong
asthesimulatecoroutineisadvanced.
Thisisthebeautyofcoroutines.Theyhelpyoufocusonthelogicofwhatyou’retryingto
accomplish.Theydecoupleyourcode’sinstructionsfortheenvironmentfromthe
implementationthatcarriesoutyourwishes.Thisenablesyoutoruncoroutinesseemingly
inparallel.Thisalsoallowsyoutoimprovetheimplementationoffollowingthose
instructionsovertimewithoutchangingthecoroutines.
Now,Iwanttorunsimulateinarealenvironment.Todothat,Ineedtorepresentthe
stateofeachcellinthegrid.Here,Idefineaclasstocontainthegrid:
Clickheretoviewcodeimage
classGrid(object):
def__init__(self,height,width):
self.height=height
self.width=width
self.rows=[]
for_inrange(self.height):
self.rows.append([EMPTY]*self.width)
def__str__(self):
#…
Thegridallowsyoutogetandsetthevalueofanycoordinate.Coordinatesthatareoutof
boundswillwraparound,makingthegridactlikeinfiniteloopingspace.
Clickheretoviewcodeimage
defquery(self,y,x):
returnself.rows[y%self.height][x%self.width]
defassign(self,y,x,state):
self.rows[y%self.height][x%self.width]=state
Atlast,Icandefinethefunctionthatinterpretsthevaluesyieldedfromsimulateandall
ofitsinteriorcoroutines.Thisfunctionturnstheinstructionsfromthecoroutinesinto
interactionswiththesurroundingenvironment.Itprogressesthewholegridofcells
forwardasinglestepandthenreturnsanewgridcontainingthenextstate.
Clickheretoviewcodeimage
deflive_a_generation(grid,sim):
progeny=Grid(grid.height,grid.width)
item=next(sim)
whileitemisnotTICK:
ifisinstance(item,Query):
state=grid.query(item.y,item.x)
item=sim.send(state)
else:#MustbeaTransition
progeny.assign(item.y,item.x,item.state)
item=next(sim)
returnprogeny
Toseethisfunctioninaction,Ineedtocreateagridandsetitsinitialstate.Here,Imakea
classicshapecalledaglider.
grid=Grid(5,9)
grid.assign(0,3,ALIVE)
#…
print(grid)
>>>
–*–—
–-*–—***––––
–––
NowIcanprogressthisgridforwardonegenerationatatime.Youcanseehowtheglider
movesdownandtotherightonthegridbasedonthesimplerulesfromthegame_logic
function.
Clickheretoviewcodeimage
classColumnPrinter(object):
#…
columns=ColumnPrinter()
sim=simulate(grid.height,grid.width)
foriinrange(5):
columns.append(str(grid))
grid=live_a_generation(grid,sim)
print(columns)
>>>
0|1|2|3|4
–*–—|–––|–––|–––|–––
–-*–-|—*-*–-|–-*–-|–*–—|–-*–—***–-|–**–-|—*-*–-|–-**–|–—*–
–––|–*–—|–**–-|–**–-|–***–
–––|–––|–––|–––|–––
ThebestpartaboutthisapproachisthatIcanchangethegame_logicfunctionwithout
havingtoupdatethecodethatsurroundsit.Icanchangetherulesoraddlargerspheresof
influencewiththeexistingmechanicsofQuery,Transition,andTICK.This
demonstrateshowcoroutinesenabletheseparationofconcerns,whichisanimportant
designprinciple.
CoroutinesinPython2
Unfortunately,Python2ismissingsomeofthesyntacticalsugarthatmakescoroutinesso
elegantinPython3.Therearetwolimitations.First,thereisnoyieldfromexpression.
ThatmeansthatwhenyouwanttocomposegeneratorcoroutinesinPython2,youneedto
includeanadditionalloopatthedelegationpoint.
Clickheretoviewcodeimage
#Python2
defdelegated():
yield1
yield2
defcomposed():
yield‘A’
forvalueindelegated():#yieldfrominPython3
yieldvalue
yield‘B’
printlist(composed())
>>>
[‘A’,1,2,‘B’]
ThesecondlimitationisthatthereisnosupportforthereturnstatementinPython2
generators.Togetthesamebehaviorthatinteractscorrectlywithtry/except/finally
blocks,youneedtodefineyourownexceptiontypeandraiseitwhenyouwanttoreturna
value.
Clickheretoviewcodeimage
#Python2
classMyReturn(Exception):
def__init__(self,value):
self.value=value
defdelegated():
yield1
raiseMyReturn(2)#return2inPython3
yield‘Notreached’
defcomposed():
try:
forvalueindelegated():
yieldvalue
exceptMyReturnase:
output=e.value
yieldoutput*4
printlist(composed())
>>>
[1,8]
ThingstoRemember
Coroutinesprovideanefficientwaytoruntensofthousandsoffunctionsseemingly
atthesametime.
Withinagenerator,thevalueoftheyieldexpressionwillbewhatevervaluewas
passedtothegenerator’ssendmethodfromtheexteriorcode.
Coroutinesgiveyouapowerfultoolforseparatingthecorelogicofyourprogram
fromitsinteractionwiththesurroundingenvironment.
Python2doesn’tsupportyieldfromorreturningvaluesfromgenerators.
Item41:Considerconcurrent.futuresforTrue
Parallelism
AtsomepointinwritingPythonprograms,youmayhittheperformancewall.Evenafter
optimizingyourcode(seeItem58:“ProfileBeforeOptimizing”),yourprogram’s
executionmaystillbetooslowforyourneeds.Onmoderncomputersthathavean
increasingnumberofCPUcores,it’sreasonabletoassumethatonesolutionwouldbe
parallelism.Whatifyoucouldsplityourcode’scomputationintoindependentpiecesof
workthatrunsimultaneouslyacrossmultipleCPUcores?
Unfortunately,Python’sglobalinterpreterlock(GIL)preventstrueparallelisminthreads
(seeItem37:“UseThreadsforBlockingI/O,AvoidforParallelism”),sothatoptionisout.
Anothercommonsuggestionistorewriteyourmostperformance-criticalcodeasan
extensionmoduleusingtheClanguage.Cgetsyouclosertothebaremetalandcanrun
fasterthanPython,eliminatingtheneedforparallelism.C-extensionscanalsostartnative
threadsthatruninparallelandutilizemultipleCPUcores.Python’sAPIforC-extensions
iswelldocumentedandagoodchoiceforanescapehatch.
ButrewritingyourcodeinChasahighcost.Codethatisshortandunderstandablein
PythoncanbecomeverboseandcomplicatedinC.Suchaportrequiresextensivetesting
toensurethatthefunctionalityisequivalenttotheoriginalPythoncodeandthatnobugs
havebeenintroduced.Sometimesit’sworthit,whichexplainsthelargeecosystemofCextensionmodulesinthePythoncommunitythatspeedupthingsliketextparsing,image
compositing,andmatrixmath.ThereareevenopensourcetoolssuchasCython
(http://cython.org/)andNumba(http://numba.pydata.org/)thatcaneasethetransitiontoC.
TheproblemisthatmovingonepieceofyourprogramtoCisn’tsufficientmostofthe
time.OptimizedPythonprogramsusuallydon’thaveonemajorsourceofslowness,but
rather,thereareoftenmanysignificantcontributors.TogetthebenefitsofC’sbaremetal
andthreads,you’dneedtoportlargepartsofyourprogram,drasticallyincreasingtesting
needsandrisk.TheremustbeabetterwaytopreserveyourinvestmentinPythontosolve
difficultcomputationalproblems.
Themultiprocessingbuilt-inmodule,easilyaccessedviathe
concurrent.futuresbuilt-inmodule,maybeexactlywhatyouneed.Itenables
PythontoutilizemultipleCPUcoresinparallelbyrunningadditionalinterpretersaschild
processes.Thesechildprocessesareseparatefromthemaininterpreter,sotheirglobal
interpreterlocksarealsoseparate.EachchildcanfullyutilizeoneCPUcore.Eachchild
hasalinktothemainprocesswhereitreceivesinstructionstodocomputationandreturns
results.
Forexample,sayyouwanttodosomethingcomputationallyintensivewithPythonand
utilizemultipleCPUcores.I’lluseanimplementationoffindingthegreatestcommon
divisoroftwonumbersasaproxyforamorecomputationallyintensealgorithm,like
simulatingfluiddynamicswiththeNavier-Stokesequation.
Clickheretoviewcodeimage
defgcd(pair):
a,b=pair
low=min(a,b)
foriinrange(low,0,-1):
ifa%i==0andb%i==0:
returni
Runningthisfunctioninserialtakesalinearlyincreasingamountoftimebecausethereis
noparallelism.
Clickheretoviewcodeimage
numbers=[(1963309,2265973),(2030677,3814172),
(1551645,2229620),(2039045,2020802)]
start=time()
results=list(map(gcd,numbers))
end=time()
print(‘Took%.3fseconds’%(end-start))
>>>
Took1.170seconds
RunningthiscodeonmultiplePythonthreadswillyieldnospeedimprovementbecause
theGILpreventsPythonfromusingmultipleCPUcoresinparallel.Here,Idothesame
computationasaboveusingtheconcurrent.futuresmodulewithits
ThreadPoolExecutorclassandtwoworkerthreads(tomatchthenumberofCPU
coresonmycomputer):
Clickheretoviewcodeimage
start=time()
pool=ThreadPoolExecutor(max_workers=2)
results=list(pool.map(gcd,numbers))
end=time()
print(‘Took%.3fseconds’%(end-start))
>>>
Took1.199seconds
It’sevenslowerthistimebecauseoftheoverheadofstartingandcommunicatingwiththe
poolofthreads.
Nowforthesurprisingpart:Bychangingasinglelineofcode,somethingmagical
happens.IfIreplacetheThreadPoolExecutorwiththeProcessPoolExecutor
fromtheconcurrent.futuresmodule,everythingspeedsup.
Clickheretoviewcodeimage
start=time()
pool=ProcessPoolExecutor(max_workers=2)#Theonechange
results=list(pool.map(gcd,numbers))
end=time()
print(‘Took%.3fseconds’%(end-start))
>>>
Took0.663seconds
Runningonmydual-coremachine,it’ssignificantlyfaster!Howisthispossible?Here’s
whattheProcessPoolExecutorclassactuallydoes(viathelow-levelconstructs
providedbythemultiprocessingmodule):
1.Ittakeseachitemfromthenumbersinputdatatomap.
2.Itserializesitintobinarydatausingthepicklemodule(seeItem44:“Make
pickleReliablewithcopyreg”).
3.Itcopiestheserializeddatafromthemaininterpreterprocesstoachildinterpreter
processoveralocalsocket.
4.Next,itdeserializesthedatabackintoPythonobjectsusingpickleinthechild
process.
5.ItthenimportsthePythonmodulecontainingthegcdfunction.
6.Itrunsthefunctionontheinputdatainparallelwithotherchildprocesses.
7.Itserializestheresultbackintobytes.
8.Itcopiesthosebytesbackthroughthesocket.
9.ItdeserializesthebytesbackintoPythonobjectsintheparentprocess.
10.Finally,itmergestheresultsfrommultiplechildrenintoasinglelisttoreturn.
Althoughitlookssimpletotheprogrammer,themultiprocessingmoduleand
ProcessPoolExecutorclassdoahugeamountofworktomakeparallelismpossible.
Inmostotherlanguages,theonlytouchpointyouneedtocoordinatetwothreadsisa
singlelockoratomicoperation.Theoverheadofusingmultiprocessingishigh
becauseofalloftheserializationanddeserializationthatmusthappenbetweentheparent
andchildprocesses.
Thisschemeiswellsuitedtocertaintypesofisolated,high-leveragetasks.Byisolated,I
meanfunctionsthatdon’tneedtosharestatewithotherpartsoftheprogram.Byhighleverage,Imeansituationsinwhichonlyasmallamountofdatamustbetransferred
betweentheparentandchildprocessestoenablealargeamountofcomputation.The
greatestcommondenominatoralgorithmisoneexampleofthis,butmanyother
mathematicalalgorithmsworksimilarly.
Ifyourcomputationdoesn’thavethesecharacteristics,thentheoverheadof
multiprocessingmaypreventitfromspeedingupyourprogramthrough
parallelization.Whenthathappens,multiprocessingprovidesmoreadvanced
facilitiesforsharedmemory,cross-processlocks,queues,andproxies.Butallofthese
featuresareverycomplex.It’shardenoughtoreasonaboutsuchtoolsinthememory
spaceofasingleprocesssharedbetweenPythonthreads.Extendingthatcomplexityto
otherprocessesandinvolvingsocketsmakesthismuchmoredifficulttounderstand.
Isuggestavoidingallpartsofmultiprocessingandusingthesefeaturesviathe
simplerconcurrent.futuresmodule.Youcanstartbyusingthe
ThreadPoolExecutorclasstorunisolated,high-leveragefunctionsinthreads.Later,
youcanmovetotheProcessPoolExecutortogetaspeedup.Finally,onceyou’ve
completelyexhaustedtheotheroptions,youcanconsiderusingthemultiprocessing
moduledirectly.
ThingstoRemember
MovingCPUbottleneckstoC-extensionmodulescanbeaneffectivewayto
improveperformancewhilemaximizingyourinvestmentinPythoncode.However,
thecostofdoingsoishighandmayintroducebugs.
Themultiprocessingmoduleprovidespowerfultoolsthatcanparallelize
certaintypesofPythoncomputationwithminimaleffort.
Thepowerofmultiprocessingisbestaccessedthroughthe
concurrent.futuresbuilt-inmoduleanditssimple
ProcessPoolExecutorclass.
Theadvancedpartsofthemultiprocessingmoduleshouldbeavoidedbecause
theyaresocomplex.
6.Built-inModules
Pythontakesa“batteriesincluded”approachtothestandardlibrary.Manyotherlanguages
shipwithasmallnumberofcommonpackagesandrequireyoutolookelsewherefor
importantfunctionality.AlthoughPythonalsohasanimpressiverepositoryofcommunitybuiltmodules,itstrivestoprovide,initsdefaultinstallation,themostimportantmodules
forcommonusesofthelanguage.
Thefullsetofstandardmodulesistoolargetocoverinthisbook.ButsomeofthesebuiltinpackagesaresocloselyintertwinedwithidiomaticPythonthattheymayaswellbepart
ofthelanguagespecification.Theseessentialbuilt-inmodulesareespeciallyimportant
whenwritingtheintricate,error-pronepartsofprograms.
Item42:DefineFunctionDecoratorswith
functools.wraps
Pythonhasspecialsyntaxfordecoratorsthatcanbeappliedtofunctions.Decoratorshave
theabilitytorunadditionalcodebeforeandafteranycallstothefunctionstheywrap.This
allowsthemtoaccessandmodifyinputargumentsandreturnvalues.Thisfunctionality
canbeusefulforenforcingsemantics,debugging,registeringfunctions,andmore.
Forexample,sayyouwanttoprinttheargumentsandreturnvalueofafunctioncall.This
isespeciallyhelpfulwhendebuggingastackoffunctioncallsfromarecursivefunction.
Here,Idefinesuchadecorator:
Clickheretoviewcodeimage
deftrace(func):
defwrapper(*args,**kwargs):
result=func(*args,**kwargs)
print(‘%s(%r,%r)->%r’%
(func.__name__,args,kwargs,result))
returnresult
returnwrapper
Icanapplythistoafunctionusingthe@symbol.
Clickheretoviewcodeimage
@trace
deffibonacci(n):
“““Returnthen-thFibonaccinumber”””
ifnin(0,1):
returnn
return(fibonacci(n-2)+fibonacci(n-1))
The@symbolisequivalenttocallingthedecoratoronthefunctionitwrapsandassigning
thereturnvaluetotheoriginalnameinthesamescope.
fibonacci=trace(fibonacci)
Callingthisdecoratedfunctionwillrunthewrappercodebeforeandafterfibonacci
runs,printingtheargumentsandreturnvalueateachlevelintherecursivestack.
fibonacci(3)
>>>
fibonacci((1,),{})->1
fibonacci((0,),{})->0
fibonacci((1,),{})->1
fibonacci((2,),{})->1
fibonacci((3,),{})->2
Thisworkswell,butithasanunintendedsideeffect.Thevaluereturnedbythedecorator
—thefunctionthat’scalledabove—doesn’tthinkit’snamedfibonacci.
Clickheretoviewcodeimage
print(fibonacci)
>>>
<functiontrace.<locals>.wrapperat0x107f7ed08>
Thecauseofthisisn’thardtosee.Thetracefunctionreturnsthewrapperitdefines.
Thewrapperfunctioniswhat’sassignedtothefibonaccinameinthecontaining
modulebecauseofthedecorator.Thisbehaviorisproblematicbecauseitunderminestools
thatdointrospection,suchasdebuggers(seeItem57:“ConsiderInteractiveDebugging
withpdb”)andobjectserializers(seeItem44:“MakepickleReliablewith
copyreg”).
Forexample,thehelpbuilt-infunctionisuselessonthedecoratedfibonacci
function.
Clickheretoviewcodeimage
help(fibonacci)
>>>
Helponfunctionwrapperinmodule__main__:
wrapper(*args,**kwargs)
Thesolutionistousethewrapshelperfunctionfromthefunctoolsbuilt-inmodule.
Thisisadecoratorthathelpsyouwritedecorators.Applyingittothewrapperfunction
willcopyalloftheimportantmetadataabouttheinnerfunctiontotheouterfunction.
Clickheretoviewcodeimage
deftrace(func):
@wraps(func)
defwrapper(*args,**kwargs):
#…
returnwrapper
@trace
deffibonacci(n):
#…
Now,runningthehelpfunctionproducestheexpectedresult,eventhoughthefunctionis
decorated.
Clickheretoviewcodeimage
help(fibonacci)
>>>
Helponfunctionfibonacciinmodule__main__:
fibonacci(n)
Returnthen-thFibonaccinumber
Callinghelpisjustoneexampleofhowdecoratorscansubtlycauseproblems.Python
functionshavemanyotherstandardattributes(e.g.,__name__,__module__)thatmust
bepreservedtomaintaintheinterfaceoffunctionsinthelanguage.Usingwrapsensures
thatyou’llalwaysgetthecorrectbehavior.
ThingstoRemember
DecoratorsarePythonsyntaxforallowingonefunctiontomodifyanotherfunction
atruntime.
Usingdecoratorscancausestrangebehaviorsintoolsthatdointrospection,suchas
debuggers.
Usethewrapsdecoratorfromthefunctoolsbuilt-inmodulewhenyoudefine
yourowndecoratorstoavoidanyissues.
Item43:ConsidercontextlibandwithStatementsfor
Reusabletry/finallyBehavior
ThewithstatementinPythonisusedtoindicatewhencodeisrunninginaspecial
context.Forexample,mutualexclusionlocks(seeItem38:“UseLocktoPreventData
RacesinThreads”)canbeusedinwithstatementstoindicatethattheindentedcodeonly
runswhilethelockisheld.
lock=Lock()
withlock:
print(‘Lockisheld’)
Theexampleaboveisequivalenttothistry/finallyconstructionbecausetheLock
classproperlyenablesthewithstatement.
lock.acquire()
try:
print(‘Lockisheld’)
finally:
lock.release()
Thewithstatementversionofthisisbetterbecauseiteliminatestheneedtowritethe
repetitivecodeofthetry/finallyconstruction.It’seasytomakeyourobjectsand
functionscapableofuseinwithstatementsbyusingthecontextlibbuilt-inmodule.
Thismodulecontainsthecontextmanagerdecorator,whichletsasimplefunctionbe
usedinwithstatements.Thisismucheasierthandefininganewclasswiththespecial
methods__enter__and__exit__(thestandardway).
Forexample,sayyouwantaregionofyourcodetohavemoredebugloggingsometimes.
Here,Idefineafunctionthatdoesloggingattwoseveritylevels:
Clickheretoviewcodeimage
defmy_function():
logging.debug(‘Somedebugdata’)
logging.error(‘Errorloghere’)
logging.debug(‘Moredebugdata’)
ThedefaultloglevelformyprogramisWARNING,soonlytheerrormessagewillprintto
screenwhenIrunthefunction.
my_function()
>>>
Errorloghere
Icanelevatetheloglevelofthisfunctiontemporarilybydefiningacontextmanager.This
helperfunctionbooststheloggingseveritylevelbeforerunningthecodeinthewith
blockandreducestheloggingseveritylevelafterward.
Clickheretoviewcodeimage
@contextmanager
defdebug_logging(level):
logger=logging.getLogger()
old_level=logger.getEffectiveLevel()
logger.setLevel(level)
try:
yield
finally:
logger.setLevel(old_level)
Theyieldexpressionisthepointatwhichthewithblock’scontentswillexecute.Any
exceptionsthathappeninthewithblockwillbere-raisedbytheyieldexpressionfor
youtocatchinthehelperfunction(seeItem40:“ConsiderCoroutinestoRunMany
FunctionsConcurrently”foranexplanationofhowthatworks).
Now,Icancallthesameloggingfunctionagain,butinthedebug_loggingcontext.
Thistime,allofthedebugmessagesareprintedtothescreenduringthewithblock.The
samefunctionrunningoutsidethewithblockwon’tprintdebugmessages.
Clickheretoviewcodeimage
withdebug_logging(logging.DEBUG):
print(‘Inside:’)
my_function()
print(‘After:’)
my_function()
>>>
Inside:
Somedebugdata
Errorloghere
Moredebugdata
After:
Errorloghere
UsingwithTargets
Thecontextmanagerpassedtoawithstatementmayalsoreturnanobject.Thisobjectis
assignedtoalocalvariableintheaspartofthecompoundstatement.Thisgivesthecode
runninginthewithblocktheabilitytodirectlyinteractwithitscontext.
Forexample,sayyouwanttowriteafileandensurethatit’salwaysclosedcorrectly.You
candothisbypassingopentothewithstatement.openreturnsafilehandlefortheas
targetofwithandwillclosethehandlewhenthewithblockexits.
Clickheretoviewcodeimage
withopen(‘/tmp/my_output.txt’,‘w’)ashandle:
handle.write(‘Thisissomedata!’)
Thisapproachispreferabletomanuallyopeningandclosingthefilehandleeverytime.It
givesyouconfidencethatthefileiseventuallyclosedwhenexecutionleavesthewith
statement.Italsoencouragesyoutoreducetheamountofcodethatexecuteswhilethefile
handleisopen,whichisgoodpracticeingeneral.
Toenableyourownfunctionstosupplyvaluesforastargets,allyouneedtodoisyield
avaluefromyourcontextmanager.Forexample,hereIdefineacontextmanagertofetch
aLoggerinstance,setitslevel,andthenyielditfortheastarget.
Clickheretoviewcodeimage
@contextmanager
deflog_level(level,name):
logger=logging.getLogger(name)
old_level=logger.getEffectiveLevel()
logger.setLevel(level)
try:
yieldlogger
finally:
logger.setLevel(old_level)
Callingloggingmethodslikedebugontheastargetwillproduceoutputbecausethe
loggingseveritylevelissetlowenoughinthewithblock.Usingtheloggingmodule
directlywon’tprintanythingbecausethedefaultloggingseveritylevelforthedefault
programloggerisWARNING.
Clickheretoviewcodeimage
withlog_level(logging.DEBUG,‘my-log’)aslogger:
logger.debug(‘Thisismymessage!’)
logging.debug(‘Thiswillnotprint’)
>>>
Thisismymessage!
Afterthewithstatementexits,callingdebugloggingmethodsontheLoggernamed
'my-log'willnotprintanythingbecausethedefaultloggingseveritylevelhasbeen
restored.Errorlogmessageswillalwaysprint.
Clickheretoviewcodeimage
logger=logging.getLogger(‘my-log’)
logger.debug(‘Debugwillnotprint’)
logger.error(‘Errorwillprint’)
>>>
Errorwillprint
ThingstoRemember
Thewithstatementallowsyoutoreuselogicfromtry/finallyblocksand
reducevisualnoise.
Thecontextlibbuilt-inmoduleprovidesacontextmanagerdecoratorthat
makesiteasytouseyourownfunctionsinwithstatements.
Thevalueyieldedbycontextmanagersissuppliedtotheaspartofthewith
statement.It’susefulforlettingyourcodedirectlyaccessthecauseofthespecial
context.
Item44:MakepickleReliablewithcopyreg
Thepicklebuilt-inmodulecanserializePythonobjectsintoastreamofbytesand
deserializebytesbackintoobjects.Pickledbytestreamsshouldn’tbeusedto
communicatebetweenuntrustedparties.ThepurposeofpickleistoletyoupassPython
objectsbetweenprogramsthatyoucontroloverbinarychannels.
Note
Thepicklemodule’sserializationformatisunsafebydesign.Theserializeddata
containswhatisessentiallyaprogramthatdescribeshowtoreconstructtheoriginal
Pythonobject.Thismeansamaliciouspicklepayloadcouldbeusedto
compromiseanypartofthePythonprogramthatattemptstodeserializeit.
Incontrast,thejsonmoduleissafebydesign.SerializedJSONdatacontainsa
simpledescriptionofanobjecthierarchy.DeserializingJSONdatadoesnotexpose
aPythonprogramtoanyadditionalrisk.FormatslikeJSONshouldbeusedfor
communicationbetweenprogramsorpeoplethatdon’ttrusteachother.
Forexample,sayyouwanttouseaPythonobjecttorepresentthestateofaplayer’s
progressinagame.Thegamestateincludestheleveltheplayerisonandthenumberof
livesheorshehasremaining.
classGameState(object):
def__init__(self):
self.level=0
self.lives=4
Theprogrammodifiesthisobjectasthegameruns.
Clickheretoviewcodeimage
state=GameState()
state.level+=1#Playerbeatalevel
state.lives-=1#Playerhadtotryagain
Whentheuserquitsplaying,theprogramcansavethestateofthegametoafilesoitcan
beresumedatalatertime.Thepicklemodulemakesiteasytodothis.Here,Idump
theGameStateobjectdirectlytoafile:
Clickheretoviewcodeimage
state_path=‘/tmp/game_state.bin’
withopen(state_path,‘wb’)asf:
pickle.dump(state,f)
Later,IcanloadthefileandgetbacktheGameStateobjectasifithadneverbeen
serialized.
Clickheretoviewcodeimage
withopen(state_path,‘rb’)asf:
state_after=pickle.load(f)
print(state_after.__dict__)
>>>
{‘lives’:3,‘level’:1}
Theproblemwiththisapproachiswhathappensasthegame’sfeaturesexpandovertime.
Imagineyouwanttheplayertoearnpointstowardsahighscore.Totracktheplayer’s
points,you’daddanewfieldtotheGameStateclass.
classGameState(object):
def__init__(self):
#…
self.points=0
SerializingthenewversionoftheGameStateclassusingpicklewillworkexactlyas
before.Here,Isimulatetheround-tripthroughafilebyserializingtoastringwithdumps
andbacktoanobjectwithloads:
Clickheretoviewcodeimage
state=GameState()
serialized=pickle.dumps(state)
state_after=pickle.loads(serialized)
print(state_after.__dict__)
>>>
{‘lives’:4,‘level’:0,‘points’:0}
ButwhathappenstooldersavedGameStateobjectsthattheusermaywanttoresume?
Here,Iunpickleanoldgamefileusingaprogramwiththenewdefinitionofthe
GameStateclass:
Clickheretoviewcodeimage
withopen(state_path,‘rb’)asf:
state_after=pickle.load(f)
print(state_after.__dict__)
>>>
{‘lives’:3,‘level’:1}
Thepointsattributeismissing!Thisisespeciallyconfusingbecausethereturnedobject
isaninstanceofthenewGameStateclass.
Clickheretoviewcodeimage
assertisinstance(state_after,GameState)
Thisbehaviorisabyproductofthewaythepicklemoduleworks.Itsprimaryusecase
ismakingiteasytoserializeobjects.Assoonasyouruseofpickleexpandsbeyond
trivialusage,themodule’sfunctionalitystartstobreakdowninsurprisingways.
Fixingtheseproblemsisstraightforwardusingthecopyregbuilt-inmodule.The
copyregmoduleletsyouregisterthefunctionsresponsibleforserializingPython
objects,allowingyoutocontrolthebehaviorofpickleandmakeitmorereliable.
DefaultAttributeValues
Inthesimplestcase,youcanuseaconstructorwithdefaultarguments(seeItem19:
“ProvideOptionalBehaviorwithKeywordArguments”)toensurethatGameState
objectswillalwayshaveallattributesafterunpickling.Here,Iredefinetheconstructorthis
way:
Clickheretoviewcodeimage
classGameState(object):
def__init__(self,level=0,lives=4,points=0):
self.level=level
self.lives=lives
self.points=points
Tousethisconstructorforpickling,IdefineahelperfunctionthattakesaGameState
objectandturnsitintoatupleofparametersforthecopyregmodule.Thereturnedtuple
containsthefunctiontouseforunpicklingandtheparameterstopasstotheunpickling
function.
Clickheretoviewcodeimage
defpickle_game_state(game_state):
kwargs=game_state.__dict__
returnunpickle_game_state,(kwargs,)
Now,Ineedtodefinetheunpickle_game_statehelper.Thisfunctiontakes
serializeddataandparametersfrompickle_game_stateandreturnsthe
correspondingGameStateobject.It’satinywrapperaroundtheconstructor.
Clickheretoviewcodeimage
defunpickle_game_state(kwargs):
returnGameState(**kwargs)
Now,Iregisterthesewiththecopyregbuilt-inmodule.
Clickheretoviewcodeimage
copyreg.pickle(GameState,pickle_game_state)
Serializinganddeserializingworksasbefore.
Clickheretoviewcodeimage
state=GameState()
state.points+=1000
serialized=pickle.dumps(state)
state_after=pickle.loads(serialized)
print(state_after.__dict__)
>>>
{‘lives’:4,‘level’:0,‘points’:1000}
Withthisregistrationdone,nowIcanchangethedefinitionofGameStatetogivethe
playeracountofmagicspellstouse.ThischangeissimilartowhenIaddedthepoints
fieldtoGameState.
Clickheretoviewcodeimage
classGameState(object):
def__init__(self,level=0,lives=4,points=0,magic=5):
#…
Butunlikebefore,deserializinganoldGameStateobjectwillresultinvalidgamedata
insteadofmissingattributes.Thisworksbecauseunpickle_game_statecallsthe
GameStateconstructordirectly.Theconstructor’skeywordargumentshavedefault
valueswhenparametersaremissing.Thiscausesoldgamestatefilestoreceivethedefault
valueforthenewmagicfieldwhentheyaredeserialized.
Clickheretoviewcodeimage
state_after=pickle.loads(serialized)
print(state_after.__dict__)
>>>
{‘level’:0,‘points’:1000,‘magic’:5,‘lives’:4}
VersioningClasses
Sometimesyou’llneedtomakebackwards-incompatiblechangestoyourPythonobjects
byremovingfields.Thispreventsthedefaultargumentapproachtoserializationfrom
working.
Forexample,sayyourealizethatalimitednumberoflivesisabadidea,andyouwantto
removetheconceptoflivesfromthegame.Here,IredefinetheGameStatetonolonger
havealivesfield:
Clickheretoviewcodeimage
classGameState(object):
def__init__(self,level=0,points=0,magic=5):
#…
Theproblemisthatthisbreaksdeserializingoldgamedata.Allfieldsfromtheolddata,
evenonesremovedfromtheclass,willbepassedtotheGameStateconstructorbythe
unpickle_game_statefunction.
Clickheretoviewcodeimage
pickle.loads(serialized)
>>>
TypeError:__init__()gotanunexpectedkeywordargument‘lives’
Thesolutionistoaddaversionparametertothefunctionssuppliedtocopyreg.New
serializeddatawillhaveaversionof2specifiedwhenpicklinganewGameState
object.
Clickheretoviewcodeimage
defpickle_game_state(game_state):
kwargs=game_state.__dict__
kwargs[‘version’]=2
returnunpickle_game_state,(kwargs,)
Oldversionsofthedatawillnothaveaversionargumentpresent,allowingyouto
manipulatetheargumentspassedtotheGameStateconstructoraccordingly.
Clickheretoviewcodeimage
defunpickle_game_state(kwargs):
version=kwargs.pop(‘version’,1)
ifversion==1:
kwargs.pop(‘lives’)
returnGameState(**kwargs)
Now,deserializinganoldobjectworksproperly.
Clickheretoviewcodeimage
copyreg.pickle(GameState,pickle_game_state)
state_after=pickle.loads(serialized)
print(state_after.__dict__)
>>>
{‘magic’:5,‘level’:0,‘points’:1000}
Youcancontinuethisapproachtohandlechangesbetweenfutureversionsofthesame
class.Anylogicyouneedtoadaptanoldversionoftheclasstoanewversionoftheclass
cangointheunpickle_game_statefunction.
StableImportPaths
Oneotherissueyoumayencounterwithpickleisbreakagefromrenamingaclass.
Oftenoverthelifecycleofaprogram,you’llrefactoryourcodebyrenamingclassesand
movingthemtoothermodules.Unfortunately,thiswillbreakthepicklemoduleunless
you’recareful.
Here,IrenametheGameStateclasstoBetterGameState,removingtheoldclass
fromtheprogramentirely:
Clickheretoviewcodeimage
classBetterGameState(object):
def__init__(self,level=0,points=0,magic=5):
#…
AttemptingtodeserializeanoldGameStateobjectwillnowfailbecausetheclasscan’t
befound.
Clickheretoviewcodeimage
pickle.loads(serialized)
>>>
AttributeError:Can’tgetattribute‘GameState’on<module‘__main__’from
‘my_code.py’>
Thecauseofthisexceptionisthattheimportpathoftheserializedobject’sclassis
encodedinthepickleddata.
Clickheretoviewcodeimage
print(serialized[:25])
>>>
b’\x80\x03c__main__\nGameState\nq\x00)’
Thesolutionistousecopyregagain.Youcanspecifyastableidentifierforthefunction
touseforunpicklinganobject.Thisallowsyoutotransitionpickleddatatodifferent
classeswithdifferentnameswhenit’sdeserialized.Itgivesyoualevelofindirection.
Clickheretoviewcodeimage
copyreg.pickle(BetterGameState,pickle_game_state)
Afterusingcopyreg,youcanseethattheimportpathtopickle_game_stateis
encodedintheserializeddatainsteadofBetterGameState.
Clickheretoviewcodeimage
state=BetterGameState()
serialized=pickle.dumps(state)
print(serialized[:35])
>>>
b’\x80\x03c__main__\nunpickle_game_state\nq\x00}’
Theonlygotchaisthatyoucan’tchangethepathofthemoduleinwhichthe
unpickle_game_statefunctionispresent.Onceyouserializedatawithafunction,it
mustremainavailableonthatimportpathfordeserializinginthefuture.
ThingstoRemember
Thepicklebuilt-inmoduleisonlyusefulforserializinganddeserializingobjects
betweentrustedprograms.
Thepicklemodulemaybreakdownwhenusedformorethantrivialusecases.
Usethecopyregbuilt-inmodulewithpickletoaddmissingattributevalues,
allowversioningofclasses,andprovidestableimportpaths.
Item45:UsedatetimeInsteadoftimeforLocalClocks
CoordinatedUniversalTime(UTC)isthestandard,time-zone-independentrepresentation
oftime.UTCworksgreatforcomputersthatrepresenttimeassecondssincetheUNIX
epoch.ButUTCisn’tidealforhumans.Humansreferencetimerelativetowherethey’re
currentlylocated.Peoplesay“noon”or“8am”insteadof“UTC15:00minus7hours.”If
yourprogramhandlestime,you’llprobablyfindyourselfconvertingtimebetweenUTC
andlocalclockstomakeiteasierforhumanstounderstand.
Pythonprovidestwowaysofaccomplishingtimezoneconversions.Theoldway,using
thetimebuilt-inmodule,isdisastrouslyerrorprone.Thenewway,usingthedatetime
built-inmodule,worksgreatwithsomehelpfromthecommunity-builtpackagenamed
pytz.
Youshouldbeacquaintedwithbothtimeanddatetimetothoroughlyunderstandwhy
datetimeisthebestchoiceandtimeshouldbeavoided.
ThetimeModule
Thelocaltimefunctionfromthetimebuilt-inmoduleletsyouconvertaUNIX
timestamp(secondssincetheUNIXepochinUTC)toalocaltimethatmatchesthehost
computer’stimezone(PacificDaylightTime,inmycase).
Clickheretoviewcodeimage
fromtimeimportlocaltime,strftime
now=1407694710
local_tuple=localtime(now)
time_format=‘%Y-%m-%d%H:%M:%S’
time_str=strftime(time_format,local_tuple)
print(time_str)
>>>
2014-08-1011:18:30
You’lloftenneedtogotheotherwayaswell,startingwithuserinputinlocaltimeand
convertingittoUTCtime.Youcandothisbyusingthestrptimefunctiontoparsethe
timestring,thencallmktimetoconvertlocaltimetoaUNIXtimestamp.
Clickheretoviewcodeimage
fromtimeimportmktime,strptime
time_tuple=strptime(time_str,time_format)
utc_now=mktime(time_tuple)
print(utc_now)
>>>
1407694710.0
Howdoyouconvertlocaltimeinonetimezonetolocaltimeinanother?Forexample,
sayyouaretakingaflightbetweenSanFranciscoandNewYork,andwanttoknowwhat
timeitwillbeinSanFranciscoonceyou’vearrivedinNewYork.
Directlymanipulatingthereturnvaluesfromthetime,localtime,andstrptime
functionstodotimezoneconversionsisabadidea.Timezoneschangeallthetimedueto
locallaws.It’stoocomplicatedtomanageyourself,especiallyifyouwanttohandleevery
globalcityforflightdepartureandarrival.
Manyoperatingsystemshaveconfigurationfilesthatkeepupwiththetimezonechanges
automatically.Pythonletsyouusethesetimezonesthroughthetimemodule.For
example,hereIparsethedeparturetimefromtheSanFranciscotimezoneofPacific
DaylightTime:
Clickheretoviewcodeimage
parse_format=‘%Y-%m-%d%H:%M:%S%Z’
depart_sfo=‘2014-05-0115:45:16PDT’
time_tuple=strptime(depart_sfo,parse_format)
time_str=strftime(time_format,time_tuple)
print(time_str)
>>>
2014-05-0115:45:16
AfterseeingthatPDTworkswiththestrptimefunction,youmightalsoassumethat
othertimezonesknowntomycomputerwillalsowork.Unfortunately,thisisn’tthecase.
Instead,strptimeraisesanexceptionwhenitseesEasternDaylightTime(thetime
zoneforNewYork).
Clickheretoviewcodeimage
arrival_nyc=‘2014-05-0123:33:24EDT’
time_tuple=strptime(arrival_nyc,time_format)
>>>
ValueError:unconverteddataremains:EDT
Theproblemhereistheplatform-dependentnatureofthetimemodule.Itsactual
behaviorisdeterminedbyhowtheunderlyingCfunctionsworkwiththehostoperating
system.ThismakesthefunctionalityofthetimemoduleunreliableinPython.Thetime
modulefailstoconsistentlyworkproperlyformultiplelocaltimes.Thus,youshould
avoidthetimemoduleforthispurpose.Ifyoumustusetime,onlyuseittoconvert
betweenUTCandthehostcomputer’slocaltime.Forallothertypesofconversions,use
thedatetimemodule.
ThedatetimeModule
ThesecondoptionforrepresentingtimesinPythonisthedatetimeclassfromthe
datetimebuilt-inmodule.Likethetimemodule,datetimecanbeusedtoconvert
fromthecurrenttimeinUTCtolocaltime.
Here,ItakethepresenttimeinUTCandconvertittomycomputer’slocaltime(Pacific
DaylightTime):
Clickheretoviewcodeimage
fromdatetimeimportdatetime,timezone
now=datetime(2014,8,10,18,18,30)
now_utc=now.replace(tzinfo=timezone.utc)
now_local=now_utc.astimezone()
print(now_local)
>>>
2014-08-1011:18:30-07:00
ThedatetimemodulecanalsoeasilyconvertalocaltimebacktoaUNIXtimestampin
UTC.
Clickheretoviewcodeimage
time_str=‘2014-08-1011:18:30’
now=datetime.strptime(time_str,time_format)
time_tuple=now.timetuple()
utc_now=mktime(time_tuple)
print(utc_now)
>>>
1407694710.0
Unlikethetimemodule,thedatetimemodulehasfacilitiesforreliablyconverting
fromonelocaltimetoanotherlocaltime.However,datetimeonlyprovidesthe
machineryfortimezoneoperationswithitstzinfoclassandrelatedmethods.What’s
missingarethetimezonedefinitionsbesidesUTC.
Luckily,thePythoncommunityhasaddressedthisgapwiththepytzmodulethat’s
availablefordownloadfromthePythonPackageIndex
(https://pypi.python.org/pypi/pytz/).pytzcontainsafulldatabaseofeverytimezone
definitionyoumightneed.
Tousepytzeffectively,youshouldalwaysconvertlocaltimestoUTCfirst.Performany
datetimeoperationsyouneedontheUTCvalues(suchasoffsetting).Then,convertto
localtimesasafinalstep.
Forexample,hereIconvertanNYCflightarrivaltimetoaUTCdatetime.Although
someofthesecallsseemredundant,allofthemarenecessarywhenusingpytz.
Clickheretoviewcodeimage
arrival_nyc=‘2014-05-0123:33:24’
nyc_dt_naive=datetime.strptime(arrival_nyc,time_format)
eastern=pytz.timezone(‘US/Eastern’)
nyc_dt=eastern.localize(nyc_dt_naive)
utc_dt=pytz.utc.normalize(nyc_dt.astimezone(pytz.utc))
print(utc_dt)
>>>
2014-05-0203:33:24+00:00
OnceIhaveaUTCdatetime,IcanconvertittoSanFranciscolocaltime.
Clickheretoviewcodeimage
pacific=pytz.timezone(‘US/Pacific’)
sf_dt=pacific.normalize(utc_dt.astimezone(pacific))
print(sf_dt)
>>>
2014-05-0120:33:24-07:00
Justaseasily,IcanconvertittothelocaltimeinNepal.
Clickheretoviewcodeimage
nepal=pytz.timezone(‘Asia/Katmandu’)
nepal_dt=nepal.normalize(utc_dt.astimezone(nepal))
print(nepal_dt)
>>>
2014-05-0209:18:24+05:45
Withdatetimeandpytz,theseconversionsareconsistentacrossallenvironments
regardlessofwhatoperatingsystemthehostcomputerisrunning.
ThingstoRemember
Avoidusingthetimemodulefortranslatingbetweendifferenttimezones.
Usethedatetimebuilt-inmodulealongwiththepytzmoduletoreliablyconvert
betweentimesindifferenttimezones.
AlwaysrepresenttimeinUTCanddoconversionstolocaltimeasthefinalstep
beforepresentation.
Item46:UseBuilt-inAlgorithmsandDataStructures
Whenyou’reimplementingPythonprogramsthathandleanon-trivialamountofdata,
you’lleventuallyseeslowdownscausedbythealgorithmiccomplexityofyourcode.This
usuallyisn’ttheresultofPython’sspeedasalanguage(seeItem41:“Consider
concurrent.futuresforTrueParallelism”ifitis).Theissue,morelikely,isthat
youaren’tusingthebestalgorithmsanddatastructuresforyourproblem.
Luckily,thePythonstandardlibraryhasmanyofthealgorithmsanddatastructuresyou’ll
needtousebuiltin.Besidesspeed,usingthesecommonalgorithmsanddatastructures
canmakeyourlifeeasier.Someofthemostvaluabletoolsyoumaywanttousearetricky
toimplementcorrectly.Avoidingreimplementationofcommonfunctionalitywillsaveyou
timeandheadaches.
Double-endedQueue
Thedequeclassfromthecollectionsmoduleisadouble-endedqueue.Itprovides
constanttimeoperationsforinsertingorremovingitemsfromitsbeginningorend.This
makesitidealforfirst-in-first-out(FIFO)queues.
Clickheretoviewcodeimage
fifo=deque()
fifo.append(1)#Producer
x=fifo.popleft()#Consumer
Thelistbuilt-intypealsocontainsanorderedsequenceofitemslikeaqueue.Youcan
insertorremoveitemsfromtheendofalistinconstanttime.Butinsertingorremoving
itemsfromtheheadofalisttakeslineartime,whichismuchslowerthantheconstant
timeofadeque.
OrderedDictionary
Standarddictionariesareunordered.Thatmeansadictwiththesamekeysandvalues
canresultindifferentordersofiteration.Thisbehaviorisasurprisingbyproductofthe
waythedictionary’sfasthashtableisimplemented.
Clickheretoviewcodeimage
a={}
a[‘foo’]=1
a[‘bar’]=2
#Randomlypopulate‘b’tocausehashconflicts
whileTrue:
z=randint(99,1013)
b={}
foriinrange(z):
b[i]=i
b[‘foo’]=1
b[‘bar’]=2
foriinrange(z):
delb[i]
ifstr(b)!=str(a):
break
print(a)
print(b)
print(‘Equal?’,a==b)
>>>
{‘foo’:1,‘bar’:2}
{‘bar’:2,‘foo’:1}
Equal?True
TheOrderedDictclassfromthecollectionsmoduleisaspecialtypeof
dictionarythatkeepstrackoftheorderinwhichitskeyswereinserted.Iteratingthekeys
ofanOrderedDicthaspredictablebehavior.Thiscanvastlysimplifytestingand
debuggingbymakingallcodedeterministic.
Clickheretoviewcodeimage
a=OrderedDict()
a[‘foo’]=1
a[‘bar’]=2
b=OrderedDict()
b[‘foo’]=‘red’
b[‘bar’]=‘blue’
forvalue1,value2inzip(a.values(),b.values()):
print(value1,value2)
>>>
1red
2blue
DefaultDictionary
Dictionariesareusefulforbookkeepingandtrackingstatistics.Oneproblemwith
dictionariesisthatyoucan’tassumeanykeysarealreadypresent.Thatmakesitclumsyto
dosimplethingslikeincrementacounterstoredinadictionary.
stats={}
key=‘my_counter’
ifkeynotinstats:
stats[key]=0
stats[key]+=1
Thedefaultdictclassfromthecollectionsmodulesimplifiesthisby
automaticallystoringadefaultvaluewhenakeydoesn’texist.Allyouhavetodois
provideafunctionthatwillreturnthedefaultvalueeachtimeakeyismissing.Inthis
example,theintbuilt-infunctionreturns0(seeItem23:“AcceptFunctionsforSimple
InterfacesInsteadofClasses”foranotherexample).Now,incrementingacounteris
simple.
stats=defaultdict(int)
stats[‘my_counter’]+=1
HeapQueue
Heapsareusefuldatastructuresformaintainingapriorityqueue.Theheapqmodule
providesfunctionsforcreatingheapsinstandardlisttypeswithfunctionslike
heappush,heappop,andnsmallest.
Itemsofanyprioritycanbeinsertedintotheheapinanyorder.
a=[]
heappush(a,5)
heappush(a,3)
heappush(a,7)
heappush(a,4)
Itemsarealwaysremovedbyhighestpriority(lowestnumber)first.
Clickheretoviewcodeimage
print(heappop(a),heappop(a),heappop(a),heappop(a))
>>>
3457
Theresultinglistiseasytouseoutsideofheapq.Accessingthe0indexoftheheap
willalwaysreturnthesmallestitem.
Clickheretoviewcodeimage
a=[]
heappush(a,5)
heappush(a,3)
heappush(a,7)
heappush(a,4)
asserta[0]==nsmallest(1,a)[0]==3
Callingthesortmethodonthelistmaintainstheheapinvariant.
print(‘Before:’,a)
a.sort()
print(‘After:‘,a)
>>>
Before:[3,4,7,5]
After:[3,4,5,7]
Eachoftheseheapqoperationstakeslogarithmictimeinproportiontothelengthofthe
list.DoingthesameworkwithastandardPythonlistwouldscalelinearly.
Bisection
Searchingforaniteminalisttakeslineartimeproportionaltoitslengthwhenyoucall
theindexmethod.
x=list(range(10**6))
i=x.index(991234)
Thebisectmodule’sfunctions,suchasbisect_left,provideanefficientbinary
searchthroughasequenceofsorteditems.Theindexitreturnsistheinsertionpointofthe
valueintothesequence.
i=bisect_left(x,991234)
Thecomplexityofabinarysearchislogarithmic.Thatmeansusingbisecttosearcha
listof1millionitemstakesroughlythesameamountoftimeasusingindextolinearly
searchalistof14items.It’swayfaster!
IteratorTools
Theitertoolsbuilt-inmodulecontainsalargenumberoffunctionsthatareusefulfor
organizingandinteractingwithiterators(seeItem16:“ConsiderGeneratorsInsteadof
ReturningLists”andItem17:“BeDefensiveWhenIteratingOverArguments”for
background).NotalloftheseareavailableinPython2,buttheycaneasilybebuiltusing
simplerecipesdocumentedinthemodule.Seehelp(itertools)inaninteractive
Pythonsessionformoredetails.
Theitertoolsfunctionsfallintothreemaincategories:
Linkingiteratorstogether
•chain:Combinesmultipleiteratorsintoasinglesequentialiterator.
•cycle:Repeatsaniterator’sitemsforever.
•tee:Splitsasingleiteratorintomultipleparalleliterators.
•zip_longest:Avariantofthezipbuilt-infunctionthatworkswellwith
iteratorsofdifferentlengths.
Filteringitemsfromaniterator
•islice:Slicesaniteratorbynumericalindexeswithoutcopying.
•takewhile:Returnsitemsfromaniteratorwhileapredicatefunctionreturns
True.
•dropwhile:Returnsitemsfromaniteratoroncethepredicatefunctionreturns
Falseforthefirsttime.
•filterfalse:Returnsallitemsfromaniteratorwhereapredicatefunction
returnsFalse.Theoppositeofthefilterbuilt-infunction.
Combinationsofitemsfromiterators
•product:ReturnstheCartesianproductofitemsfromaniterator,whichisa
nicealternativetodeeplynestedlistcomprehensions.
•permutations:ReturnsorderedpermutationsoflengthNwithitemsfroman
iterator.
•combination:ReturnstheunorderedcombinationsoflengthNwith
unrepeateditemsfromaniterator.
ThereareevenmorefunctionsandrecipesavailableintheitertoolsmodulethatI
don’tmentionhere.Wheneveryoufindyourselfdealingwithsometrickyiterationcode,
it’sworthlookingattheitertoolsdocumentationagaintoseewhetherthere’s
anythingthereforyoutouse.
ThingstoRemember
UsePython’sbuilt-inmodulesforalgorithmsanddatastructures.
Don’treimplementthisfunctionalityyourself.It’shardtogetright.
Item47:UsedecimalWhenPrecisionIsParamount
Pythonisanexcellentlanguageforwritingcodethatinteractswithnumericaldata.
Python’sintegertypecanrepresentvaluesofanypracticalsize.Itsdouble-precision
floatingpointtypecomplieswiththeIEEE754standard.Thelanguagealsoprovidesa
standardcomplexnumbertypeforimaginaryvalues.However,thesearen’tenoughfor
everysituation.
Forexample,sayyouwanttocomputetheamounttochargeacustomerforan
internationalphonecall.Youknowthetimeinminutesandsecondsthatthecustomerwas
onthephone(say,3minutes42seconds).Youalsohaveasetrateforthecostofcalling
AntarcticafromtheUnitedStates($1.45/minute).Whatshouldthechargebe?
Withfloatingpointmath,thecomputedchargeseemsreasonable.
rate=1.45
seconds=3*60+42
cost=rate*seconds/60
print(cost)
>>>
5.364999999999999
Butroundingittothenearestwholecentroundsdownwhenyouwantittoroundupto
properlycoverallcostsincurredbythecustomer.
print(round(cost,2))
>>>
5.36
Sayyoualsowanttosupportveryshortphonecallsbetweenplacesthataremuchcheaper
toconnect.Here,Icomputethechargeforaphonecallthatwas5secondslongwitharate
of$0.05/minute:
rate=0.05
seconds=5
cost=rate*seconds/60
print(cost)
>>>
0.004166666666666667
Theresultingfloatissolowthatitroundsdowntozero.Thiswon’tdo!
print(round(cost,2))
>>>
0.0
ThesolutionistousetheDecimalclassfromthedecimalbuilt-inmodule.The
Decimalclassprovidesfixedpointmathof28decimalpointsbydefault.Itcangoeven
higherifrequired.ThisworksaroundtheprecisionissuesinIEEE754floatingpoint
numbers.Theclassalsogivesyoumorecontroloverroundingbehaviors.
Forexample,redoingtheAntarcticacalculationwithDecimalresultsinanexactcharge
insteadofanapproximation.
Clickheretoviewcodeimage
rate=Decimal(‘1.45’)
seconds=Decimal(‘222’)#3*60+42
cost=rate*seconds/Decimal(‘60’)
print(cost)
>>>
5.365
TheDecimalclasshasabuilt-infunctionforroundingtoexactlythedecimalplaceyou
needwiththeroundingbehavioryouwant.
Clickheretoviewcodeimage
rounded=cost.quantize(Decimal(‘0.01’),rounding=ROUND_UP)
print(rounded)
>>>
5.37
Usingthequantizemethodthiswayalsoproperlyhandlesthesmallusagecasefor
short,cheapphonecalls.Here,youcanseetheDecimalcostisstilllessthan1centfor
thecall:
Clickheretoviewcodeimage
rate=Decimal(‘0.05’)
seconds=Decimal(‘5’)
cost=rate*seconds/Decimal(‘60’)
print(cost)
>>>
0.004166666666666666666666666667
Butthequantizebehaviorensuresthatthisisroundeduptoonewholecent.
Clickheretoviewcodeimage
rounded=cost.quantize(Decimal(‘0.01’),rounding=ROUND_UP)
print(rounded)
>>>
0.01
WhileDecimalworksgreatforfixedpointnumbers,itstillhaslimitationsinits
precision(e.g.,1/3willbeanapproximation).Forrepresentingrationalnumberswithno
limittoprecision,considerusingtheFractionclassfromthefractionsbuilt-in
module.
ThingstoRemember
Pythonhasbuilt-intypesandclassesinmodulesthatcanrepresentpracticallyevery
typeofnumericalvalue.
TheDecimalclassisidealforsituationsthatrequirehighprecisionandexact
roundingbehavior,suchascomputationsofmonetaryvalues.
Item48:KnowWheretoFindCommunity-BuiltModules
Pythonhasacentralrepositoryofmodules(https://pypi.python.org)foryoutoinstalland
useinyourprograms.Thesemodulesarebuiltandmaintainedbypeoplelikeyou:the
Pythoncommunity.Whenyoufindyourselffacinganunfamiliarchallenge,thePython
PackageIndex(PyPI)isagreatplacetolookforcodethatwillgetyouclosertoyourgoal.
TousethePackageIndex,you’llneedtouseacommand-linetoolnamedpip.pipis
installedbydefaultinPython3.4andabove(it’salsoaccessiblewithpython-mpip).
Forearlierversions,youcanfindinstructionsforinstallingpiponthePythonPackaging
website(https://packaging.python.org).
Onceinstalled,usingpiptoinstallanewmoduleissimple.Forexample,hereIinstall
thepytzmodulethatIusedinanotheriteminthischapter(seeItem45:“Use
datetimeInsteadoftimeforLocalClocks”):
Clickheretoviewcodeimage
$pip3installpytz
Downloading/unpackingpytz
Downloadingpytz-2014.4.tar.bz2(159kB):159kBdownloaded
Runningsetup.py(…)egg_infoforpackagepytz
Installingcollectedpackages:pytz
Runningsetup.pyinstallforpytz
Successfullyinstalledpytz
Cleaningup…
Intheexampleabove,Iusedthepip3command-linetoinstallthePython3versionof
thepackage.Thepipcommand-line(withoutthe3)isalsoavailableforinstalling
packagesforPython2.Themajorityofpopularpackagesarenowavailableforeither
versionofPython(seeItem1:“KnowWhichVersionofPythonYou’reUsing”).pipcan
alsobeusedwithpyvenvtotracksetsofpackagestoinstallforyourprojects(seeItem
53:“UseVirtualEnvironmentsforIsolatedandReproducibleDependencies”).
EachmoduleinthePyPIhasitsownsoftwarelicense.Mostofthepackages,especially
thepopularones,havefreeoropensourcelicenses(seehttp://opensource.orgfordetails).
Inmostcases,theselicensesallowyoutoincludeacopyofthemodulewithyourprogram
(whenindoubt,talktoalawyer).
ThingstoRemember
ThePythonPackageIndex(PyPI)containsawealthofcommonpackagesthatare
builtandmaintainedbythePythoncommunity.
pipisthecommand-linetooltouseforinstallingpackagesfromPyPI.
pipisinstalledbydefaultinPython3.4andabove;youmustinstallityourselffor
olderversions.
ThemajorityofPyPImodulesarefreeandopensourcesoftware.
7.Collaboration
TherearelanguagefeaturesinPythontohelpyouconstructwell-definedAPIswithclear
interfaceboundaries.ThePythoncommunityhasestablishedbestpracticesthatmaximize
themaintainabilityofcodeovertime.TherearealsostandardtoolsthatshipwithPython
thatenablelargeteamstoworktogetheracrossdisparateenvironments.
CollaboratingwithothersonPythonprogramsrequiresbeingdeliberateabouthowyou
writeyourcode.Evenifyou’reworkingonyourown,chancesareyou’llbeusingcode
writtenbysomeoneelseviathestandardlibraryoropensourcepackages.It’simportantto
understandthemechanismsthatmakeiteasytocollaboratewithotherPython
programmers.
Item49:WriteDocstringsforEveryFunction,Class,and
Module
DocumentationinPythonisextremelyimportantbecauseofthedynamicnatureofthe
language.Pythonprovidesbuilt-insupportforattachingdocumentationtoblocksofcode.
Unlikemanyotherlanguages,thedocumentationfromaprogram’ssourcecodeisdirectly
accessibleastheprogramruns.
Forexample,youcanadddocumentationbyprovidingadocstringimmediatelyafterthe
defstatementofafunction.
Clickheretoviewcodeimage
defpalindrome(word):
“““ReturnTrueifthegivenwordisapalindrome.”””
returnword==word[::-1]
YoucanretrievethedocstringfromwithinthePythonprogramitselfbyaccessingthe
function’s__doc__specialattribute.
Clickheretoviewcodeimage
print(repr(palindrome.__doc__))
>>>
‘ReturnTrueifthegivenwordisapalindrome.’
Docstringscanbeattachedtofunctions,classes,andmodules.Thisconnectionispartof
theprocessofcompilingandrunningaPythonprogram.Supportfordocstringsandthe
__doc__attributehasthreeconsequences:
Theaccessibilityofdocumentationmakesinteractivedevelopmenteasier.Youcan
inspectfunctions,classes,andmodulestoseetheirdocumentationbyusingthe
helpbuilt-infunction.ThismakesthePythoninteractiveinterpreter(thePython
“shell”)andtoolslikeIPythonNotebook(http://ipython.org)ajoytousewhile
you’redevelopingalgorithms,testingAPIs,andwritingcodesnippets.
Astandardwayofdefiningdocumentationmakesiteasytobuildtoolsthatconvert
thetextintomoreappealingformats(likeHTML).Thishasledtoexcellent
documentation-generationtoolsforthePythoncommunity,suchasSphinx
(http://sphinx-doc.org).It’salsoenabledcommunity-fundedsiteslikeReadtheDocs
(https://readthedocs.org)thatprovidefreehostingofbeautiful-looking
documentationforopensourcePythonprojects.
Python’sfirst-class,accessible,andgood-lookingdocumentationencouragespeople
towritemoredocumentation.ThemembersofthePythoncommunityhaveastrong
beliefintheimportanceofdocumentation.There’sanassumptionthat“goodcode”
alsomeanswell-documentedcode.Thismeansthatyoucanexpectmostopen
sourcePythonlibrariestohavedecentdocumentation.
Toparticipateinthisexcellentcultureofdocumentation,youneedtofollowafew
guidelineswhenyouwritedocstrings.ThefulldetailsarediscussedonlineinPEP257
(http://www.python.org/dev/peps/pep-0257/).Thereareafewbest-practicesyoushouldbe
suretofollow.
DocumentingModules
Eachmoduleshouldhaveatop-leveldocstring.Thisisastringliteralthatisthefirst
statementinasourcefile.Itshouldusethreedoublequotes(""").Thegoalofthis
docstringistointroducethemoduleanditscontents.
Thefirstlineofthedocstringshouldbeasinglesentencedescribingthemodule’spurpose.
Theparagraphsthatfollowshouldcontainthedetailsthatallusersofthemoduleshould
knowaboutitsoperation.Themoduledocstringisalsoajumping-offpointwhereyoucan
highlightimportantclassesandfunctionsfoundinthemodule.
Here’sanexampleofamoduledocstring:
Clickheretoviewcodeimage
#words.py
#!/usr/bin/envpython3
“““Libraryfortestingwordsforvariouslinguisticpatterns.
Testinghowwordsrelatetoeachothercanbetrickysometimes!
Thismoduleprovideseasywaystodeterminewhenwordsyou’ve
foundhavespecialproperties.
Availablefunctions:
-palindrome:Determineifawordisapalindrome.
-check_anagram:Determineiftwowordsareanagrams.
…
”””
#…
Ifthemoduleisacommand-lineutility,themoduledocstringisalsoagreatplacetoput
usageinformationforrunningthetoolfromthecommand-line.
DocumentingClasses
Eachclassshouldhaveaclass-leveldocstring.Thislargelyfollowsthesamepatternasthe
module-leveldocstring.Thefirstlineisthesingle-sentencepurposeoftheclass.
Paragraphsthatfollowdiscussimportantdetailsoftheclass’soperation.
Importantpublicattributesandmethodsoftheclassshouldbehighlightedintheclassleveldocstring.Itshouldalsoprovideguidancetosubclassesonhowtoproperlyinteract
withprotectedattributes(seeItem27:“PreferPublicAttributesOverPrivateOnes”)and
thesuperclass’smethods.
Here’sanexampleofaclassdocstring:
Clickheretoviewcodeimage
classPlayer(object):
“““Representsaplayerofthegame.
Subclassesmayoverridethe‘tick’methodtoprovide
customanimationsfortheplayer’smovementdepending
ontheirpowerlevel,etc.
Publicattributes:
-power:Unusedpower-ups(floatbetween0and1).
-coins:Coinsfoundduringthelevel(integer).
”””
#…
DocumentingFunctions
Eachpublicfunctionandmethodshouldhaveadocstring.Thisfollowsthesamepattern
asmodulesandclasses.Thefirstlineisthesingle-sentencedescriptionofwhatthe
functiondoes.Theparagraphsthatfollowshoulddescribeanyspecificbehaviorsandthe
argumentsforthefunction.Anyreturnvaluesshouldbementioned.Anyexceptionsthat
callersmusthandleaspartofthefunction’sinterfaceshouldbeexplained.
Here’sanexampleofafunctiondocstring:
Clickheretoviewcodeimage
deffind_anagrams(word,dictionary):
“““Findallanagramsforaword.
Thisfunctiononlyrunsasfastasthetestfor
membershipinthe‘dictionary’container.Itwill
beslowifthedictionaryisalistandfastif
it’saset.
Args:
word:Stringofthetargetword.
dictionary:Containerwithallstringsthat
areknowntobeactualwords.
Returns:
Listofanagramsthatwerefound.Emptyif
nonewerefound.
”””
#…
Therearealsosomespecialcasesinwritingdocstringsforfunctionsthatareimportantto
know.
Ifyourfunctionhasnoargumentsandasimplereturnvalue,asinglesentence
descriptionisprobablygoodenough.
Ifyourfunctiondoesn’treturnanything,it’sbettertoleaveoutanymentionofthe
returnvalueinsteadofsaying“returnsNone.”
Ifyoudon’texpectyourfunctiontoraiseanexceptionduringnormaloperation,
don’tmentionthatfact.
Ifyourfunctionacceptsavariablenumberofarguments(seeItem18:“Reduce
VisualNoisewithVariablePositionalArguments”)orkeyword-arguments(seeItem
19:“ProvideOptionalBehaviorwithKeywordArguments”),use*argsand
**kwargsinthedocumentedlistofargumentstodescribetheirpurpose.
Ifyourfunctionhasargumentswithdefaultvalues,thosedefaultsshouldbe
mentioned(seeItem20:“UseNoneandDocstringstoSpecifyDynamicDefault
Arguments”).
Ifyourfunctionisagenerator(seeItem16:“ConsiderGeneratorsInsteadof
ReturningLists”),thenyourdocstringshoulddescribewhatthegeneratoryields
whenit’siterated.
Ifyourfunctionisacoroutine(seeItem40:“ConsiderCoroutinestoRunMany
FunctionsConcurrently”),thenyourdocstringshouldcontainwhatthecoroutine
yields,whatitexpectstoreceivefromyieldexpressions,andwhenitwillstop
iteration.
Note
Onceyou’vewrittendocstringsforyourmodules,it’simportanttokeepthe
documentationuptodate.Thedoctestbuilt-inmodulemakesiteasytoexercise
usageexamplesembeddedindocstringstoensurethatyoursourcecodeandits
documentationdon’tdivergeovertime.
ThingstoRemember
Writedocumentationforeverymodule,class,andfunctionusingdocstrings.Keep
themuptodateasyourcodechanges.
Formodules:Introducethecontentsofthemoduleandanyimportantclassesor
functionsallusersshouldknowabout.
Forclasses:Documentbehavior,importantattributes,andsubclassbehaviorinthe
docstringfollowingtheclassstatement.
Forfunctionsandmethods:Documenteveryargument,returnedvalue,raised
exception,andotherbehaviorsinthedocstringfollowingthedefstatement.
Item50:UsePackagestoOrganizeModulesandProvide
StableAPIs
Asthesizeofaprogram’scodebasegrows,it’snaturalforyoutoreorganizeitsstructure.
Yousplitlargerfunctionsintosmallerfunctions.Yourefactordatastructuresintohelper
classes(seeItem22:“PreferHelperClassesOverBookkeepingwithDictionariesand
Tuples”).Youseparatefunctionalityintovariousmodulesthatdependoneachother.
Atsomepoint,you’llfindyourselfwithsomanymodulesthatyouneedanotherlayerin
yourprogramtomakeitunderstandable.Forthispurpose,Pythonprovidespackages.
Packagesaremodulesthatcontainothermodules.
Inmostcases,packagesaredefinedbyputtinganemptyfilenamed__init__.pyinto
adirectory.Once__init__.pyispresent,anyotherPythonfilesinthatdirectorywill
beavailableforimportusingapathrelativetothedirectory.Forexample,imaginethat
youhavethefollowingdirectorystructureinyourprogram.
main.py
mypackage/__init__.py
mypackage/models.py
mypackage/utils.py
Toimporttheutilsmodule,youusetheabsolutemodulenamethatincludesthe
packagedirectory’sname.
#main.py
frommypackageimportutils
Thispatterncontinueswhenyouhavepackagedirectoriespresentwithinotherpackages
(likemypackage.foo.bar).
Note
Python3.4introducesnamespacepackages,amoreflexiblewaytodefine
packages.Namespacepackagescanbecomposedofmodulesfromcompletely
separatedirectories,ziparchives,orevenremotesystems.Fordetailsonhowtouse
theadvancedfeaturesofnamespacepackages,seePEP420
(http://www.python.org/dev/peps/pep-0420/).
ThefunctionalityprovidedbypackageshastwoprimarypurposesinPythonprograms.
Namespaces
Thefirstuseofpackagesistohelpdivideyourmodulesintoseparatenamespaces.This
allowsyoutohavemanymoduleswiththesamefilenamebutdifferentabsolutepathsthat
areunique.Forexample,here’saprogramthatimportsattributesfromtwomoduleswith
thesamename,utils.py.Thisworksbecausethemodulescanbeaddressedbytheir
absolutepaths.
Clickheretoviewcodeimage
#main.py
fromanalysis.utilsimportlog_base2_bucket
fromfrontend.utilsimportstringify
bucket=stringify(log_base2_bucket(33))
Thisapproachbreaksdownwhenthefunctions,classes,orsubmodulesdefinedin
packageshavethesamenames.Forexample,sayyouwanttousetheinspectfunction
fromboththeanalysis.utilsandfrontend.utilsmodules.Importingthe
attributesdirectlywon’tworkbecausethesecondimportstatementwilloverwritethe
valueofinspectinthecurrentscope.
Clickheretoviewcodeimage
#main2.py
fromanalysis.utilsimportinspect
fromfrontend.utilsimportinspect#Overwrites!
Thesolutionistousetheasclauseoftheimportstatementtorenamewhateveryou’ve
importedforthecurrentscope.
Clickheretoviewcodeimage
#main3.py
fromanalysis.utilsimportinspectasanalysis_inspect
fromfrontend.utilsimportinspectasfrontend_inspect
value=33
ifanalysis_inspect(value)==frontend_inspect(value):
print(‘Inspectionequal!’)
Theasclausecanbeusedtorenameanythingyouretrievewiththeimportstatement,
includingentiremodules.Thismakesiteasytoaccessnamespacedcodeandmakeits
identityclearwhenyouuseit.
Note
Anotherapproachforavoidingimportednameconflictsistoalwaysaccessnames
bytheirhighestuniquemodulename.
Fortheexampleabove,you’dfirstimportanalysis.utilsandimport
frontend.utils.Then,you’daccesstheinspectfunctionswiththefull
pathsofanalysis.utils.inspectandfrontend.utils.inspect.
Thisapproachallowsyoutoavoidtheasclausealtogether.Italsomakesit
abundantlycleartonewreadersofthecodewhereeachfunctionisdefined.
StableAPIs
TheseconduseofpackagesinPythonistoprovidestrict,stableAPIsforexternal
consumers.
Whenyou’rewritinganAPIforwiderconsumption,likeanopensourcepackage(see
Item48:“KnowWheretoFindCommunity-BuiltModules”),you’llwanttoprovide
stablefunctionalitythatdoesn’tchangebetweenreleases.Toensurethathappens,it’s
importanttohideyourinternalcodeorganizationfromexternalusers.Thisenablesyouto
refactorandimproveyourpackage’sinternalmoduleswithoutbreakingexistingusers.
PythoncanlimitthesurfaceareaexposedtoAPIconsumersbyusingthe__all__
specialattributeofamoduleorpackage.Thevalueof__all__isalistofeverynameto
exportfromthemoduleaspartofitspublicAPI.Whenconsumingcodedoesfromfoo
import*,onlytheattributesinfoo.__all__willbeimportedfromfoo.If
__all__isn’tpresentinfoo,thenonlypublicattributes,thosewithoutaleading
underscore,areimported(seeItem27:“PreferPublicAttributesOverPrivateOnes”).
Forexample,sayyouwanttoprovideapackageforcalculatingcollisionsbetween
movingprojectiles.Here,Idefinethemodelsmoduleofmypackagetocontainthe
representationofprojectiles:
Clickheretoviewcodeimage
#models.py
__all__=[‘Projectile’]
classProjectile(object):
def__init__(self,mass,velocity):
self.mass=mass
self.velocity=velocity
Ialsodefineautilsmoduleinmypackagetoperformoperationsonthe
Projectileinstances,suchassimulatingcollisionsbetweenthem.
Clickheretoviewcodeimage
#utils.py
from.modelsimportProjectile
__all__=[‘simulate_collision’]
def_dot_product(a,b):
#…
defsimulate_collision(a,b):
#…
Now,I’dliketoprovideallofthepublicpartsofthisAPIasasetofattributesthatare
availableonthemypackagemodule.Thiswillallowdownstreamconsumerstoalways
importdirectlyfrommypackageinsteadofimportingfrommypackage.modelsor
mypackage.utils.ThisensuresthattheAPIconsumer’scodewillcontinuetowork
eveniftheinternalorganizationofmypackagechanges(e.g.,models.pyisdeleted).
TodothiswithPythonpackages,youneedtomodifythe__init__.pyfileinthe
mypackagedirectory.Thisfileactuallybecomesthecontentsofthemypackage
modulewhenit’simported.Thus,youcanspecifyanexplicitAPIformypackageby
limitingwhatyouimportinto__init__.py.Sinceallofmyinternalmodulesalready
specify__all__,Icanexposethepublicinterfaceofmypackagebysimplyimporting
everythingfromtheinternalmodulesandupdating__all__accordingly.
#__init__.py
__all__=[]
from.modelsimport*
__all__+=models.__all__
from.utilsimport*
__all__+=utils.__all__
Here’saconsumeroftheAPIthatdirectlyimportsfrommypackageinsteadofaccessing
theinnermodules:
Clickheretoviewcodeimage
#api_consumer.py
frommypackageimport*
a=Projectile(1.5,3)
b=Projectile(4,1.7)
after_a,after_b=simulate_collision(a,b)
Notably,internal-onlyfunctionslikemypackage.utils._dot_productwillnotbe
availabletotheAPIconsumeronmypackagebecausetheyweren’tpresentin
__all__.Beingomittedfrom__all__meanstheyweren’timportedbythefrom
mypackageimport*statement.Theinternal-onlynamesareeffectivelyhidden.
Thiswholeapproachworksgreatwhenit’simportanttoprovideanexplicit,stableAPI.
However,ifyou’rebuildinganAPIforusebetweenyourownmodules,thefunctionality
of__all__isprobablyunnecessaryandshouldbeavoided.Thenamespacingprovided
bypackagesisusuallyenoughforateamofprogrammerstocollaborateonlargeamounts
ofcodetheycontrolwhilemaintainingreasonableinterfaceboundaries.
Bewareofimport*
Importstatementslikefromximportyareclearbecausethesourceofyis
explicitlythexpackageormodule.Wildcardimportslikefromfooimport*
canalsobeuseful,especiallyininteractivePythonsessions.However,wildcards
makecodemoredifficulttounderstand.
fromfooimport*hidesthesourceofnamesfromnewreadersofthecode.If
amodulehasmultipleimport*statements,you’llneedtocheckallofthe
referencedmodulestofigureoutwhereanamewasdefined.
Namesfromimport*statementswilloverwriteanyconflictingnameswithinthe
containingmodule.Thiscanleadtostrangebugscausedbyaccidentalinteractions
betweenyourcodeandoverlappingnamesfrommultipleimport*statements.
Thesafestapproachistoavoidimport*inyourcodeandexplicitlyimport
nameswiththefromximportystyle.
ThingstoRemember
PackagesinPythonaremodulesthatcontainothermodules.Packagesallowyouto
organizeyourcodeintoseparate,non-conflictingnamespaceswithuniqueabsolute
modulenames.
Simplepackagesaredefinedbyaddingan__init__.pyfiletoadirectorythat
containsothersourcefiles.Thesefilesbecomethechildmodulesofthedirectory’s
package.Packagedirectoriesmayalsocontainotherpackages.
YoucanprovideanexplicitAPIforamodulebylistingitspubliclyvisiblenamesin
its__all__specialattribute.
Youcanhideapackage’sinternalimplementationbyonlyimportingpublicnames
inthepackage’s__init__.pyfileorbynaminginternal-onlymemberswitha
leadingunderscore.
Whencollaboratingwithinasingleteamoronasinglecodebase,using__all__
forexplicitAPIsisprobablyunnecessary.
Item51:DefineaRootExceptiontoInsulateCallersfrom
APIs
Whenyou’redefiningamodule’sAPI,theexceptionsyouthrowarejustasmuchapartof
yourinterfaceasthefunctionsandclassesyoudefine(seeItem14:“PreferExceptionsto
ReturningNone”).
Pythonhasabuilt-inhierarchyofexceptionsforthelanguageandstandardlibrary.
There’sadrawtousingthebuilt-inexceptiontypesforreportingerrorsinsteadofdefining
yourownnewtypes.Forexample,youcouldraiseaValueErrorexceptionwhenever
aninvalidparameterispassedtoyourfunction.
Clickheretoviewcodeimage
defdetermine_weight(volume,density):
ifdensity<=0:
raiseValueError(‘Densitymustbepositive’)
#…
Insomecases,usingValueErrormakessense,butforAPIsit’smuchmorepowerfulto
defineyourownhierarchyofexceptions.Youcandothisbyprovidingaroot
Exceptioninyourmodule.Then,haveallotherexceptionsraisedbythatmodule
inheritfromtherootexception.
Clickheretoviewcodeimage
#my_module.py
classError(Exception):
“““Base-classforallexceptionsraisedbythismodule.”””
classInvalidDensityError(Error):
“““Therewasaproblemwithaprovideddensityvalue.”””
HavingarootexceptioninamodulemakesiteasyforconsumersofyourAPItocatchall
oftheexceptionsthatyouraiseonpurpose.Forexample,hereaconsumerofyourAPI
makesafunctioncallwithatry/exceptstatementthatcatchesyourrootexception:
Clickheretoviewcodeimage
try:
weight=my_module.determine_weight(1,-1)
exceptmy_module.Errorase:
logging.error(‘Unexpectederror:%s’,e)
Thistry/exceptpreventsyourAPI’sexceptionsfrompropagatingtoofarupwardand
breakingthecallingprogram.ItinsulatesthecallingcodefromyourAPI.Thisinsulation
hasthreehelpfuleffects.
First,rootexceptionsletcallersunderstandwhenthere’saproblemwiththeirusageof
yourAPI.IfcallersareusingyourAPIproperly,theyshouldcatchthevariousexceptions
thatyoudeliberatelyraise.Iftheydon’thandlesuchanexception,itwillpropagateallthe
wayuptotheinsulatingexceptblockthatcatchesyourmodule’srootexception.That
blockcanbringtheexceptiontotheattentionoftheAPIconsumer,givingthemachance
toaddproperhandlingoftheexceptiontype.
Clickheretoviewcodeimage
try:
weight=my_module.determine_weight(1,-1)
exceptmy_module.InvalidDensityError:
weight=0
exceptmy_module.Errorase:
logging.error(‘Buginthecallingcode:%s’,e)
ThesecondadvantageofusingrootexceptionsisthattheycanhelpfindbugsinyourAPI
module’scode.Ifyourcodeonlydeliberatelyraisesexceptionsthatyoudefinewithin
yourmodule’shierarchy,thenallothertypesofexceptionsraisedbyyourmodulemustbe
theonesthatyoudidn’tintendtoraise.ThesearebugsinyourAPI’scode.
Usingthetry/exceptstatementabovewillnotinsulateAPIconsumersfrombugsin
yourAPImodule’scode.Todothat,thecallerneedstoaddanotherexceptblockthat
catchesPython’sbaseExceptionclass.ThisallowstheAPIconsumertodetectwhen
there’sabugintheAPImodule’simplementationthatneedstobefixed.
Clickheretoviewcodeimage
try:
weight=my_module.determine_weight(1,-1)
exceptmy_module.InvalidDensityError:
weight=0
exceptmy_module.Errorase:
logging.error(‘Buginthecallingcode:%s’,e)
exceptExceptionase:
logging.error(‘BugintheAPIcode:%s’,e)
raise
Thethirdimpactofusingrootexceptionsisfuture-proofingyourAPI.Overtime,youmay
wanttoexpandyourAPItoprovidemorespecificexceptionsincertainsituations.For
example,youcouldaddanExceptionsubclassthatindicatestheerrorconditionof
supplyingnegativedensities.
Clickheretoviewcodeimage
#my_module.py
classNegativeDensityError(InvalidDensityError):
“““Aprovideddensityvaluewasnegative.”””
defdetermine_weight(volume,density):
ifdensity<0:
raiseNegativeDensityError
Thecallingcodewillcontinuetoworkexactlyasbeforebecauseitalreadycatches
InvalidDensityErrorexceptions(theparentclassof
NegativeDensityError).Inthefuture,thecallercoulddecidetospecial-casethe
newtypeofexceptionandchangeitsbehavioraccordingly.
Clickheretoviewcodeimage
try:
weight=my_module.determine_weight(1,-1)
exceptmy_module.NegativeDensityErrorase:
raiseValueError(‘Mustsupplynon-negativedensity’)frome
exceptmy_module.InvalidDensityError:
weight=0
exceptmy_module.Errorase:
logging.error(‘Buginthecallingcode:%s’,e)
exceptExceptionase:
logging.error(‘BugintheAPIcode:%s’,e)
raise
YoucantakeAPIfuture-proofingfurtherbyprovidingabroadersetofexceptionsdirectly
belowtherootexception.Forexample,imagineyouhadonesetoferrorsrelatedto
calculatingweights,anotherrelatedtocalculatingvolume,andathirdrelatedto
calculatingdensity.
Clickheretoviewcodeimage
#my_module.py
classWeightError(Error):
“““Base-classforweightcalculationerrors.”””
classVolumeError(Error):
“““Base-classforvolumecalculationerrors.”””
classDensityError(Error):
“““Base-classfordensitycalculationerrors.”””
Specificexceptionswouldinheritfromthesegeneralexceptions.Eachintermediate
exceptionactsasitsownkindofrootexception.Thismakesiteasiertoinsulatelayersof
callingcodefromAPIcodebasedonbroadfunctionality.Thisismuchbetterthanhaving
allcallerscatchalonglistofveryspecificExceptionsubclasses.
ThingstoRemember
DefiningrootexceptionsforyourmodulesallowsAPIconsumerstoinsulate
themselvesfromyourAPI.
CatchingrootexceptionscanhelpyoufindbugsincodethatconsumesanAPI.
CatchingthePythonExceptionbaseclasscanhelpyoufindbugsinAPI
implementations.
Intermediaterootexceptionsletyouaddmorespecifictypesofexceptionsinthe
futurewithoutbreakingyourAPIconsumers.
Item52:KnowHowtoBreakCircularDependencies
Inevitably,whileyou’recollaboratingwithothers,you’llfindamutualinterdependency
betweenmodules.Itcanevenhappenwhileyouworkbyyourselfonthevariouspartsofa
singleprogram.
Forexample,sayyouwantyourGUIapplicationtoshowadialogboxforchoosingwhere
tosaveadocument.Thedatadisplayedbythedialogcouldbespecifiedthrough
argumentstoyoureventhandlers.Butthedialogalsoneedstoreadglobalstate,likeuser
preferences,toknowhowtorenderproperly.
Here,Idefineadialogthatretrievesthedefaultdocumentsavelocationfromglobal
preferences:
Clickheretoviewcodeimage
#dialog.py
importapp
classDialog(object):
def__init__(self,save_dir):
self.save_dir=save_dir
#…
save_dialog=Dialog(app.prefs.get(‘save_dir’))
defshow():
#…
Theproblemisthattheappmodulethatcontainstheprefsobjectalsoimportsthe
dialogclassinordertoshowthedialogonprogramstart.
#app.py
importdialog
classPrefs(object):
#…
defget(self,name):
#…
prefs=Prefs()
dialog.show()
It’sacirculardependency.Ifyoutrytousetheappmodulefromyourmainprogram,
you’llgetanexceptionwhenyouimportit.
Clickheretoviewcodeimage
Traceback(mostrecentcalllast):
File“main.py”,line4,in<module>
importapp
File“app.py”,line4,in<module>
importdialog
File“dialog.py”,line16,in<module>
save_dialog=Dialog(app.prefs.get(‘save_dir’))
AttributeError:‘module’objecthasnoattribute‘prefs’
Tounderstandwhat’shappeninghere,youneedtoknowthedetailsofPython’simport
machinery.Whenamoduleisimported,here’swhatPythonactuallydoesindepth-first
order:
1.Searchesforyourmoduleinlocationsfromsys.path
2.Loadsthecodefromthemoduleandensuresthatitcompiles
3.Createsacorrespondingemptymoduleobject
4.Insertsthemoduleintosys.modules
5.Runsthecodeinthemoduleobjecttodefineitscontents
Theproblemwithacirculardependencyisthattheattributesofamodulearen’tdefined
untilthecodeforthoseattributeshasexecuted(afterstep#5).Butthemodulecanbe
loadedwiththeimportstatementimmediatelyafterit’sinsertedintosys.modules
(afterstep#4).
Intheexampleabove,theappmoduleimportsdialogbeforedefininganything.Then,
thedialogmoduleimportsapp.Sinceappstillhasn’tfinishedrunning—it’scurrently
importingdialog—theappmoduleisjustanemptyshell(fromstep#4).The
AttributeErrorisraised(duringstep#5fordialog)becausethecodethatdefines
prefshasn’trunyet(step#5forappisn’tcomplete).
Thebestsolutiontothisproblemistorefactoryourcodesothattheprefsdatastructure
isatthebottomofthedependencytree.Then,bothappanddialogcanimportthesame
utilitymoduleandavoidanycirculardependencies.Butsuchacleardivisionisn’talways
possibleorcouldrequiretoomuchrefactoringtobeworththeeffort.
Therearethreeotherwaystobreakcirculardependencies.
ReorderingImports
Thefirstapproachistochangetheorderofimports.Forexample,ifyouimportthe
dialogmoduletowardthebottomoftheappmodule,afteritscontentshaverun,the
AttributeErrorgoesaway.
#app.py
classPrefs(object):
#…
prefs=Prefs()
importdialog#Moved
dialog.show()
Thisworksbecause,whenthedialogmoduleisloadedlate,itsrecursiveimportofapp
willfindthatapp.prefshasalreadybeendefined(step#5ismostlydoneforapp).
AlthoughthisavoidstheAttributeError,itgoesagainstthePEP8styleguide(see
Item2:“FollowthePEP8StyleGuide”).Thestyleguidesuggeststhatyoualwaysput
importsatthetopofyourPythonfiles.Thismakesyourmodule’sdependenciesclearto
newreadersofthecode.Italsoensuresthatanymoduleyoudependonisinscopeand
availabletoallthecodeinyourmodule.
Havingimportslaterinafilecanbebrittleandcancausesmallchangesintheorderingof
yourcodetobreakthemoduleentirely.Thus,youshouldavoidimportreorderingtosolve
yourcirculardependencyissues.
Import,Configure,Run
Asecondsolutiontothecircularimportsproblemistohaveyourmodulesminimizeside
effectsatimporttime.Youhaveyourmodulesonlydefinefunctions,classes,and
constants.Youavoidactuallyrunninganyfunctionsatimporttime.Then,youhaveeach
moduleprovideaconfigurefunctionthatyoucallonceallothermoduleshave
finishedimporting.Thepurposeofconfigureistoprepareeachmodule’sstateby
accessingtheattributesofothermodules.Yourunconfigureafterallmoduleshave
beenimported(step#5iscomplete),soallattributesmustbedefined.
Here,Iredefinethedialogmoduletoonlyaccesstheprefsobjectwhenconfigure
iscalled:
Clickheretoviewcodeimage
#dialog.py
importapp
classDialog(object):
#…
save_dialog=Dialog()
defshow():
#…
defconfigure():
save_dialog.save_dir=app.prefs.get(‘save_dir’)
Ialsoredefinetheappmoduletonotrunanyactivitiesonimport.
#app.py
importdialog
classPrefs(object):
#…
prefs=Prefs()
defconfigure():
#…
Finally,themainmodulehasthreedistinctphasesofexecution:importeverything,
configureeverything,andrunthefirstactivity.
#main.py
importapp
importdialog
app.configure()
dialog.configure()
dialog.show()
Thisworkswellinmanysituationsandenablespatternslikedependencyinjection.But
sometimesitcanbedifficulttostructureyourcodesothatanexplicitconfigurestepis
possible.Havingtwodistinctphaseswithinamodulecanalsomakeyourcodeharderto
readbecauseitseparatesthedefinitionofobjectsfromtheirconfiguration.
DynamicImport
Thethird—andoftensimplest—solutiontothecircularimportsproblemistousean
importstatementwithinafunctionormethod.Thisiscalledadynamicimportbecause
themoduleimporthappenswhiletheprogramisrunning,notwhiletheprogramisfirst
startingupandinitializingitsmodules.
Here,Iredefinethedialogmoduletouseadynamicimport.Thedialog.show
functionimportstheappmoduleatruntimeinsteadofthedialogmoduleimporting
appatinitializationtime.
Clickheretoviewcodeimage
#dialog.py
classDialog(object):
#…
save_dialog=Dialog()
defshow():
importapp#Dynamicimport
save_dialog.save_dir=app.prefs.get(‘save_dir’)
#…
Theappmodulecannowbethesameasitwasintheoriginalexample.Itimports
dialogatthetopandcallsdialog.showatthebottom.
#app.py
importdialog
classPrefs(object):
#…
prefs=Prefs()
dialog.show()
Thisapproachhasasimilareffecttotheimport,configure,andrunstepsfrombefore.The
differenceisthatthisrequiresnostructuralchangestothewaythemodulesaredefined
andimported.You’resimplydelayingthecircularimportuntilthemomentyoumust
accesstheothermodule.Atthatpoint,youcanbeprettysurethatallothermoduleshave
alreadybeeninitialized(step#5iscompleteforeverything).
Ingeneral,it’sgoodtoavoiddynamicimportslikethis.Thecostoftheimportstatement
isnotnegligibleandcanbeespeciallybadintightloops.Bydelayingexecution,dynamic
importsalsosetyouupforsurprisingfailuresatruntime,suchasSyntaxError
exceptionslongafteryourprogramhasstartedrunning(seeItem56:“TestEverything
withunittest”forhowtoavoidthat).However,thesedownsidesareoftenbetterthan
thealternativeofrestructuringyourentireprogram.
ThingstoRemember
Circulardependencieshappenwhentwomodulesmustcallintoeachotheratimport
time.Theycancauseyourprogramtocrashatstartup.
Thebestwaytobreakacirculardependencyisrefactoringmutualdependenciesinto
aseparatemoduleatthebottomofthedependencytree.
Dynamicimportsarethesimplestsolutionforbreakingacirculardependency
betweenmoduleswhileminimizingrefactoringandcomplexity.
Item53:UseVirtualEnvironmentsforIsolatedand
ReproducibleDependencies
Buildinglargerandmorecomplexprogramsoftenleadsyoutorelyonvariouspackages
fromthePythoncommunity(seeItem48:“KnowWheretoFindCommunity-Built
Modules”).You’llfindyourselfrunningpiptoinstallpackageslikepytz,numpy,and
manyothers.
Theproblemisthat,bydefault,pipinstallsnewpackagesinagloballocation.That
causesallPythonprogramsonyoursystemtobeaffectedbytheseinstalledmodules.In
theory,thisshouldn’tbeanissue.Ifyouinstallapackageandneverimportit,how
coulditaffectyourprograms?
Thetroublecomesfromtransitivedependencies:thepackagesthatthepackagesyou
installdependon.Forexample,youcanseewhattheSphinxpackagedependsonafter
installingitbyaskingpip.
Clickheretoviewcodeimage
$pip3showSphinx
–
Name:Sphinx
Version:1.2.2
Location:/usr/local/lib/python3.4/site-packages
Requires:docutils,Jinja2,Pygments
Ifyouinstallanotherpackagelikeflask,youcanseethatit,too,dependsonthe
Jinja2package.
Clickheretoviewcodeimage
$pip3showflask
–
Name:Flask
Version:0.10.1
Location:/usr/local/lib/python3.4/site-packages
Requires:Werkzeug,Jinja2,itsdangerous
TheconflictarisesasSphinxandflaskdivergeovertime.Perhapsrightnowtheyboth
requirethesameversionofJinja2andeverythingisfine.Butsixmonthsorayearfrom
now,Jinja2mayreleaseanewversionthatmakesbreakingchangestousersofthe
library.IfyouupdateyourglobalversionofJinja2withpipinstall-upgrade,youmayfindthatSphinxbreakswhileflaskkeepsworking.
ThecauseofthisbreakageisthatPythoncanonlyhaveasingleglobalversionofa
moduleinstalledatatime.Ifoneofyourinstalledpackagesmustusethenewversionand
anotherpackagemustusetheoldversion,yoursystemisn’tgoingtoworkproperly.
SuchbreakagecanevenhappenwhenpackagemaintainerstrytheirbesttopreserveAPI
compatibilitybetweenreleases(seeItem50:“UsePackagestoOrganizeModulesand
ProvideStableAPIs”).NewversionsofalibrarycansubtlychangebehaviorsthatAPIconsumingcoderelieson.Usersonasystemmayupgradeonepackagetoanewversion
butnotothers,whichcoulddependencies.There’saconstantriskofthegroundmoving
beneathyourfeet.
Thesedifficultiesaremagnifiedwhenyoucollaboratewithotherdeveloperswhodotheir
workonseparatecomputers.It’sreasonabletoassumethattheversionsofPythonand
globalpackagestheyhaveinstalledontheirmachineswillbeslightlydifferentthanyour
own.Thiscancausefrustratingsituationswhereacodebaseworksperfectlyonone
programmer’smachineandiscompletelybrokenonanother’s.
Thesolutiontoalloftheseproblemsisatoolcalledpyvenv,whichprovidesvirtual
environments.SincePython3.4,thepyvenvcommand-linetoolisavailablebydefault
alongwiththePythoninstallation(it’salsoaccessiblewithpython-mvenv).Prior
versionsofPythonrequireinstallingaseparatepackage(withpipinstall
virtualenv)andusingacommand-linetoolcalledvirtualenv.
pyvenvallowsyoutocreateisolatedversionsofthePythonenvironment.Using
pyvenv,youcanhavemanydifferentversionsofthesamepackageinstalledonthesame
systematthesametimewithoutconflicts.Thisletsyouworkonmanydifferentprojects
andusemanydifferenttoolsonthesamecomputer.
pyvenvdoesthisbyinstallingexplicitversionsofpackagesandtheirdependenciesinto
completelyseparatedirectorystructures.ThismakesitpossibletoreproduceaPython
environmentthatyouknowwillworkwithyourcode.It’sareliablewaytoavoid
surprisingbreakages.
ThepyvenvCommand
Here’saquicktutorialonhowtousepyvenveffectively.Beforeusingthetool,it’s
importanttonotethemeaningofthepython3command-lineonyoursystem.Onmy
computer,python3islocatedinthe/usr/local/bindirectoryandevaluatesto
version3.4.2(seeItem1:“KnowWhichVersionofPythonYou’reUsing”).
$whichpython3
/usr/local/bin/python3
$python3—version
Python3.4.2
Todemonstratethesetupofmyenvironment,Icantestthatrunningacommandtoimport
thepytzmoduledoesn’tcauseanerror.ThisworksbecauseIalreadyhavethepytz
packageinstalledasaglobalmodule.
$python3-c‘importpytz’
$
Now,Iusepyvenvtocreateanewvirtualenvironmentcalledmyproject.Eachvirtual
environmentmustliveinitsownuniquedirectory.Theresultofthecommandisatreeof
directoriesandfiles.
Clickheretoviewcodeimage
$pyvenv/tmp/myproject
$cd/tmp/myproject
$ls
binincludelibpyvenv.cfg
Tostartusingthevirtualenvironment,Iusethesourcecommandfrommyshellonthe
bin/activatescript.activatemodifiesallofmyenvironmentvariablestomatch
thevirtualenvironment.Italsoupdatesmycommand-lineprompttoincludethevirtual
environmentname('myproject')tomakeitextremelyclearwhatI’mworkingon.
$sourcebin/activate
(myproject)$
Afteractivation,youcanseethatthepathtothepython3command-linetoolhasmoved
towithinthevirtualenvironmentdirectory.
Clickheretoviewcodeimage
(myproject)$whichpython3
/tmp/myproject/bin/python3
(myproject)$ls-l/tmp/myproject/bin/python3
…->/tmp/myproject/bin/python3.4
(myproject)$ls-l/tmp/myproject/bin/python3.4
…->/usr/local/bin/python3.4
Thisensuresthatchangestotheoutsidesystemwillnotaffectthevirtualenvironment.
Eveniftheoutersystemupgradesitsdefaultpython3toversion3.5,myvirtual
environmentwillstillexplicitlypointtoversion3.4.
ThevirtualenvironmentIcreatedwithpyvenvstartswithnopackagesinstalledexcept
forpipandsetuptools.Tryingtousethepytzpackagethatwasinstalledasaglobal
moduleintheoutsidesystemwillfailbecauseit’sunknowntothevirtualenvironment.
Clickheretoviewcodeimage
(myproject)$python3-c‘importpytz’
Traceback(mostrecentcalllast):
File“<string>”,line1,in<module>
ImportError:Nomodulenamed‘pytz’
Icanusepiptoinstallthepytzmoduleintomyvirtualenvironment.
Clickheretoviewcodeimage
(myproject)$pip3installpytz
Onceit’sinstalled,Icanverifythatit’sworkingwiththesametestimportcommand.
Clickheretoviewcodeimage
(myproject)$python3-c‘importpytz’
(myproject)$
Whenyou’redonewithavirtualenvironmentandwanttogobacktoyourdefaultsystem,
youusethedeactivatecommand.Thisrestoresyourenvironmenttothesystem
defaults,includingthelocationofthepython3command-linetool.
(myproject)$deactivate
$whichpython3
/usr/local/bin/python3
Ifyoueverwanttoworkinthemyprojectenvironmentagain,youcanjustrun
sourcebin/activateinthedirectorylikebefore.
ReproducingDependencies
Onceyouhaveavirtualenvironment,youcancontinueinstallingpackageswithpipas
youneedthem.Eventually,youmaywanttocopyyourenvironmentsomewhereelse.For
example,sayyouwanttoreproduceyourdevelopmentenvironmentonaproduction
server.Ormaybeyouwanttoclonesomeoneelse’senvironmentonyourownmachineso
youcanruntheircode.
pyvenvmakesthesesituationseasy.Youcanusethepipfreezecommandtosaveall
ofyourexplicitpackagedependenciesintoafile.Byconvention,thisfileisnamed
requirements.txt.
Clickheretoviewcodeimage
(myproject)$pip3freeze>requirements.txt
(myproject)$catrequirements.txt
numpy==1.8.2
pytz==2014.4
requests==2.3.0
Now,imaginethatyou’dliketohaveanothervirtualenvironmentthatmatchesthe
myprojectenvironment.Youcancreateanewdirectorylikebeforeusingpyvenvand
activateit.
$pyvenv/tmp/otherproject
$cd/tmp/otherproject
$sourcebin/activate
(otherproject)$
Thenewenvironmentwillhavenoextrapackagesinstalled.
(otherproject)$pip3list
pip(1.5.6)
setuptools(2.1)
Youcaninstallallofthepackagesfromthefirstenvironmentbyrunningpipinstall
ontherequirements.txtthatyougeneratedwiththepipfreezecommand.
Clickheretoviewcodeimage
(otherproject)$pip3install-r/tmp/myproject/requirements.txt
Thiscommandwillcrankalongforalittlewhileasitretrievesandinstallsallofthe
packagesrequiredtoreproducethefirstenvironment.Onceit’sdone,listingthesetof
installedpackagesinthesecondvirtualenvironmentwillproducethesamelistof
dependenciesfoundinthefirstvirtualenvironment.
(otherproject)$piplist
numpy(1.8.2)
pip(1.5.6)
pytz(2014.4)
requests(2.3.0)
setuptools(2.1)
Usingarequirements.txtfileisidealforcollaboratingwithothersthrougha
revisioncontrolsystem.Youcancommitchangestoyourcodeatthesametimeyou
updateyourlistofpackagedependencies,ensuringthattheymoveinlockstep.
Thegotchawithvirtualenvironmentsisthatmovingthembreakseverythingbecauseall
ofthepaths,likepython3,arehard-codedtotheenvironment’sinstalldirectory.Butthat
doesn’tmatter.Thewholepurposeofvirtualenvironmentsistomakeiteasytoreproduce
thesamesetup.Insteadofmovingavirtualenvironmentdirectory,justfreezetheold
one,createanewonesomewhereelse,andreinstalleverythingfromthe
requirements.txtfile.
ThingstoRemember
Virtualenvironmentsallowyoutousepiptoinstallmanydifferentversionsofthe
samepackageonthesamemachinewithoutconflicts.
Virtualenvironmentsarecreatedwithpyvenv,enabledwithsource
bin/activate,anddisabledwithdeactivate.
Youcandumpalloftherequirementsofanenvironmentwithpipfreeze.You
canreproducetheenvironmentbysupplyingtherequirements.txtfiletopip
install-r.
InversionsofPythonbefore3.4,thepyvenvtoolmustbedownloadedand
installedseparately.Thecommand-linetooliscalledvirtualenvinsteadof
pyvenv.
8.Production
PuttingaPythonprogramtouserequiresmovingitfromadevelopmentenvironmenttoa
productionenvironment.Supportingdisparateconfigurationslikethiscanbeachallenge.
Makingprogramsthataredependableinmultiplesituationsisjustasimportantasmaking
programswithcorrectfunctionality.
ThegoalistoproductionizeyourPythonprogramsandmakethembulletproofwhile
they’reinuse.Pythonhasbuilt-inmodulesthataidinhardeningyourprograms.It
providesfacilitiesfordebugging,optimizing,andtestingtomaximizethequalityand
performanceofyourprogramsatruntime.
Item54:ConsiderModule-ScopedCodetoConfigure
DeploymentEnvironments
Adeploymentenvironmentisaconfigurationinwhichyourprogramruns.Everyprogram
hasatleastonedeploymentenvironment,theproductionenvironment.Thegoalofwriting
aprograminthefirstplaceistoputittoworkintheproductionenvironmentandachieve
somekindofoutcome.
Writingormodifyingaprogramrequiresbeingabletorunitonthecomputeryouusefor
developing.Theconfigurationofyourdevelopmentenvironmentmaybemuchdifferent
fromyourproductionenvironment.Forexample,youmaybewritingaprogramfor
supercomputersusingaLinuxworkstation.
Toolslikepyvenv(seeItem53:“UseVirtualEnvironmentsforIsolatedand
ReproducibleDependencies”)makeiteasytoensurethatallenvironmentshavethesame
Pythonpackagesinstalled.Thetroubleisthatproductionenvironmentsoftenrequiremany
externalassumptionsthatarehardtoreproduceindevelopmentenvironments.
Forexample,sayyouwanttorunyourprograminawebservercontainerandgiveit
accesstoadatabase.Thismeansthateverytimeyouwanttomodifyyourprogram’scode,
youneedtorunaservercontainer,thedatabasemustbesetupproperly,andyourprogram
needsthepasswordforaccess.That’saveryhighcostifallyou’retryingtodoisverify
thataone-linechangetoyourprogramworkscorrectly.
Thebestwaytoworkaroundtheseissuesistooverridepartsofyourprogramatstartup
timetoprovidedifferentfunctionalitydependingonthedeploymentenvironment.For
example,youcouldhavetwodifferent__main__files,oneforproductionandonefor
development.
Clickheretoviewcodeimage
#dev_main.py
TESTING=True
importdb_connection
db=db_connection.Database()
#prod_main.py
TESTING=False
importdb_connection
db=db_connection.Database()
TheonlydifferencebetweenthetwofilesisthevalueoftheTESTINGconstant.Other
modulesinyourprogramcanthenimportthe__main__moduleandusethevalueof
TESTINGtodecidehowtheydefinetheirownattributes.
Clickheretoviewcodeimage
#db_connection.py
import__main__
classTestingDatabase(object):
#…
classRealDatabase(object):
#…
if__main__.TESTING:
Database=TestingDatabase
else:
Database=RealDatabase
Thekeybehaviortonoticehereisthatcoderunninginmodulescope—notinsideany
functionormethod—isjustnormalPythoncode.Youcanuseanifstatementatthe
moduleleveltodecidehowthemodulewilldefinenames.Thismakesiteasytotailor
modulestoyourvariousdeploymentenvironments.Youcanavoidhavingtoreproduce
costlyassumptionslikedatabaseconfigurationswhentheyaren’tneeded.Youcaninject
fakeormockimplementationsthateaseinteractivedevelopmentandtesting(seeItem56:
“TestEverythingwithunittest”).
Note
Onceyourdeploymentenvironmentsgetcomplicated,youshouldconsidermoving
themoutofPythonconstants(likeTESTING)andintodedicatedconfiguration
files.Toolsliketheconfigparserbuilt-inmoduleletyoumaintainproduction
configurationsseparatefromcode,adistinctionthat’scrucialforcollaboratingwith
anoperationsteam.
Thisapproachcanbeusedformorethanworkingaroundexternalassumptions.For
example,ifyouknowthatyourprogrammustworkdifferentlybasedonitshostplatform,
youcaninspectthesysmodulebeforedefiningtop-levelconstructsinamodule.
Clickheretoviewcodeimage
#db_connection.py
importsys
classWin32Database(object):
#…
classPosixDatabase(object):
#…
ifsys.platform.startswith(‘win32’):
Database=Win32Database
else:
Database=PosixDatabase
Similarly,youcanuseenvironmentvariablesfromos.environtoguideyourmodule
definitions.
ThingstoRemember
Programsoftenneedtoruninmultipledeploymentenvironmentsthateachhave
uniqueassumptionsandconfigurations.
Youcantailoramodule’scontentstodifferentdeploymentenvironmentsbyusing
normalPythonstatementsinmodulescope.
Modulecontentscanbetheproductofanyexternalcondition,includinghost
introspectionthroughthesysandosmodules.
Item55:UsereprStringsforDebuggingOutput
WhendebuggingaPythonprogram,theprintfunction(oroutputviathelogging
built-inmodule)willgetyousurprisinglyfar.Pythoninternalsareofteneasytoaccessvia
plainattributes(seeItem27:“PreferPublicAttributesOverPrivateOnes”).Allyouneed
todoisprinthowthestateofyourprogramchangeswhileitrunsandseewhereitgoes
wrong.
Theprintfunctionoutputsahuman-readablestringversionofwhateveryousupplyit.
Forexample,printingabasicstringwillprintthecontentsofthestringwithoutthe
surroundingquotecharacters.
print(‘foobar’)
>>>
foobar
Thisisequivalenttousingthe'%s'formatstringandthe%operator.
print(‘%s’%‘foobar’)
>>>
foobar
Theproblemisthatthehuman-readablestringforavaluedoesn’tmakeitclearwhatthe
actualtypeofthevalueis.Forexample,noticehowinthedefaultoutputofprintyou
can’tdistinguishbetweenthetypesofthenumber5andthestring'5'.
print(5)
print(‘5’)
>>>
5
5
Ifyou’redebuggingaprogramwithprint,thesetypedifferencesmatter.Whatyou
almostalwayswantwhiledebuggingistoseethereprversionofanobject.Therepr
built-infunctionreturnstheprintablerepresentationofanobject,whichshouldbeitsmost
clearlyunderstandablestringrepresentation.Forbuilt-intypes,thestringreturnedby
reprisavalidPythonexpression.
a=‘\x07’
print(repr(a))
>>>
‘\x07’
Passingthevaluefromreprtotheevalbuilt-infunctionshouldresultinthesame
Pythonobjectyoustartedwith(ofcourse,inpractice,youshouldonlyuseevalwith
extremecaution).
b=eval(repr(a))
asserta==b
Whenyou’redebuggingwithprint,youshouldreprthevaluebeforeprintingto
ensurethatanydifferenceintypesisclear.
print(repr(5))
print(repr(‘5’))
>>>
5
‘5’
Thisisequivalenttousingthe'%r'formatstringandthe%operator.
print(‘%r’%5)
print(‘%r’%‘5’)
>>>
5
‘5’
FordynamicPythonobjects,thedefaulthuman-readablestringvalueisthesameasthe
reprvalue.Thismeansthatpassingadynamicobjecttoprintwilldotherightthing,
andyoudon’tneedtoexplicitlycallrepronit.Unfortunately,thedefaultvalueofrepr
forobjectinstancesisn’tespeciallyhelpful.Forexample,hereIdefineasimpleclass
andthenprintitsvalue:
Clickheretoviewcodeimage
classOpaqueClass(object):
def__init__(self,x,y):
self.x=x
self.y=y
obj=OpaqueClass(1,2)
print(obj)
>>>
<__main__.OpaqueClassobjectat0x107880ba8>
Thisoutputcan’tbepassedtotheevalfunction,anditsaysnothingabouttheinstance
fieldsoftheobject.
Therearetwosolutionstothisproblem.Ifyouhavecontroloftheclass,youcandefine
yourown__repr__specialmethodthatreturnsastringcontainingthePython
expressionthatrecreatestheobject.Here,Idefinethatfunctionfortheclassabove:
Clickheretoviewcodeimage
classBetterClass(object):
def__init__(self,x,y):
#…
def__repr__(self):
return‘BetterClass(%d,%d)’%(self.x,self.y)
Now,thereprvalueismuchmoreuseful.
obj=BetterClass(1,2)
print(obj)
>>>
BetterClass(1,2)
Whenyoudon’thavecontrolovertheclassdefinition,youcanreachintotheobject’s
instancedictionary,whichisstoredinthe__dict__attribute.Here,Iprintoutthe
contentsofanOpaqueClassinstance:
obj=OpaqueClass(4,5)
print(obj.__dict__)
>>>
{‘y’:5,‘x’:4}
ThingstoRemember
Callingprintonbuilt-inPythontypeswillproducethehuman-readablestring
versionofavalue,whichhidestypeinformation.
Callingrepronbuilt-inPythontypeswillproducetheprintablestringversionofa
value.Thesereprstringscouldbepassedtotheevalbuilt-infunctiontogetback
theoriginalvalue.
%sinformatstringswillproducehuman-readablestringslikestr.%rwillproduce
printablestringslikerepr.
Youcandefinethe__repr__methodtocustomizetheprintablerepresentationof
aclassandprovidemoredetaileddebugginginformation.
Youcanreachintoanyobject’s__dict__attributetoviewitsinternals.
Item56:TestEverythingwithunittest
Pythondoesn’thavestatictypechecking.There’snothinginthecompilerthatwillensure
thatyourprogramwillworkwhenyourunit.WithPythonyoudon’tknowwhetherthe
functionsyourprogramcallswillbedefinedatruntime,evenwhentheirexistenceis
evidentinthesourcecode.Thisdynamicbehaviorisablessingandacurse.
ThelargenumbersofPythonprogrammersouttheresayit’sworthitbecauseofthe
productivitygainedfromtheresultingbrevityandsimplicity.Butmostpeoplehaveheard
atleastonehorrorstoryaboutPythoninwhichaprogramencounteredaboneheadederror
atruntime.
OneoftheworstexamplesI’veheardiswhenaSyntaxErrorwasraisedinproduction
asasideeffectofadynamicimport(seeItem52:“KnowHowtoBreakCircular
Dependencies”).TheprogrammerIknowwhowashitbythissurprisingoccurrencehas
sinceruledoutusingPythoneveragain.
ButIhavetowonder,whywasn’tthecodetestedbeforetheprogramwasdeployedto
production?Typesafetyisn’teverything.Youshouldalwaystestyourcode,regardlessof
whatlanguageit’swrittenin.However,I’lladmitthatthebigdifferencebetweenPython
andmanyotherlanguagesisthattheonlywaytohaveanyconfidenceinaPython
programisbywritingtests.Thereisnoveilofstatictypecheckingtomakeyoufeelsafe.
Luckily,thesamedynamicfeaturesthatpreventstatictypecheckinginPythonalsomake
itextremelyeasytowritetestsforyourcode.YoucanusePython’sdynamicnatureand
easilyoverridablebehaviorstoimplementtestsandensurethatyourprogramsworkas
expected.
Youshouldthinkoftestsasaninsurancepolicyonyourcode.Goodtestsgiveyou
confidencethatyourcodeiscorrect.Ifyourefactororexpandyourcode,testsmakeit
easytoidentifyhowbehaviorshavechanged.Itsoundscounter-intuitive,buthavinggood
testsactuallymakesiteasiertomodifyPythoncode,notharder.
Thesimplestwaytowritetestsistousetheunittestbuilt-inmodule.Forexample,say
youhavethefollowingutilityfunctiondefinedinutils.py:
Clickheretoviewcodeimage
#utils.py
defto_str(data):
ifisinstance(data,str):
returndata
elifisinstance(data,bytes):
returndata.decode(‘utf-8’)
else:
raiseTypeError(‘Mustsupplystrorbytes,‘
‘found:%r’%data)
Todefinetests,Icreateasecondfilenamedtest_utils.pyorutils_test.py
thatcontainstestsforeachbehaviorIexpect.
Clickheretoviewcodeimage
#utils_test.py
fromunittestimportTestCase,main
fromutilsimportto_str
classUtilsTestCase(TestCase):
deftest_to_str_bytes(self):
self.assertEqual(‘hello’,to_str(b’hello’))
deftest_to_str_str(self):
self.assertEqual(‘hello’,to_str(‘hello’))
deftest_to_str_bad(self):
self.assertRaises(TypeError,to_str,object())
if__name__==‘__main__’:
main()
TestsareorganizedintoTestCaseclasses.Eachtestisamethodbeginningwiththe
wordtest.IfatestmethodrunswithoutraisinganykindofException(including
AssertionErrorfromassertstatements),thenthetestisconsideredtohavepassed
successfully.
TheTestCaseclassprovideshelpermethodsformakingassertionsinyourtests,suchas
assertEqualforverifyingequality,assertTrueforverifyingBooleanexpressions,
andassertRaisesforverifyingthatexceptionsareraisedwhenappropriate(see
help(TestCase)formore).YoucandefineyourownhelpermethodsinTestCase
subclassestomakeyourtestsmorereadable;justensurethatyourmethodnamesdon’t
beginwiththewordtest.
Note
Anothercommonpracticewhenwritingtestsistousemockfunctionsandclasses
tostuboutcertainbehaviors.Forthispurpose,Python3providesthe
unittest.mockbuilt-inmodule,whichisalsoavailableforPython2asanopen
sourcepackage.
Sometimes,yourTestCaseclassesneedtosetupthetestenvironmentbeforerunning
testmethods.Todothis,youcanoverridethesetUpandtearDownmethods.These
methodsarecalledbeforeandaftereachtestmethod,respectively,andtheyletyouensure
thateachtestrunsinisolation(animportantbestpracticeofpropertesting).Forexample,
hereIdefineaTestCasethatcreatesatemporarydirectorybeforeeachtestanddeletes
itscontentsaftereachtestfinishes:
Clickheretoviewcodeimage
classMyTest(TestCase):
defsetUp(self):
self.test_dir=TemporaryDirectory()
deftearDown(self):
self.test_dir.cleanup()
#Testmethodsfollow
#…
IusuallydefineoneTestCaseforeachsetofrelatedtests.SometimesIhaveone
TestCaseforeachfunctionthathasmanyedgecases.Othertimes,aTestCasespans
allfunctionsinasinglemodule.I’llalsocreateoneTestCasefortestingasingleclass
andallofitsmethods.
Whenprogramsgetcomplicated,you’llwantadditionaltestsforverifyingtheinteractions
betweenyourmodules,insteadofonlytestingcodeinisolation.Thisisthedifference
betweenunittestsandintegrationtests.InPython,it’simportanttowritebothtypesof
testsforexactlythesamereason:Youhavenoguaranteethatyourmoduleswillactually
worktogetherunlessyouproveit.
Note
Dependingonyourproject,itcanalsobeusefultodefinedata-driventestsor
organizetestsintodifferentsuitesofrelatedfunctionality.Forthesepurposes,code
coveragereports,andotheradvancedusecases,thenose
(http://nose.readthedocs.org/)andpytest(http://pytest.org/)opensource
packagescanbeespeciallyhelpful.
ThingstoRemember
TheonlywaytohaveconfidenceinaPythonprogramistowritetests.
Theunittestbuilt-inmoduleprovidesmostofthefacilitiesyou’llneedtowrite
goodtests.
YoucandefinetestsbysubclassingTestCaseanddefiningonemethodper
behavioryou’dliketotest.TestmethodsonTestCaseclassesmuststartwiththe
wordtest.
It’simportanttowritebothunittests(forisolatedfunctionality)andintegrationtests
(formodulesthatinteract).
Item57:ConsiderInteractiveDebuggingwithpdb
Everyoneencountersbugsintheircodewhiledevelopingprograms.Usingtheprint
functioncanhelpyoutrackdownthesourceofmanyissues(seeItem55:“Userepr
StringsforDebuggingOutput”).Writingtestsforspecificcasesthatcausetroubleis
anothergreatwaytoisolateproblems(seeItem56:“TestEverythingwithunittest”).
Butthesetoolsaren’tenoughtofindeveryrootcause.Whenyouneedsomethingmore
powerful,it’stimetotryPython’sbuilt-ininteractivedebugger.Thedebuggerletsyou
inspectprogramstate,printlocalvariables,andstepthroughaPythonprogramone
statementatatime.
Inmostotherprogramminglanguages,youuseadebuggerbyspecifyingwhatlineofa
sourcefileyou’dliketostopon,thenexecutetheprogram.Incontrast,withPythonthe
easiestwaytousethedebuggerisbymodifyingyourprogramtodirectlyinitiatethe
debuggerjustbeforeyouthinkyou’llhaveanissueworthinvestigating.Thereisno
differencebetweenrunningaPythonprogramunderadebuggerandrunningitnormally.
Toinitiatethedebugger,allyouhavetodoisimportthepdbbuilt-inmoduleandrunits
set_tracefunction.You’lloftenseethisdoneinasinglelinesoprogrammerscan
commentitoutwithasingle#character.
Clickheretoviewcodeimage
defcomplex_func(a,b,c):
#…
importpdb;pdb.set_trace()
Assoonasthisstatementruns,theprogramwillpauseitsexecution.Theterminalthat
startedyourprogramwillturnintoaninteractivePythonshell.
Clickheretoviewcodeimage
->importpdb;pdb.set_trace()
(Pdb)
Atthe(Pdb)prompt,youcantypeinthenameoflocalvariablestoseetheirvalues
printedout.Youcanseealistofalllocalvariablesbycallingthelocalsbuilt-in
function.Youcanimportmodules,inspectglobalstate,constructnewobjects,runthe
helpbuilt-infunction,andevenmodifypartsoftheprogram—whateveryouneedtodo
toaidinyourdebugging.Inaddition,thedebuggerhasthreecommandsthatmake
inspectingtherunningprogrameasier.
bt:Printthetracebackofthecurrentexecutioncallstack.Thisletsyoufigureout
whereyouareinyourprogramandhowyouarrivedatthepdb.set_trace
triggerpoint.
up:Moveyourscopeupthefunctioncallstacktothecallerofthecurrentfunction.
Thisallowsyoutoinspectthelocalvariablesinhigherlevelsofthecallstack.
down:Moveyourscopebackdownthefunctioncallstackonelevel.
Onceyou’redoneinspectingthecurrentstate,youcanusedebuggercommandstoresume
theprogram’sexecutionunderprecisecontrol.
step:Runtheprogramuntilthenextlineofexecutionintheprogram,thenreturn
controlbacktothedebugger.Ifthenextlineofexecutionincludescallinga
function,thedebuggerwillstopinthefunctionthatwascalled.
next:Runtheprogramuntilthenextlineofexecutioninthecurrentfunction,then
returncontrolbacktothedebugger.Ifthenextlineofexecutionincludescallinga
function,thedebuggerwillnotstopuntilthecalledfunctionhasreturned.
return:Runtheprogramuntilthecurrentfunctionreturns,thenreturncontrol
backtothedebugger.
continue:Continuerunningtheprogramuntilthenextbreakpoint(or
set_traceiscalledagain).
ThingstoRemember
YoucaninitiatethePythoninteractivedebuggeratapointofinterestdirectlyinyour
programwiththeimportpdb;pdb.set_trace()statements.
ThePythondebuggerpromptisafullPythonshellthatletsyouinspectandmodify
thestateofarunningprogram.
pdbshellcommandsletyoupreciselycontrolprogramexecution,allowingyouto
alternatebetweeninspectingprogramstateandprogressingprogramexecution.
Item58:ProfileBeforeOptimizing
ThedynamicnatureofPythoncausessurprisingbehaviorsinitsruntimeperformance.
Operationsyoumightassumeareslowareactuallyveryfast(stringmanipulation,
generators).Languagefeaturesyoumightassumearefastareactuallyveryslow(attribute
access,functioncalls).ThetruesourceofslowdownsinaPythonprogramcanbeobscure.
Thebestapproachistoignoreyourintuitionanddirectlymeasuretheperformanceofa
programbeforeyoutrytooptimizeit.Pythonprovidesabuilt-inprofilerfordetermining
whichpartsofaprogramareresponsibleforitsexecutiontime.Thisletsyoufocusyour
optimizationeffortsonthebiggestsourcesoftroubleandignorepartsoftheprogramthat
don’timpactspeed.
Forexample,sayyouwanttodeterminewhyanalgorithminyourprogramisslow.Here,
Idefineafunctionthatsortsalistofdatausinganinsertionsort:
Clickheretoviewcodeimage
definsertion_sort(data):
result=[]
forvalueindata:
insert_value(result,value)
returnresult
Thecoremechanismoftheinsertionsortisthefunctionthatfindstheinsertionpointfor
eachpieceofdata.Here,Idefineanextremelyinefficientversionoftheinsert_value
functionthatdoesalinearscanovertheinputarray:
Clickheretoviewcodeimage
definsert_value(array,value):
fori,existinginenumerate(array):
ifexisting>value:
array.insert(i,value)
return
array.append(value)
Toprofileinsertion_sortandinsert_value,Icreateadatasetofrandom
numbersanddefineatestfunctiontopasstotheprofiler.
Clickheretoviewcodeimage
fromrandomimportrandint
max_size=10**4
data=[randint(0,max_size)for_inrange(max_size)]
test=lambda:insertion_sort(data)
Pythonprovidestwobuilt-inprofilers,onethatispurePython(profile)andanother
thatisaC-extensionmodule(cProfile).ThecProfilebuilt-inmoduleisbetter
becauseofitsminimalimpactontheperformanceofyourprogramwhileit’sbeing
profiled.Thepure-Pythonalternativeimposesahighoverheadthatwillskewtheresults.
Note
WhenprofilingaPythonprogram,besurethatwhatyou’remeasuringisthecode
itselfandnotanyexternalsystems.Bewareoffunctionsthataccessthenetworkor
resourcesondisk.Thesemayappeartohavealargeimpactonyourprogram’s
executiontimebecauseoftheslownessoftheunderlyingsystems.Ifyourprogram
usesacachetomaskthelatencyofslowresourceslikethese,youshouldalso
ensurethatit’sproperlywarmedupbeforeyoustartprofiling.
Here,IinstantiateaProfileobjectfromthecProfilemoduleandrunthetest
functionthroughitusingtheruncallmethod:
profiler=Profile()
profiler.runcall(test)
Oncethetestfunctionhasfinishedrunning,Icanextractstatisticsaboutitsperformance
usingthepstatsbuilt-inmoduleanditsStatsclass.VariousmethodsonaStats
objectadjusthowtoselectandsorttheprofilinginformationtoshowonlythethingsyou
careabout.
stats=Stats(profiler)
stats.strip_dirs()
stats.sort_stats(‘cumulative’)
stats.print_stats()
Theoutputisatableofinformationorganizedbyfunction.Thedatasampleistakenonly
fromthetimetheprofilerwasactive,duringtheruncallmethodabove.
Clickheretoviewcodeimage
>>>
20003functioncallsin1.812seconds
Orderedby:cumulativetime
ncallstottimepercallcumtimepercallfilename:lineno(function)
10.0000.0001.8121.812main.py:34(<lambda>)
10.0030.0031.8121.812main.py:10(insertion_sort)
100001.7970.0001.8100.000main.py:20(insert_value)
99920.0130.0000.0130.000{method‘insert’of‘list’
objects}
80.0000.0000.0000.000{method‘append’of‘list’
objects}
10.0000.0000.0000.000{method‘disable’of
‘_lsprof.Profiler’objects}
Here’saquickguidetowhattheprofilerstatisticscolumnsmean:
ncalls:Thenumberofcallstothefunctionduringtheprofilingperiod.
tottime:Thenumberofsecondsspentexecutingthefunction,excludingtime
spentexecutingotherfunctionsitcalls.
tottimepercall:Theaveragenumberofsecondsspentinthefunctioneach
timeitwascalled,excludingtimespentexecutingotherfunctionsitcalls.Thisis
tottimedividedbyncalls.
cumtime:Thecumulativenumberofsecondsspentexecutingthefunction,
includingtimespentinallotherfunctionsitcalls.
cumtimepercall:Theaveragenumberofsecondsspentinthefunctioneach
timeitwascalled,includingtimespentinallotherfunctionsitcalls.Thisis
cumtimedividedbyncalls.
Lookingattheprofilerstatisticstableabove,IcanseethatthebiggestuseofCPUinmy
testisthecumulativetimespentintheinsert_valuefunction.Here,Iredefinethat
functiontousethebisectbuilt-inmodule(seeItem46:“UseBuilt-inAlgorithmsand
DataStructures”):
Clickheretoviewcodeimage
frombisectimportbisect_left
definsert_value(array,value):
i=bisect_left(array,value)
array.insert(i,value)
Icanruntheprofileragainandgenerateanewtableofprofilerstatistics.Thenew
functionismuchfaster,withacumulativetimespentthatisnearly100×smallerthanthe
previousinsert_valuefunction.
Clickheretoviewcodeimage
>>>
30003functioncallsin0.028seconds
Orderedby:cumulativetime
ncallstottimepercallcumtimepercallfilename:lineno(function)
10.0000.0000.0280.028main.py:34(<lambda>)
10.0020.0020.0280.028main.py:10(insertion_sort)
100000.0050.0000.0260.000main.py:112(insert_value)
100000.0140.0000.0140.000{method‘insert’of‘list’
objects}
100000.0070.0000.0070.000{built-inmethodbisect_left}
10.0000.0000.0000.000{method‘disable’of
‘_lsprof.Profiler’objects}
Sometimes,whenyou’reprofilinganentireprogram,you’llfindthatacommonutility
functionisresponsibleforthemajorityofexecutiontime.Thedefaultoutputfromthe
profilermakesthissituationdifficulttounderstandbecauseitdoesn’tshowhowtheutility
functioniscalledbymanydifferentpartsofyourprogram.
Forexample,herethemy_utilityfunctioniscalledrepeatedlybytwodifferent
functionsintheprogram:
defmy_utility(a,b):
#…
deffirst_func():
for_inrange(1000):
my_utility(4,5)
defsecond_func():
for_inrange(10):
my_utility(1,3)
defmy_program():
for_inrange(20):
first_func()
second_func()
Profilingthiscodeandusingthedefaultprint_statsoutputwillgenerateoutput
statisticsthatareconfusing.
Clickheretoviewcodeimage
>>>
20242functioncallsin0.208seconds
Orderedby:cumulativetime
ncallstottimepercallcumtimepercallfilename:lineno(function)
10.0000.0000.2080.208main.py:176(my_program)
200.0050.0000.2060.010main.py:168(first_func)
202000.2030.0000.2030.000main.py:161(my_utility)
200.0000.0000.0020.000main.py:172(second_func)
10.0000.0000.0000.000{method‘disable’of
‘_lsprof.Profiler’objects}
Themy_utilityfunctionisclearlythesourceofmostexecutiontime,butit’snot
immediatelyobviouswhythatfunctioniscalledsomuch.Ifyousearchthroughthe
program’scode,you’llfindmultiplecallsitesformy_utilityandstillbeconfused.
Todealwiththis,thePythonprofilerprovidesawayofseeingwhichcallerscontributedto
theprofilinginformationofeachfunction.
stats.print_callers()
Thisprofilerstatisticstableshowsfunctionscalledontheleftandwhowasresponsiblefor
makingthecallontheright.Here,it’sclearthatmy_utilityismostusedby
first_func:
Clickheretoviewcodeimage
>>>
Orderedby:cumulativetime
Functionwascalledby…
ncallstottimecumtime
main.py:176(my_program)<main.py:168(first_func)<-200.0050.206main.py:176(my
main.py:161(my_utility)<-200000.2020.202main.py:168(fi
2000.0020.002main.py:172(se
main.py:172(second_func)<-200.0000.002main.py:176(my
ThingstoRemember
It’simportanttoprofilePythonprogramsbeforeoptimizingbecausethesourceof
slowdownsisoftenobscure.
UsethecProfilemoduleinsteadoftheprofilemodulebecauseitprovides
moreaccurateprofilinginformation.
TheProfileobject’sruncallmethodprovideseverythingyouneedtoprofilea
treeoffunctioncallsinisolation.
TheStatsobjectletsyouselectandprintthesubsetofprofilinginformationyou
needtoseetounderstandyourprogram’sperformance.
Item59:UsetracemalloctoUnderstandMemoryUsage
andLeaks
MemorymanagementinthedefaultimplementationofPython,CPython,usesreference
counting.Thisensuresthatassoonasallreferencestoanobjecthaveexpired,the
referencedobjectisalsocleared.CPythonalsohasabuilt-incycledetectortoensurethat
self-referencingobjectsareeventuallygarbagecollected.
Intheory,thismeansthatmostPythonprogrammersdon’thavetoworryaboutallocating
ordeallocatingmemoryintheirprograms.It’stakencareofautomaticallybythelanguage
andtheCPythonruntime.However,inpractice,programseventuallydorunoutof
memoryduetoheldreferences.FiguringoutwhereyourPythonprogramsareusingor
leakingmemoryprovestobeachallenge.
Thefirstwaytodebugmemoryusageistoaskthegcbuilt-inmoduletolisteveryobject
currentlyknownbythegarbagecollector.Althoughit’squiteablunttool,thisapproach
doesletyouquicklygetasenseofwhereyourprogram’smemoryisbeingused.
Here,Irunaprogramthatwastesmemorybykeepingreferences.Itprintsouthowmany
objectswerecreatedduringexecutionandasmallsampleofallocatedobjects.
Clickheretoviewcodeimage
#using_gc.py
importgc
found_objects=gc.get_objects()
print(‘%dobjectsbefore’%len(found_objects))
importwaste_memory
x=waste_memory.run()
found_objects=gc.get_objects()
print(‘%dobjectsafter’%len(found_objects))
forobjinfound_objects[:3]:
print(repr(obj)[:100])
>>>
4756objectsbefore
14873objectsafter
<waste_memory.MyObjectobjectat0x1063f6940>
<waste_memory.MyObjectobjectat0x1063f6978>
<waste_memory.MyObjectobjectat0x1063f69b0>
Theproblemwithgc.get_objectsisthatitdoesn’ttellyouanythingabouthowthe
objectswereallocated.Incomplicatedprograms,aspecificclassofobjectcouldbe
allocatedmanydifferentways.Theoverallnumberofobjectsisn’tnearlyasimportantas
identifyingthecoderesponsibleforallocatingtheobjectsthatareleakingmemory.
Python3.4introducesanewtracemallocbuilt-inmoduleforsolvingthisproblem.
tracemallocmakesitpossibletoconnectanobjectbacktowhereitwasallocated.
Here,Iprintoutthetopthreememoryusageoffendersinaprogramusing
tracemalloc:
Clickheretoviewcodeimage
#top_n.py
importtracemalloc
tracemalloc.start(10)#Saveupto10stackframes
time1=tracemalloc.take_snapshot()
importwaste_memory
x=waste_memory.run()
time2=tracemalloc.take_snapshot()
stats=time2.compare_to(time1,‘lineno’)
forstatinstats[:3]:
print(stat)
>>>
waste_memory.py:6:size=2235KiB(+2235KiB),count=29981(+29981),
average=76B
waste_memory.py:7:size=869KiB(+869KiB),count=10000(+10000),average=89
B
waste_memory.py:12:size=547KiB(+547KiB),count=10000(+10000),average=56
B
It’simmediatelyclearwhichobjectsaredominatingmyprogram’smemoryusageand
whereinthesourcecodetheywereallocated.
Thetracemallocmodulecanalsoprintoutthefullstacktraceofeachallocation(up
tothenumberofframespassedtothestartmethod).Here,Iprintoutthestacktraceof
thebiggestsourceofmemoryusageintheprogram:
Clickheretoviewcodeimage
#with_trace.py
#…
stats=time2.compare_to(time1,‘traceback’)
top=stats[0]
print(‘\n’.join(top.traceback.format()))
>>>
File“waste_memory.py”,line6
self.x=os.urandom(100)
File“waste_memory.py”,line12
obj=MyObject()
File“waste_memory.py”,line19
deep_values.append(get_data())
File“with_trace.py”,line10
x=waste_memory.run()
Astacktracelikethisismostvaluableforfiguringoutwhichparticularusageofa
commonfunctionisresponsibleformemoryconsumptioninaprogram.
Unfortunately,Python2doesn’tprovidethetracemallocbuilt-inmodule.Thereare
opensourcepackagesfortrackingmemoryusageinPython2(suchasheapy),though
theydonotfullyreplicatethefunctionalityoftracemalloc.
ThingstoRemember
ItcanbedifficulttounderstandhowPythonprogramsuseandleakmemory.
Thegcmodulecanhelpyouunderstandwhichobjectsexist,butithasno
informationabouthowtheywereallocated.
Thetracemallocbuilt-inmoduleprovidespowerfultoolsforunderstandingthe
sourceofmemoryusage.
tracemallocisonlyavailableinPython3.4andabove.
Index
Symbols
%r,forprintablestrings,203
%s,forhuman-readablestrings,202
*operator,liabilityof,44–45
*symbol,forkeyword-onlyarguments,52–53
*args
optionalkeywordargumentsand,48
variablepositionalargumentsand,43–45
**kwargs,forkeyword-onlyarguments,53–54
A
__all__specialattribute
avoiding,183
listingallpublicnames,181–183
ALL_CAPSformat,3
Allocationofmemory,tracemallocmoduleand,214–216
APIs(applicationprogramminginterfaces)
future-proofing,186–187
internal,allowingsubclassaccessto,80–82
packagesprovidingstable,181–184
rootexceptionsand,184–186
usingfunctionsfor,61–64
appendmethod,36–37
Arguments
defensivelyiteratingover,38–42
keyword,45–48
keyword-only,51–54
optionalpositional,43–45
asclauses,inrenamingmodules,181
astargets,withstatementsand,155–156
assertEqualhelpermethod,verifyingequality,206
assertRaiseshelpermethod,verifyingexceptions,206
assertTruehelpermethod,forBooleanexpressions,206
asynciobuilt-inmodule,vs.blockingI/O,125
AttributeErrorexceptionraising,102–103
Attribute(s).SeealsoPrivateattributes;Publicattributes
addingmissingdefaultvalues,159–160
lazilyloading/saving,100–105
metaclassesannotating,112–115
names,conflictsover,81–82
B
Binarymode,forreading/writingdata,7
Binarytreeclass,inheritingfromcollections.abc,84–86
bisectmodule,forbinarysearches,169
Blockingoperations,inQueueclass,132–136
Bookkeeping
withdictionaries,55–58
helperclassesfor,58–60
btcommand,ofinteractivedebugger,208
Buffersizes,inQueueclass,132–136
Bytecode,interpreterstatefor,122
bytesinstances,forcharactersequences,5–7
C
__call__specialmethod,withinstances,63–64
callablebuilt-infunction,63
CapitalizedWordformat,3
Centralprocessingunit.SeeCPU(centralprocessingunit)
C-extensionmodules
forCPUbottlenecks,145
problemswith,146
chainfunction,ofitertoolsmodule,170
Childclasses,initializingparentclassesfrom,69–73
Childprocesses,subprocessmanaging,118–121
Circulardependencies
dynamicimportsresolving,191–192
importreorderingfor,189–190
import/configure/runstepsfor,190–191
inimportingmodules,187–188
refactoringcodefor,189
Clarity,withkeywordarguments,51–54
Classinterfaces
@propertymethodimproving,91–94
usepublicattributesfordefining,87–88
classstatements,metaclassesreceiving,106–107
__class__variable
registeringclassesand,108–112
superbuilt_infunctionwith,73
Classes.SeealsoMetaclasses;Subclasses
annotatingpropertiesof,112–115
forbookkeeping,58–60
docstringsfor,177
initializingparent,69–73
metaclassesregistering,108–112
mix-in,73–78
versioning,160–161
@classmethod
inaccessingprivateattributes,78–79
polymorphism,forconstructingobjectsgenerically,67–69
Closures,interactingwithvariablescope,31–36
collectionsmodule
defaultdictclassfrom,168
dequeclassfrom,166–167
OrderedDictclassfrom,167–168
collections.abcmodule,customcontainersinheritingfrom,84–86
combinationfunction,ofitertoolsmodule,170
Command-lines
correctPythonversion,1,2
startingchildprocesses,119–120
communicatemethod
readingchildprocessoutput,118–119
timeoutparameterwith,121
Community-builtmodules,PythonPackageIndexfor,173–174
Complexexpressions,helperfunctionsand,8–10
Concurrency
coroutinesand,137–138
defined,117
inpipelines,129–132
Queueclassand,132–136
concurrent.futuresbuilt-inmodule,enablingparallelism,146–148
configparserbuilt-inmodule,forproductionconfiguration,201
Containers
inheritingfromcollections.abc,84–86
iterable,41–42
contextlibbuilt-inmodule,enablingwithstatements,154–155
contextmanagerdecorator
purposeof,154
astargetsand,155–156
continuecommand,ofinteractivedebugger,209
Conway’sGameofLife,coroutinesand,138–143
CoordinatedUniversalTime(UTC),intimeconversions,162–165
copyregbuilt-inmodule
addingmissingattributevalues,159–160
controllingpicklebehavior,158
providingstableimportpaths,161–162
versioningclasseswith,160–161
Coroutines
inConway’sGameofLife,138–143
purposeof,137–138
inPython2,143–145
countmethod,forcustomcontainertypes,85–86
cProfilemodule,foraccurateprofiling,210–213
CPU(centralprocessingunit)
bottleneckdifficulties,145–146
time,threadswasting,131–132
usage,childprocessesand,118–121
CPythoninterpreter,effectofGILon,122–123
CPythonruntime
memorymanagementwith,214
cumtimecolumn,inprofilerstatistics,211
cumtimepercallcolumn,inprofilerstatistics,211
cyclefunction,ofitertoolsmodule,170
D
Datamodels,@propertyimproving,91–95
Dataraces,Lockpreventing,126–129
datetimebuilt-inmodule,fortimeconversions,164–166
deactivatecommand,disablingpyvenvtool,195–196
Deadlocks,timeoutparameteravoiding,121
Deallocationofmemory,tracemallocmanaging,214–216
Debuggers,decoratorproblemswith,151,153
Debugging
interactive,withpdbmodule,208–209
memoryusage,214–216
printfunctionand,202
reprstringsfor,202–204
rootexceptionsfor,185–186
Decimalclass,fornumericalprecision,171–173
Decorators,functionalityof,151–153
functools,151–153
Defaultarguments
approachtoserialization,159–160
namedtupleclassesand,59
usingdynamicvaluesfor,48–51
Defaultvaluehooks,62–64
defaultdictusing,62–64
Defaultvalues
copyregbuilt-inmoduleand,159–160
ofkeywordarguments,46–47
defaultdictclass,fordictionaries,168
Dependencies
circular,187–192
reproducing,196–197
transitive,192–194
Dependencyinjection,191
Deploymentenvironments,module-scopedcodefor,199–201
dequeclass,asdouble-endedqueue,166–167
Descriptors
enablingreusablepropertylogic,90
inmodifyingclassproperties,112–115
forreusable@propertymethods,97–100
Deserializingobjects
defaultattributevaluesand,159–160
picklebuilt-inmodulefor,157–158
stableimportpathsand,161–162
Developmentenvironment,uniqueconfigurations/assumptionsfor,199–201
Diamondinheritance,initializingparentclassesand,70–71
__dict__attribute,viewingobjectinternals,204
Dictionaries
bookkeepingwith,55–58
comprehensionexpressionsin,16
default,168
ordered,167–168
translatingrelatedobjectsinto,74–75
__doc__specialattribute,retrievingdocstrings,175–176
Docstrings
class-level,177
documentingdefaultbehaviorin,48–51
forfunctions,178–179
importance/placementof,175–176
module,176–177
doctestbuilt-inmodule,179
Documentation
docstringsfor.SeeDocstrings
importanceof,175
Documentation-generationtools,176
Double-endedqueues,dequeclassesas,166–167
__double_leading_underscoreformat,3
downcommand,ofinteractivedebugger,209
dropwhilefunction,ofitertoolsmodule,170
Dynamicimports
avoiding,192
resolvingcirculardependencies,191–192
Dynamicstate,defined,55
E
elseblocks
afterfor/whileloops,23–25
duringexceptionhandling,26–27
endindexes,inslicingsequences,10–13
__enter__method,indefiningnewclasses,154
enumeratebuilt-infunction,preferredfeaturesof,20–21
environdictionary,tailoringmoduleswith,201
evalbuilt-infunction,forre-creatingoriginalvalues,203
Exceptions
raising,29–31
root,184–187
try/finallyblocksand,26–28
Executiontime,optimizationof,209–213
__exit__method,indefiningnewclasses,154
Expressions
inlistcomprehensions,16–18
PEP8guidancefor,4
F
filterbuilt-infunction,listcomprehensionsvs.,15–16
filterfalsefunction,ofitertoolsmodule,170
finallyblocks,duringexceptionhandling,26–27
First-in-first-outqueues,dequeclassfor,166–167
forloops
elseblocksafter,23–25
iteratorprotocoland,40–42
Fractionclass,fornumericalprecision,172
Functions
closure/variablescopeinteraction,31–36
decorated,151–153
docstringsfor,178–179
exceptionsvs.returnNone,29–31
asfirst-classobjects,32,63–64
generatorvs.returninglists,36–38
iteratingoverarguments,38–42
keywordargumentsfor,45–48
keyword-onlyargumentsfor,51–54
optionalpositionalargumentsfor,43–45
forsimpleinterfaces,61–64
simultaneous,coroutinesfor,137–138
functoolsbuilt-inmodule,fordefiningdecorators,152–153
G
GameofLife,coroutinesin,138–143
Garbagecollector,cleanupby,99
gcbuilt-inmodule,debuggingmemoryusage,214–215
Generator(s)
coroutineextensionsof,137–138
expressions,forlargecomprehensions,18–20
returninglistsvs.,36–38
Genericclassmethod,forconstructingobjects,67–69
Genericfunctionality,withmix-inclasses,74–78
__get__method,fordescriptorprotocol,97–100
__getattr__specialmethod,tolazilyloadattributes,100–103
__getattribute__method,accessinginstancevariablesin,104–105
__getattribute__method,descriptorprotocoland,98–100
__getattribute__specialmethod,forrepeatedaccess,102–105
__getitem__specialmethod
customimplementationof,84–86
inslicingsequences,10
Gettermethods
descriptorprotocolfor,98–100
problemswithusing,87–88
providingwith@property,88–89
GIL(globalinterpreterlock)
corruptionofdatastructuresand,126–127
defined,122
preventingparallelisminthreads,122–125,145,146–147
Globalscope,33
H
hasattrbuilt-infunction,determiningexistenceofproperties,103
hashlibbuilt-inmodule,120
heappopfunction,forpriorityqueues,168–169
heappushfunction,forpriorityqueues,168–169
heapqmodule,forpriorityqueues,168–169
helpfunction
decoratorproblemswith,152–153
ininteractivedebugger,208
Helperclasses
forbookkeeping,58–60
providingstatefulclosurebehavior,62–63
Helperfunctions,complexexpressionsinto,8–10
Hooks
toaccessmissingattributes,100–105
defaultvalue,62–64
functionsactingas,61–62
inmodifyingclassproperties,113
I
IEEE754(IEEEStandardforFloating-PointArithmetic),171–172
if/elseexpressions,forsimplification,9–10
import*statements
avoiding,183–184
inprovidingstableAPIs,182–183
Importpaths,stable,copyregproviding,161–162
Importreordering,forcirculardependencies,189–190
importstatements
asdynamicimports,191–192
withpackages,180–181
Incrementinginplace,publicattributesfor,88
indexmethod,forcustomcontainertypes,85–86
Infiniterecursion,super()functionavoiding,101–105
Inheritance
fromcollections.abc,84–86
methodresolutionorder(MRO)and,71
multiple,formix-inutilityclasses,77–78
__init__method
assingleconstructorperclass,67,69
initializingparentclass,69–71
__init__.py
definingpackages,180
modifying,182
Initializingparentclasses
__init__methodfor,69–71
methodresolutionorder(MRO)and,71
superbuilt-infunctionfor,70–73
Integrationtests,207
Interactivedebugging,withpdb,208–209
Intermediaterootexceptions,future-proofingAPIs,186–187
I/O(input/output)
betweenchildprocesses,118–121
threadsforblockingI/O,124–125
IOError,exceptblocksand,26–27
IronPythonruntime,1,2
isinstance
bytes/str/unicodeand,5–6
withcoroutines,142
dynamictypeinspectionwith,74–75
metaclassesand,114
picklemoduleand,158
testingand,205
islicefunction,ofitertoolsmodule,170
iterbuilt-infunction,41–42
__iter__method
asgenerator,41–42
iterablecontainerclass,defined,41–42
Iteratorprotocol,40–42
Iterators
asfunctionarguments,39
generatorsreturning,37–38
zipfunctionprocessing,21–23
itertoolsbuilt-inmodule
functionsof,169–170
izip_longestfunction,foriteratinginparallel,23
J
joinmethod,ofQueueclass,132–136
Jythonruntime,1,2
K
Keywordarguments
constructingclasseswith,58
dynamicdefaultargumentvalues,48–51
providingoptionalbehavior,45–48
Keyword-onlyarguments
forclarity,51–53
inPython2,53–54
L
lambdaexpression
askeyhook,61
vs.listcomprehensions,15–16
producingiteratorsand,40
inprofiling,210–212
Languagehooks,formissingattributes,100–105
Lazyattributes,__getattr__/__setattr__/__getattribute__for,100–105
_leading_underscoreformat,3
Leakybucketquota,implementing,92–95
lenbuilt-infunction,forcustomsequencetypes,85
__len__specialmethod,forcustomsequencetypes,85
listbuilt-intype,performanceasFIFOqueue,166–167
Listcomprehensions
generatorexpressionsfor,18–20
insteadofmap/filter,15–16
numberofexpressionsin,16–18
listtype,subclassing,83–84
Lists,slicing,10–13
localsbuilt-infunction,152,208
localtimefunction,fromtimemodule,163–164
Lockclass
preventingdataraces,126–129
inwithstatements,153–154
Logging
debugfunctionfor,154–156
severitylevels,154–155
Loops
elseblocksafter,23–25
inlistcomprehensions,16–18
range/enumeratefunctions,20–21
lowercase_underscoreformat,3
M
mapbuilt-infunction,listcomprehensionsvs.,15–16
Memory
coroutineuseof,137
threadsrequiring,136
Memoryleaks
bydescriptorclasses,99–100
identifying,214–216
Memorymanagement,withtracemallocmodule,214–216
Meta.__new__method
inmetaclasses,107
settingclassattributes,114
__metaclass__attribute,inPython2,106–107
Metaclasses
annotatingattributeswith,112–115
forclassregistration,108–112
defined,87,106
validatingsubclasses,105–108
methodresolutionorder(MRO),forsuperclassinitializationorder,70–73
Mix-inclasses
composingfromsimplebehaviors,74–75
defined,73–74
pluggablebehaviorsfor,75–76
utility,creatinghierachiesof,77–78
mktime,fortimeconversion,163,165
Mockfunctionsandclasses
unittest.mockbuilt-inmodule,206
__module__attribute,106,153
Modules
breakingcirculardependenciesin,187–192
community-built,173–174
docstrings,176–177
packagesfororganizing,179–184
providingstableAPIsfrom,181–184
tailoringfordeploymentenvironment,199–201
Module-scopedcode,fordeploymentenvironments,199–201
MRO(methodresolutionorder),forsuperclassinitializationorder,70–73
Multipleconditions,inlistcomprehensions,16–18
Multipleinheritance,formix-inutilityclasses,73–78
Multipleiterators,zipbuilt-infunctionand,21–23
Multipleloops,inlistcomprehensions,16–18
multiprocessingbuilt-inmodule,enablingparallelism,146–148
Mutual-exclusionlocks(mutex)
GILas,122
Lockclassas,126–129
inwithstatements,153–154
N
__name__attributeindefiningdecorators,151,153
inregisteringclasses,109–110
testingand,206
namedtupletype
definingclasses,58
limitationsof,59
NameErrorexception,33
Namespacepackages,withPython3.4,180
Namingconflicts,privateattributestoavoid,81–82
Namingstyles,3–4
ncallscolumninprofilerstatistics,211
__new__method,ofmetaclasses,106–108
nextbuilt-infunction,41–42
nextcommand,ofinteractivedebugger,209
__next__specialmethod,iteratorobjectimplementing,41
Noisereduction,keywordargumentsand,45–48
Nonevalue
functionsreturning,29–31
specifyingdynamicdefaultvalues,48–51
nonlocalstatement,inclosuresmodifyingvariables,34–35
nsmallestfunction,forpriorityqueues,168–169
Numericalprecision,withDecimalclass,171–173
O
Objects,accessingmissingattributesin,100–105
On-the-flycalculations,using@propertyfor,91–95
Optimization,profilingpriorto,209–213
Optionalarguments
keyword,47–48
positional,43–45
OrderedDictclass,fordictionaries,167–168
OverflowErrorexceptions,51
P
Packages
dividingmodulesintonamespaces,180–181
asmodulescontainingmodules,179–180
providingstableAPIswith,181–184
Parallelism
avoidingthreadsfor,122–123
childprocessesand,118–121
concurrent.futuresfortrue,146–148
corruptionofdatastructuresand,126–128
defined,117
needfor,145–146
Parentclasses
accessingprivateattributesof,79–81
initializing,70–73
pdbbuilt-inmodule,forinteractivedebugging,208–209
pdb.set_trace()statements,208–209
PEP8(PythonEnhancementProposal#8)styleguide
expression/statementrules,4
namingstylesin,3–4,80
overviewof,2–3
whitespacerules,3
permutationsfunction,ofitertoolsmodule,170
picklebuilt-inmodule
addingmissingattributevalues,159–160
providingstableimportpathsfor,161–162
serializing/deserializingobjects,157–158
versioningclassesfor,160–161
pipcommand-linetool
reproducingenvironments,196–197
transitivedependenciesand,192–193
forutilizingPackageIndex,173
pipfreezecommand,savingpackagedependencies,196
Pipelines
concurrencyin,129–131
problemswith,132
Queueclassbuilding,132–136
Polymorphism
@classmethodsutilizing,65–69
defined,64
Popenconstructor,startingchildprocesses,118
Positionalarguments
constructingclasseswith,58
keywordargumentsand,45–48
reducingvisualnoise,43–45
printfunction,fordebuggingoutput,202–203,208
print_statsoutput,forprofiling,213
Printablerepresentation,reprfunctionfor,202–204
Privateattributes
accessing,78–80
allowingsubclassaccessto,81–83
indicatinginternalAPIs,80
ProcessPoolExecutorclass,enablingparallelism,147–148
productfunction,ofitertoolsmodule,170
Productionenvironment,uniqueconfigurationsfor,199–201
profilemodule,liabilitiesof,210
@propertymethod
definingspecialbehaviorwith,88–89
descriptorsforreusing,97–100
givingattributesnewfunctionality,91–94
improvingdatamodelswith,95
numericalattributes,intoon-the-flycalculations,91–95
problemswithoverusing,95–96
unexpectedsideeffectsin,90–91
@property.setter,modifyingobjectstatein,91
pstatsbuilt-inmodule,extractingstatistics,211
Publicattributes
accessing,78
definingnewclassinterfaceswith,87–88
givingnewfunctionalityto,91–94
preferredfeaturesof,80–82
Pylinttool,forPythonsourcecode,4
PyPI(PythonPackageIndex),forcommunity-builtmodules,173–174
PyPyruntime,1,2
Python2
coroutinesin,143–145
determininguseof,2
keyword-onlyargumentsin,53–54
metaclasssyntaxin,106–107
mutatingclosurevariablesin,35
strandunicodein,5–7
zipbuilt-infunctionin,22
Python3
classdecoratorsin,111
determininguseof,2
closuresandnonlocalstatementsin,34–35
keyword-onlyargumentsin,51–53
metaclasssyntaxin,106
strandbytesin,5–7
PythonEnhancementProposal#8.SeePEP8(PythonEnhancementProposal#8)style
guide
PythonPackageIndex(PyPI),forcommunity-builtmodules,173–174
Pythonthreads.SeeThreads
pytzmodule
installing,173
pyvenvtooland,194
fortimeconversions,165–166
pyvenvcommand-linetool
purposeof,194
reproducingenvironments,196–197
forvirtualenvironments,194–196
Q
quantizemethod,ofDecimalclass,fornumericaldata,172
Queueclass,coordinatingworkbetweenthreads,132–136
R
rangebuilt-infunction,inloops,20
ReadtheDocscommunity-fundedsite,176
Refactoringattributes,@propertyinsteadof,91–95
Refactoringcode,forcirculardependencies,189
Registeringclasses,metaclassesfor,108–112
Repetitivecode
composingmix-instominimize,74
keywordargumentseliminating,45–48
__repr__specialmethod,customizingclassprintablerepresentation,203–204
reprstrings,fordebuggingoutput,202–204
requirements.txtfile,forinstallingpackages,197
returncommand,ofinteractivedebugger,209
returnstatements
ingenerators,140
notallowedinPython2generators,144
Rootexceptions
findingbugsincodewith,185–186
future-proofingAPIs,186–187
insulatingcallersfromAPIs,184–185
Ruleofleastsurprise,87,90,91
runcallmethod,forprofiling,211–213
S
Scopes,variable,closureinteractionwith,31–36
Scopingbug,inclosures,34
selectbuilt-inmodule,blockingI/O,121,124
Serializing,datastructures,109
Serializingobjects,pickleand
defaultargumentapproachto,159–160
defaultattributevaluesand,159–160
picklebuilt-inmodulefor,157–158
stableimportpathsand,161–162
__set__method,fordescriptorprotocol,97–100
set_tracefunction,pdbmodulerunning,208–209
setattrbuilt-infunction
annotatingclassattributesand,113
inbadthreadinteractions,127–128
lazyattributesand,101–102,104
__setattr__specialmethod,tolazilysetattributes,103–105
__setitem__specialmethod,inslicingsequences,10
Sets,comprehensionexpressionsin,16
setterattribute,for@propertymethod,88–89
Settermethods
descriptorprotocolfor,98–100
liabilityofusing,87–88
providingwith@property,88–89
setuptools,invirtualenvironments,195–197
Singleconstructorperclass,67,69
Single-lineexpressions,difficultieswith,8–10
sixtool,inadoptingPython3,2
Slicingsequences
basicfunctionsof,10–13
stridesyntaxin,13–15
Sort,keyargument,closurefunctionsas,31–32
sourcebin/activatecommand,enablingpyvenvtool,195
Speedup,concurrencyvs.parallelismfor,117
Starargs(*args),43
startindexes,inslicingsequences,10–13
Statements,PEP8guidancefor,4
Statictypechecking,lackof,204–205
Statsobject,forprofilinginformation,211–213
stepcommand,ofinteractivedebugger,209
StopIterationexception,39,41
strinstances,forcharactersequences,5–7
stridesyntax,inslicingsequences,13–15
strptimefunctions,conversionto/fromlocaltime,163–164
Subclasses
allowingaccesstoprivatefields,81–83
constructing/connectinggenerically,65–69
listtype,83–84
TestCase,206–207
validatingwithmetaclasses,105–108
subprocessbuilt-inmodule,forchildprocesses,118–121
superbuilt-infunction,initializingparentclasses,71–73
supermethod,avoidinginfiniterecursion,101–105
Superclassinitializationorder,MROresolving,70–73
Syntax
decorators,151–153
forclosuresmutatingvariables,34–35
forkeyword-onlyarguments,52–53
loopswithelseblocks,23
listcomprehensions,15
metaclasses,106
slicing,10–13
SyntaxErrorexceptions,dynamicimportsand,192
sysmodule,guidingmoduledefinitions,201
Systemcalls,blockingI/Oand,124–125
T
takewhilefunction,ofitertoolsmodule,170
task_donecall,methodoftheQueueclass,inbuildingpipelines,134
teefunction,ofitertoolsmodule,170
Testmethods,206–207
TestCaseclasses,subclassing,206–207
threadingbuilt-inmodule,Lockclassin,126–129
ThreadPoolExecutorclass,notenablingparallelism,147–148
Threads
blockingI/Oand,124–125
coordinatingworkbetween,132–136
parallelismpreventedby,122–123,145,146–147
preventingdataracesbetween,126–129
problemswith,136
usefulnessofmultiple,124
timebuilt-inmodule,limitationsof,163–164
Timezoneconversionmethods,162–166
timeoutparameter,inchildprocessI/O,121
tottimecolumn,inprofilerstatistics,211
tottimepercallcolumn,inprofilerstatistics,211
tracemallocbuilt-inmodule,formemoryoptimization,214–216
Transitivedependencies,192–194
try/exceptstatements,rootexceptionsand,185
try/except/else/finallyblocks,duringexceptionhandling,27–28
try/finallyblocks
duringexceptionhandling,26–27
withstatementsprovidingreusable,154–155
Tuples
extending,58
rulesforcomparing,32
asvalues,57
variableargumentsbecoming,44
zipfunctionproducing,21–23
TypeError
exceptions,forkeyword-onlyarguments,53–54
rejectingiterators,41–42
tzinfoclass,fortimezoneoperations,164–165
U
unicodeinstances,forcharactersequences,5–7
Unittests,207
unittestbuilt-inmodule,forwritingtests,205–207
UNIXtimestamp,intimeconversions,163–165
Unordereddictionaries,167
upcommand,ofinteractivedebugger,209
UTC(CoordinatedUniversalTime),intimeconversions,162–165
Utilityclasses,mix-in,creatinghierarchiesof,77–78
V
Validationcode,metaclassesrunning,105–108
ValueErrorexceptions,30–31,184
Values
fromiterators,40–42
tuplesas,57
validatingassignmentsto,89
Variablepositionalarguments
keywordargumentsand,47–48
reducingvisualnoise,43–45
Variablescopes,closureinteractionwith,31–36
--versionflag,determiningversionofPython,1–2
Virtualenvironments
pyvenvtoolcreating,194–196
reproducing,196–197
virtualenvcommand-linetool,194
Visualnoise,positionalargumentsreducing,43–45
W
WeakKeyDictionary,purpooseof,99
weakrefmodule,buildingdescriptors,113
whileloops,elseblocksfollowing,23–25
Whitespace,importanceof,3
Wildcardimports,183
withstatements
mutual-exclusionlockswith,153–154
forreusabletry/finallyblocks,154–155
astargetvaluesand,155–156
wrapshelperfunction,fromfunctools,fordefiningdecorators,152–153
Y
yieldexpression
incoroutines,137–138
ingeneratorfunctions,37
useincontextlib,155
yieldfromexpression,unsupportedinPython2,144
Z
ZeroDivisionErrorexceptions,30–31,51
zipbuilt-infunction
foriteratorsofdifferentlengths,170
processingiteratorsinparallel,21–23
zip_longestfunction,foriteratorsofdifferentlength,22–23,170
CodeSnippets