Download Module 4 – Python and Regular Expressions What is Python?

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Module4–PythonandRegularExpressions
•  Module4containsonlyanindividualassignment
•  DueonMondayFebruary27th
•  DonotwaitunAlthelastminutetostartonthismodule
•  ReadtheWIKIbeforestarAngalongwithafewPythontutorials
•  PorAonsoftoday’sslidescamefrom
–  MarcConrad
•  UniversityofLuton
–  PaulPrescod
•  VancouverPythonUsers’Group
–  JamesCasey
•  Opscode
–  TimFinin
•  UniveristyofMaryland
ExtensibleNetworkingPla3orm
1-CSE330–Crea+veProgrammingandRapidPrototyping
1
WhatisPython?
•  Pythonisaneasytolearn,powerfulprogramming
language
– Efficienthigh-leveldatastructures
– Simpleapproachtoobject-oriented
programming.
– Elegantsyntaxanddynamictyping
– Up-and-cominglanguageintheopensource
world
•  WeareusingPythonversion2.x(not3.x)
ExtensibleNetworkingPla3orm
2-CSE330–Crea+veProgrammingandRapidPrototyping
2
UsabilityFeatures
•  Veryclearsyntax
•  Obviouswaytodomostthings
•  Hugeamountoffreecodeandlibraries
•  InteracAve
•  OnlyinnovaAvewhereinnovaAonisreally
necessary
–  BeSertostealagoodideathaninventabadone!
ExtensibleNetworkingPla3orm
3-CSE330–Crea+veProgrammingandRapidPrototyping
3
Python“Helloworld"
print "Hello, World"
ExtensibleNetworkingPla3orm
4-CSE330–Crea+veProgrammingandRapidPrototyping
4
PythonInterpreter
•  Justtype:
•  Todds-MacBook-Air:~todd$python
•  Python2.7.10(default,Sep232015,04:34:21)
•  [GCC4.2.1CompaAbleAppleLLVM7.0.0(clang-700.0.72)]
ondarwin
•  Type"help","copyright","credits"or"license"formore
informaAon.
ExtensibleNetworkingPla3orm
5-CSE330–Crea+veProgrammingandRapidPrototyping
5
FeaturesoftheInterpreter
•  Linesstartwith“>>>”.YoucanrecognizePython
interpretertranscriptsanywhereyouseethem.
•  Expressionsthatreturnavaluedisplaythevalue.
>>> 5+3*4
17
•  Thissavesyoufromexcessive“print”ing
ExtensibleNetworkingPla3orm
6-CSE330–Crea+veProgrammingandRapidPrototyping
6
InteracAveInterpreters
•  Windowscommandline
•  OSX
•  Linux/Unix
•  Graphicalcommandlines:“IDLE”,“PythonWin”,
“MacPython”,…
•  Jython
•  Andmanymore…
ExtensibleNetworkingPla3orm
7-CSE330–Crea+veProgrammingandRapidPrototyping
7
Pythonscripts
•  SomeAmesyouwanttorunthesameprogrammore
thanonce!
•  MakeafilewithPythonstatementsinit:
foo.py:
print “hello world”
todd$ python foo.py
hello world
todd$ python foo.py
hello world
ExtensibleNetworkingPla3orm
8-CSE330–Crea+veProgrammingandRapidPrototyping
8
Pythonisdynamicallytyped
width = 20
print width
height = 5 * 9
print height
print width * height
width = "really wide"
print width
ExtensibleNetworkingPla3orm
9-CSE330–Crea+veProgrammingandRapidPrototyping
9
ExperimentintheInterpreter
•  AnyPythonvariablecanholdanyvalue.
>>> width = 20
>>> height = 5 * 9
>>> print width * height
900
>>> width = "really wide"
>>> print width
really wide
ExtensibleNetworkingPla3orm
10-CSE330–Crea+veProgrammingandRapidPrototyping
10
DynamicTypeChecking
test_sqrt.py:
import math
def square_root(num):
return math.sqrt(num)
def goodfunc():
print square_root(10)
def badfunc():
print square_root("10")
goodfunc()
badfunc()
ExtensibleNetworkingPla3orm
11-CSE330–Crea+veProgrammingandRapidPrototyping
11
MulAplestatementsonaline
•  YoucancombinemulAplesimplestatementsona
line:
>>> a = 5;print a; a = 6; print a
5
6
ExtensibleNetworkingPla3orm
12-CSE330–Crea+veProgrammingandRapidPrototyping
12
IndentaAon
•  PythonusesindentaAonforscoping:
if this_function(that_variable):
do_something()
else:
do_something_else()
ExtensibleNetworkingPla3orm
13-CSE330–Crea+veProgrammingandRapidPrototyping
13
IndentaAon
•  Tabsandspaceslookthesameinmosteditors.
•  Ifyoureditorusesadifferentconversionrate
betweentabsandspacesthan“standard”,your
Pythoncodemaynotparseproperly.
•  ThreeeasysoluAons:
1.  Onlyusetabsorspacesinafile:don’tmixthem.
2.  UseaneditorthatknowsaboutPython.
3.  Configureeditortousethesametab/spacerulesasPython,vi,emacs,
notepad,edit,etc.:8spacespertab
ExtensibleNetworkingPla3orm
14-CSE330–Crea+veProgrammingandRapidPrototyping
14
ComparedtoPHP/Javascript
•  ExcellentforWebapps(PHPonserver,Javascript
onclient)butnotmuchelse.
•  PythoncanbeusedforyourWebapps,your
complicatedalgorithms,yourGUIs,yourCOM
components,anextensionlanguageforJava
programs
•  EveninWebapps,Pythonhandlescomplexity
beoer.
ExtensibleNetworkingPla3orm
15-CSE330–Crea+veProgrammingandRapidPrototyping
15
ComparedtoJava
•  Javaismoredifficultforamateur
programmers.
•  StaActypecheckingcanbeinconvenientand
inflexible.
•  Booomline:Javacanmakeprojectsharder
thantheyneedtobe.
ExtensibleNetworkingPla3orm
16-CSE330–Crea+veProgrammingandRapidPrototyping
16
PythonLimitaAons
•  NotthefastestexecuAngprogramminglanguage:
– 
– 
– 
– 
C/C++isnaturallyfast
Perl’sregularexpressionsandIOarealiSlefaster
SomeJavaimplementabonshavegoodJITs
ButPythonalsohassomespeedadvantages:
•  Fastimplementabonsofbuilt-indatastructures
•  PyrexcompilesPythoncodetoC
•  DynamictypecheckingrequiresmorecareintesAng.
•  Languagechanges(relaAvely)quickly:thisisa
strengthandaweakness.
ExtensibleNetworkingPla3orm
17-CSE330–Crea+veProgrammingandRapidPrototyping
17
ObjectsAlltheWayDown
•  EverythinginPythonisanobject
•  Integersareobjects.
•  Charactersareobjects.
•  Complexnumbersareobjects.
•  Booleansareobjects.
•  FuncAonsareobjects.
•  Methodsareobjects.
•  Modulesareobjects
ExtensibleNetworkingPla3orm
18-CSE330–Crea+veProgrammingandRapidPrototyping
18
ObjectTypeandIdenAty
•  Youcanfindoutthetypeofanyobject:
>>>printtype(1)
<type'int'>
>>>printtype(1.0)
<type'float'>
•  EveryobjectalsohasauniqueidenAfier(usuallyonly
fordebuggingpurposes)
>>>printid(1)
7629640
>>>printid("1")
7910560
ExtensibleNetworkingPla3orm
19-CSE330–Crea+veProgrammingandRapidPrototyping
19
None
•  “None”representsthelackofavalue.
•  Like“NULL”insomelanguagesorindatabases.
•  Forinstance:
>>> if y!=0:
... fraction = x/y
... else:
... fraction = None ExtensibleNetworkingPla3orm
20-CSE330–Crea+veProgrammingandRapidPrototyping
20
FileObjects
•  Representopenedfiles:
>>> infile = file( "catalog.txt", "r" )
>>> data = infile.read()
>>> infile.close()
>>> outfile = file( "catalog2.txt", "w" )
>>> data = data+ "more data"
>>> outfile.write( data )
>>> outfile.close() •  YoumaysomeAmesseethename“open”used
tocreatefiles.
ExtensibleNetworkingPla3orm
21-CSE330–Crea+veProgrammingandRapidPrototyping
21
BasicFlowControl
•  if/elif/else(testcondiAon)
•  while(loopunAlcondiAonchanges)
•  for(iterateoveriteraterableobject)
ExtensibleNetworkingPla3orm
22-CSE330–Crea+veProgrammingandRapidPrototyping
22
ifStatement
if j=="Hello":
doSomething()
elif j=="World":
doSomethingElse()
else:
doTheRightThing()
ExtensibleNetworkingPla3orm
23-CSE330–Crea+veProgrammingandRapidPrototyping
23
whileStatement
str=""
while str!="quit":
str=raw_input()
print str
print "Done"
ExtensibleNetworkingPla3orm
24-CSE330–Crea+veProgrammingandRapidPrototyping
24
forStatement
myList = ["a", "b", "c", "d", "e"]
for i in myList:
print i
for i in range( 10 ):
print i
for i in range( len( myList ) ):
if myList[i]=="c":
myList[i]=None
•  Can“break”outoffor-loops.
•  Can“conAnue”tonextiteraAon.
ExtensibleNetworkingPla3orm
25-CSE330–Crea+veProgrammingandRapidPrototyping
25
PythonModules
ExtensibleNetworkingPla3orm
26-CSE330–Crea+veProgrammingandRapidPrototyping
26
WhatisaModule?
-  AfilecontainingsomePythoncode
OR
-  A.dll(.soonUnix)containingcompiledcodewhich
followssomeguidelines
-  Anamespace
ExtensibleNetworkingPla3orm
27-CSE330–Crea+veProgrammingandRapidPrototyping
27
APythonModule
def hello_world():
print "Hello world" •  Savethisas“myModule.py”Nowwecanuseit:
>>> import myModule
>>> myModule.hello_world() •  Or:
>>> from myModule import hello_world
>>> hello_world()
ExtensibleNetworkingPla3orm
28-CSE330–Crea+veProgrammingandRapidPrototyping
28
WebClientAccess-Example
>>> import urllib2
>>> url = 'http://research.engineering.wustl.edu/
~todd/date.php'
>>> data = urllib2.urlopen(url)
>>> for line in data:
... If ’Today’ in line:
... print line
...
<BR>Today is 02-24-2016
ExtensibleNetworkingPla3orm
29-CSE330–Crea+veProgrammingandRapidPrototyping
29
OtherBuilt-inProtocols
• 
• 
• 
• 
• 
• 
• 
• 
FTP
XML-RPC
Telnet
POP
IMAP
MIME
NNTP
HTTP
ExtensibleNetworkingPla3orm
30-CSE330–Crea+veProgrammingandRapidPrototyping
• 
• 
• 
• 
• 
SSL
Sockets
CGI
Gopher
URLParsing
•  Plusdownloadablemodules
foreveryotherprotocolin
theuniverse!
30
ExtensibleNetworkingPla3orm
31-CSE330–Crea+veProgrammingandRapidPrototyping
31
RegularExpressions
ExtensibleNetworkingPla3orm
32-CSE330–Crea+veProgrammingandRapidPrototyping
32
RegularExpressions
•  Regularexpressionsareapowerfulstring
manipulaAontool
•  Allmodernlanguageshavesimilarlibrarypackages
forregularexpressions
•  Useregularexpressionsto:
–  Searchastring(search and match)
–  Replacepartsofastring (sub)
–  Breakstringsintosmallerpieces (split)
ExtensibleNetworkingPla3orm
33-CSE330–Crea+veProgrammingandRapidPrototyping
33
RegularExpressionSyntax
•  Mostcharactersmatchthemselves
Theregularexpression“test”matchesthestring
‘test’,andonlythatstring
•  [x]matchesanyoneofalistofcharacters
“[abc]”matches‘a’,‘b’,or ‘c’
•  [^x]matchesanyonecharacterthatisnotincluded
inx
“[^abc]”matchesanysinglecharacterexcept
‘a’,’b’,or ‘c’
ExtensibleNetworkingPla3orm
34-CSE330–Crea+veProgrammingandRapidPrototyping
34
RegularExpressionSyntax
•  “.”matchesanysinglecharacter
•  Parenthesescanbeusedforgrouping
“(abc)+”matches’abc’, ‘abcabc’,
‘abcabcabc’, etc.
•  x|ymatchesxory
“this|that”matches‘this’ and ‘that’,
butnot ‘thisthat’.
ExtensibleNetworkingPla3orm
35-CSE330–Crea+veProgrammingandRapidPrototyping
35
RegularExpressionSyntax
•  x*matcheszeroormorex’s
“a*”matches’’,’a’,’aa’, etc.
•  x+matchesoneormorex’s
“a+”matches’a’,’aa’,’aaa’,etc.
•  x?matcheszerooronex’s
“a?”matches’’or’a’
•  x{m,n}matchesix‘s,wherem<i<n
“a{2,3}”matches’aa’ or ’aaa’
ExtensibleNetworkingPla3orm
36-CSE330–Crea+veProgrammingandRapidPrototyping
36
RegularExpressionSyntax
•  “\d”matchesanydigit;
“\D”anynon-digit
•  “\s”matchesanywhitespacecharacter;
“\S”anynon-whitespacecharacter
•  “\w”matchesanyalphanumericcharacter;
“\W”anynon-alphanumericcharacter
•  “^”matchesthebeginningofthestring;
“$”theendofthestring
ExtensibleNetworkingPla3orm
37-CSE330–Crea+veProgrammingandRapidPrototyping
37
DebuggexExample
ExtensibleNetworkingPla3orm
38-CSE330–Crea+veProgrammingandRapidPrototyping
38
SearchandMatchinPythonRegEx
•  ThetwobasicfuncAonsarere.searchandre.match
–  SearchlooksforapaSernanywhereinastring
–  Matchlooksforamatchstarbngatthebeginning
•  BothreturnNone(logicalfalse)ifthepaoernisn’t
foundanda“matchobject”instanceifitis
>>> import re
>>> pat = "a*b”
>>> re.search(pat,"fooaaabcde")
<_sre.SRE_Match object at 0x809c0>
>>> re.match(pat,"fooaaabcde")
>>>
ExtensibleNetworkingPla3orm
39-CSE330–Crea+veProgrammingandRapidPrototyping
39
What’samatchobject?
•  Aninstanceofthematchclasswiththedetailsofthe
matchresult
>>> r1 = re.search("a*b","fooaaabcde")
>>> r1.group() # group returns string
matched
'aaab'
>>> r1.start() # index of the match
start
3
>>> r1.end()
# index of the match end
7
>>> r1.span()
# tuple of (start, end)
(3, 7)
ExtensibleNetworkingPla3orm
40-CSE330–Crea+veProgrammingandRapidPrototyping
40
Whatgotmatched?
•  Here’sapaoerntomatchsimpleemailaddresses
\w+@(\w+\.)+(com|org|net|edu)
>>> pat1 = "\w+@(\w+\.)+(com|org|net|edu)"
>>> r1 = re.match(pat1,"[email protected]")
>>> r1.group()
’[email protected]’
•  Wemightwanttoextractthepaoernparts,likethe
emailnameandhost
ExtensibleNetworkingPla3orm
41-CSE330–Crea+veProgrammingandRapidPrototyping
41
Whatgotmatched?
•  Wecanputparenthesesaroundgroupswewanttobe
abletoreference
>>> pat2 = "(\w+)@((\w+\.)+(com|org|net|edu))"
>>> r2 = re.match(pat2,”[email protected]")
>>> r2.group(1)
’todd'
>>> r2.group(2)
’arl.wustl.edu'
>>> r2.groups()
r2.groups()
(’todd', ’arl.wustl.edu', ’wustl.', 'edu’)
•  Notethatthe‘groups’arenumberedinapreorder
traversal
ExtensibleNetworkingPla3orm
42-CSE330–Crea+veProgrammingandRapidPrototyping
42
Whatgotmatched?
•  Wecan‘label’thegroupsaswell…
>>> pat3 ="(?P<name>\w+)@(?P<host>(\w+\.)+
(com|org|net|edu))"
>>> r3 = re.match(pat3,"[email protected]")
>>> r3.group('name')
’todd'
>>> r3.group('host')
’arl.wustl.edu’
•  Andreferencethematchingpartsbythelabels
ExtensibleNetworkingPla3orm
43-CSE330–Crea+veProgrammingandRapidPrototyping
43
MorerefuncAons
•  re.split()islikesplitbutcanusepaoerns
>>> re.split("\W+", “This... is a test,
short and sweet, of split().”)
['This', 'is', 'a', 'test', 'short’,
'and', 'sweet', 'of', 'split’, ‘’]
•  re.subsubsAtutesonestringforapaoern
>>> re.sub('(blue|white|red)', 'black', 'blue
socks and red shoes')
'black socks and black shoes’
•  re.findall()findsallmatches
>>> re.findall("\d+”,"12 dogs,11 cats, 1 egg")
['12', '11', ’1’]
ExtensibleNetworkingPla3orm
44-CSE330–Crea+veProgrammingandRapidPrototyping
44
Compilingregularexpressions
•  Ifyouplantousearepaoernmorethanonce,compileit
toareobject
•  Pythonproducesaspecialdatastructurethatspeedsup
matching
>>> cpat3 = re.compile(pat3)
>>> cpat3
<_sre.SRE_Pattern object at 0x2d9c0>
>>> r3 =
cpat3.search("[email protected]")
>>> r3
<_sre.SRE_Match object at 0x895a0>
>>> r3.group()
’[email protected]'
ExtensibleNetworkingPla3orm
45-CSE330–Crea+veProgrammingandRapidPrototyping
45
Module4Assignment
ExtensibleNetworkingPla3orm
46-CSE330–Crea+veProgrammingandRapidPrototyping
46