Download PythonRegExprTut

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CSC1018F:
Regular Expressions
(Tutorial)
Diving into Python Ch. 7
Number Systems
Election of Class Representative
Duties:
Represent the interests and concerns of
the class
At regular departmental meetings
On an ad-hoc basis
This does not mean that individuals
cannot approach the tutors, TA,
course coordinator or lecturer
Mini-Test Answers (1)
The regular expression r"^(\d{3})\D*(\d{4})\D+$" will match
successfully to:
a.
b.
c.
d.
e.
"1234567"
"123-4567abc"
"1234567x"
"12345678"
"/1234567/"
A regular expression which accepts names of the type (Mr Warren
Worthington) prefixed by the designation (Mr, Mrs, Ms, Dr, Prof) with a
single first name and single surname and that groups the components of
the name, will have the form:
a. r"\bMr|Mrs|Ms|Dr|Prof \D+ \D+%”
b. "\b(Mr|Mrs|Ms|Dr|Prof)\D([A-Z][a-z])\D([A-Z][a-z])%”
c. r"^(M\D+|Dr|Prof) (\D+) (\D+|\D+-\D+)%”
d. r"^(Mr|Mrs|Ms|Dr|Prof) ([A-Z][a-z]*) ([A-Z][a-z]*)%”
(Where [a-z] expresses a range of possible single characters)
Mini-Test Answers (2)
The binary number 101010 has a hexadecimal
representation of:
a.
b.
c.
d.
2A
222
A2
CA
The The hexadecimal number BED has a binary
representation of:
a.
b.
c.
d.
011101111111
101111101101
111101110111
111111011110
Mini-Test Answers (3)
Construct a regular expression that will parse landline telephone numbers.
Regional code is a 0 followed by two or three digits. Landline codes may have an optional international prefix of
“+27” in which case the zero prefix and parentheses fall
away (e.g., “+2721”). The local dialling code can have five
or six numeric digits.
r"^((\+27\d{2,3})|(\(0\d{2,3}\))) \d{6,7}$"
Revision Exercise
Create a function which will take a date string in
any one of the following formats:
dd/mm/yyyy or dd/mm/yy
Other separators (e.g., ‘\’, ‘ ‘, ‘-’) are also allowed
Single figure entries may have the form x or 0x, e.g. 3/4/5 or
03/04/05
dd month yy or yyyy where month may be written in full
(December) or abbreviated (Dec. or Dec)
And return it in the format:
dd month(in full) yyyy, e.g. 13 March 2006
Implement this using regular expressions and also
implement range checking on dates
Revision Solution (1)
def parseDate(dateStr):
import re
monthRange = (31, 28, 31, 30, 31, 30, 31, 31, 30, …
monthName = ("January", "February", …
monthAbbr = ["Jan", "Feb", "Mar", "Apr", "May", …
# Parsing
datePattern = r"""^
(\d{1,2})
# day (1 or 2 digits)
\W+
# any non-alphanumeric separator
(\d{1,2} |
# month (1 or 2 digits)
January | February | March | April | May | June | July |
August | September | October | November | December |
# month in full
Jan.? | Feb.? | Mar.? | Apr.? | May.? | Jun.? | Jul.? |
Aug.? | Sep.? | Oct.? | Nov.? | Dec.?)
# abbreviated month
\W+
# any non-alphanumeric separator
(\d{1,2} | \d{4}) # year (1, 2 or 4 digits)
$
"""
Revision Solution (2)
resDate = re.search(datePattern, dateStr, re.VERBOSE)
if resDate == None:
return None
else:
# find month entry
resTuple = resDate.groups()
day = resTuple[0]
month = resTuple[1]
year = resTuple[2]
try:
monthInd = int(month)-1
except ValueError:
# match on first 3 chars of month
monthInd = monthAbbr.index(month[:3])
if int(day) < 1 or int(day) > monthRange[monthInd]:
return None
else:
if len(year) == 2: # fix year if needed
year = int(year) > 50 and "19" + year or "20" + year
return str(int(day)) + " " + monthName[monthInd] + " " + year
Related documents