Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Python with COM
Get at your Office Data
10-Nov-98
Python with COM ~ Christian Tismer
1
Contents
1. Introduction
2. Using COM with Python
3. Accessing Excel Data
4. Representing data tables in Python
5. Reading data from Access databases
6. Reading Word tables
7. Processing of data in Python
8. Creating results
A. Supplemental
10-Nov-98
Python with COM ~ Christian Tismer
2
1 Introduction
Foreword
Prerequisites
Short overview on data management
whetting your appetite: some online examples
DM problems handled in this session
other DM problems one should know of
Specific goals of this session
collect data from different sources, convert them into a
suitable structure, modify them and put them back into
some other form
10-Nov-98
Python with COM ~ Christian Tismer
3
1.1 Foreword
Some words ... (1 of 2)
The following tutorial is on Data Management on the Windows platform. The
main target is interaction with Office Objects like Word tables, Excel sheets
and Access tables and queries. Therefore, the Win32 COM interface plays a
central role, which led to inclusion of COM into the tutorials title.
Nevertheless, my primary intent is to give you support, practice and a row of
tools, to make your data processing tasks more effective. You will get working
examples on how to explore data, how to convert it, find a better shape, insert
it into your result set, optimize for speed, make your data fly.
You will not get complete applications to handle specific tasks. Instead, you
get working code pieces and modules from production code, together with
hints, tips, work-arounds, tricks, whatever can be squeezed out of a single
person in 31/2 hours.
The majority of materials has been prepared on transparencies. I will probably
publish them as a preview on the Internet, in order to let you prepare specific
questions in advance, which are not covered yet. Hints by email are welcome.
10-Nov-98
Python with COM ~ Christian Tismer
4
1.1 Foreword
... before we begin. (2 of 2)
The whole course material will be handed out to you on a
CD-ROM. This includes the current Python distribution,
all source code, also modules which are not in the public
domain and contain a copyright statement.
Attendees of this tutorial gain the right to use our modules
for any purpose at home or in their company in order to
increase their productivity.
Excluded is the right of publishing, selling or distributing original or
modified versions of our software, or bundling it as part of a product.
Companies which wish to do so need to contact Professional Net
Service about conditions.
10-Nov-98
Python with COM ~ Christian Tismer
5
1.2 Prerequisites
In order to make use of the materials which are
handed out to you in the tutorial, you need the
following equipment:
PC workstation running Windows 95, 98, NT 4.0 or up
Not below 32 MB of main memory
CD-ROM drive
Python 1.5.1
Win32all-122.exe
Office Professional 97 or 98 (including Access)
10-Nov-98
Python with COM ~ Christian Tismer
6
1.3 On Data Management
Data management is quite a large field. The complete
scope is not so well defined. A try to give an idea:
A Data Management department is usually responsible to take care about
data entry, data quality, data conversion, data verification, data access
security, data storage security, preparation of raw data listings and basic
statistics, data and report archival.
This can be extended / simplified to “everything necessary, until a
statistician’s work can begin”.
Data Managers are more and more confrontated with new evolving data
formats, multiple data systems being used in parallel, and much increased
demands on the ouptut quality. Providing simple text files as a report is in
most cases insufficient. Modern Office tools have set new standards on
what can be expected and force the Data Manager to not only produce
data, but also present them in a convenient outfit.
You can save a lot of time for all of the above using
Python with COM
10-Nov-98
Python with COM ~ Christian Tismer
7
1.4 DM problems
DM tasks tackled in this session:
Data conversion
Data transport across applications
Finding a model for given data
Exploring an unknown set of data
Techniques for import and export
Common data formats
10-Nov-98
Python with COM ~ Christian Tismer
8
1.5 Other DM problems
DM tasks one should have heared about:
Verification of transport by closing the loop
Data entry and comparison
Generating raw data listings
Generating full reports with basic statistics
Design and management of databases
Data archival and retrieval
Coping with customer defined structures :-(
10-Nov-98
Python with COM ~ Christian Tismer
9
1.6 Specific goals of this session
Procedure
Realisation
collect data from different
sources
Access tables, Excel Tables,
Word tables, text files (fixed,
SDF)
convert them into a suitable
structure
Python table, dataset,
temporary Access tables
modify them and put them
back into some other form
write data to Access tables,
Word files, produce SDF files
10-Nov-98
Python with COM ~ Christian Tismer
10
2 Using COM with Python
How to access a COM
object
10-Nov-98
How to create and
read an interface
Python with COM ~ Christian Tismer
11
2.1 Accessing a COM object
Create an interface
identify which interface to use
generate the Python interface
figure out how to use it
try to create an object
>>> import win32com.client
>>> e=win32com.client.Dispatch("DAO.DBEngine")
>>> db=e.OpenDatabase("h:\\ab\\vergleich.mdb")
>>>
10-Nov-98
Python with COM ~ Christian Tismer
12
2.2 Reading an Interface
Learn about the interface
Buy books on the remote application
Use the application’s online help
Read the generated Python code
Try the interface from your Python shell
>>> w=win32com.client.Dispatch("Word.Application")
>>> w.Visible=1
>>> doc=w.Documents.Add()
>>> doc.Range()
<win32com.gen_py.Microsoft Word 8.0 Object Library.Range>
>>> doc.Range().Text="Hi SPAM7"
>>>
10-Nov-98
Python with COM ~ Christian Tismer
13
3 Accessing Excel Data
Getting an Excel sheet into a Python table
working with ranges and attributes
Where to get help on my objects?
how do I get the data as it looks like?
The hard way using COM
the hard way using delimited (SDF) files
10-Nov-98
Python with COM ~ Christian Tismer
14
3.1 From Excel into Python
Getting an Excel sheet into a Python table
• make sure to get
accustomed to ranges
• be careful with strings:
they come as Unicode
when reading multiple cells
>>> xl=win32com.client.Dispatch("Excel.Application")
>>> xl.Visible=-1
>>> wb=xl.Workbooks(1)
>>> sh=wb.Worksheets(1)
>>> for row in sh.UsedRange.Value: print row
(L'Name', L'Age', L'Language', L'Salary')
(L'Gates', 43.0, L'Visual Basic', L'dooh')
(L'Tismer', 42.0, L'Python', L':-(')
(L'Rossum', L'dunno, >42?', L'Python', L'SPAM')
>>>
10-Nov-98
Python with COM ~ Christian Tismer
15
3.2 Ranges and Attributes
Many properties are themselves ranges
>>> sh.UsedRange.Rows(1).Value
((L'Name', L'Age', L'Language', L'Salary'),)
>>> sh.UsedRange.Columns(1).Value
((L'Name',), (L'Gates',), (L'Tismer',), (L'Rossum',))
>>> sh.UsedRange.Columns(1)
<win32com.gen_py.Microsoft Excel 8.0 Object Library.Range>
Other properties are attributes
>>> r=sh.UsedRange.Columns(1)
>>> r.Font<win32com.gen_py.Microsoft Excel 8.0 Object
Library.Font>
>>> r.Font.Size
10.0
>>> r.Font.Size=20
>>>
10-Nov-98
Python with COM ~ Christian Tismer
16
3.3 Where to get help on my objects?
10-Nov-98
Python with COM ~ Christian Tismer
17
3.4.1 WYSIWYG data Part I
The hard way using COM
• Value property gives the true internal value
• Text property gives the current text representation
>>> r.Cells(2,2).NumberFormat="0.000"
>>> r.Cells(2,2).Text
'43.000’
>>> r.Cells(2,2).Value
43.0
>>>
You have to cycle through all the single cells to
get at the formatted text
Works, but is very slow
10-Nov-98
Python with COM ~ Christian Tismer
18
3.4.2 WYSIWYG data Part II
The hard way using SDF - very fast!
• Excel exports WYSIWYG - You parse the output
def split_delimited(s) :
"""split a delimited text file string into a list of tuples.
Quotes are obeyed, enclosed newlines are expanded to tab,
double quotes go to quotes"""
# the trick to make this handy is to use \0 as a placeholder
eol = buffalo.findlinedelimiter(s[:10000]) # guessing function
parts = string.split(string.replace(s, "\t", "\0"), '"')
limits = (0, len(parts)-1)
for i in range(len(parts)) :
part = parts[i]
if i%2 : part = string.replace(part, eol, "\t")
else :
if not part and i not in limits: part = '"'
parts[i] = part
# merge it back
txt = string.join(parts, "")
parts = string.split(txt, eol)
# now break by \0
for i in range(len(parts)) :
fields = string.split(parts[i], "\0")
parts[i] = tuple(fields)
return parts
10-Nov-98
Python with COM ~ Christian Tismer
19
4 Representing data tables in Python
simple tables as with
the Excel examples
table wrapper class
with named columns
>>> tab
[(1, 2, 3), (4, 5, 6), (7, 8, 9)]
>>> import dataset
>>> ds = dataset.DataSet(["field1", "field2", "field3"], tab)
>>> ds
DataSet with 3 rows and 3 columns
>>> ds.getFieldNames()
['field1', 'field2', 'field3']
>>> ds[-1]{'field1': 7, 'field3': 9, 'field2': 8}
>>>
10-Nov-98
Python with COM ~ Christian Tismer
20
4.1 Some DataSet methods
ambiguous
append
appendColumns
appendConstantColumn
crossTabulate
deTabulate
display
displayColumn
expand
filterByCategory
filterByColumn
filterByValueList
flatten
fold
folded
getColumn
getColumnNames, getFieldNames
getTuples
getUniqueColumnValues
guessColumnTypes
hasColumn
insert
item
join
notinlist
reduce
reduced
remove
renameColumn, renameColumns
selectColumns
sortOnColumn, sortOnColumns
splitByColumnValues
substituteInColumn
transformByColumn
union
unique
These methods are described in the dataset module.
10-Nov-98
Python with COM ~ Christian Tismer
21
4.2 A little DataSet browser
>>> import PyTabs, axsaxs
>>> db = axsaxs.Accessor("h:/verwaltung/AB/Adressen/Adreßbuch.mdb")
>>> ds = db.getDataSet("adressen")
opening adressen
reading records...
This handy little tool is itself a COM
258
server which I wrote with Delphi in an
>>> x=PyTabs.viewDS(ds)
afternoon
10-Nov-98
Python with COM ~ Christian Tismer
22
5 Reading data from Access databases
reading a table
inspecting table and field properties
creating queries dynamically
10-Nov-98
Python with COM ~ Christian Tismer
23
5.1 Reading an Access table
Using the native COM interface
>>> import win32com.client
>>> e=win32com.client.Dispatch("DAO.DBEngine.35")
>>> db=e.OpenDatabase("h:/ab/vergleich.mdb")
>>> rs = db.OpenRecordset("-finde-")
>>> f = rs.Fields
>>> names = map(lambda fld: fld.Name, f)
>>> names
['Typ_3', 'Nummer', 'Finde', 'Bemerkung']
>>> rs.MoveFirst()
>>> while not rs.EOF:
...
values = map(lambda fld:fld.Value, f)
...
print values
...
rs.MoveNext() # never forget this one!
...
['ABNORM', 0, 'Normal', None]
['ABNORM', 1, 'Abnormal', None]
['ACTTYP', 1, 'Mild', None]
>>>
10-Nov-98
This is usually
written as
"DAO.DBEngine".
some machines seem
to require ".35". I
believe this
happens when no
Office 95 was
installed before,
but I’m not sure.
Python with COM ~ Christian Tismer
24
5.1 Reading an Access table
Using the axsaxs / dataset interface
>>> import axsaxs, dataset
>>> db = axsaxs.Accessor("h:/ab/vergleich.mdb")
>>> ds = db.getDataSet("-finde-")
opening -findereading records...
241
>>> ds.getFieldNames()
['Typ_3', 'Nummer', 'Finde', 'Bemerkung']
>>> ds[0]
{'Bemerkung': None, 'Finde': 'Normal', 'Nummer': 0, 'Typ_3': 'ABNORM'}
>>> ds.item(0)
('ABNORM', 0, 'Normal', None)
>>>
A dataset is a wrapper class around tabular data in
Python. Axsaxs is a wrapper around DAO databases.
10-Nov-98
Python with COM ~ Christian Tismer
25
5.2 Accessing properties
Access TableDefs
>>> import axsaxs
>>> db=axsaxs.Accessor(r"h:\ab\vergleich.mdb")
>>> d=db.daoDB
>>> for td in d.TableDefs: print td.Name
-Finde-MedListAE_TAB
(...)
Field properties
>>>
>>>
>>>
25
>>>
...
rs = d.OpenRecordset("AE_TAB")
f=rs.Fields[0]
f.Properties.Count
for p in f.Properties:
print p.Name
Some can be changed by
assignment
10-Nov-98
Value
Attributes
CollatingOrder
Type
Name
OrdinalPosition
Size
SourceField
SourceTable
ValidateOnSet
DataUpdatable
ForeignName
DefaultValue
ValidationRule
ValidationText
Required
AllowZeroLength
FieldSize
OriginalValue
VisibleValue
ColumnHidden
ColumnWidth
ColumnOrder
DecimalPlaces
DisplayControl
>>>
Python with COM ~ Christian Tismer
26
6.1 Reading Word tables
using COM (online with Word)
>>>
>>>
>>>
>>>
>>>
1
>>>
>>>
...
...
...
...
...
...
...
>>>
11
>>>
import win32com.client
w=win32com.client.Dispatch("word.application")
w.Visible=1
doc=w.Documents.Add("d:\\tmp\\d.html")
doc.Tables.Count
this works
with HTML,
too!
tbl = []
for row in range(1, 1+len(doc.Tables(1).Rows)):
line = []
for col in range(1, 1+len(doc.Tables(1).Columns)):
try:
line.append(doc.Tables(1).Cell(row, col).Range.Text)
except:pass # exception for joined cells
tbl.append(line)
len(tbl)
10-Nov-98
Python with COM ~ Christian Tismer
27
6.2 Reading Word tables
using Rich Text files (offline, RTF parser)
# simple class to get the text from RTF.
# Especially to read tables in and get their
values.
import string, sys
sys.path.insert(0,"c:\\ab\\python")
import rtfpars
class rtftext(rtfpars.rtfstream) :
def __init__(self, fname) :
rtfpars.rtfstream.__init__(self, fname)
self.level = 0
def gettok(self) :
code, val = self.gettoken()
if code < 2 :
self.level = self.level + code
return code, val
def skipuntil(self, target) :
while 1 :
code, val = self.gettok()
if code == 0 :
if not val or val in target :
return val
def skiphead(self) :
self.skipuntil(["pard"])
10-Nov-98
def readuntil(self, target) :
res = []
while 1 :
tup = self.gettok()
res.append(tup)
code, val = tup
if code == 0 :
if not val or val in target: return res
def getthing(self) :
# a thing is a simple line or a table row.
if self.level==0 : self.skiphead()
line = self.readuntil(["par", "sect",
"cell", "row"])
if (0, "intbl") not in line: return line
buf = line
if (0, "row") not in buf:
buf = buf + self.readuntil(["row"])
cells = splitlist(buf, (0, "cell"))
rest = cells[-1]
del cells[-1]
tok = rest[-1]
del rest[-1]
row = []
for cell in cells:
row.append(splitlist(cell, (0, "par")))
row.append(rest)
row.append(tok)
return row
Python with COM ~ Christian Tismer
28
6.2 Reading Word tables (cont.)
Further examination is very data / problem specific
# helpers
# later, we will have an own paragraph
structure
# uhhm, bad without one. hack...
def gettext(para) :
ret = []
for code, val in para:
if code==2 :
ret.append(val)
elif val == "tab" :
ret.append("\t")
return string.join(ret, "")
def splitlist(lis, elem) :
# splits list, but keeps the elem
found at the end.
res = []
while 1 :
try :
pos = lis.index(elem)+1
res.append(lis[:pos])
lis[:pos] = []
except ValueError:
res.append(lis)
return res
10-Nov-98
# the little app: get all data from tables
def main(fname =
"c:\\ab\\brivudin\\513\\urin\\RE14-97E.rtf") :
global tables
tables = []
rtf = rtftext(fname)
intbl = 0
while not rtf.eof:
row = rtf.getthing()
if (0, "row") not in row :
intbl = 0
pass # print gettext(row)
continue
if not intbl :
tables.append([])
intbl = 1
textrow = []
for cell in row[:-2] :
celltext = map(gettext, cell)
textrow.append(string.join(celltext,
"\n"))
tables[-1].append(textrow)
Python with COM ~ Christian Tismer
29
7 Processing of data in Python
reorganizing tables
exploring of data
what is the contents
what is the best datatype for this column?
Grouping operations
data normalization
10-Nov-98
Python with COM ~ Christian Tismer
30
7.1 Processing of data
reorganizing tables (1 of 3)
A common task: de-tabulate data from many columns into a long one.
Hard for Access or SQL, this is a cakewalk with a dataset.
This is the raw data,
prepared a little as an
Access Query. The data
columns must be reorganized into one clumn.
Example taken from a huge
Pharmaceutical project:
Brivudin Oral had 26 large
Access Databases with
different structure. They
were all harmonized and
merged into one big
Summary database.
10-Nov-98
Python with COM ~ Christian Tismer
31
7.1 Processing of data
reorganizing tables (2 of 3)
Here an excerpt from the transformation code.
def moveECG(limit=None):
SRCT="ECG"
# does all the blocks of data, just ECG for now
#VarID,SubjectID,VisitNo,TimeStamp,Val,RefRangeID
print 'loading ecg data, wait a minute...'
ds1 = Halle.getDataSet('_exportECG',limit)
ds1.display()
ds2 = ds1.selectColumns(['Subject','Visit','P', 'PQ', 'QRS','QT','HR'])
ds3 = ds2.deTabulate(2)
ds4 = ds3.transformByColumn('Subject',globalID)
ds5 = ds4.renameColumn('Subject','SubjectID')
ds6 = ds5.renameColumn('Visit','VisitNo')
ds6.display()
Here we go reshaping our data
Summary.addMeasurements(ds6, SRCT)
10-Nov-98
Python with COM ~ Christian Tismer
32
7.1 Processing of data
reorganizing tables (3 of 3)
And a look at the resulting table...
10-Nov-98
Python with COM ~ Christian Tismer
33
7.2 Exploring of data
What is the contents?
>>> ds = db.getDataSet("ae_tab")
opening ae_tab 1
reading records...
217
>>> ds.getFieldNames()
['Subject', 'Page', 'Row', 'OCCNO', 'AE', 'HARTS', 'SEVERITY',
'STRTDT', 'STOPDT', 'PATTERN', 'RELSHIP', 'NOACT', 'SMDINCR',
'SMDRED', 'SMDINTR', 'SMDDISC', 'NDTHER', 'CONALT', 'CONADD',
'HOSPITL', 'OUTC', 'SAE', 'OK']
>>> ds.getUniqueColumnValues("SEVERITY")
[None, 1, 2, 3]
>>>
10-Nov-98
Python with COM ~ Christian Tismer
34
7.2 Exploring of data
What is the best datatype for this column?
def guessColumnType(data) :
# should go into Dataset perhaps.
# This can be done much more sophisticated.
# for now, we do simple heuristics.
data = filter(None, data)
if len(filter(isDateTime, data)) ==
len(data) :
return "Date"
data = map(str, data)
needed = max(map(len, data))
if not data : return "VarChar(60)"
# try to reduce unnecessary floats
data = map(lambda s:s[-2:]==".0" and s[:-2]
or s, data)
try :
# now try to convert
data = map(string.atoi, data)
maxval = max(data)
minval = min(data)
maxval = max(maxval, abs(minval))
if maxval <= 65535:
return “Integer"
return “Long"
except ValueError: pass
10-Nov-98
This function didn’t
make it into dataset yet,
since it is quite
database dependant.
try :
data = map(string.atof, data)
return "Double"
except ValueError: pass
if needed > 255:
return "Memo"
elif needed > 60:
return "VarChar(255)"
return "VarChar(60)"
Python with COM ~ Christian Tismer
35
7.3 Processing of data
Grouping operations
dataset.reduce(columnlist) squeezes all repeated records into one
and turns the values in columnlist into lists. Dataset.expand does
the inverse. This provides easy processing of groups of records.
>>> ds = db.getDataSet("select subject, page, ae, severity from [ae_tab 2]")
opening select subject, page, ae, severity from [ae_tab 2]
reading records...
217
>>> ds
DataSet with 217 rows and 4 columns
>>> ds2=ds.reduce(["page", "ae", "severity"])
>>> ds2.display(colwidth=20, maxrows=5)
'subject'
['page']
['ae']
['severity']
1
[103, 132, 132, 240, ['Diastolic pressure [1, 1, 1, 1,
7
[174, 174, 191, 191, ['Impatiences', 'Dro [1, 1, 1, 1,
8
[174, 191, 213, 305, ['Palpitations', 'Pa [1, 1, 1, 2,
9
[132, 152, 174, 174, ['Hypersalivation',
[2, 2, 2, 2,
10
[132]
['Decrease of diasto [1]
>>>
>>> ds2
DataSet with 45 rows and 4 columns
>>> ds2.expand()
DataSet with 217 rows and 4 columns
>>>
10-Nov-98
Python with COM ~ Christian Tismer
1,
1,
1,
2,
None
1, 1
2, 2
2, 1
36
7.4 Processing of data
data normalization (1 of 2)
with a few of the grouping operations,
redundancy in tables can be analyzed.
•
•
•
•
10-Nov-98
Group the columns with the contents by reduce
insert a unique key column
select master and detail datasets
expand the detail dataset
Python with COM ~ Christian Tismer
37
7.4 Processing of data
data normalization (2 of 2)
>>>
>>>
>>>
...
...
...
...
...
...
>>>
>>>
>>>
>>>
ds=dataset.DataSet(["nr", "nr2", "data1", "data2"], [])
import whrandom
for nr in range(1,5):
nr2 = nr*2
for k in range(whrandom.randint(1, 8)):
data1 = whrandom.randint(1, 1000)
data2 = whrandom.randint(1, 2000)
ds.append((nr, nr2, data1, data2))
ds2 = ds.reduce(["data1", "data2"])
ds3 = ds2.appendColumns(ds2.recordRange("key"))
dsmaster = ds3.selectColumns(ds3.notinlist(ds3.reduced()))
dsdetail = ds3.selectColumns(["key"]+ds3.reduced()).expand()
>>> dsdetail.display()
'key'
'data1'
'data2'
0
883
732
0
224
1853
0
889
1170
0
871
1581
1
763
453
1
867
1881
1
870
566
1
646
1509
1
645
612
2
150
1042
10-Nov-98
>>> dsmaster.display()
'nr'
'nr2'
'key'
1
2
0
2
4
1
3
6
2
4
8
3
>>>
Python with COM ~ Christian Tismer
38
8 Creating results
creating result tables in Access
creating result tables in Word
logging events in an Access table
fast writing mode for Access
producing cross tables beyond Access’
capabilities
formatting output in Word
10-Nov-98
Python with COM ~ Christian Tismer
39
8.1 Creating results
creating result tables in Access (1 of 2)
def makeTableSQL(name, ds):
fields = []
for cn in ds.getFieldNames():
fields.append( "[%s] %s" % (cn, guessColumnType(ds.getColumn(cn))) )
return "create table [%s] (%s)" % (name, string.join(fields, ", "))
>>> makeTableSQL("master", dsmaster)
'create table [master] ([nr] Integer, [nr2] Integer, [key] Integer)'
>>> from brivtools import makeTableSQL
>>> makeTableSQL("master", dsmaster)
'create table [master] ([nr] Integer, [nr2] Integer, [key] Integer)'
>>> db.execSQL(makeTableSQL("master", dsmaster))
>>> db.execSQL(makeTableSQL("detail", dsdetail))
>>> db.insertDataSet("master", dsmaster)
inserting data
4
>>> db.insertDataSet("detail", dsdetail)
inserting data
21
>>>
10-Nov-98
Python with COM ~ Christian Tismer
40
8.1 Creating results
creating result tables in Access (2 of 2)
10-Nov-98
Python with COM ~ Christian Tismer
41
8.2 Creating results
creating result tables in Word (1 of 2)
c = win32com.client.constants
def appendtable(rows, columns) :
myrange = doc.Range()
myrange.Collapse(c.wdCollapseEnd)
# sieh nach ob wir in einer Tabelle sind.
# wenn ja, hänge einen Absatz an
if myrange.Tables.Count:
myrange.Paragraphs.Add()
myrange.Collapse(c.wdCollapseEnd)
tbl = myrange.Tables.Add(myrange, rows, columns)
return tbl
This is the straight-forward way
to create a Word table:
Add a table with rows and
columns, and fill them cell by
cell.
Meanwhile you can brew
coffee, or have a meal...
def dstotableslow(ds) :
nrows = len(ds)+1
ncols = len(ds.getColumnNames())
tbl = appendtable(nrows, ncols)
header = ds.getFieldNames()
cell = tbl.Cell
for col in range(ncols):
cell(1, col+1).Range.Text = str(header[col])
content = ds.getTuples()
for row in range(len(content)):
for col in range(ncols):
cell(row+2, col+1).Range.Text = str(content[row][col])
return tbl
10-Nov-98
Python with COM ~ Christian Tismer
42
8.2 Creating results
creating result tables in Word (2 of 2)
def dstostring(ds):
nrows = len(ds)+1
ncols = len(ds.getColumnNames())
header = ds.getFieldNames()
content = ds.getTuples()
lis = [string.join(map(str, header), "\t")]
for line in content:
lis.append(string.join(map(str, line), "\t"))
lis.append("")
return string.join(lis, "\n")
But in most cases, your data
will most probably not contain
TAB characters.
This leads to a very fast
solution which converts
megabytes of table data into
Word in a few seconds.
def dstotable(ds):
nrows = len(ds)+1
ncols = len(ds.getColumnNames())
blob = dstostring(ds)
if string.count(blob, "\n") != nrows or \
string.count(blob, "\t") != (ncols-1) * nrows:
return dstotableslow(ds)
# no specials, we can use the fast one.
doc.Range().Paragraphs.Add()
myrange = doc.Range().Paragraphs.Add().Range
myrange.Text = blob
c = win32com.client.constants
myrange.ConvertToTable(Separator=c.wdSeparateByTabs, NumColumns=ncols,
NumRows=nrows, Format=c.wdTableFormatNone)
return myrange.Tables(1)
For the unlikely cases, we fall
back to the slower method.
10-Nov-98
Python with COM ~ Christian Tismer
43
8.3 Creating results
logging events in an Access table
a simple example where status records are
written into existing Access records
def msg_to_subjectvisits(subject, visit, msg) :
ddb = db.daoDB
rs = ddb.OpenRecordset('''select * from SubjectVisits
where Pat_No = %d and Visit = %d''' %(subject, visit))
rs.Edit()
rs.Fields("PythonResult").Value = msg
rs.Update()
rs.Close()
10-Nov-98
Python with COM ~ Christian Tismer
44
8.4 Creating results
fast writing mode for Access
reading an Access table with axsaxs is very fast since it is done in
larger blocks.
Writing is much more expensive since it always must happen
recordwise.
Early versions of Python’s COM interface were slow at attribute
access, and an accelerator module gave speed gains of about 3.8.
The idea is to pick pre-bound functions which are applied later.
Speed gain is meanwhile down to 1.17, but still considerable.
>>> import COMutil, speedCOM
>>> speedCOM.Install(COMutil.findModule("DAO.DBEngine.35"))
10-Nov-98
Python with COM ~ Christian Tismer
45
8.5 Creating results
producing cross tables beyond Access’ capabilities
Access can do crosstabs only on single fields. The dataset module
can collapse multiple fields into a tuple, rotate that and unpack it
again - giving multiple crosstabs.
• Group name and value fields together with selectColumns
• Fold multiple name fields into one
• >>> ds = ds.fold(firstnamecol, num_of_cols)
• Fold the same number of value fields into one
• >>> ds = ds.fold(firstvaluecol, num_of_cols)
• Do the crosstabulation
• >>> dsx = ds.crossTabulate(namefield, valuefield)
• Flatten the dataset, that’s it.
• >>> dsx = dsx.flatten()
• You still have to work on the column names a little
10-Nov-98
Python with COM ~ Christian Tismer
46
8.6 Creating results
formatting output in Word (1 of 3)
class converter:
def __init__(self, factor, divisor=1):
if factor != 1 and divisor != 1:
self.factor = float(factor)
self.divisor = float(divisor)
self.operation = self.scale
elif divisor != 1 :
self.divisor = float(divisor)
self.operation = self.divide
elif factor != 1 :
self.factor = float(factor)
self.operation = self.multiply
else:
self.operation = self.noop
def
def
def
def
def
multiply(self, arg):
divide(self, arg):
scale(self, arg):
noop(self, arg):
__call__(self, arg):
return
return
return
return
return
arg * self.factor
arg / self.divisor
arg * self.factor / self.divisor
arg+0.0
self.operation(arg)
PicasToPoints = converter(12)
PointsToPicas = converter(1, 12)
InchesToPoints = converter(72)
PointsToInches = converter(1, 72)
LinesToPoints = converter(12)
PointsToLines = converter(1, 12)
InchesToCentimeters = converter(254, 100)
CentimetersToInches = converter(100, 254)
CentimetersToPoints = converter(InchesToPoints(100), 254)
PointsToCentimeters = converter(254, InchesToPoints(100))
10-Nov-98
Python with COM ~ Christian Tismer
PythonCOM still
has some probs
with some global
functions of
Word. Here a
little surrogate
class which is not
only useful for
Word.
47
8.6 Creating results
formatting output in Word (2 of 3)
True = -1
False = 0
def simpleformat(tbl, w1=3, w2 = 3.5):
"""formats a 3-columned table arbitrary"""
c = win32com.client.constants
tbl.Columns(1).SetWidth(CentimetersToPoints(w1), c.wdAdjustProportional)
if tbl.Columns.Count >= 2 :
tbl.Columns(2).SetWidth(CentimetersToPoints(w2), c.wdAdjustProportional)
tbl.Rows(1).HeadingFormat = True
myrange = tbl.Rows(1).Range
myrange.Font.Bold = True
fmt = myrange.ParagraphFormat
fmt.SpaceBefore = 3
fmt.SpaceAfter = 3
return tbl
Now for a simple formatter
which does some very few
changes. Next page we look
at the result...
10-Nov-98
Python with COM ~ Christian Tismer
48
8.6 Creating results
formatting output in Word (3 of 3)
10-Nov-98
Python with COM ~ Christian Tismer
49
A Supplemental
how to find the right Access version of a
.mdb file
ODBC data sources and ADODB
Running your data through SAS
how to compress all your Access databases
overnight
10-Nov-98
Python with COM ~ Christian Tismer
50
A.1 Supplemental
how to find the right Access version of a .mdb file
• this is a common problem: Access shows up and offers to convert your
database. Buf you want to open it with the right Access version
instead.
Solution: acc278.py does what it’s name says.
Usage:
After installation, you can double-click any .MDB file from the Explorer, and the according Access
version will be loaded. Only this action is intercepted. Opening from the file menu works as usual.
Installation:
- Copy this file into your python directory.
- Edit the path settings for your Access executables.
- From Explorer, open the File Type settings for .MDB and change the "open" method as follows:
d:\python\pythonw.exe d:\python\acc278.py "%1"
(adjust paths accordingly)
DAO's "Version" method for Access 7/8 won't help, since the databases are identical in structure.
Only the objects which Access stores in the database have changed. An undocumented feature which
works for all my databases is a Property "AccessVersion" which is stored in a database property
collection. We use DAO to read the version number and fire up the right MSACCESS.EXE version.
10-Nov-98
Python with COM ~ Christian Tismer
51
A.2 Supplemental
ODBC data sources and ADODB
this chapter is in preparation. The “axsaxs”
module meanwhile has an “axsado” companion
which works with several databases, like MS
SQL server 6.5, and it will support ODBC
sources as well. The module will be ready in a
few weeks.
10-Nov-98
Python with COM ~ Christian Tismer
52
A.3 Supplemental
Running your data through SAS
this chapter is in preparation. The SAS module
is under development and will be ready in a few
weeks.
10-Nov-98
Python with COM ~ Christian Tismer
53
A.4 Supplemental
how to compress all your databases overnight (1 of 3)
#
#
#
#
#
19980113 recursion
compact
CT971215
compresses MS Access databases
and keeps the date info.
import dao3032, os, stat, string, sys
infofile = ".compactinfo"
def compact(fname) :
e=dao3032.DBEngine()
x=os.stat(fname)
tim=(x[7],x[8]) # == stat.ST_ATIME, stat.ST_MTIME
fneu = fname+".neu"
e.CompactDatabase(fname, fneu)
os.utime(fneu, tim)
os.unlink(fname)
os.rename(fneu, fname)
x=os.stat(fname)
return x[stat.ST_SIZE]
10-Nov-98
Python with COM ~ Christian Tismer
54
A.4 Supplemental
how to compress all your databases overnight (2 of 3)
def get_tree(path, retlist=None) :
if retlist is None: retlist=[]
dirs = [] ; known = {}
try :
for row in open(os.path.join(path, infofile)).readlines() :
fields = eval(row)
if len(fields) == 2 :
known[fields[0]] = fields[1]
except : pass
for entry in os.listdir(path) :
fullname = os.path.join(path, entry)
x=os.stat(fullname)
mode = x[0] # == stat.ST_MODE
if stat.S_ISDIR(mode) :
print "Directory: %s" %entry
dirs.append(entry)
elif stat.S_ISREG(mode) and string.lower(entry)[-4:]==".mdb":
print "Database %s" % entry
if not (mode & stat.S_IWRITE) :
print 10*'*', 'SCHREIBGESCHÜTZT', 10*'*'
continue
if known.has_key(entry) and known[entry]==x[stat.ST_SIZE] :
print 10*'+', 'Größe ist unverändert', 10*'+'
continue
10-Nov-98
Python with COM ~ Christian Tismer
55
A.4 Supplemental
how to compress all your databases overnight (3 of 3)
retlist.append(x[stat.ST_SIZE], fullname)
for subdir in dirs:
newpath = os.path.join(path, subdir)
print "recursing into %s" % newpath
get_tree(newpath, retlist)
print "back from %s" % newpath
return retlist
def compactall(path) :
print "searching tree %s" % path
worklist = get_tree(path)
print "sorting by size, smallest first"
worklist.sort()
for (size, database) in worklist:
print "compacting %s (%d)" % (database, size),
sys.stdout.flush()
newsize = compact(database)
logpath, entry = os.path.split(database)
logpath = os.path.join(logpath, infofile)
open(logpath, "a").write(repr((entry, newsize))+"\n")
print "- %d (ratio=%0.2f)" % (newsize, (0.0+newsize)/size)
print "done compacting all of %s" % path
10-Nov-98
Python with COM ~ Christian Tismer
56