Download Using Open Source Software in Audits

Document related concepts

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Time series wikipedia , lookup

Transcript
Open Source Audit Software
IIA District Conference
Durham, NC
2/27/2009
Track 1 – Internal Audit
Mike Blakley, EZ-R Stats, LLC
1
Objectives
1. Open source audit
software – advantages /
disadvantages
2. Audit software
functionality of four major
software packages
3. SQLite - application in
various audit areas
2
Objectives (cont’d)
4. RAT-STATS - random
sampling
5. "R" system and its
applications
6. Cephes - basic functionality
7. Excel   open source
software
3
What is open source
software?
 Source and binaries
 Languages
 Maintained by various persons
 Support / development -
volunteer basis
 Licensing - GPL, Public Domain,
etc.
4
Advantages
1. Transparency
2. Portability
3. Lower cost
5
Disadvantages
1. May require additional
expertise
2. No slick front-end
3. Plain packaging
4. Support?
6
Objectives
1. Open source software -,
advantages/disadvantages
Next topic: Four Major Packages
7
Four major packages
1. SQLite - database system
2. RAT-STATS - random sampling
system
3. R - library of statistical and plotting
routines
4. Cephes - mathematical and
statistical routines
8
How Excel fits in
 Audit tests on data in SQLite
 RAT-STATS - Excel
workbooks
 R has an Excel interface
 Run R scripts from Excel
 Cephes routines can be called
directly from Excel
9
Recap of objectives
 1. Open source software -,
advantages/disadvantages
 2. Four major software packages
Next topic is SQLite
10
Overview
 Developed in North Carolina!
 Largest number of database installations
 Public domain
 Standards compliant - SQL92
 Very fast, written in “C”
 Zero installation
SQLite
11
Example Audit uses
 Sample planning
 Population statistics
 Identification of duplicates
 Match/merge
 Benford's Law
 Same, same, different
 Data stratification
12
Advantages
 Cost effective -
fast database
 No license cost
 Simple to install
 Portable
 Standards compliant
13
Disadvantages
 Doesn't have every
"bell and whistle"
 Doesn't support every
functionality
 Basic system is
“command line”
14
SQLite Front Ends
 Excel
 SQLite browser
 Others
15
Specific audit applications
 White paper available
which explains many of
the topics
 Article in EDPACS, June
2008
16
How to load data
Load using manual
"scripts"
Load with free
software
Import from Excel,
Access, text files
17
Target audience
 Auditors
 Audit Managers
 Business Analysts
 Researchers
 Anyone working with large data
volumes
18
Screen Shots of SQLiteBrowser
1. Identification of
duplicates
2. “Drill down” (using
where clause)
3. Population
subtotals and
basic statistics
Public domain
SQLite Database Browser
19
Identification of Duplicates
20
“Drill down” with where clause
21
Population Statistics
22
More information
 SQLite site – http://sqlite.org
 EZ-R Stats –
http://ezrstats.com
 SQLite browser
http://sqlitebrowser.sourceforge.net/
23
Wrap up Objective 3
 What is SQLite?
 What audit areas can it be used?
 Data import
Next topic is Random Sampling
24
RAT-STATS
 Federal HHS in San Francisco, with




assistance from several universities
Comprehensive
Widely used in the health care industry
Has withstood court challenges
Are others, such as EZ-Quant (DOD)
25
Major functional areas
 1. Random number generation
 2. Sample size determination
 3. Attribute sampling
 4. Variable sampling
 5. Types of sampling
 stratified
 unrestricted
 other
26
How it works
 Windows based (no Mac or
Linux)
 Simple to install
 Some documentation
 Works with Excel, Access
and text files
27
Advantages
 Comprehensive
court
challenges as to validity
 Withstood
 Does all the computations
 Provides basic documentation
for work-papers
 Easy to install
 No license cost
28
Disadvantages
 Only certain confidence
levels
 Little transparency (FOIA)
 Support?
29
Screen Shots
1. Random numbers
2. Variable sampling
30
Random numbers
31
Variable sampling
32
Wrap up Objective 4
 What is RAT-STATS?
 Audit Areas
 Random numbers
 Attribute sampling
 Variable sampling
Next topic is R
33
 World-wide development
Statisticians
 College Professors

 Library of statistical
routines
 Extensive plotting and
charting capabilities
 R is `GNU S’
34
Major functional areas
 1. Statistical computing
 2. Graphics
 3. Linear regression and
modeling
 4. Statistical tests
 5. Time series analysis
 6. Data Classification
35
How it works
 Windows, Mac or Linux
 Relatively simple to install
 Extensive documentation
 Works with



Excel, Access
text files
many databases (including SQLite)
36
Audit areas
 Excellent capabilities for
regression
 Does step-wise regression
(quite costly in other packages)
 Sample planning
 Population statistics
 Charting/plotting as part of
audit planning
37
Advantages
 Comprehensive
 Good charting and
plotting capabilities
 Extensive statistical
functions
 Easy to install
 No license cost
38
Disadvantages
User interface
Fairly steep
learning curve
Support?
39
Screen Shots
1. Stepwise regression
2. Plot - confidence/precision
intervals
40
Stepwise regression
41
Confidence Intervals
42
Wrap up Objective 5
 What is R?
 What audit areas can it be used to
address
Next topic is Cephes
43
Cephes
 Federal Department of Energy




at Oak Ridge Laboratories
Library of mathematical and
statistical routines (400+)
Adaptation of earlier versions in
FORTRAN
Translated into C and Visual
Basic
Highly reliable and extensively
tested
44
Major functional areas
 1. Statistical
computing
 2. Mathematical
computations
 3. Probability
45
How it works
 Windows only
 Relatively simple to install
 Extensive documentation
 Works as stand alone
routines or can be called
from Excel
46
Audit areas
 Sample calculations
 Random number generation
 Sample planning
 Population statistics
47
Advantages
 Reliable, extensive
testing (IEEE)
 Extensive statistical
functions
 Easy to install
 No license cost
48
Disadvantages
Support ?
49
Example of probability
functions
 Chi square distribution
 Complemented Chi square
 Inverse Chi square
 Normal distribution
 Inverse normal distribution
 Poisson distribution
 Inverse Poisson distribution
 Student's t distribution
50
Example of Arithmetic and
Algebraic functions
 Square root
 Long integer square root
 Cube root
 Evaluate polynomial
 Round to nearest integer value
 Truncate upward to integer
 Truncate downward to integer
 Absolute value
51
Screen shots
1. Calculations with Excel
VBA
2. Plot with
confidence/precision
intervals
52
Calculations with Excel VBA
53
Plot with
confidence/precision intervals
54
Wrap up Objective 6
 What is Cephes?
 Useful for evaluation of random
samples, linear regression, etc.
Next topic is Excel as a platform
55
Excel as an audit platform
 Extensive capabilities, generally
underused
 Can be integrated with open source
software
 ActiveX Data Objects (ADO)
 Visual Basic for Applications (VBA)
 Calling external routines
 COM Servers
56
ActiveX Data Objects
AuditNet
"End User Database Access Using
Excel"
http://www.auditnet.org/articles/MB200803.htm
Example is use of SQLite
57
Visual Basic for Applications
 Very extensive capabilities
 Entire applications written in
VBA
 Powerful audit tool
 Example library
58
Calling external routines
 Can be used to build scripts
 Then executed by external
applications
 Excel - Shell command
 Provides ability to perform a variety
of tasks, such as


charting and plotting using R
running database queries
59
COM Servers
 Makes routines directly accessible
to Excel using "CreateObject"
 Cephes library is an example
 Many free com servers available
 Simplifies Excel by
"compartmentalizing" program logic
60
Advantages
 Already widely used
 Many "built-in" capabilities
 Macro language VBA widely
understood
61
Disadvantages
Learning curve
Support?
62
Wrap up Objective 7
 Excel as an audit platform
 Uses include:
 database
queries,
 running R,
 complex stat calculations
63
Questions?
Contact info:
919-715-4791
[email protected]
View this presentation
64