Download Metalco: The SAP Proposal

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Data Mining Journal Entries for
Fraud Detection: A Pilot Study
by Roger S. Debreceny &
Glen L. Gray
Discussed by
Severin Grabski
Objective
• Explore research issues related to the
application of statistical data mining to
fraud detection in journal entries
– Is this important?
– YES! Most significant frauds are not
conducted by the users of the ERP systems,
they are done “outside” of these well
controlled systems.
• Was this accomplished?
– Maybe
Accomplished?
• Used Benford’s Law in examining Journal
Entries
• Statistically significant differences in First
Digit distributions were found (Chi
Square test), should these be
investigated?
– A 0% difference (Omicron) gives a statistically
significant p < 0.015. What does this tell me?
– Is a 1% difference between observed and
predicted indicative of a problem?
– Could use Mean Absolute Deviation
Entity
Beta
Chi
ChiEta
ChiNu
ChiPi
Delta
Eta
EtaNu
EtaPi
Nu
Total Dev
0.19
0.03
0.06
0.11
0.30
0.06
0.20
0.10
0.08
0.34
MAD
0.0211
0.0033
0.0067
0.0122
0.0333
0.0067
0.0222
0.0111
0.0089
0.0378
Benford’s Law & First 5 Firms
35%
30%
25%
Benford
Beta
Chi
ChiEta
ChiNu
ChiPi
20%
15%
10%
5%
0%
1
2
3
4
5
6
7
8
9
Accomplished?
• Identification of “violations” of the Benford’s
First Digit Law only provides a preliminary
indication
– Nigrini and Mittermaier (1997) recommend
using the first digit as an initial test of
reasonableness
Other “Benford’s Law” Digit Tests
• Second Digit Test
– This also only gives a preliminary indication
• First Two Digits Test
– Provide more direction
• Number Duplication
– Identify and rank order duplicate numbers
Other Benford’s Law Research
• Carslaw (1988) found support for rounding up of
income figures using the expected second digit
frequencies (more 0s, fewer 9s than expected).
• Thomas (1989), again using second digits found
support for rounding up of income and down for
losses.
• Nigrini
– (1994) used first two digit frequencies to analyze
payroll fraud, and
– (1996) used first two digit frequencies to examine tax
compliance
Fourth Digit Test
• Chi Square to test for distributional
difference of fourth digit
– “…distribution of the fourth digit for each
organization for all dollar amounts over $999.”
– Was this the fourth digit to the left or right?
– What if the transaction was for $100,000?
• While statistically significant differences
were found, should these be investigated?
Three Digit Test
• Examined Last (Three) Digits in dollar
amounts
– Used the “top 5” of the last three digit pattern
– Found that 4 of 29 entities had 30-60% of
their transactions consisting of the top 5 last
three digit patterns
• Would be interesting to note if these were
the entities that “failed” Benford’s Law
Data Mining J/E Questions
• Would have liked a more reasoned/theoretical
approach in specifying where and why data mining
techniques should be applied
• Sources of J/E?
– Influence Data Mining
• Unusual patterns between classes of J/Es?
• Class of J/E influence nature of J/E (i.e., do any
type of J/E have a higher probability of fraud)?
• Evidence from Benford’s Law or Right Most Digits?
• Underlying issues that will guide effective and
efficient data mining of JEs
Descriptive Statistics
• Any way to group the firms by industry?
• What can be found based upon grouping
and analyzing by size?
Other Questions
• What other approaches (than Benford’s Law)
can be applied to mining journal entries?
• What is currently done by audit teams for
computerized analysis of journal entries?
• The analysis expects to see a “large enough”
number of Journal Entries in order to highlight
that fraud might be occurring. What if only a few
JEs are made? What is the sensitivity of this
approach?
Confusion
• Number of organizations?
– 36 organizations
– 8 data sets had less than 1 year
– 1 data set was incomplete
– 27  why 29 observations?
• Did you count each year for the 2
organizations that provided 2 years of data
as separate observations?
– What is the justification?
– Why not do a year-to-year comparison for
those organizations?
What’s Missing?
• Interpretation and more detailed analysis
of the data
– Know that there are “violations” but never
know if there is really fraudulent activity
• What are the other data mining techniques
that are planned?
• Analytical reasoning as to what tests
should be done or what is revealed by
certain tests
Data Mining Extensions
• Compare the entities with “larger” average
line items per journal entry (e.g. >10) in
one pool?
• Alternatively look at those in which the
maximum number of line items is large
(e.g. >100)
Summary
• Objective – explore research issues
related to the application of statistical data
mining to fraud detection in journal entries
• Good first step – and this is a pilot study
• Would like more theoretical motivation for
tests & research issues
• Would have liked more data analysis
• Could I apply this in an audit?
• I’m not sure - - - more research is needed
Thank You