Download Detection of financial statement fraud and feature selection using

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Transcript
Decision Support Systems 50 (2011) 491–500
Contents lists available at ScienceDirect
Decision Support Systems
j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / d s s
Detection of financial statement fraud and feature selection using data
mining techniques
P. Ravisankar a, V. Ravi a,⁎, G. Raghava Rao a, I. Bose b
a
b
Institute for Development and Research in Banking Technology, Castle Hills Road #1, Masab Tank, Hyderabad 500 057, AP, India
School of Business, The University of Hong Kong, Pokfulam Road, Hong Kong
a r t i c l e
i n f o
Article history:
Received 20 November 2009
Received in revised form 14 June 2010
Accepted 3 November 2010
Available online 12 November 2010
Keywords:
Data mining
Financial fraud detection
Feature selection
t-statistic
Neural networks
SVM
GP
a b s t r a c t
Recently, high profile cases of financial statement fraud have been dominating the news. This paper uses data
mining techniques such as Multilayer Feed Forward Neural Network (MLFF), Support Vector Machines (SVM),
Genetic Programming (GP), Group Method of Data Handling (GMDH), Logistic Regression (LR), and
Probabilistic Neural Network (PNN) to identify companies that resort to financial statement fraud. Each of
these techniques is tested on a dataset involving 202 Chinese companies and compared with and without
feature selection. PNN outperformed all the techniques without feature selection, and GP and PNN
outperformed others with feature selection and with marginally equal accuracies.
© 2010 Elsevier B.V. All rights reserved.
1. Introduction
Financial fraud is a serious problem worldwide and more so in fast
growing countries like China. Traditionally, auditors are responsible
for detecting financial statement fraud. With the appearance of an
increasing number of companies that resort to these unfair practices,
auditors have become overburdened with the task of detection of
fraud. Hence, various techniques of data mining are being used to
lessen the workload of the auditors. Enron and Worldcom are the two
major scandals involving corporate accounting fraud, which arose
from the disclosure of misdeeds conducted by trusted executives of
large public corporations. Enron Corporation [17] was an American
energy company based in Houston, Texas. Before its bankruptcy in late
2001, Enron was one of the world's leading electricity, natural gas,
pulp and paper, and communications companies, with revenues
amounting to nearly $101 billion in 2000. Long Distance Discount
Services, Inc. (LDDS) began its operations in Hattiesburg, Mississippi
in 1983. The company's name was changed to LDDS WorldCom [18] in
⁎ Corresponding author. Tel.: + 91 40 23534981x2042; fax: + 91 40 23535157.
E-mail addresses: [email protected] (P. Ravisankar),
[email protected] (V. Ravi), [email protected] (G. Raghava Rao),
[email protected] (I. Bose).
0167-9236/$ – see front matter © 2010 Elsevier B.V. All rights reserved.
doi:10.1016/j.dss.2010.11.006
1995, and later it became WorldCom. On July 21, 2002, WorldCom
filed for Chapter 11 bankruptcy protection in the largest such filing in
US history at that time.
Financial statements are a company's basic documents to reflect its
financial status [3]. A careful reading of the financial statements can
indicate whether the company is running smoothly or is in crisis. If the
company is in crisis, financial statements can indicate if the most
critical thing faced by the company is cash or profit or something else.
All the listed companies are required to publish their financial
statements every year and every quarter. The stockholders can form a
good idea about the companies’ financial future through the financial
statements, and can decide whether the companies’ stocks are worth
investing. The bank also needs the companies’ financial statements in
order to decide whether to grant loans to them. In a nutshell, the
financial statements are the mirrors of the companies’ financial status.
Financial statements are records of financial flows of a business.
Generally, they include balance sheets, income statements, cash flow
statements, statements of retained earnings, and some other statements. A detailed description of the items listed in the various
financial statements is given below:
• Balance sheet
A balance sheet is a statement of the book value of an organization at
a particular date, usually at the end of the fiscal year. A balance sheet
has three parts: assets, liabilities, and shareholders' equity. The
492
P. Ravisankar et al. / Decision Support Systems 50 (2011) 491–500
difference between the assets and the liabilities is known as the 'net
assets' or the 'net worth' of the company.
• Income statement
Income statements, also called Profit and Loss Statement for
companies indicate how net revenue (money received from the sale
of products and services before expenses are subtracted, also known
as the ‘top line’) is transformed into net income (the result after all
revenues and expenses have been accounted for, also known as the
‘bottom line’). The purpose of the income statement is to show
managers and investors whether the company made or lost money
during the period under consideration.
• Cash flow statement
A cash flow statement is a financial statement that shows incoming
and outgoing funds during a particular period. The statement shows
how changes in balance sheet and income accounts affect cash and
cash equivalents. As an analytical tool the statement of cash flows is
useful in determining the short-term viability of a company,
particularly its ability to pay bills.
• Statement of retained earnings
The statement of retained earnings, also known as ‘statement of owners'
equity’ and ‘statement of net assets’ for non-profit organizations, explains
the changes in company's retained earnings over the reporting period. It
breaks down changes affecting the account, such as profits or losses from
operations, dividends paid, and any other items charged or credited to
retained earnings. Next, we will describe the key characteristics of
financial fraud that can be observed through the financial ratios calculated
on the basis of the financial statements published by companies.
1.1. Financial ratios
Financial ratios are a valuable and easy way to interpret the numbers
found in financial statements. They can help to answer critical
questions such as whether the business is carrying excess debt or
inventory, whether customers are paying according to terms, whether
the operating expenses are too high, and whether the company assets
are being used properly to generate income.
• Liquidity
Liquidity measures a company's capacity to pay its liabilities in short
term. There are two ratios for evaluating liquidity. They are:
1) Current ratio = Total current assets / Total current liabilities
2) Quick ratio = (Cash + Accounts receivable + Any other quick
assets) / Current liabilities
The higher the ratios the stronger is the company's ability to pay its
liabilities as they become due, and the lower is the risk of default.
• Safety
Safety indicates a company's vulnerability to risk of debt. There are
three ratios for evaluating liquidity. They are:
1) Debt to equity = Total liabilities / Net worth
2) EBIT/Interest = Earnings before interest and taxes / Interest charges
3) Cash flow to current maturity of long-term debt = (Net profit +
Non-cash expenses) / Current portion of long-term debt
• Profitability
Profitability ratios measure the company's ability to generate a return on
its resources. There are four ratios to evaluate a company's profitability.
They include:
1) Gross profit margin = Gross profit / Total sales
2) Net profit margin = Net profit / Total sales
3) Return on assets = Net profit before taxes / Total assets
4) Return on equity = Net profit before taxes / Net worth
• Efficiency
Efficiency evaluates how well the company manages its assets. There
are four ratios to evaluate the efficiency of asset management:
1) Accounts receivable turnover = Total net sales / Accounts
receivable
2) Accounts payable turnover = Cost of goods sold / Accounts
payable
3) Inventory turnover = Cost of goods sold / Inventory
4) Sales to total assets = Total sales / Total assets
Financial statement fraud may be perpetrated to increase stock
prices or to get loans from banks. It may be done to distribute lesser
dividends to shareholders. Another probable reason may be to avoid
payment of taxes. Nowadays an increasing number of companies are
making use of fraudulent financial statements in order to cover up
their true financial status and make selfish gains at the expense of
stockholders. The fraud triangle is also known as Cressey's Triangle, or
Cressey's Fraud Triangle. The fraud triangle seeks to explain what
must be present for fraud to occur. The fraud triangle describes the
probability of financial reporting fraud which depends on three
factors: incentives/pressures, opportunities, and attitudes/rationalization of financial statement fraud [37,38]. The fraud triangle is
depicted in Fig. 1, and it is discussed below.
When financial stability or profitability is threatened by economic,
industry, or entity operating conditions, or excessive pressure exists
for management to meet debt requirements, or personal net worth is
materially threatened, the management will face the incentives or
pressures to resort to fraudulent practice. Pressure can come in the
form of peer pressure, living a lavish lifestyle, a drug addiction, and
many other aspects that can influence someone to seek gains via
financial fraud. When there are significant accounting estimates that
are difficult to verify, or there is oversight over financial reporting, or
high turnover or ineffective accounting internal audit, there are
opportunities for fraud. For instance, a cashier can steal money out of
the cash register because it is there. If the cashier is required to drop
all cash into an underground safe for which he does not know the
combination, opportunity will not exist. When inappropriate or
inefficient communication and support of the entity's values is
evident, or a history of violation of laws is known, or management
has a practice of making overly aggressive or unrealistic forecasts,
then there are risks of fraudulent reporting due to attitudes/
rationalization. Rationalization is a grey area in the fraud triangle.
Opportunities and incentives exist or they don't. Rationalization
depends on the individuals and the circumstances they are facing
[19,37]. Understanding the fraud triangle is essential to evaluating
financial fraud. When someone is able to grasp the basic concept of
the fraud triangle, they are able to better understand financial frauds,
how they occur, why they occur, and what to do to stop them.
1.2. Variables related to financial statement fraud
Based on expert's knowledge, intuition, and previous research, it is
important to identify some key financial items that are relevant for
detection of financial statement fraud. These are listed below:
• Z-score: The Z-score is developed by Altman [2]. It is a formula for
measurement of the financial health of a company and works as a
tool to predict bankruptcy. It is used to detect financial statement
Fig. 1. Components of the fraud triangle.
P. Ravisankar et al. / Decision Support Systems 50 (2011) 491–500
493
fraud as well. The formula for Z-score for public companies is given
by:
data mining. Section 5 presents the results and discusses the
implications of these results. Finally, Section 6 concludes the paper.
Z-score = (Working capital / Total assets * 1.2) + (Retained earnings
÷ Total assets * 1.4) + (Earnings before income tax
÷ Total assets * 3.3) + (Market value of equity
÷ Book value of total) + (Liabilities * 0.6 + Sales
÷ Total assets * 0.999)
2. Literature review
• A high debt structure increases the likelihood of financial fraud as it
shifts the risk from equity owner to the debt owner. So the financial
ratios related to debt structure such as (i) Total debt/Total assets
and (ii) Debt/Equity need to be carefully considered when searching
for indications of fraud.
• An abnormal value reported as a measure of continuous growth
such as sales to growth ratio is also a factor that may be indicative of
fraudulent financial practice.
• Many items of the financial statements such as Accounts receivable,
Inventories, Gross margin etc. can be estimated to some degree
using subjective methods and different accounting methods can
often lead to different values even for the same company.
• According to previous research, many other financial ratios can be
considered for fraud detection, such as Net profit/Total assets,
Working capital/Total assets, Net profit/Sales, Current assets/
Current liabilities and so on.
• The tenure of CEO and CFO: According to the auditors’ experience
and previous research, the high turnover of CEO and CFO may
indicate the existence of financial fraud in the company.
• Some qualitative variables such as previous auditor's qualifications
can be considered to determine the likelihood of fraudulent book
keeping.
Data mining has been applied in many aspects of financial analysis.
Few areas where data mining techniques have already being used
include: bankruptcy prediction, credit card approval, loan decision,
money-laundering detection, stock analysis, etc. However, research
related to the use of data mining for detection of financial statement
fraud is limited. The main objective of this research is to predict the
occurrence of financial statement fraud in companies as accurately as
possible using intelligent techniques. Financial accounting fraud can
be detected by a human expert by using his/her experiential/
judgemental knowledge, provided he/she has sufficient expertise.
However, in this case, human bias cannot be eliminated and the
judgments tend to be subjective. Hence, we resort to data-driven
approaches, which solely rely on the past data of fraudulent and
healthy companies and their financial ratios. When data mining
techniques (most of them barring a few statistical ones are artificial
intelligence based) are employed to solve these problems, they work
in an objective way by sifting through the records of fraudulent and
healthy companies. In the process, they discover knowledge which
can be used to predict whether a company at hand will perpetrate
financial accounting fraud in future. Data mining techniques have
another advantage in that they can handle a large number of records
and financial ratios efficiently. According to Kirkos et al. [23], artificial
intelligence methods have the theoretical advantage that they do not
impose arbitrary assumptions on the input variables. An auxiliary aim
of this research is to select the most important financial items that can
explain the financial statement fraud. The results obtained using this
research will be useful for auditors engaged in the prediction of
financial statement fraud. In fact, emerging companies can carefully
monitor those financial statements for getting long-term advantages
in the competitive market. Further, it will be useful for investors who
plan to invest in such companies.
The rest of the paper is organized as follows. Section 2 reviews the
research done in the area of financial statement fraud detection.
Section 3 provides an overview of the data mining techniques that are
used in this paper. Section 4 describes the feature selection phase of
There has been a limited use of data mining techniques for
detection of financial statement fraud. The data mining techniques
used include decision trees, neural networks (NN), Bayesian belief
networks, case based reasoning, fuzzy rule-based reasoning, hybrid
methods, logistic regression, and text mining. Extant research in this
direction is reviewed in the following paragraphs.
According to Kirkos et al. [23], some estimates stated that fraud
cost US business more than $400 billion annually. Spathis et al [42]
compared multi-criteria decision aids with statistical techniques such
as logit and discriminant analysis in detecting fraudulent financial
statements. A novel financial kernel for the detection of management
fraud is developed using support vector machines on financial data by
Cecchini et al. [9]. Huang et al. [20] developed an innovative fraud
detection mechanism on the basis of Zipf's Law. The purpose of this
technique is to assist auditors in reviewing the overwhelming
volumes of datasets and identifying any potential fraud records.
Kirkos et al. [23] used the ID3 decision tree and Bayesian belief
network to detect financial statement fraud successfully.
Sohl and Venkatachalam [41] used back-propagation NN for the
prediction of financial statement fraud. There are other researchers
who used different NN algorithms to detect financial reporting fraud.
Cerullo and Cerullo [10] explained the nature of fraud and financial
statement fraud along with the characteristics of NN and their
applications. They illustrated how NN packages could be utilized by
various firms to predict the occurrence of fraud. Calderon and Cheh [8]
examined the efficacy of NN as a potential enabler of business risk
based auditing. They employed different methods using NN as a tool
for research in the auditing and risk assessment domain. Further, they
identified several opportunities for future research that include
methodological issues related to NN modeling as well as specific
issues related to the application of NN for business risk assessment.
Koskivaara [25] investigated the impact of various preprocessing
models on the forecast capability of NN when auditing financial
accounts. Further, Koskivaara [26] proposed NN based support
systems as a possible tool for use in auditing. He demonstrated that
the main application areas of NN were detection of material errors,
and management fraud. Busta and Weinberg [7] used NN to
distinguish between ‘normal’ and ‘manipulated’ financial data. They
examined the digit distribution of the numbers in the underlying
financial information. The data analysis is based on Benford’s law,
which demonstrated that the digits of naturally occurring numbers
are distributed on a predictable and specific pattern. They tested six
NN designs to determine the most effective model. In each design, the
inputs to the NN were the different subsets of the 34 variables. The
results showed that NN were able to correctly classify 70.8% of the
data on an average.
Feroz et al. [15] observed that the relative success of the NN
models was due to their ability to ‘learn’ what were important. The
perpetrators of financial reporting frauds had incentives to appear
prosperous as evidenced by high profitability. In contrast to
conventional statistical models replete with assumptions, the NN
used adaptive learning processes to determine what were important
in predicting targets. Thus, the NN approach was less likely to be
affected by accounting manipulations. The NN approach was well
suited to predicting the possible fraudsters because the NN ‘learnt’ the
characteristics of reporting violators despite managers’ intent to
obfuscate misrepresentations. Brooks [6] also applied various NN
models to detect financial statement fraud with great success. Fanning
and Cogger [13] used NN (AutoNet) for detecting management fraud.
The study offered an in-depth examination of important publicly
available predictors of fraudulent financial statements. The study
494
P. Ravisankar et al. / Decision Support Systems 50 (2011) 491–500
reinforced the efficiency of AutoNet in providing empirical evidence
regarding the merits of suggested red flags for fraudulent financial
statements. Ramamoorti et al. [37] provided an overview of the multilayer perceptron architecture and compared it with a Delphi study.
They found that internal auditors could benefit from using NN for
assessing risk. Zhang et al. [46] conducted a review of the published
papers that reported the use of NN in forecasting during the time
period 1988–98.
Aamodt and Plaza [1] and Kotsiantis et al. [27] used case based
reasoning to identify the fraudulent companies. Further, Deshmukh
and Talluru [12] demonstrated the construction of a rule-based fuzzy
reasoning system to assess the risk of management fraud and
proposed an early warning system by finding out 15 rules related to
the probability of management fraud. Pacheco et al. [34] developed a
hybrid intelligent system consisting of NN and a fuzzy expert system
to diagnose financial problems. Further, Magnusson et al. [30] used
text mining and demonstrated that the language of quarterly reports
provided an indication of the change in the company's financial status.
A rule-based system that consisted of too many if–then statements
made it difficult for marketing researchers to understand key drivers
of consumer behavior [22]. Variable selection was used in order to
choose a subset of the original predictive variables by eliminating
variables that were either redundant or possessed little predictive
information.
3. Methodology
The dataset used in this research was obtained from 202
companies that were listed in various Chinese stock exchanges, of
which 101 were fraudulent and 101 were non-fraudulent companies.
The data also contained 35 financial items for each of these
companies. Table 1 lists these financial items. Of these, 28 were
financial ratios reflecting liquidity, safety, profitability, and efficiency
of companies. We performed log transformation on the entire dataset
to reduce its dimension. Then we normalized each of the independent
variables of the original dataset during the data preprocessing stage.
Furthermore, ten-fold cross-validation is performed to improve the
reliability of the result. Then, we analyzed the dataset using six data
mining techniques including MLFF, SVM, GP, GMDH, LR, and PNN. The
block diagram in Fig. 2 depicts the data flow. We chose the six
techniques because MLFF, GMDH and PNN fall under the NN category,
SVM comes from statistical learning theory, GP is an evolutionary
technique, and logistic regression is a traditional statistical technique
for classification. Thus, these methods had varied background and
different theories to support them. In this way, we ensured that the
problem at hand is analyzed by disparate models, that had varying
degree of difficulty in implementation and also exhibited varying
degree of performance on different data mining problems. In other
words, the problem is studied and analyzed comprehensively from all
perspectives.
It is observed that some of the independent variables turned out to
be much more important for the prediction purpose whereas some
contributed negatively towards the classification accuracies of different
classifiers. So, a simple statistical technique using the t-statistic is used to
accomplish feature selection on the dataset by identifying the most
significant financial items that could detect the presence of financial
statement fraud. This is described in Section 4. The features having high
t-statistic values were more significant than others. For feature
selection, first we extracted the top 18 features (more than half of
the total 35 financial items) from the original dataset. Then the dataset
with the reduced feature set (only 18 financial items) was fed as input to
the above mentioned classifiers, which resulted in new combinations
such as t-statistic-MLFF, t-statistic-SVM, t-statistic-GP, t-statisticGMDH, t-statistic-LR, and t-statistic-PNN. In order to conduct further
analysis, we then extracted the top 10 features and the same process
was repeated, i.e., the dataset with the reduced feature set (only 10
Table 1
Items from financial statements of companies that are used for detection of financial
statement fraud.
No.
Financial items
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Debt
Total assets
Gross profit
Net profit
Primary business income
Cash and deposits
Accounts receivable
Inventory/Primary business income
Inventory/Total assets
Gross profit/Total assets
Net profit/Total assets
Current assets/Total assets
Net profit/Primary business income
Accounts receivable/Primary business income
Primary business income/Total assets
Current assets/Current liabilities
Primary business income/Fixed assets
Cash/Total assets
Inventory/Current liabilities
Total debt/Total equity
Long term debt/Total assets
Net profit/Gross profit
Total debt/Total assets
Total assets/Capital and reserves
Long term debt/Total capital and reserves
Fixed assets/Total assets
Deposits and cash/Current assets
Capitals and reserves/Total debt
Accounts receivable/Total assets
Gross profit/Primary business profit
Undistributed profit/Net profit
Primary business profit/Primary business profit of last year
Primary business income/Last year's primary business income
Account receivable /Accounts receivable of last year
Total assets/Total assets of last year
financial items) was fed as input to all of the above classifiers. A brief
description of the different data mining techniques used in this research
is provided below.
3.1. Support vector machines (SVM)
SVM introduced by Vapnik [44] use a linear model to implement
nonlinear class boundaries by mapping input vectors nonlinearly into
a high-dimensional feature space. In the new space, an optimal
separating hyperplane is constructed. The training examples that are
closest to the maximum margin hyperplane are called support
vectors. All other training examples are irrelevant for defining the
binary class boundaries. SVM are simple enough to be analyzed
mathematically. In this sense, SVM may serve as a promising
alternative combining the strengths of conventional statistical
methods that are more theory-driven and easy to analyze, and
machine learning methods that are more data-driven, distributionfree and robust. Recently, SVM have been used in financial applications such as credit rating, time series prediction, and insurance claim
frauds detection. These studies reported that the performance of SVM
is comparable to and even better than other classifiers such as MLFF,
case based reasoning, discriminant analysis, and logistic regression.
Fig. 2. Architecture of different classifiers.
P. Ravisankar et al. / Decision Support Systems 50 (2011) 491–500
3.2. Genetic programming (GP)
GP [28] is an extension of genetic algorithms (GA). It is a search
methodology belonging to the family of evolutionary computation. GP
randomly generates an initial population of solutions. Then, the initial
population is manipulated using various genetic operators to produce
new populations. These operators include reproduction, crossover,
mutation, dropping condition, etc. The whole process of evolving from
one population to the next population is called a generation. A highlevel description of the GP algorithm can be divided into a number of
sequential steps [14]:
• Create a random population of programs, or rules, using the
symbolic expressions provided as the initial population.
• Evaluate each program or rule by assigning a fitness value according
to a predefined fitness function that can measure the capability of
the rule or program to solve the problem.
• Use the reproduction operator to copy existing programs into the
new generation.
• Generate the new population with crossover, mutation, or other
operators from a randomly chosen set of parents.
• Repeat the second to the fourth steps for the new population until a
predefined termination criterion is satisfied, or a fixed number of
generations is completed.
• The solution to the problem is the genetic program with the best
fitness within all generations.
In GP, the crossover operation is achieved by reproduction of two
parent trees. Two crossover points are then randomly selected in the
two offspring trees. Exchanging sub-trees, which are selected
according to the crossover point in the parent trees, generates the
final offspring trees. The offspring trees are usually different from
their parents in size and shape. Then, mutation operation is also
considered in GP. A single parental tree is first reproduced. Then a
mutation point is randomly selected from the reproduction, which
can be either a leaf node or a sub-tree. Finally, the leaf node or the subtree is replaced by a new leaf node or a randomly generated sub-tree.
Fitness functions ensure that the evolution goes toward optimization
by calculating the fitness value for each individual in the population.
The fitness value evaluates the performance of each individual in the
population.
GP is guided by the fitness function to search for the most efficient
computer program that can solve a given problem. A simple measure
of fitness [14] is adopted for the binary classification problem and is
given as follows:
Fitness ¼
No: of samples classified correctly
No: of samples used for training during evaluation
The major considerations in applying GP to pattern classification
are:
• GP based techniques are free of the distribution of the data, and so
no a priori knowledge is needed about the statistical distribution of
the data.
• GP can directly operate on the data in its original form.
• GP can detect the underlying but unknown relationship that exists
among data items and express it as a mathematical expression.
• GP can discover the most important discriminating features of a
class during the training phase.
3.3. Multi-layer feedforward neural network (MLFF)
MLFF is one of the most common NN structures, as they are simple
and effective, and have found home in a wide assortment of machine
learning applications. MLFF starts as a network of nodes arranged in
three layers—the input, hidden, and output layers. The input and
495
output layers serve as nodes to buffer input and output for the model,
respectively, and the hidden layer serves to provide a means for input
relations to be represented in the output. Before any data is passed to
the network, the weights for the nodes are random, which has the
effect of making the network much like a newborn's brain—developed
but without knowledge. MLFF are feed-forward NN trained with the
standard back-propagation algorithm. They are supervised networks
so they require a desired response to be trained. They learn how to
transform input data into a desired response. So they are widely used
for pattern classification and prediction. A multi-layer perceptron is
made up of several layers of neurons. Each layer is fully connected to
the next one. With one or two hidden layers, they can approximate
virtually any input–output map. They have been shown to yield
accurate predictions in difficult problems [39].
3.4. Group method of data handling (GMDH)
GMDH was introduced by Ivakhnenko [21] in 1966 as an inductive
learning algorithm for modeling complex systems. It is a selforganizing approach that tests increasingly complicated models and
evaluates them using some external criterion on separate parts of the
data sample. GMDH is partly inspired by research in perceptrons and
learning filters. GMDH has influenced the development of several
techniques for synthesizing (or ‘self-organizing’) networks of polynomial nodes. GMDH attempts a hierarchic solution by trying out
many simple models, retaining the best of these models, and building
on them iteratively to obtain a composition (or feed-forward
network) of functions as the model. The building blocks of GMDH,
or polynomial nodes, usually have the quadratic form:
2
2
z = w0 + w1 x1 + w2 x2 + w3 x1 + w4 x2 + w5 x1 x2
where x1 and x2 are inputs, wi is the coefficient (or weight) vector w,
and z is the node output. The coefficients are determined by solving
the linear regression equation with z = y, where y represents the
response vector. The GMDH develops on a data set. The data set
including independent variables (x1, x2,..., xn) and one dependent
variable y is split into a training and testing set. During the process of
learning a forward multilayer NN is developed by observing the
following steps:
• In the input layer of the network n units with an elementary transfer
function y = xi are constructed. These are used to provide values of
independent variables from the learning set to the successive layers
of the network.
• When constructing a hidden layer an initial population of units is
generated. Each unit corresponds to the Ivakhnenko polynomial form:
2
2
y = a + bx1 + cx2 + dx1 + ex1 x2 + fx2 or
y = a + bx1 + cx2 + dx1 x2
where y is an output variable; x1, x2 are two input variables; and a, b,… f
are parameters.
• Parameters of all units in the layer are estimated using the learning set.
• The mean square error between the dependent variable y and the
response of each unit is computed for the testing set.
• Units are sorted in terms of the mean square error and just a few
units with minimal error survive. The rest of the units are deleted.
This step guarantees that only units with a good ability for
approximation are chosen.
• Next the hidden layers are constructed so that the mean square
error of the best unit decreases.
• Output of the network is considered as the response of the best unit
in the layer with the minimal error.
496
P. Ravisankar et al. / Decision Support Systems 50 (2011) 491–500
Table 2
Top 18 items selected from financial statements of companies by t-statistic based
feature selection.
No.
Financial items
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Net profit
Gross profit
Primary business income
Primary business income/Total assets
Gross profit/Total assets
Net profit/Total assets
Inventory/Total assets
Inventory/Current liabilities
Net profit/Primary business income
Primary business income/Fixed assets
Primary business profit/Primary business profit of last year
Primary business income/Last year's primary business income
Fixed assets/Total assets
Current assets/Current liabilities
Capitals and reserves/Total debt
Long term debt/Total capital and reserves
Cash and deposits
Inventory/Primary business income
The GMDH network learns in an inductive manner and builds a
function (called a polynomial model), that results in the minimum
error between the predicted value and expected output. The majority
of GMDH networks use regression analysis for solving the problem.
The first step is to decide the type of polynomial that the regression
will find. The initial layer is simply the input layer. The first layer is
created by computing regressions of the input variables and then
choosing the best ones. The second layer is created by computing
regressions of the values in the first layer along with the input
variables. This means that the algorithm essentially builds polynomials of polynomials. Again, only the best are chosen by the algorithm.
These are called survivors. This process continues until a pre-specified
selection criterion is met.
3.5. Logistic regression (LR)
According to Panik [35], when dealing with logistic regression, the
response variable is taken to be dichotomous or binary (it takes on only
two possible values), i.e., yi =0 or 1 for all i=1,...,n. For instance, we can
have a situation in which the outcome of some process of observation is
either a success (we record a 1) or failure (we record a 0), or we observe
the presence (1) or absence (0) of some characteristic or phenomenon. In
addition, dichotomous variables are useful for making predictions, e.g., we
may ask the following: Will an individual make a purchase of a particular
item in the near future?
Here yi =
1; yes;
0; No:
According to Williams et al. [45], LR is a commonly used approach
for performing binary classification. It learns a set of parameters, {w0,
w}, that maximizes the likelihood of the class labels for a given set of
training data. Let xi ∈ Rd denote a (column) vector of d features
representing the ith data point, and y i ∈ {0, 1} denote its
corresponding class label (e.g., clutter or mine). For a labeled
(training) data point, yi is known; for an unlabeled (testing) data
point, yi is unknown. Under the LR model, the probability of label
yi = 1 given xi is given by Eq. (1):
exp w0 + wT xi
ψi ≡pðyi = 1jxi Þ =
1 + exp w0 + wT xi
ð1Þ
where w0 ∈ R and w ∈ Rd are the LR intercept and coefficients
respectively. For a set of N independent labeled data points, {xi,
yi} N
i = 1, the log-likelihood of the class labels can be written as
Eq. (2):
n
lðw0 ; wÞ = ∑
i=1
½ð1−yi Þ logð1−ψi Þ + yi logψi ð2Þ
To maximize the log-likelihood in Eq. (2), a standard optimization
approach can be employed, since the gradient (and Hessian) of Eq. (2)
with respect to {w0, w} can be readily calculated. Once the LR
parameters {w0, w} have been learned, the probability that an
unlabeled testing data point xi belongs to each class can be obtained
using Eq. (1).
3.6. Probabilistic neural network (PNN)
PNN is a feed-forward NN involving a one pass training algorithm
used for classification and mapping of data. PNN was introduced by
Specht [43] in 1990. It is a pattern classification network based on the
classical Bayes classifier, which is statistically an optimal classifier that
seeks to minimize the risk of misclassification. Any pattern classifier
places each observed data vector x = [x1, x2, x3 . . . xN]T in one of the
predefined classes ci, i = 1, 2,..., m where m is the number of possible
classes. The effectiveness of any classifier is limited by the number of
data elements that the vector x can have and the number of possible
classes m. The classical Bayes pattern classifier [40] implements the
Bayes conditional probability rule that the probability P(ci|x) of x
being in class ci is given by:
P ðci jxÞ =
P ðxjci ÞP ðci Þ
∑ P xjcj P cj
m
ð3Þ
j=1
where P(x|ci) is the conditioned probability density function of x
given set ci, P(cj) is the probability of drawing data from class cj. Vector
x is said to belong to a particular class ci, if P(ci|x) N P(cj|x), ∀ j = 1, 2,...,
m and j ≠ i. This input x is fed into each of the patterns in the pattern
layer. The summation layer computes the probability P(ci|x) that the
given input x is included in each of the classes ci that is represented by
the patterns in the pattern layer. The output layer selects the class for
which the highest probability is obtained in the summation layer. The
input is then made to belong to this class. The effectiveness of the
network in classifying input vectors depends on the value of the
smoothing parameter.
4. Feature selection
Feature selection is critical to data mining and knowledge based
authentication. The problem of feature selection has been well studied
in areas where datasets with a large number of features are available,
including machine learning, pattern recognition, and statistics.
Piramuthu [36] observed that about 80% of the resources in a majority
of data mining applications are spent on cleaning and preprocessing
the data, and developed a new feature selection method based on
Hausdorff distance for analyzing web traffic data. Feature selection is
of paramount importance for any learning algorithm which when
poorly done (i.e., a poor set of features is selected) may lead to
problems related to incomplete information, noisy or irrelevant
features, not the best set/mix of features, among others [45]. Mladenic
and Grobelnik [31] reviewed various feature selection methods in the
context of web mining. Chen and Liginlal [11] developed a maximum
entropy based feature selection technique for knowledge based
authentication.
In this study, we employed a feature selection phase by using the
simple t-statistic technique. t-statistic is one of the efficient feature
P. Ravisankar et al. / Decision Support Systems 50 (2011) 491–500
Top 10 / 18
features
Dataset with
all features
t-statistic
497
MLFF/
SVM/
GP/
GMDH/
LR/
PNN
Final
output
Fig. 3. Architecture of different classifiers after feature selection.
selection techniques. The features are ranked according to the formula
shown below [16,29]. In fact, Liu et al. [29] were the first to propose
t-statistic for the purpose of feature selection in the field of
bioinformatics.
j μ1 −μ2 j
ffi
t−statistic = rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
σ12
n1
+
ð4Þ
σ22
n2
where μ1 and μ2 represent the means of the samples of fraudulent
companies and non-fraudulent companies for a given feature
respectively, σ1 and σ2 represent the standard deviation of the
samples of fraudulent companies and non-fraudulent companies for a
given feature respectively. n1 and n2 represent the number of samples
of fraudulent companies and non-fraudulent companies for a given
feature. The t-statistic values are computed for each feature and the
top 18 features with the highest t-statistic values are considered in the
first case and the top 10 features are considered in the second case. A
high t-statistic value indicates that the feature can highly discriminate
between the samples of fraudulent and non-fraudulent companies.
The top 18 financial features that are selected by the t-statistic based
feature selection are shown in Table 2. The feature subset formed with
the top 18 features is fed as input to MLFF/SVM/GP/GMDH/LR/PNN for
classification purpose in the first case. Similarly, the feature subset
formed with the top 10 features is fed as input to MLFF/SVM/GP/
GMDH/LR/PNN for classification purpose in the second case. The block
diagram for all these combinations is shown in Fig. 3. Ten-fold crossvalidation is used to ensure better validity of the experiments. It
should be noted that the t-statistic is employed for feature selection
for each fold separately. It is observed that the same set of features did
not turn out to be best in each fold. Hence, we followed a frequency
based approach, whereby, the frequency of occurrence of each of the
features in top slots is computed and the features are then sorted in
the descending order of the frequency of occurrences. In this manner,
we selected the top 10 and top 18 features and reported them in
Table 2.
5. Results and discussion
The dataset analyzed in this paper comprised 35 financial items for
202 companies, of which 101 were fraudulent and 101 were non-
fraudulent. Since the financial items had a wide range, we first
performed natural logarithmic transformation, and then normalization during the data preprocessing phase. We employed the GP as
implemented in the tool Discipulus (available at www.rmltech.com
and downloaded on 20th August, 2008). For MLFF, GMDH, and PNN,
we employed Neuroshell 2.0 [33] and for SVM and LR we used KNIME
2.0.0 [24].
The sensitivity is the measure of the proportion of the number of
fraudulent companies predicted correctly as fraudulent by a particular
model to the total number of actual fraudulent companies. The
specificity is the measure of the proportion of the number of nonfraudulent companies predicted as non-fraudulent by a model to the
total number of actual non-fraudulent companies. In all cases, we
presented the average accuracies, sensitivities, specificities, and area
under the Receiver Operating Characteristic curve (AUC) for the test
data, averaged over 10-folds. We ranked the classifiers based on AUC.
First, the results of the 10-fold cross-validation method for the standalone techniques viz. MLFF, SVM, GP, GMDH, LR, and PNN without
feature selection are presented in Table 3. From Table 3 we observe
that PNN with 98.09% accuracy and 98.09% sensitivity outperformed
all other classifiers (as indicated by bold faced numerals in the Table 3).
GP yielded the next best result with 94.14% accuracy and 95.09%
sensitivity. We also observe that PNN is the best classifier among all
others in terms of AUC as well. The best results obtained by Bose and
Wang [5], who employed canonical discriminant analysis (CDA),
classification and regression tree (C&RT) and exhaustive pruning NN
on the same dataset are also presented in Table 3 for ease of
comparison. From Table 3 we can observe that the results obtained
in this study are always superior to the results obtained by them for all
cases, except SVM and LR.
As the next step, we used t-statistic for feature selection and
extracted the most important features. First, we considered the top
18 features for constructing the reduced feature subset. Later, this
feature subset is fed to all the above classifiers for the purpose of
classification. The average results of all the classifiers over all folds
with 18 features are presented in Table 4. From Table 4 we observe
that GP outperformed other classifiers with 92.68% accuracy and
90.55% sensitivity, whereas PNN came close behind with 95.64%
accuracy and 91.27% sensitivity (as indicated by bold faced numerals
in Table 4). Furthermore, results based on AUC indicated that GP
yielded highest accuracy followed by PNN, which yielded marginally
less accuracy. This makes us infer that the selected feature subsets
Table 3
Average results of dataset with all features using 10-fold cross-validation.
Classifier
Accuracy
Sensitivity
Specificity
AUC
MLFF
SVM
GP
GMDH
LR
PNN
CDA [5]
C&RT [5]
Exhaustive pruning NN [5]
78.36
70.41
94.14
93.00
66.86
98.09
71.37
72.38
77.14
80.21
55.43
95.09
91.46
63.32
98.09
61.96
72.40
80.83
76.35
84.13
93.05
95.18
70.66
98.09
80.77
72.36
73.45
7827.90
6978.00
9407.10
9331.85
6699.10
9809.00
7136.5
7238
7714
Table 4
Average results of dataset with reduced features (top 18 features selected by t-statistic)
and using 10-fold cross-validation.
Classifier
Accuracy
Sensitivity
Specificity
AUC
MLFF
SVM
GP
GMDH
LR
PNN
78.77
73.41
92.68
90.68
70.36
95.64
76.98
72.07
90.55
93.46
62.91
91.27
81.28
75.04
95.27
88.34
78.88
94.16
7912.80
7355.55
9290.95
9089.95
7089.50
9271.75
498
P. Ravisankar et al. / Decision Support Systems 50 (2011) 491–500
Table 5
Average results of dataset with reduced features (top 10 features selected by t-statistic)
and using 10-fold cross-validation.
Table 7
t-statistic values of average AUCs of GP compared to that of other classifiers with (top
18 features) feature selection.
Classifier
Accuracy
Sensitivity
Specificity
AUC
Classifier compared
t-statistic at 10% level of significance
MLFF
SVM
GP
GMDH
LR
PNN
75.32
72.36
89.27
88.14
70.86
90.77
67.24
73.60
85.64
87.44
65.23
87.53
82.79
69.68
93.16
89.25
76.46
94.07
7501.65
7164.35
8939.95
8834.40
7084.45
9079.85
MLFF
SVM
GMDH
LR
PNN
5.13*
5.28*
0.69
6.40*
0.08
have a high discriminatory power and the ‘left-over’ features have
very little to contribute to the success of financial fraud detection.
Furthermore, in order to conduct an exhaustive study over this
dataset, in the second set of experiment we considered only the top
10 features (based on the values of the t-statistics) for constructing
the reduced feature subset. The top 10 features can be seen in the first
ten rows of Table 2. We repeated the experiments as in the first case.
The average results for all the classifiers over all folds with 10 features
are presented in Table 5. From Table 5 we observe that PNN
outperformed other classifiers with 90.77% accuracy and 87.53%
sensitivity (as indicated by bold faced numerals in Table 5), whereas
GP came second with 89.27% accuracy and 85.64% sensitivity.
Moreover, results based on the AUC indicated that PNN yielded the
highest accuracy followed by GP, which yielded only marginally less
accuracy.
In order to find out whether the difference in average AUCs is
statistically significant or not, we conducted a t-test between the top
performer and the remaining classifiers (i) without feature selection,
(ii) with feature selection including top 18 features, and (iii) with
feature selection including top 10 features. In case of the dataset
without feature selection, the t-statistic values between the average
AUCs obtained by PNN and that of other classifiers are presented in
Table 6. From Table 6 we observe that t-statistic values are more
than the critical value of the test statistic, which is 1.73 at the 10%
level of significance. Thus, we infer that PNN significantly outperformed other classifiers without feature selection. In case of the
dataset with feature selection and considering only the top 18
features, the t-statistic values between the average AUCs obtained
by GP and that of other classifiers are presented in Table 7. From this
table we can observe that t-statistic values are more than 1.73 in
case of MLFF, SVM and LR, whereas those values are less than 1.73 in
case of PNN and GMDH. From these results we can say that the GP
significantly outperformed all classifiers except GMDH and PNN.
Considering only the top 10 features, the t-statistic values between
the average AUCs obtained by PNN and that of other classifiers are
presented in Table 8. From this table we can observe that t-statistic
values are more than 1.73 in case of MLFF, SVM and LR, whereas
those values are less than 1.73 in case of GP and GMDH. From these
results we can say that PNN outperformed all classifiers except GP
and GMDH.
When we take a close look at the top 10 and top 18 features shown
in Table 2, we observe that most of these features are associated with
Table 6
t-statistic values of average AUCs of PNN compared to that of other classifiers without
feature selection.
Classifier compared
t-statistic at 10% level of significance
MLFF
SVM
GP
GMDH
LR
7.84
15.66
2.11
2.49
11.58
The * indicates that the result is statistically significant.
the firm's ability to generate profit or income. Among the top 10
features, eight features are associated with the profitability of the
firm. A closer looks reveals that among the top 10 features, four are
associated with primary business income, and five are associated with
either gross or net profit earned by the firm. This indicated that a
fraudulent firm usually tried to inflate the profit or the income figures
in order to create an impressive financial statement. Any unusual
income or profit figures should be a reason for suspicion and further
investigation by an auditor.
When the present dataset of 35 dimensions (financial items) is
visualized using the tool Neucom [32] in the principal component
dimensions by plotting the first principal component on x-axis and
the second principal component on y-axis, we noticed three
predominant clusters and nine outliers. This provided a possible
reason for the spectacular performance of PNN because PNN is
tolerant to outliers [4]. While comparing the dataset with and
without feature selection, it is noticed that even after reducing the
number of features to almost one third of the original number, the
change in accuracies is at most 5% in all the cases except PNN, where
the accuracies are reduced by 8%. From this we can infer that the tstatistic is a simple and efficient feature selection technique for
picking up very significant features that ensured better accuracies.
Based on our experiments, we conclude that PNN without feature
selection outperformed methods such as MLFF, SVM, GP, GMDH, and
LR. After feature selection, GP performed well compared to all other
techniques, and PNN yielded marginally less accuracies when top 18
features are selected. Similarly, PNN outperformed all other
techniques when top 10 features are selected. Also, we concluded
that our results are much superior to an earlier study on the same
dataset.
It should be noted that while all the techniques have equal cost,
the technique that is preferred and recommended is totally dictated
by the dataset at hand. Since accuracy is a major concern for financial
analysts, we should select that technique which yields less
misclassifications and consumes less time. This is because the
performance of all of these techniques depends on the dataset on
which they are used. Having said that, everything else (i.e.,
accuracies, sensitivity, specificity, etc.) being equal, we should select
that technique which is less cumbersome, easy to understand, and
easy to implement.
Table 8
t-statistic values of average AUCs of PNN compared to that of other classifiers with (top
10 features) feature selection.
Classifier compared
t-statistic at 10% level of significance
MLFF
SVM
GP
GMDH
LR
5.36*
5.69*
0.41
0.83
6.35*
The * indicates that the result is statistically significant.
P. Ravisankar et al. / Decision Support Systems 50 (2011) 491–500
6. Conclusion and future research directions
This paper presents the application of intelligent techniques to
predict financial statement fraud in companies. The dataset consisting
of 202 Chinese companies is analyzed using the stand-alone
techniques like MLFF, SVM, GMDH, GP, LR, and PNN. Then, t-statistic
is used for feature subset selection and top 18 features are selected in
the first case and top 10 features are selected in the second case. With
the reduced feature subset the classifiers MLFF, SVM, GMDH, GP, LR,
and PNN are invoked again. Results based on AUC indicated that the
PNN was the top performer followed by GP which yielded marginally
less accuracies in most of the cases. Also, the results obtained in this
study are better than those obtained in an earlier study on the same
dataset. Ten-fold cross-validation is performed throughout the study.
Prediction of financial fraud is extremely important as it can save huge
amounts of money from being embezzled. Our study is an important
step in that direction that highlights the use of data mining for solving
this serious problem.
With regards to the future research directions, we can extend this
work by extracting ‘if–then’ rules from different classifiers. These
rules can be helpful for easy understanding of the prediction process
for the end user because they make the knowledge learnt by these
techniques transparent. This type of knowledge elicitation can help in
providing early warning. In addition to the data mining techniques
used in this research, hybrid data mining techniques that combine
two or more classifiers can be used on the same dataset. Also, text
mining algorithms for sentiment analysis of the textual description of
the financial statements can be used together with data mining
algorithms for assessing the financial items in the financial statements
to provide better prediction of financial statement fraud.
Acknowledgments
We are very thankful to Mr. Frank Francone for giving us
permission to use the Discipulus tool (demo version) for conducting
various numerical experiments reported in this paper. We want to
thank the three anonymous reviewers for their insightful comments
which helped to improve the quality of this paper.
References
[1] A. Aamodt, E. Plaza, Case-based reasoning: foundational issues, methodological
variations, and system approaches, Artificial Intelligence Communications 7 (1)
(1994) 39–59.
[2] E.I. Altman, Financial ratios, discriminant analysis and prediction of corporate
bankruptcy, The Journal of Finance 23 (4) (1968) 589–609.
[3] W.H. Beaver, Financial ratios as predictors of failure, Journal of Accounting
Research 4 (1966) 71–111.
[4] D.P. Berrar, C.S. Downes, W. Dubitzky, Multiclass cancer classification using gene
expression profiling and probabilistic neural networks, Proceedings of the Pacific
Symposium on Biocomputing, vol. 8, 2003, pp. 5–16.
[5] I. Bose, J. Wang, Data mining for detection of financial statement fraud in Chinese
companies, Working Paper, The University of Hong Kong, 2008.
[6] R.C. Brooks, Neural networks: a new technology, The CPA Journal Online, http://
www.nysscpa.org/cpajournal/old/15328449.htm1994.
[7] B. Busta, R. Weinberg, Using Benford's law and neural networks as a review
procedure, Managerial Auditing Journal 13 (6) (1998) 356–366.
[8] T.G. Calderon, J.J. Cheh, A roadmap for future neural networks research in auditing
and risk assessment, International Journal of Accounting Information Systems 3
(4) (2002) 203–236.
[9] M. Cecchini, H. Aytug, G.J. Koehler, and P. Pathak. Detecting Management
Fraud in Public Companies. http://warrington.ufl.edu/isom/docs/papers/
DetectingManagementFraudInPublicCompanies.pdf
[10] M.J. Cerullo, V. Cerullo, Using neural networks to predict financial reporting fraud:
Part 1, Computer Fraud & Security 5 (1999) 14–17.
[11] Y. Chen, D. Liginlal, A maximum entropy approach to feature selection in
knowledge-based authentication, Decision Support Systems 46 (1) (2008)
388–398.
[12] A. Deshmukh, L. Talluru, A rule-based fuzzy reasoning system for assessing the
risk of management fraud, International Journal of Intelligent Systems in
Accounting, Finance & Management 7 (4) (1998) 223–241.
[13] K.M. Fanning, K.O. Cogger, Neural network detection of management fraud using
published financial data, International Journal of Intelligent Systems in Accounting, Finance, and Management 7 (1) (1998) 21–41.
499
[14] K.M. Faraoun, A. Boukelif, Genetic programming approach for multi-category
pattern classification applied to network intrusion detection, International
Journal of Computational Intelligence and Applications 6 (1) (2006) 77–99.
[15] E.H. Feroz, T.M. Kwon, V. Pastena, K.J. Park, The efficacy of red flags in predicting
the SEC's targets: an artificial neural networks approach, International Journal of
Intelligent Systems in Accounting, Finance, and Management 9 (3) (2000)
145–157.
[16] X. Fu, F. Tan, H. Wang, Y.Q. Zhang, R. Harrison, Feature similarity based
redundancy reduction for gene selection, Proceedings of the International
Conference on Data Mining (Las Vegas, NV, USA), June 26–29, 2006.
[17] http://en.wikipedia.org/wiki/Enron.
[18] http://en.wikipedia.org/wiki/MCI_Inc.
[19] http://www.examiner.com/x-17547-Financial-Fraud-Examiner~y2009m7d17Financial-Fraud-101-Understanding-the-Fraud-Triangle.
[20] S.-M. Huang, D.C. Yen, L.-W. Yang, J.-S. Hua, An investigation of Zipf's Law for fraud
detection, Decision Support Systems 46 (1) (2008) 70–83.
[21] A.G. Ivakhnenko, The group method of data handling—a rival of the method of
stochastic approximation, Soviet Automatic Control 13 (3) (1966) 43–55.
[22] Y. Kim, Toward a successful CRM: variable selection, sampling, and ensemble,
Decision Support Systems 41 (2) (2006) 542–553.
[23] E. Kirkos, C. Spathis, Y. Manolopoulos, Data mining techniques for the detection of
fraudulent financial statement, Expert Systems with Applications 32 (2007)
995–1003.
[24] KNIME 2.0.0. http://www.knime.org
[25] E. Koskivaara, Different pre-processing models for financial accounts when using
neural networks for auditing, Proceedings of the 8th European Conference on
Information Systems, vol. 1, 2000, pp. 326–3328, Vienna, Austria.
[26] E. Koskivaara, Artificial neural networks in auditing: state of the art, The ICFAI
Journal of Audit Practice 1 (4) (2004) 12–33.
[27] S. Kotsiantis, E. Koumanakos, D. Tzelepis, V. Tampakas, Forecasting fraudulent
financial statements using data mining, International Journal of Computational
Intelligence 3 (2) (2006) 104–110.
[28] J.R. Koza, Genetic programming: on the programming of computers by means of
natural selection, MIT press, Cambridge, MA, 1992.
[29] H. Liu, J. Li, L. Wong, A comparative study on feature selection and classification
methods using gene expression profiles and proteomic patterns, Genome
Informatics 13 (2002) 51–60.
[30] C. Magnusson, A. Arppe, T. Eklund, B. Back, H. Vanharanta, A. Visa, The language of
quarterly reports as an indicator of change in the company's financial status,
Information & Management 42 (4) (2005) 561–574.
[31] D. Mladenic, M. Grobelnik, Feature selection on hierarchy of web documents,
Decision Support Systems 35 (1) (2003) 45–87.
[32] Neucom, http://www.aut.ac.nz/research/research-institutes/kedri/research-centres/
centre-for-data-mining-and-decision-support-systems/neucom-project-homepage#download.
[33] Neuroshell 2.0, Ward Systems Inc. http://www.wardsystems.com.
[34] R. Pacheco, A. Martins, R.M. Barcia, S. Khator, A hybrid intelligent system applied
to financial statement analysis, Proceedings of the 5th IEEE conference on Fuzzy
Systems, vol. 2, 1996, pp. 1007–10128, New Orleans, LA, USA.
[35] M. Panik, Regression Modeling Methods, Theory, and Computation with SAS, CRC
Press, 2009.
[36] S. Piramuthu, On learning to predict web traffic, Decision Support Systems 35 (2)
(2003) 213–229.
[37] S. Ramamoorti, A.D. Bailey Jr., R.O. Traver, Risk assessment in internal auditing: a
neural network approach, International Journal of Intelligent Systems in
Accounting, Finance & Management 8 (3) (1999) 159–180.
[38] M. Ramos, Auditor's responsibility for fraud detection, Journal of Accountancy 195
(1) (2003) 28–35.
[39] G.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning Internal Representations by
Error Propagation, MIT Press, Cambridge, MA, 1986.
[40] M.F. Selekwa, V. Kwigizile, R.N. Mussa, Setting up a probabilistic neural network
for classification of highway vehicles, International Journal of Computational
Intelligence and Applications 5 (4) (2005) 411–423.
[41] J.E. Sohl, A.R. Venkatachalam, A neural network approach to forecasting model
selection, Information & Management 29 (6) (1995) 297–303.
[42] C. Spathis, M. Doumpos, C. Zopounidis, Detecting falsified financial statements: a
comparative study using multicriteria analysis and multivariate statistical
techniques, European Accounting Review 11 (3) (2002) 509–535.
[43] D.F. Specht, Probabilistic neural networks, Neural Networks 3 (1990) 110–118.
[44] V. Vapnik, Adaptive and learning systems for signal processing, in: HaykinS. (Ed.),
Statistical Learning Theory, John Wiley and Sons, 1998.
[45] D.P. Williams, V. Myers, M.S. Silvious, Mine classification with imbalanced data,
IEEE Geoscience and Remote Sensing Letters 6 (3) (2009) 528–532.
[46] G. Zhang, B.E. Patuwo, M.Y. Hu, Forecasting with artificial neural networks: the
state of the art, International Journal of Forecasting 14 (1) (1998) 35–62.
Pediredla Ravisankar is working as a Software Engineer in Capgemini, Hyderabad,
since February, 2010. He obtained his M.Tech (Information Technology) with
specialization in Banking Technology and Information Security from UoH and IDRBT,
Hyderabad (2009) and M.Sc. (Physics) from UoH, Hyderabad (2007). He has published
papers in Knowledge-Based Systems, Information Sciences, International Journal of
Data Mining, Modeling and Management and an IEEE conference paper. He is
nominated for Marquis Who’s Who in the world for 2011. His research interests
include data mining, soft computing, evolutionary algorithms, neural networks and
their applications.
500
P. Ravisankar et al. / Decision Support Systems 50 (2011) 491–500
Vadlamani Ravi is an Associate Professor in the Institute for Development and Research in
Banking Technology (IDRBT), Hyderabad, since April 2010. He obtained his Ph.D. in Soft
Computing from Osmania University, Hyderabad and RWTH Aachen, Germany (2001); MS
(Science and Technology) from BITS, Pilani (1991) and M.Sc. (Statistics & Operations
Research) from IIT, Bombay (1987). Prior to joining IDRBT, he worked as a Faculty at the
Institute of Systems Science (ISS), National University of Singapore for three years. Earlier,
he worked as Assistant Director at the Indian Institute of Chemical Technology (IICT),
Hyderabad. He was deputed to RWTH Aachen (Aachen University of Technology)
Germany under the DAAD Long Term Fellowship to carry out advanced research during
1997–1999. In a career spanning 22 years, Dr. Ravi has worked in the applications of Fuzzy
Computing, Neuro Computing, Soft Computing, Data Mining, Global/Multi-Criteria/
Combinatorial Optimization and Multivariate Statistics in Financial Engineering, Software
Engineering, Reliability Engineering, Chemical Engineering, Environmental Engineering,
Chemistry, Medical Entomology, Bioinformatics and Geotechnical Engineering. He
published 93 papers in refereed International / National Journals / Conferences and
invited chapters in edited volumes. He edited a Book on “Advances in Banking Technology
and Management: Impact of ICT and CRM”, published by IGI Global, USA, 2007. Further, he
is a referee for 25 International Journals of repute in Computer Science, Operations
Research, Computational Statistics, Economics and Finance. Moreover, he is an Editorial
board member of International Journal of Information Systems in the Service Sector
(IJISSS), IGI Global, USA, International Journal of Data Analysis Techniques and Strategies
(IJDATS), Inderscience Publications, Switzerland, International Journal of Information and
Decision Sciences (IJIDS), Inderscience Publications, Switzerland, International Journal of
Information Technology Project Management (IJITPM), IGI Global, USA. His current
research interests include Bankruptcy Prediction, CRM, Churn Prediction, FOREX rate
prediction, Risk Modeling and Asset Liability Management through Optimization, Software
reliability prediction, Software development cost estimation. He is listed in Marquis Who's
Who in the World 2009, 2010: Marquis Who's Who in Science and Engineering in 2011.
Also, he is an Invited Member of the 2000 Outstanding Intellectuals of the 21st Century
2009/2010 and 100 Top Educators in 2009 both published by International Biographical
Center, UK.
Gundumalla Raghava Rao is working as Research Associate for IDRBT since May 2009.
He holds an M.Tech (Computer Science & Engineering) from National Institute of
Technology, Rourkela in 2008. He holds a B.Tech (Computer Science & Engineering)
from M.I.T.S, Rayagada under Biju Patnaik University of Technology, Orissa. His research
interests include data mining.
Indranil Bose is an associate professor of Information Systems at the School of
Business, The University of Hong Kong. Prior to that, he was a faculty member at the
University of Texas at Arlington and at the University of Florida. He holds a B.Tech. from
the Indian Institute of Technology, MS from the University of Iowa, MS and Ph.D. from
Purdue University. His research interests are in telecommunications, information
security, data mining, and supply chain management. His publications have appeared
in Communications of the ACM, Communications of AIS, Computers and Operations
Research, Decision Support Systems, Electronic Commerce Research & Applications,
Ergonomics, European Journal of Operational Research, Information & Management,
Information Systems and e-Business Management, Journal of the American Society for
Information Science and Technology, Journal of Organizational Computing and Electronic
Commerce, Operations Research Letters, among others. His research is supported by
several grants from academia and industry. He serves as Associate Editor/Editorial
Review Board Member of Communications of AIS, Information & Management, Journal of
Global Information Technology and Management, Information Resources management
Journal, International Journal of Information Systems and Supply Chain Management,
Journal of Database Management, etc. He has also served as guest editor for
Communications of AIS, Decision Support Systems, European Journal of Information
Systems, and Journal of Organizational Computing and Electronic Commerce.