Download Software Bug Classification using Suffix Tree Clustering (STC)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Human genetic clustering wikipedia , lookup

K-means clustering wikipedia , lookup

Nearest-neighbor chain algorithm wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
IJCST Vol. 2, Issue 1, March 2011
ISSN : 2229-4333(Print) | ISSN : 0976-8491(Online)
Software Bug Classification using
Suffix Tree Clustering (STC) Algorithm
1
1
2
Naresh Kumar Nagwani, 2Dr. Shrish Verma
Department of CS&E, National Institute of Technology Raipur.
Department of ET&C, National Institute of Technology Raipur.
Abstract
Suffix Tree Clustering (STC) is one of the popular text clustering
algorithms. STC has number of applications and the most
popular is web document clustering. Software bug data contains
number of attributes like bug-id, summary (title), description,
comments, status, version etc. Most of the important attributes
holds text data. Since the software bug repositories are consist
of most the data in the form of text, STC can be applied to
create the clusters of software bug record. In this paper STC
algorithm is used for software bug classification. First clusters
are created from the bug repositories and then labels are
assigned to the each cluster, which indicates the classes of
the clusters. STC implementation is available as the part of
Carrot2 framework. The designed technique is evaluated using
the common clustering parameters.
Keywords
Software Bug Classification, STC Clustering, Bug Clustering,
Software Bug Repository.
I. Introduction
A bug is defect in sofware. Bug indicates the unexpected
behavior of some of the given requirement during software
development. During software testing the unexpected behavior
of requirements are identified by software testers or quality
engineers and they are marked as a Bug. In this paper both
defect and bug are used as synonyms. Bugs are managed and
tracked using number of available tools like Bugzilla, Perforce,
JIRA etc.
A. Bug Reposiories
Most of the open source projects and bigger projects manages
their software development related data using some of the
tool. For managing the bugs associated with the software bug
tracking tools are used. These bug tracking systems provides
online interfaces to various users associated with the projects.
These tools internally manages the bug repositories where
all the bugs and related data are stored. For example for the
Mozilla project, the bugs are tracked using bugzilla tool. Bugzilla
provides all the mozilla bugs in the form of online repository. By
specifying the bug id in the Mozilla’ online repository, any user
can fetch the required bug information. The url for Mozilla’s
bug repository is “https://bugzilla.mozilla.org/show_bug.cgi?
id=”.
B. Cluster Analysis
Cluster analysis is a statistical method, which identifies groups
of similar objects which shows some similar characteristics.
A cluster is a collection of data objects that are similar to one
another within the same cluster and are dissimilar to the objects
in other clusters. There are number of clustering algorithms
available and also numbers of techniques exist for measuring the
distances for the clusters data points. Here some of the popular
distance functions and clustering algorithms are explained,
which will used by the suggested data mining model.
36 International Journal of Computer Science and Technology
1. Major Clustering Approaches
There exist a large number of clustering algorithms. The
choice of clustering algorithm depends both on the type of
data available and on the particular purpose and application.
In general, major clustering methods can be classified into
the following categories: Partitioning algorithms, Hierarchy
algorithms, Density-based, Grid-based, and Model-based
C. Suffix Tree Clustering (STC) algorithm
The first clustering algorithm to take advantage of association
between words, not only their frequencies, was Suffix Tree
Clustering used in Grouper [30,31]. STC attempts to cluster
documents or search results according to identical phrases they
contain. What motivates incorporation of phrases into STC is
making use of proximity and order of words, which have more
descriptive power than keywords. Apart from forming clusters,
phrases can be used for labeling the clusters created. STC is
organized into two main phases: discovering phrase clusters
(also called base clusters) and combining similar ones to form
merged clusters (or simply clusters).
The Suffix Tree Clustering (STC) algorithm groups the input
texts according to the identical phrases they share [31]. The
rationale behind such approach is that phrases, compared to
single keywords, have greater descriptive power. This results
from their ability to retain the relationships of proximity and
order between words. A great advantage of STC is that phrases
are used both to discover and to describe the resulting groups.
The Suffix Tree Clustering algorithm works in two main phases:
base cluster discovery phase and base cluster merging phase.
In the first phase a generalized suffix tree of all texts' sentences
is built using words as basic elements. After all sentences
are processed, the tree nodes contain information about
the documents in which particular phrases appear. Using
that information documents that share the same phrase are
grouped into base clusters of which only those are retained
whose score exceeds a predefined Minimal Base Cluster Score.
In the second phase of the algorithm, a graph representing
relationships between the discovered base clusters is built
based on their similarity and on the value of the Merge
Threshold. Base clusters belonging to coherent sub graphs of
that graph are merged into final clusters.
A detailed example illustrating the STC algorithm along with
its evaluation based on standard Information Retrieval metrics
and user feedback is presented in [30,31].
STC algorithm
Step 1: Cleaning - Stemming, Sentence boundary identification,
Punctuation elimination.
Step 2: Suffix tree construction - Produces base clusters
(internal nodes); Base clusters are scored based on size and
phrase score (which depends on length and word „quality”)
Step 3: Merging base clusters - Highly overlapping clusters
are merged
w w w. i j c s t. c o m
ISSN : 2229-4333(Print) | ISSN : 0976-8491(Online)
Cluster labeling algorithms group a set of documents based
on a similarity score between them and then identify phrases
which are representative of the cluster. They pick commonly
occurring phrases in the documents and compute a score for
each phrase.
D. Carrot2 Framework
Carrot2 framework is proposed by Stefanowski and Weiss in
2003. Carrot2 framework is open source and is available at
http://www.cs.put.poznan.pl/dweiss/carrot/. It is a component
based framework for text clustering. It allows substituting
components for Input (i.e. snippets from other search engines),
Filter, Stemming, Distance measure, Clustering and Output for
the text clustering.
In this paper web clustering technique STC is applied for software
bug classification. It is a classification by clustering technique,
which uses the web clustering algorithm STC for classification
of software bugs. This paper is divided in six sections. Section
two discusses about the work done previously in the similar
field. In section three proposed methodology is explained in
details, implementation and result evaluation is mentioned in
section four and five. And finally conclusion and future scope
of the proposed work is given in section six.
II. Related and Previous Work Done
Work done related to software repositories mining is discussed
in this section. Software bug repositories contains huge amount
of knowledge patterns.
The approach of xFinder is proposed by Kagdi et al to recommend
expert developers by mining version archives of a system [19].
The basic premise of this approach is that the developers who
contributed substantial changes to a specific part of source
code in the past are likely to best assist in its current or future
change. Some investigation and analysis on bug fixing is done
by Ayewah and Pugh [28]. Several past projects introduce and
refine an approach to finding fix-inducing commits that is based
on creating a link between the bug report database and the
code repository using commit messages [4,12,20,32].
The concepts of neighbors and link are introduced by Luo et al
[12] for document clustering. Some of the common problems of
text clustering like big volume, high dimensionality and complex
semantics etc. are studied by Stefanowski et al [16], solution
to these problems are also proposed. Some of the suggested
solutions are subspace clustering, ontology etc.
A generic open source framework for pre-processing of software
bug repositories was designed by Nagwani and Verma [24].
The framework was implemented in java including GUI for the
framework. The framework was designed for extracting software
bugs from online software bug repositories and parsing the files
retrieved from online software bug repositories. All the parsed
data can be saved in the local database and user can also
fetch the bug records from the local database. The framework
implemented is implemented in generic way and can be plugged
with any of the online software bug repository.
A weighted bug similarity model is proposed by Nagwani and
Singh [27] for discovering similar bugs in a software bug
repositories. In the proposed model a bug is transformed to
an object with number of attributes. For measuring the similarity
between two bugs all the attributes similarities are calculated
and weights are assigned to the similarity values then using a
suitable threshold value bugs can be marked as similar bugs.
A data mining model is designed and implemented in java
by Nagwani and Verma [25] for predicting the fix duration for
w w w. i j c s t. c o m
IJCST Vol. 2, Issue 1, March 2011
newer incoming software bugs in software bug repositories.
The proposed model was designed by using textual information
similarity in a software bug. For any newly created bug, all its
similar bugs are discovered first in the software bug repository
then average fix duration is calculated and fix duration is
predicted for a software bug. A GUI (Graphical User Interface)
bugs mining model is proposed by Nagwani and Singh [26].
The model was proposed to discover the similar and duplicate
GUI bugs for the graphical user interface of any software. The
similarity of GUI components and associated events were
counted for detecting the similar and duplicate GUI bugs.
Grouper [30,31] is a snippet-based clustering engine based
on the Husky Search Meta search engine. The main feature
of Grouper is the introduction of a phrase-analysis algorithm
called STC (Suffix Tree Clustering). In essence, the algorithm
builds a suffix tree of phrases in snippets; each representative
phrase becomes a candidate cluster; candidates with large
overlap are merged together. The main contribution of Grouper
stands in the complexity of the clustering algorithm, which
allows for very fast processing of large result sets. In [9] the
problem of improving cluster labels is studied. A machine
learning approach is used: given a corpus of training data,
the system is able to identify the most salient phrases and use
them as cluster titles. The algorithm is therefore supervised.
Various issues related to Web clustering engine like acquisition,
preprocessing etc. are addressed by Carpineto et al [5]. Cluster
label optimization and performance related issues are also
discussed in the study.
Eclipse [7] is a multi-language software development environment
comprising an integrated development environment (IDE) and
an extensible plug-in system. It is written primarily in Java and
can be used to develop applications in Java. Some of the
problems of text clustering like very high dimensionality of the
data, very large size of the databases and understandability of
the cluster description etc. are analyzed and studied by Beil et
al [8] and an approach is proposed which uses frequent item
(term) sets for text clustering. Such frequent sets are discovered
using algorithms for association rule mining then clusters are
created based on frequent term sets.
Java [14] is a general-purpose, concurrent, class-based, objectoriented language that is specifically designed to have as few
implementation dependencies as possible. JBoss Seam [15]
is a powerful new application framework for building Web 2.0
applications. JIRA [18] is the issue tracking, bug tracking and
project tracking tool for software development teams.
A semi-automated approach is proposed by Fluri et al. [1]
to discover patterns of source code change types using
agglomerative hierarchical clustering. They found that change
type patterns do describe development activities and affect the
control flow, the exception flow, or change the API. A standard
for the classification of software anomalies is provided by the
IEEE in 1993 [10], which was further revised on 2009 [11]. A
uniform approach to the classification of anomalies found in
software and its documentation is provided. The processing
of anomalies discovered during any software life cycle phase
are described.
Various case studies which provide the quantitative data on
categories of software faults and discuss the applicability of
these software fault category distributions to fault injection are
studied by Ploski et al. [13]. Various challenges in hierarchical
document clustering are studied by Benjamin et al. [2], they
focused on some of the key challenges like high dimensionality,
high volume of data, ease of browsing, and meaningful cluster
labels etc. And also numbers of document clustering algorithms
International Journal of Computer Science and Technology 37
IJCST Vol. 2, Issue 1, March 2011
are reviewed.
The group of X Wang et. al. [34], has proposed discovering
semantically similar terms using WordNet. Several methods
have been implemented and evaluated. Also they have
proposed the Semantic Similarity Retrieval Model (SSRM), a
general document similarity and information retrieval method
suitable for retrieval in conventional document collections and
the Web. Jalbert and Weimer [29] have proposed a system that
automatically classifies duplicate bug reports as they arrive to
save developer time. This system uses surface features, textual
semantics, and graph clustering to predict duplicate status
III. Methodology
The overall methodology of software bug clustering using STC
algorithm can be represented using fig. 1, fig. 2 and fig. 3. The
overall methodology is divided in the six stages.
ISSN : 2229-4333(Print) | ISSN : 0976-8491(Online)
Fig. 1: Retrieving Software Bugs from Online Repositories
Fig. 2: Software Bug Clustering Using STC Algorithm.
A. Data Access Layer to extract the data from local
database.
Retrieving software bugs from online software bug repositories,
parsing the software bugs and saving to the local database.
B. Bug to object transformation
Once a software bug record is retrieved at local database
and is available for performing data mining operation, it is
transformed into the terms of a java object. So that it can
be stored and processed further in the java collection API
(Application Programming Interface).
C. Applying stop word elimination and stemming
The popular text mining pre-processing techniques named stop
word elimination and stemming are applied here in order to
pre-process the software bug records. Most of the software
bug attributes are textual; hence they need to be prepared for
mining. Stop words does not make any sense in knowledge
discovery and hence need to be eliminated. Stemming is
required in order to unify the terms present in a text document,
so that knowledge patterns can be discovered effectively.
D. Passing the software bug objects to carrot
framework
Once the software bug record is transformed into the java
object, and all the software bug records are collected in
the java collection, it is passed into the carrot2 framework,
which is implemented in java for web document clustering, for
performing clustering of software bug records. STC algorithm
is selected for performing the software bug clustering.
E. Output – Software bug clusters with labels
As soon as STC complete its task of creating clusters of software
bugs, the output is generated for each cluster with cluster label
assigned to it. This label indicates the class of the cluster. This
method is an example of classification using clustering.
F. Calculating entropy and purity for each cluster.
At last various cluster parameters are evaluated, to evaluate the
performance of the software bug clustering algorithm. The four
parameters are evaluated for the algorithm. These parameters
are purity, entropy, number of clusters created and time to
create the clusters in milliseconds.
Fig. 3: Steps in Evaluating the STC Algorithm for Software Bug
Clustering.
IV. Implementation
Implementation is done using open source technologies. Java
[14] is used as the primary programming language. Eclipse
[7] is used as the IDE (Integrated Development Environment),
MySql [26] is used as the local database for storing the software
bug records locally and JDBC (Java Data Base Connectivity) is
used for getting software bug records from MySql database to
Java programming language.
Pre-processing task for software bug records are done in two
phases. First the stop words are elimination by using Weka
API, and in the second phase stemming is done using Porter’s
stemmer, which is implemented using Java. Carrot2 Framework
is java implementation of web document clustering. It includes
the implementation of STC web clustering algorithm, which is
also used for the implementation of the proposed algorithm.
V. Experiments and Result Evaluation
Cluster quality is evaluated using various metrics. In this
paper four matrices are used named purity, entropy, number
of clusters generated for the different number of software bug
records and times to create the clusters. Purity assumes that
all samples of a cluster are predicted to be members of the
actual dominant class for that cluster. The cluster quality can
be judged on the basis of these parameters.
One of the ways of measuring the quality of a clustering solution
is cluster purity. Let there be k clusters (the k in k-means) of
the dataset D and size of cluster Cj be |Cj|. Let |Cj|class=i
denote number of items of class i assigned to cluster j. Purity
of this cluster is given by
(1)
purity(Cj) =
The overall purity of a clustering solution could be expressed
as a weighted sum of individual cluster purities
purity =
38 International Journal of Computer Science and Technology
purity(Cj)
(2)
w w w. i j c s t. c o m
IJCST Vol. 2, Issue 1, March 2011
ISSN : 2229-4333(Print) | ISSN : 0976-8491(Online)
In general, larger the value of purity better the cluster solution is.
Entropy is a measure of randomness or irregularity, looking for
the most random distribution corresponds to looking for the
distribution with maximal entropy. The entropy of a probability
distribution P is
(3)
The entropy is calculated as a function of probability of a
software bug object belonging into a particular class.
Based upon the proposed methodology and cluster evaluation
parameters the implementation is done and parameters are
calculated. Three software bug repositories are taken for
experiment with different number of software bug records. The
number of clusters generated for different number of software
bugs in different software bug repositories are given in Table
1. The graph plotted for this data is shown in fig. 4. Time to
create clusters for different number of software bug records
in different software bug repositories is given in Table 2. The
corresponding graph for the given value is plotted and shown
in fig. 5.
Table 1: Number of Clusters Created in STC Algorithm
Number of Bugs /
Bug Repository
Jboss-Seam
Mozilla
MySQL
100
200
500
700
1000
16
15
16
16
16
16
16
16
16
16
16
16
16
16
16
Table 2: Time To Create Clusters in STC Algorithm
Number of Bugs
Bug Repository
Jboss-Seam
Mozilla
MySQL
100
200
500
700
1000
593
656
656
625
750
735
719
938
985
782
1047
1110
828
1203
1266
Fig. 5: Purity Measured for Created Clusters in STC
Algorithm.
The entropy value for different number of software bug records
and different software repositories is calculated and given in
Table 4. The graph plotted for the corresponding values in
fig.6.
Table 4: Entropy in STC Algorithm
Number of Bugs /
100 200
Bug Repository
Jboss-Seam
3.79 3.75
Mozilla
3.43 3.57
MySQL
3.81 3.63
500
700
1000
3.77
3.4
3.73
3.78
3.5
3.66
3.76
3.48
3.63
Fig. 6: Entropy Measured for Created Clusters in STC
Algorithm.
Fig. 4: Time to Create Clusters in STC Algorithm.
The purity value is calculated against the clusters created for
different number of software bug records in different software
bug repositories are given in Table-III. For JBoss-Seam repository
the maximum purity value is achieved. These values are plotted
in graph shown in fig. 5.
Table 3: Purity in STC Algorithm
Number of Bugs /
Bug Repository
Jboss-Seam
Mozilla
MySQL
w w w. i j c s t. c o m
100
200
500
700
1000
0.55
0.71
0.18
0.53
0.55
0.19
0.39
0.59
0.2
0.39
0.48
0.2
0.39
0.47
0.16
VI. Conclusion and Future Work
In this paper STC clustering algorithm is used for software
bug clustering. And it is demonstrated that the STC can be
used effectively for the software bug clustering also. Carrot2
framework is used in the implementation and software bug
clusters are created using STC algorithm. Various clustering
parameters are also calculated to evaluate the quality of the
clusters created. Using STC the cluster labels are assigned
to the each created cluster, which indicates the class of the
cluster. So this is an effective way of classifying the software bug
in just a small time, also cluster purity calculated is adoptable.
The future scope of the proposed work could be applying more
pre-processing task on the software bug repositories in order to
reduce the mining time and comparing the other web clustering
algorithms with the STC and designing some hybrid algorithms
for the betterment of software bug classification.
International Journal of Computer Science and Technology 39
IJCST Vol. 2, Issue 1, March 2011
References
[1] Beat Fluri, Emanuel Giger, Harald C. Gall, "Discovering
Patterns of Change Types", 2008 IEEE, pp. 463-466,
2008.
[2] Benjamin C. M. Fung, Ke Wang, Martin Ester, "Hierarchical
Document Clustering, The Encyclopedia of Data
Warehousing and Mining", John Wang (ed.), Idea Group,
pp. 1-7, 2005.
[3] "Bugzilla, An Open source web-based general-purpose bug
tracker and testing tool originally developed and used by
the Mozilla", [Online] Available : http://www.bugzilla.org.
[4] Williams, J. Spacco. "Szz revisited: verifying when changes
induce fixes". In DEFECTS ’08: Proceedings of the 2008
workshop on Defects in large software systems, pages
32–36, New York, NY, USA, 2008. ACM.
[5] Claudio Carpineto, Stanislaw Osin´ Ski, Giovanni Romano,
Dawid Weiss, "A Survey of Web Clustering Engines", ACM
Computing Surveys, Vol. 41, No. 3, Article 17, Publication
date: July 2009.
[6] Congnan Luo, Yanjun Li, Soon M. Chung, "Text document
clustering based on neighbors", Elsevier Data & Knowledge
Engineering 68 (2009), pp. 1271–1288.
[7] "Eclipse, A multi-language software development
environment comprising an integrated development
environment (IDE) and an extensible plug-in system" :
[Online] Available : http://www.eclipse.org/
[8] Florian Beil, Martin Ester, Xiaowei Xu, "Frequent TermBased Text Clustering", SIGKDD 02 Edmonton, Alberta,
Canada, 2002.
[9] H. J. Zeng, Q. C. He, Z. Chen, W. Y. Ma, J. Ma., "Learning
to Cluster Web Search Results". In Proceedings of the
ACM SIGIR Conference on Research and development in
information retrieval, pp. 210-217, 2004.
[10]IEEE Standard Classification for Software Anomalies, IEEE
Std 1044-1993, 1993.
[11]IEEE Standard Classification for Software Anomalies,
IEEE Std 1044-2009 (revision of IEEE Std 1044-1993),
2009.
[12]J. Sliwerski, T. Zimmermann, A. Zeller. "When do changes
induce fixes?", In MSR ’05: Proceedings of the 2005
international workshop on Mining software repositories,
pp. 1–5, New York, NY, USA, 2005. ACM.
[13]Jan Ploski, Matthias Rohr, Peter Schwenkenberg, Wilhelm
Hasselbring, Software Engineering Group, TrustSoft,
Research Issues in Software Fault Categorization, ACM
SIGSOFT Software Engineering Notes, Vol. 32 No. 6,
2007.
[14]"Java, Open source programming language": [Online]
Available : http://www.sun.com/java/.
[15]JBoss Seam, a web application framework developed by
Jboss. [Online] Available : https://jira.jboss.org/browse/
JBSEAM.
[16]Jerzy Stefanowski, Dawid Weiss, "Comprehensible and
Accurate Cluster Labels in Text Clustering", Conference
RIAO2007, Pittsburgh PA, U.S.A. May 30-June 1, 2007 C.I.D. Paris, France
[17]Jiawei Han, Micheline Kamber, "Data Mining: Concepts &
Techniques" 2nd Edition, Morgan Kaufmann Publishers,
ISBN 978-81-312-0535-8, 2009.
[18]"Jira, An issue tracking", bug tracking and project tracking
tool for software development teams: [Online] Available :
http://www.jira.com.
[19]Kagdi, H., Hammad, M., Maletic, J. I., "Who Can Help Me with
this Source Code Change?" in Proc. of IEEE International
40 International Journal of Computer Science and Technology
ISSN : 2229-4333(Print) | ISSN : 0976-8491(Online)
Conference on Software Maintenance, Beijing, China,
September 28-October 3 2008.
[20]L. Aversano, L. Cerulo, C. Del Grosso. Learning from bugintroducing changes to prevent fault prone code. In IWPSE
’07: Ninth international workshop on Principles of software
evolution, pp. 19–26, New York, NY, USA, 2007. ACM.
[21]Mozilla, a global community dedicated to building free,
open source products like Firefox web browser and
Thunderbird email software: [Online] Available : https://
bugzilla.mozilla.org/.
[22]"MySql Bugs", available at: [Online] Available : http://bugs.
mysql.com.
[23]"MySql", A relational database management system
(RDBMS) that runs as a server providing multi-user access
to a number of databases: [Online] Available : http://
mysql.com.
[24]Naresh Kumar Nagwani, Dr. Shrish Verma, “An Open Source
Framework for Data Pre-processing of Online Software Bug
Repositories”, CiiT International Journal of Data Mining
Knowledge Engineering, Vol. 1, No. 7, September 2009.
[25]Naresh Kumar Nagwani, Dr. Shrish Verma, “Predictive Data
Mining Model for Software Bug Estimation Using Average
Weighted Similarity”, IEEE 2nd International Advance
Computing Conference ( IEEE IACC 2010) to be held on
19-20th February, 2010 at Thapar University, Patiala
[26]Naresh Kumar Nagwani, Pradeep Singh - "Bug Mining
Model Based on Event-Component Similarity to Discover
Similar and Duplicate GUI Bugs", IEEE International
Advance Computing Conference, IACC-2009, Patiala,
Punjab, India. 2009, pp. 1388 - 1392 Location: Patiala,
India.
[27]Naresh Kumar Nagwani, Pradeep Singh - "Weight Similarity
Measurement Model Based, Object Oriented Approach for
Bug Databases Mining to Detect Similar and Duplicate
bugs", International Conference on Advances in Computing,
Communication and Control, ICAC-2009, ACM SIGART Conf
Id - 2009-16014, Mumbai, Maharashtra, India. 2009, pp.
202-207, [Online] Available : http://portal.acm.org/toc.cf
m?id=1523103&type=proceeding&coll=portal&dl=
[28]Nathaniel Ayewah, William Pugh, "Learning from Defect
Removals", IEEE MSR 2009, pp. 179-182, 2009.
[29]Nicholas Jalbert, Westley Weimer, "Automated Duplicate
Detection for Bug Tracking Systems", IEEE International
Conference on Dependable Systems & Networks:
Anchorage, Alaska, June 24-27 2008, pp. 52-61, 2008.
[30]O. Zamir, O. Etzioni. "Grouper: A Dynamic Clustering
Interface for Web Search Results". Computer Networks
(1999) 31(11-16): pp. 1361-1374.
[31]O. Zamir, O. Etzioni. "Web Document Clustering: A
Feasibility Demonstration". In Proceedings of the ACM
SIGIR Conference on Research and Development in
Information Retrieval (SIGIR), 1998.
[32]Sunghun Kim, E. James Whitehead Jr., Yi Zhang, "Classifying
Software Changes: Clean or Buggy?", IEEE TRANSACTIONS
ON SOFTWARE ENGINEERING, Vol. 34, No. 2, MARCH/
APRIL 2008, pp. 181-196, 2008.
[33]"Trac, A Project management and bug/issue tracking
system", [Online] Available : http://trac.edgewall.org.
[34] Xiaoyin Wang, Lu Zhang, Tao Xie, John Anvik, Jiasu Sun, "An
Approach to Detecting Duplicate Bug Reports using Natural
Language and Execution Information", ACM ICSE’08, May
10–18, 2008, Leipzig, Germany, pp. 461-470, 2008.
w w w. i j c s t. c o m
ISSN : 2229-4333(Print) | ISSN : 0976-8491(Online)
IJCST Vol. 2, Issue 1, March 2011
Naresh Kumar Nagwani was born on 15th
February 1980 at Raipur, India. He completed his
graduation in Computer Science & Engineering
in 2001 from Guru Ghasidas University, Bilaspur.
He completed his post graduation M.Tech.
in Information Technology from ABV- Indian
Institute of Information Technology, Gwalior in
2005. His area of interest is DBMS, Data Mining, Text Mining
and Information Retrieval. His employment experience includes
SSCET Bhilai, Team Lead in Persistent Systems Limited and
NIT Raipur. Presently he is assistant professor at department
of computer science & engineering, National Institute of
Technology, Raipur.
Dr. Shrish Verma has completed his graduation
in Electronics & Telecommunication Engineering
and his post graduation M.Tech. in Computer
Engineering from Indian Institute of Technology,
Kharagpur. He has completed his PhD in
Engineering from Pt. Ravi Shankar Shukla
University Raipur. Presently he is head &
associate professor at department of information technology,
National Institute of Technology, Raipur.
w w w. i j c s t. c o m
International Journal of Computer Science and Technology 41