Download Automatic Data Categorization in Issue Tracking Systems A

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Automatic Data Categorization in Issue Tracking Systems
A Research Preview
Thorsten Merten1
1
Bonn-Rhein-Sieg University
of Applied Sciences
Barbara Paech2
Bonn-Rhein-Sieg University of Applied Sciences, Department of Computer Science, Germany
[email protected], [email protected]
2
University of Heidelberg, Software Engineering Group, Germany
[email protected]
ITS DATA MINING
ISSUE TRACKING SYSTEM DATA
• Most data in an Issue Tracking System
(ITS) is stored in user defined text fields
• Data mining approaches try to extract
and use this data
• Categorization of ITS data can support
multiple SWE tasks, e.g. (cf. right
figure)
◦ creating a new sub-issue if necessary
◦ marking or changing wrongly
entered manual ITS data
◦ marking relevant information for a
certain task
◦ removing unnecessary information
• Issue Tracking Systems contain valuable
Software Engineering information, e.g.
◦ requirement and feature descriptions
◦ development and refactoring tasks
◦ bug reports and bug fixing tasks
◦ discussions and dissent
◦ ...
• This information fits in multiple
categories
◦ implementation ideas
◦ stack traces or error messages
◦ social interaction
◦ ...
OUR APPROACH
AN EXAMPLE
FOR
ALGORITHMIC IMPROVEMENTS
• To support automatic creation of change logs
• Some prerequisites can be used
◦ The separation of technical items and natural language
texts already works for developer mailing list classification
[1]
◦ Semi-supervised learning can be used to separate relevant
change log information but precision and recall are
generally low
• Preprocessing is very important in Text Mining [2, pp. 84]
• We can use the specialized algorithm to seperate technical
items as preprocessing step for general classification [5, 4]
CURRENT
AND
FUTURE WORK
• Literature Reviews on algorithms and solutions and ITS
supported SWE tasks
• Identification of ITS supported SWE tasks
• Identification of data categories to support these tasks
• Use of rule-based and specialized algorithms for
preprocessing
• General idea: Improvement of semi-supervised
learning by combining multiple methods.
• Experimental validation of algorithms and measuring
improvements on ITS tasks
http://www.brsu.de – http://www.inf.h-brs.de
AN EXAMPLE
FOR
TASK-BASED UTILIZATION
Identify
Documentation is a SWE task. The creation of a change log is a sub
discipline, which can be supported by ITS data.
Hypothesize We can improve the categorization of task-relevant and irrelevant natural language text passages by separating technical information.
Categorize We implement different algorithms to separate technical and natural language information and combine those (as preprocessing) with
different semi-supervised learning algorithms.
Evaluate
We apply the above categorization to create change logs and evaluate
by a) comparison with manually written change logs and b) surveying
developers. We may now go back to the categorize step and improve
the method.
REFERENCES
[1] A. Bacchelli, T. Dal Sasso, M. D’Ambros, and M. Lanza.
Content classification of development emails.
2012 34th International Conference on Software
Engineering (ICSE), pages 375–385, June 2012.
[2] J. Han, M. Kamber, and J. Pei.
Data Mining: Concepts and Techniques, Third Edition
(The Morgan Kaufmann Series in Data Management
Systems).
Morgan Kaufmann, 3rd edition, 2011.
[3] S. T. March and G. F. Smith.
Design and natural science research on information
technology.
Decision Support Systems, 15(4):251–266, 1995.
[4] R. E. Vlas and W. N. Robinson.
Two Rule-Based Natural Language Strategies for
Requirements Discovery and Classification in
Open-Source Software Development Projects.
Journal of Management Information Systems, pages
1–39, 2012.
[5] T. Zhang and B. Lee.
A Bug Rule Based Technique with Feedback for
Classifying Bug Reports.
In 2011 IEEE 11th International Conference on Computer
and Information Technology, pages 336–343. IEEE, Aug.
2011.
http://www.uni-heidelberg.de