Download WORD

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Proposal for Outline re: Report
Draft version 2
May 9
Paper Title:
An Overview of current techniques for the Visualization of Evolution in Software from
versioning information
Paper Headings:
1.
2.
3.
4.
5.
6.
7.
Introduction
background
Data-extraction
Data-mining
Visualization
Future Work
Conclusions
Paper Contents:
1. Introduction
 Understanding software development is still a work in progress
o Software engineering a “young” science
 One way to understand and learn from the software development process is
through the artifacts of the process itself
o Information contained in versioning systems
o Bug reports
o Documented communication between developers(email)
o Modification requests
 Given the sources mentioned above, we need
o A way to extract the information is required
o A way to build relations between distinct data
o A way to correlate and derive associations between distinct data
 Once we have created a knowledge repository of the information and its relations,
we can then to do two things of value
o Data-mining: to derive or determine new facts or relations between
information
o Visualization: provide a visual medium to express the knowledge in the
repository in human readable form
 Benefit is to allow the human factor to discover new facts and
patterns
 Summarize motivation
o Enable visualization for easier understanding/comprehension
o Discover unseen patterns and dependencies
o Provide inter-mortem and post mortem analysis
2. Background
 Software development process analysis not a new idea (been around since 1950’s)
but
o Advent of versioning tools for development
o Evolution of formalized Process for development
 Software artifacts
o Communication and Project tracking(i.e. email and MS Project)
 All have led to an abundance of information pertaining to software
development
 Version control systems and information contained
o Code itself, who did it and when
 Artifacts
o Bug reports
o Modification requests
o Email
o Log files
o Others?
 Extraction mechanisms
o Inherently there is a lack of automation for fact extraction and relation
building  they weren’t designed for tracking SD evolution
 Visualization Tools
o Two main area’s where work has progressed
 The first is in the development of the software itself(MS Visual
Studio)
 The ide is problem domain specific(i.e. build GUI) and is a
production tool for code  not intended to support analysis of
project evolution
o Visualizing existing systems i.e. Rigi
 Like above intended to capture one snapshot in time of current
state of project, not provide historical overview
3. Data extraction
 Currently majority of approaches involve automation of some part of the data
extraction process
 Contents of version control system
o Analysis of information encapsulated in the system
 What is contained and just as importantly what is not
o Techniques for extraction
 Different systems
 Bug reports
 Different extraction and correlation techniques
o Data-mining for unknown dependencies
o Relation discovery and building
 The effectiveness of automation of the extraction process
4. Data Mining
 Information and facts gathered from the data extraction can be associated or
correlated to derive new information about the software evolution and the
architecture
o Files checked in around the same period of time in a versioning system
may be related
o Bug reports may have some type of tracking data that can couple it to a
particle piece of the software model
o MR’s can also contain information that can imply relations
o Other artifacts such as email can contain relevant information as well
 Time is a critical factor and component of the examination of a software’s
evolution
o The ability to observe changes in the software architecture through time
o The ability to see what parts of the software have changed during the same
time period
5. Visualization
 Creating a visual medium for data representation allows for us to take advantage
of the human factor, meaning a user can infer new patterns and trends from a
diagram  computationally can be difficult if not impossible to program the
computer to have same analytic ability
 Types of visualization explored so far
 Graphs (Vertices and edges)
o Distance between nodes vs. density driven
 Column graphs – statistics and interpretations
 3-Dimensional object graphs
 The use of color
 Time versus density plots
 Which of the above is useful and in what context
 Conclusions of what has been useful with regard to visualizations so far
6. Future Work
 Random Thoughts
o Using a Visual software development tool for the product development
could provide a base-line metric for comparisons and fact finding
regarding the evolution
o There is a lack of common representation or exploration in the realm of
visual analysis and the medium and method involved
 Driving factor of the success of computers was the GUI interface,
which is anchored on visual metaphors (the desktop, the recycle
bin, files, folders). Also drives applications (spreadsheets based on
accounting ledgers of old).
 Is there a way to create a visual symbolic language for presenting
the structure of a software product as well as being able to capture
it’s evolution in time that can take advantage of existing real world


metaphors(i.e. stop sign means stop, etc) that can be understood at
a glance(visual inspection alone yields comprehension)
 Current papers suggest that the somewhat successful software
strategies of today are iterative, or essentially feedback driven.
Current data mining seems focused on post-mortem analysis as a
learning curve. None of the papers suggest adaptations or
improvements to the process or to use it during the process from
the first day as part of the feedback tools to provide input during
the development process itself
 The data extraction sections of almost all papers don’t state what
seems to by an innate conclusion suggested by each: the lack of an
integrated tool which can correlate MR’s. bug reports, emails and
the version information to facilitate
 All the papers jump through many hoops to extract and
correlate the information to track evolution and architecture
 Why do none of the papers suggest implementing a new
tool that would facilitate this for future projects
 Question to check out: Are they already tools to completely
integrate the above information (common sense would
imply that there are) and if so, why are they not in use?
What are the problems with them if they aren’t? Since the
documents/tools themselves can be more or less language
independent from whatever is being used for the
project(CVS, bug reports, MR’s, etc.) is it worthwhile to
develop a tool whose purpose is to provide a platform to
co-ordinate the software development but whose byproduct is the ongoing ability to visualize the evolution and
information associated with the process. What information
would need to be tracked
What needs to be done based on the current state of research
Suggested paths for future research
7. Conclusions
 Summary of everything
Definitions so far:
Software artifact – any document either hard copy or electronic that is produced in
association or conjunction with a software product
MR – modification request, software artifact requesting and/or tracking a change to a
software project/module