Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Proposal for Outline re: Report Draft version 2 May 9 Paper Title: An Overview of current techniques for the Visualization of Evolution in Software from versioning information Paper Headings: 1. 2. 3. 4. 5. 6. 7. Introduction background Data-extraction Data-mining Visualization Future Work Conclusions Paper Contents: 1. Introduction Understanding software development is still a work in progress o Software engineering a “young” science One way to understand and learn from the software development process is through the artifacts of the process itself o Information contained in versioning systems o Bug reports o Documented communication between developers(email) o Modification requests Given the sources mentioned above, we need o A way to extract the information is required o A way to build relations between distinct data o A way to correlate and derive associations between distinct data Once we have created a knowledge repository of the information and its relations, we can then to do two things of value o Data-mining: to derive or determine new facts or relations between information o Visualization: provide a visual medium to express the knowledge in the repository in human readable form Benefit is to allow the human factor to discover new facts and patterns Summarize motivation o Enable visualization for easier understanding/comprehension o Discover unseen patterns and dependencies o Provide inter-mortem and post mortem analysis 2. Background Software development process analysis not a new idea (been around since 1950’s) but o Advent of versioning tools for development o Evolution of formalized Process for development Software artifacts o Communication and Project tracking(i.e. email and MS Project) All have led to an abundance of information pertaining to software development Version control systems and information contained o Code itself, who did it and when Artifacts o Bug reports o Modification requests o Email o Log files o Others? Extraction mechanisms o Inherently there is a lack of automation for fact extraction and relation building they weren’t designed for tracking SD evolution Visualization Tools o Two main area’s where work has progressed The first is in the development of the software itself(MS Visual Studio) The ide is problem domain specific(i.e. build GUI) and is a production tool for code not intended to support analysis of project evolution o Visualizing existing systems i.e. Rigi Like above intended to capture one snapshot in time of current state of project, not provide historical overview 3. Data extraction Currently majority of approaches involve automation of some part of the data extraction process Contents of version control system o Analysis of information encapsulated in the system What is contained and just as importantly what is not o Techniques for extraction Different systems Bug reports Different extraction and correlation techniques o Data-mining for unknown dependencies o Relation discovery and building The effectiveness of automation of the extraction process 4. Data Mining Information and facts gathered from the data extraction can be associated or correlated to derive new information about the software evolution and the architecture o Files checked in around the same period of time in a versioning system may be related o Bug reports may have some type of tracking data that can couple it to a particle piece of the software model o MR’s can also contain information that can imply relations o Other artifacts such as email can contain relevant information as well Time is a critical factor and component of the examination of a software’s evolution o The ability to observe changes in the software architecture through time o The ability to see what parts of the software have changed during the same time period 5. Visualization Creating a visual medium for data representation allows for us to take advantage of the human factor, meaning a user can infer new patterns and trends from a diagram computationally can be difficult if not impossible to program the computer to have same analytic ability Types of visualization explored so far Graphs (Vertices and edges) o Distance between nodes vs. density driven Column graphs – statistics and interpretations 3-Dimensional object graphs The use of color Time versus density plots Which of the above is useful and in what context Conclusions of what has been useful with regard to visualizations so far 6. Future Work Random Thoughts o Using a Visual software development tool for the product development could provide a base-line metric for comparisons and fact finding regarding the evolution o There is a lack of common representation or exploration in the realm of visual analysis and the medium and method involved Driving factor of the success of computers was the GUI interface, which is anchored on visual metaphors (the desktop, the recycle bin, files, folders). Also drives applications (spreadsheets based on accounting ledgers of old). Is there a way to create a visual symbolic language for presenting the structure of a software product as well as being able to capture it’s evolution in time that can take advantage of existing real world metaphors(i.e. stop sign means stop, etc) that can be understood at a glance(visual inspection alone yields comprehension) Current papers suggest that the somewhat successful software strategies of today are iterative, or essentially feedback driven. Current data mining seems focused on post-mortem analysis as a learning curve. None of the papers suggest adaptations or improvements to the process or to use it during the process from the first day as part of the feedback tools to provide input during the development process itself The data extraction sections of almost all papers don’t state what seems to by an innate conclusion suggested by each: the lack of an integrated tool which can correlate MR’s. bug reports, emails and the version information to facilitate All the papers jump through many hoops to extract and correlate the information to track evolution and architecture Why do none of the papers suggest implementing a new tool that would facilitate this for future projects Question to check out: Are they already tools to completely integrate the above information (common sense would imply that there are) and if so, why are they not in use? What are the problems with them if they aren’t? Since the documents/tools themselves can be more or less language independent from whatever is being used for the project(CVS, bug reports, MR’s, etc.) is it worthwhile to develop a tool whose purpose is to provide a platform to co-ordinate the software development but whose byproduct is the ongoing ability to visualize the evolution and information associated with the process. What information would need to be tracked What needs to be done based on the current state of research Suggested paths for future research 7. Conclusions Summary of everything Definitions so far: Software artifact – any document either hard copy or electronic that is produced in association or conjunction with a software product MR – modification request, software artifact requesting and/or tracking a change to a software project/module