Download Slide 0 - Faculteit Wiskunde en Informatica

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Code and Pattern Mining in
C/C++
Aditya S. Deshpande
Namratha Nayak
Guides:
Dr. A.Serebrenik(TU/e)
P.Kourzanov, ir(NXP)
Y.Dajsuren, PDEng(Virage Logic)
Agenda
•
•
•
•
•
Introduction
Problem Definition
Data flow
Design patterns
Summary
/ Faculteit Wiskunde en Informatica
22-5-2017
PAGE 1
Introduction
• Code mining – Process of extracting patterns
from source code.
• Design Patterns – A design pattern is a general
reusable solution to a commonly occurring problem
in software design.
• Streaming Data - Data streaming is the transfer of
data at a steady high-speed rate sufficient to support
such applications as high-definition television or a
radio signal.
/ Faculteit Wiskunde en Informatica
22-5-2017
PAGE 2
Problem Definition
• Lack of synchronisation between models and source
code.
• Significant amount of repetitive code in different
modules.
• Identifying patterns and integrating them in the
framework.
• Objective
/ Faculteit Wiskunde en Informatica
22-5-2017
PAGE 3
Approach
• Study the Design flow models available.
• Study the various design pattern matching methods
and tools.
/ Faculteit Wiskunde en Informatica
22-5-2017
PAGE 4
Data flow models
• Kahn Process Networks.
• Synchronous Data Flow.
/ Faculteit Wiskunde en Informatica
22-5-2017
PAGE 5
Kahn Process Network - Introduction
• Processes communicate via FIFO.
• Parallel communication is organized as follows
• Autonomous computing stations are connected to each
other in a network by communication lines.
• A station computes on data coming on its input lines to
produce output on some or all of its output lines.
• Assumptions
• Communication lines are the only means of communication.
• Communication lines transmit info within a finite time.
/ Faculteit Wiskunde en Informatica
22-5-2017
PAGE 6
Kahn Process Network - Introduction
• Restrictions
• At any given time a computing station is either computing or
waiting for information on one of its input lines.
• Each computing station follows a sequential program.
/ Faculteit Wiskunde en Informatica
22-5-2017
PAGE 7
Kahn Process Network - Example
• From Kahn’s original 1974 paper
process f(in int u, in int v, out int w)
{
u
int i; bool b = true;
for (;;) {
f
w
i = b ? wait(u) : wait(v);
v
printf("%i\n", i);
send(i, w);
Process alternately reads from u
b = !b;
and v, prints the data value, and
writes it to w.
}
}
/ Faculteit Wiskunde en Informatica
22-5-2017
PAGE 8
Kahn Process Network - Example
• From Kahn’s original 1974 paper
Process interface
includes FIFO’s.
process f(in int u, in int v, out int w)
{
int i; bool b = true;
wait() returns the next
for (;;) {
token in an input FIFO,
blocking if it’s empty
i = b ? wait(u) : wait(v);
printf("%i\n", i);
send(i, w);
send() writes a data value
b = !b;
on an output FIFO
}
}
/ Faculteit Wiskunde en Informatica
22-5-2017
PAGE 9
SDF - Introduction
• Synchronous data flow graph (SDF) is a network of
synchronous nodes (also called blocks).
• For a synchronous node, the consumptions and
productions are known a priori.
• Homogeneous SDF
/ Faculteit Wiskunde en Informatica
22-5-2017
PAGE 10
SDF - Delay
• Delay of signal processing
• Unit delay on arc between A and B, means
A
d
B
• nth sample consumed by B, is (n-1)th sample
produced by A.
• The arc is initialized with ‘d’ zero samples.
/ Faculteit Wiskunde en Informatica
22-5-2017
PAGE 11
SDF - Implementation
• Implementation requires:
• Buffering of the data samples passing between nodes
• Schedule nodes when inputs are available
• Dynamic implementation (= runtime) requires
• Runtime scheduler checks when inputs are available
and schedules nodes when a processor is free.
/ Faculteit Wiskunde en Informatica
22-5-2017
PAGE 12
SDF - Implementation
• Contribution of Lee-87:
• SDF graphs can be scheduled at compile time
• No overhead
• Compiler will:
• Determine the execution order of the nodes on one or
multiple processors or data path units
• Determine communication buffers between nodes.
/ Faculteit Wiskunde en Informatica
22-5-2017
PAGE 13
Design Patterns
• Describe solutions for common recurring problems
• Can be used in a wider context as they are defined
informally
• Documenting them in a software system simplifies
maintenance and program understanding
• Usually it is not documented, so there is a need to
discover design patterns from source code
/ Faculteit Wiskunde en Informatica
22-5-2017
PAGE 14
Pattern Mining
• Structure of design pattern is searched in the source
code.
• Should include the main properties of the design
pattern
• Flexible to describe the slightly distorted pattern
occurrences.
• Helps to understand the relationships between the
different parts of a large system
/ Faculteit Wiskunde en Informatica
22-5-2017
PAGE 15
Pattern Mining
• Reverse Engineering
• Analysis of a system to
− Identify the components and their interrelationships
− Create representations of the system in another form
• Why tools for Reverse Engineering?
• Existing legacy code
• High number of participants in code development
• Tools developed to mine the patterns from the
source code
/ Faculteit Wiskunde en Informatica
22-5-2017
PAGE 16
Pattern Mining Tools
• Aspects in the different mining tools
• Programming Language : Tools for Java and C++
• Method used to discover design patterns : Graph Matching ,
Constraint Satisfaction Problem, pattern inference
• Intermediate Representation – Abstract Semantic Graph,
Abstract Syntax Tree, Matrix and Vector
/ Faculteit Wiskunde en Informatica
22-5-2017
PAGE 17
Columbus – Design Pattern Mining Tool
• Reverse engineering framework
• Developed in cooperation between the Research
Group on Artificial Intelligence in Szeged, the
Software Technology Laboratory of the Nokia
Research Center and FrontEndART Ltd.
• Analyze large C/C++ projects and extract data
according to the Columbus Schema
• Supports project handling , data extraction , data
representation, data storage, filtering and
visualization
/ Faculteit Wiskunde en Informatica
22-5-2017
PAGE 18
Columbus - Design Pattern Mining Tool
• Has a C/C++ extractor plug-in that performs the
parsing of the source code
• Information collected by the plug-in corresponds to
the Columbus Schema
• Schema captures C++ language at low detail(i.e,
Abstract Syntax Tree) and has the higher –level
elements(i.e., semantics of types)
• Supports various file formats for exporting the
extracted data
/ Faculteit Wiskunde en Informatica
22-5-2017
PAGE 19
Other Pattern Mining Tools
• Other tools to be studied
• CPP2XMI
• Maisa
• CrocoPat
/ Faculteit Wiskunde en Informatica
22-5-2017
PAGE 20
Issues to be considered
• Can the tools support NXP source Code?
• Would it be possible to add proprietary patterns to
these tools?
• Can these tools be extended to support other
languages like C?
/ Faculteit Wiskunde en Informatica
22-5-2017
PAGE 21
Summary
• Overview of the Data flow models
• Introduced the design pattern mining tool Columbus
• Find the patterns present in the NXP source code
and check whether these can be mined using the
available tools
/ Faculteit Wiskunde en Informatica
22-5-2017
PAGE 22
References
• E.A.Lee and D.G.Messerschmitt, “Synchronous data
flow”,Proc. IEEE, vol. 75, pp. 1235-1245, Sept 1987.
• G.Kahn, “The semantics of a simple language for
parallel programming”, Proc.IFIP congr., Stockholm,
Sweden, Aug.1974, pp.471-475
• Gamma, E., Helm, R., Johnson, R. and Vlissides, J.
Design Patterns - Elements of Reusable ObjectOriented Software. Addison-Wesley, 1995.
/ Faculteit Wiskunde en Informatica
22-5-2017
PAGE 23
References
• R. Ferenc, A. Beszedes, M. Tarkiainen, and T.
Gyimothy. Columbus – Reverse Engineering Tool
and Schema for C++. In Proceedings of the 6th
International Conference on Software Maintenance
(ICSM 2002), pages 172–181. IEEE Computer Society,
Oct. 2002.
• R. Ferenc , and A. Beszedes. Data Exchange with the
Columbus Schema for C++. In Proceedings of the 6th
European Conference on Software Maintenance and
Reengineering (CSMR 2002), pages 59–66. IEEE
Computer Society, Mar. 2002.
/ Faculteit Wiskunde en Informatica
22-5-2017
PAGE 24
References
• Z. Balanyi, and R. Ferenc. Mining Design Patterns
from C++ Source Code. In Proceedings of the 19th
International Conference on Software Maintenance
(ICSM 2003), pages 305–314. IEEE Computer Society,
Sept. 2003.
/ Faculteit Wiskunde en Informatica
22-5-2017
PAGE 25
Questions
/ Faculteit Wiskunde en Informatica
22-5-2017
PAGE 26