Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Code and Pattern Mining in C/C++ Aditya S. Deshpande Namratha Nayak Guides: Dr. A.Serebrenik(TU/e) P.Kourzanov, ir(NXP) Y.Dajsuren, PDEng(Virage Logic) Agenda • • • • • Introduction Problem Definition Data flow Design patterns Summary / Faculteit Wiskunde en Informatica 22-5-2017 PAGE 1 Introduction • Code mining – Process of extracting patterns from source code. • Design Patterns – A design pattern is a general reusable solution to a commonly occurring problem in software design. • Streaming Data - Data streaming is the transfer of data at a steady high-speed rate sufficient to support such applications as high-definition television or a radio signal. / Faculteit Wiskunde en Informatica 22-5-2017 PAGE 2 Problem Definition • Lack of synchronisation between models and source code. • Significant amount of repetitive code in different modules. • Identifying patterns and integrating them in the framework. • Objective / Faculteit Wiskunde en Informatica 22-5-2017 PAGE 3 Approach • Study the Design flow models available. • Study the various design pattern matching methods and tools. / Faculteit Wiskunde en Informatica 22-5-2017 PAGE 4 Data flow models • Kahn Process Networks. • Synchronous Data Flow. / Faculteit Wiskunde en Informatica 22-5-2017 PAGE 5 Kahn Process Network - Introduction • Processes communicate via FIFO. • Parallel communication is organized as follows • Autonomous computing stations are connected to each other in a network by communication lines. • A station computes on data coming on its input lines to produce output on some or all of its output lines. • Assumptions • Communication lines are the only means of communication. • Communication lines transmit info within a finite time. / Faculteit Wiskunde en Informatica 22-5-2017 PAGE 6 Kahn Process Network - Introduction • Restrictions • At any given time a computing station is either computing or waiting for information on one of its input lines. • Each computing station follows a sequential program. / Faculteit Wiskunde en Informatica 22-5-2017 PAGE 7 Kahn Process Network - Example • From Kahn’s original 1974 paper process f(in int u, in int v, out int w) { u int i; bool b = true; for (;;) { f w i = b ? wait(u) : wait(v); v printf("%i\n", i); send(i, w); Process alternately reads from u b = !b; and v, prints the data value, and writes it to w. } } / Faculteit Wiskunde en Informatica 22-5-2017 PAGE 8 Kahn Process Network - Example • From Kahn’s original 1974 paper Process interface includes FIFO’s. process f(in int u, in int v, out int w) { int i; bool b = true; wait() returns the next for (;;) { token in an input FIFO, blocking if it’s empty i = b ? wait(u) : wait(v); printf("%i\n", i); send(i, w); send() writes a data value b = !b; on an output FIFO } } / Faculteit Wiskunde en Informatica 22-5-2017 PAGE 9 SDF - Introduction • Synchronous data flow graph (SDF) is a network of synchronous nodes (also called blocks). • For a synchronous node, the consumptions and productions are known a priori. • Homogeneous SDF / Faculteit Wiskunde en Informatica 22-5-2017 PAGE 10 SDF - Delay • Delay of signal processing • Unit delay on arc between A and B, means A d B • nth sample consumed by B, is (n-1)th sample produced by A. • The arc is initialized with ‘d’ zero samples. / Faculteit Wiskunde en Informatica 22-5-2017 PAGE 11 SDF - Implementation • Implementation requires: • Buffering of the data samples passing between nodes • Schedule nodes when inputs are available • Dynamic implementation (= runtime) requires • Runtime scheduler checks when inputs are available and schedules nodes when a processor is free. / Faculteit Wiskunde en Informatica 22-5-2017 PAGE 12 SDF - Implementation • Contribution of Lee-87: • SDF graphs can be scheduled at compile time • No overhead • Compiler will: • Determine the execution order of the nodes on one or multiple processors or data path units • Determine communication buffers between nodes. / Faculteit Wiskunde en Informatica 22-5-2017 PAGE 13 Design Patterns • Describe solutions for common recurring problems • Can be used in a wider context as they are defined informally • Documenting them in a software system simplifies maintenance and program understanding • Usually it is not documented, so there is a need to discover design patterns from source code / Faculteit Wiskunde en Informatica 22-5-2017 PAGE 14 Pattern Mining • Structure of design pattern is searched in the source code. • Should include the main properties of the design pattern • Flexible to describe the slightly distorted pattern occurrences. • Helps to understand the relationships between the different parts of a large system / Faculteit Wiskunde en Informatica 22-5-2017 PAGE 15 Pattern Mining • Reverse Engineering • Analysis of a system to − Identify the components and their interrelationships − Create representations of the system in another form • Why tools for Reverse Engineering? • Existing legacy code • High number of participants in code development • Tools developed to mine the patterns from the source code / Faculteit Wiskunde en Informatica 22-5-2017 PAGE 16 Pattern Mining Tools • Aspects in the different mining tools • Programming Language : Tools for Java and C++ • Method used to discover design patterns : Graph Matching , Constraint Satisfaction Problem, pattern inference • Intermediate Representation – Abstract Semantic Graph, Abstract Syntax Tree, Matrix and Vector / Faculteit Wiskunde en Informatica 22-5-2017 PAGE 17 Columbus – Design Pattern Mining Tool • Reverse engineering framework • Developed in cooperation between the Research Group on Artificial Intelligence in Szeged, the Software Technology Laboratory of the Nokia Research Center and FrontEndART Ltd. • Analyze large C/C++ projects and extract data according to the Columbus Schema • Supports project handling , data extraction , data representation, data storage, filtering and visualization / Faculteit Wiskunde en Informatica 22-5-2017 PAGE 18 Columbus - Design Pattern Mining Tool • Has a C/C++ extractor plug-in that performs the parsing of the source code • Information collected by the plug-in corresponds to the Columbus Schema • Schema captures C++ language at low detail(i.e, Abstract Syntax Tree) and has the higher –level elements(i.e., semantics of types) • Supports various file formats for exporting the extracted data / Faculteit Wiskunde en Informatica 22-5-2017 PAGE 19 Other Pattern Mining Tools • Other tools to be studied • CPP2XMI • Maisa • CrocoPat / Faculteit Wiskunde en Informatica 22-5-2017 PAGE 20 Issues to be considered • Can the tools support NXP source Code? • Would it be possible to add proprietary patterns to these tools? • Can these tools be extended to support other languages like C? / Faculteit Wiskunde en Informatica 22-5-2017 PAGE 21 Summary • Overview of the Data flow models • Introduced the design pattern mining tool Columbus • Find the patterns present in the NXP source code and check whether these can be mined using the available tools / Faculteit Wiskunde en Informatica 22-5-2017 PAGE 22 References • E.A.Lee and D.G.Messerschmitt, “Synchronous data flow”,Proc. IEEE, vol. 75, pp. 1235-1245, Sept 1987. • G.Kahn, “The semantics of a simple language for parallel programming”, Proc.IFIP congr., Stockholm, Sweden, Aug.1974, pp.471-475 • Gamma, E., Helm, R., Johnson, R. and Vlissides, J. Design Patterns - Elements of Reusable ObjectOriented Software. Addison-Wesley, 1995. / Faculteit Wiskunde en Informatica 22-5-2017 PAGE 23 References • R. Ferenc, A. Beszedes, M. Tarkiainen, and T. Gyimothy. Columbus – Reverse Engineering Tool and Schema for C++. In Proceedings of the 6th International Conference on Software Maintenance (ICSM 2002), pages 172–181. IEEE Computer Society, Oct. 2002. • R. Ferenc , and A. Beszedes. Data Exchange with the Columbus Schema for C++. In Proceedings of the 6th European Conference on Software Maintenance and Reengineering (CSMR 2002), pages 59–66. IEEE Computer Society, Mar. 2002. / Faculteit Wiskunde en Informatica 22-5-2017 PAGE 24 References • Z. Balanyi, and R. Ferenc. Mining Design Patterns from C++ Source Code. In Proceedings of the 19th International Conference on Software Maintenance (ICSM 2003), pages 305–314. IEEE Computer Society, Sept. 2003. / Faculteit Wiskunde en Informatica 22-5-2017 PAGE 25 Questions / Faculteit Wiskunde en Informatica 22-5-2017 PAGE 26