Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign http://web.engr.illinois.edu/~taoxie/ [email protected] Mining Software Engineering Data MAIN GOAL Transform static recordkeeping SE data to active data Make SE data actionable by uncovering hidden patterns and trends Bugzilla Mailings Code repository CVS Execution traces 2 Mining Software Engineering Data programming defect detection testing debugging maintenance … software engineering tasks data mining techniques code bases change history program states structural entities software engineering data https://sites.google.com/site/asergrp/dmse bug reports/nl … Mining Software Engineering Data programming defect detection testing debugging maintenance … software engineering tasks data mining techniques code bases change history program states structural entities software engineering data bug reports/nl … Motivation Programmers commonly reuse APIs of existing frameworks or libraries – – – Advantages: High productivity of development Challenges: Complexity and lack of documentation Consequences: • • – Spend more efforts in understanding APIs Introduce defects in API client code Solution: Mining API properties as common patterns across API client code Frame works 5 5 Solution-Driven Problem-Driven Where can I apply X miner? Basic mining algorithms E.g., association rule mining, frequent itemset mining… Advanced mining algorithms E.g., frequent partial order mining [ESEC/FSE 07] What patterns do we really need? New/adapted mining algorithms E.g., [ICSE 09], [ASE 09] 6 Mining Searching + Mining Traditional approaches Code repositories 1 patterns mining 2 Eclipse, Linux, … Often lack sufficient relevant data points (Eg. API call sites) Our new approaches Code repositories 1 2 … Open source code on the web N searching mining patterns Code search engine e.g., 7 7 7 Agenda Motivation Mining Sequence Association Rules (CAR-Miner) [ICSE 09] Detecting Exception-Handling Defects Mining Alternative Patterns (Alattin) [ASE 09] Detecting Neglected Condition Defects Conclusion 8 Exception Handling APIs throw exceptions during runtime errors Example: Session API of Hibernate framework throws HibernateException APIs expect client applications to implement recovery actions after exceptions occur Example: Hibernate Session API expects client application to rollback open uncommitted transactions after HibernateException occurs Failure to handle exceptions results in Fatal issues, e.g., database lock won’t be released if the transaction is not rolled back 9 Problem Addressed by CAR-Miner Use exception-handling specification to detect violations as defects Problem: Often specifications are not documented Solution: Mine specifications from existing API client code Challenges: Limited data points: Only from a few code bases searching + mining Limited expressiveness: Not sufficient to characterize common exception-handling behaviors: why? 10 Example Scenario 1 1.1: ... 1.2: OracleDataSource ods = null; Session session = null; Connection conn = null; Statement statement = null; 1.3: logger.debug("Starting update"); 1.4: try { 1.5: ods = new OracleDataSource(); 1.6: ods.setURL("jdbc:oracle:thin:scott/[email protected]:1521:catfish"); 1.7: conn = ods.getConnection(); 1.8: statement = conn.createStatement(); 1.9: statement.executeUpdate("DELETE FROM table1"); 1.10: connection.commit(); } 1.11: catch (SQLException se) { 1.13: logger.error("Exception occurred"); } 1.14: finally { 1.15: if(statement != null) { statement.close(); } 1.16: if(conn != null) { conn.close(); } 1.17: if(ods != null) { ods.close(); } } 1.18: } Defect: No rollback done when SQLException occurs Requires specification such as “Connection should be rolled back when a connection is created and SQLException occurs” Q: Should every connection instance has to be rolled back when SQLException occurs? Missing “conn.rollback()” 11 Example (cont.) Scenario 1 1.1: ... 1.2: OracleDataSource ods = null; Session session = null; Connection conn = null; Statement statement = null; 1.3: logger.debug("Starting update"); 1.4: try { 1.5: ods = new OracleDataSource(); 1.6: ods.setURL("jdbc:oracle:thin:scott/[email protected]:1521:catfish"); 1.7: conn = ods.getConnection(); 1.8: c statement = conn.createStatement(); 1.9: statement.executeUpdate("DELETE FROM table1"); 1.10: connection.commit(); } 1.11: catch (SQLException se) { 1.12: if (conn != null) { conn.rollback(); } 1.13: logger.error("Exception occurred"); } 1.14: finally { 1.15: if(statement != null) { statement.close(); } 1.16: if(conn != null) { conn.close(); } 1.17: if(ods != null) { ods.close(); } } 1.18: } Scenario 2 2.1: Connection conn = null; 2.2: Statement stmt = null; 2.3: BufferedWriter bw = null; FileWriter fw = null; 2.3: try { 2.4: fw = new FileWriter("output.txt"); 2.5: bw = BufferedWriter(fw); 2.6: conn = DriverManager.getConnection("jdbc:pl:db", "ps", "ps"); 2.7: Statement stmt = conn.createStatement(); 2.8: ResultSet res = stmt.executeQuery("SELECT Path FROM Files"); 2.9: while (res.next()) { 2.10: bw.write(res.getString(1)); 2.11: } 2.12: res.close(); 2.13: } catch(IOException ex) { logger.error("IOException occurred"); 2.14: } finally { 2.15: if(stmt != null) stmt.close(); 2.16: if(conn != null) conn.close(); 2.17: if (bw != null) bw.close(); 2.18: } Specification: “Connection creation => Connection rollback” Satisfied by Scenario 1 but not by Scenario 2 But Scenario 2 has no defect 12 Example (cont.) Simple association rules of the form “FCa => FCe” are not expressive Requires more general association rules (sequence association rules) such as (FCc1 FCc2) Λ FCa => FCe1, where FCc1 -> Connection conn = OracleDataSource.getConnection() FCc2 -> Statement stmt = Connection.createStatement() FCa -> stmt.executeUpdate() FCe1 -> conn.rollback() 13 Example (cont.) Simple association rules of the form “FCa => FCe” are not expressive Requires more general association rules (sequence association rules) such as (FCc1 FCc2) Λ FCa => FCe1, where FCc1 -> Connection conn = OracleDataSource.getConnection() FCc2 -> Statement stmt = Connection.createStatement() FCa -> stmt.executeUpdate() //Triggering Action FCe1 -> conn.rollback() 14 Example (cont.) Simple association rules of the form “FCa => FCe” are not expressive Requires more general association rules (sequence association rules) such as (FCc1 FCc2) Λ FCa => FCe1, where FCc1 -> Connection conn = OracleDataSource.getConnection() FCc2 -> Statement stmt = Connection.createStatement() FCa -> stmt.executeUpdate() FCe1 -> conn.rollback() //Recovery Action 15 Example (cont.) Simple association rules of the form “FCa => FCe” are not expressive Requires more general association rules (sequence association rules) such as (FCc1 FCc2) Λ FCa => FCe1, where FCc1 -> Connection conn = OracleDataSource.getConnection() FCc2 -> Statement stmt = conn.createStatement() //Context FCa -> stmt.executeUpdate() FCe1 -> conn.rollback() 16 CAR-Miner Approach Issue queries and collect relevant code examples. Eg: “lang:java java.sql.Statement executeUpdate” Construct exceptionflow graphs Open Source Projects on web 1 2 … … N Exception-Flow Graphs Static Traces Classes and Functions Extract classes and functions reused Input Application Check whether there are any exception-related defects Collect static traces Mine static traces Violations Sequence Association Rules Detect violations 17 CAR-Miner Approach Open Source Projects on web 1 2 … … N Exception-Flow Graphs Static Traces Classes and Functions Input Application Violations Sequence Association Rules 18 Exception-Flow-Graph Construction Based on a previous algorithm [Sinha&Harrold TSE 00] : normal execution path ----: exceptional execution path Exception-Flow-Graph Construction Prevent infeasible edges using a sound static analysis [Robillard&Murphy FSE 99] 20 CAR-Miner Approach Open Source Projects on web 1 2 … … N Exception-Flow Graphs Static Traces Classes and Methods Input Application Violations Sequence Association Rules 21 Static Trace Generation Collect static traces with the actions taken when exceptions occur A static trace for Node 7: “4 -> 5 -> 6 -> 7 -> 15 -> 16 -> 17” 22 Static Trace Generation Includes 3 sections: Normal functioncall sequence (4 -> 5 -> 6) Function call (7) Exception function-call sequence (15 -> 16 -> 17) A static trace for Node 7: “4 -> 5 -> 6 -> 7 -> 15 -> 16 -> 17” 23 Trace Post-Processing Identify and remove unrelated function calls using data dependency “4 -> 5 -> 6 -> 7 -> 15 -> 16 -> 17” 4: FileWriter fw = new FileWriter(“output.txt”) 5: BufferedWriter bw = new BufferedWriter(fw) ... 7: Statement stmt = conn.createStatement() ... Filtered sequence “6 -> 7 -> 15 -> 16“ 24 CAR-Miner Approach Open Source Projects on web 1 2 … … N Exception-Flow Graphs Static Traces Classes and Methods Input Application Violations Sequence Association Rules 25 Static Trace Mining Handle traces of each function call (triggering function call) individually Input: Two sequence databases with a one-to-one mapping • • normal function-call sequences (context) exception function-call sequences (recovery) Objective: Generate sequence association rules of the form (FCc1 ... FCcn) Λ FCa => FCe1 ... FCen Context Trigger Recovery 26 Mining Problem Definition Input: Two sequence databases with a one-to-one mapping Context Recovery Objective: To get association rules of the form FC1 FC2 ... FCm -> FE1 FE2 ... FEn where {FC1, FC2, ..., Fcm} Є SDB1 and {FE1, FE2, ..., Fen} Є SDB2 Existing association rule mining algorithms cannot be directly applied on multiple sequence databases 27 Mining Problem Solution Annotate the sequences to generate a single combined database Apply frequent subsequence mining algorithm [Wang and Han, ICDE 04] to get frequent sequences Transform mined sequences into sequence association rules (3 10) Λ FCa => (2 8) Context Trigger Recovery Rank rules based on the support assigned by frequent subsequence mining algorithm 28 CAR-Miner Approach Open Source Projects on web 1 2 … … N Exception-Flow Graphs Static Traces Classes and Methods Input Application Violations Sequence Association Rules 29 Violation Detection Analyze each call site of triggering call FCa Step 1: Extract context call sequence “CC1 CC2 ... CCm” from the beginning of the function to the call site of FCa Step 2: If CC1 CC2 ... CCm is super-sequence of FCc1 ... FCcn Report any missing function calls of {FCe1 ... FCen} in any exception path API client: (CC1 CC2 ... CCm) Λ FCa => Missing any? isSuperSeqOf API Rule: (FCc1 ... FCcn) Context Λ FCa => FCe1 ... FCen Trigger Recovery 30 Evaluation Research Questions: Do the mined rules represent real rules? 2. Do the detected violations represent real defects? 3. Does CAR-Miner perform better than WNminer [Weimer and Necula, TACAS 05]? 4. Do the sequence association rules help detect new defects? 1. 31 Subjects Internal Info: classes and methods belonging to the app External Info: classes and methods used by the app Code examples: #files collected through code search engine 32 RQ1: Real Rules Do the mined rules represent real rules? Real rules: 55% (Total: 294) Usage patterns: 3% False positives: 43% 33 RQ1: Distribution of Real Rules for Axion Distribution of rules based on ranks assigned by CAR-Miner #false positives is quite low between 1 to 60 rules 34 RQ2: Detected Violations Do the detected violations represent real defects? Total number of defects: 160 New defects not found by WN-Miner approach: 87 35 RQ2: Status of Detected Violations HsqlDB developers responded on the first 10 reported defects Accepted 7 defects Rejected 3 defects Reason given by HsqlDB developers for rejected defects: “Although it can throw exceptions in general, it should not throw with HsqlDB, So it is fine” 36 RQ3: Comparison with WN-miner Does CAR-Miner performs better than WN-miner? Found 224 new rules and missed 32 rules CAR-Miner detected most of the rules mined by WN-miner Two major factors: sequence association rules Increase in the data scope 37 RQ4: New defects by sequence association rules Do the sequence association rules detect new defects? Detected 21 new real defects among all applications 38 Agenda Motivation Mining Sequence Association Rules (CAR-Miner) [ICSE 09] Detecting Exception-Handling Defects Mining Alternative Patterns (Alattin) [ASE 09] Detecting Neglected Condition Defects Conclusion 39 Large Number of False Positives Existing approaches produce a large number of false positives One major observation: Programmers often write code in different ways for achieving the same task Some ways are more frequent than others Frequent ways mine patterns Infrequent ways detect violations Mined Patterns 40 40 Example: java.util.Iterator.next() Java.util.Iterator.next() throws NoSuchElementException when invoked on a list without any elements Code Example 2 Code Sample 1 PrintEntries1(ArrayList<string> entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } Code Sample 2 PrintEntries2(ArrayList<string> entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … } 41 Example: java.util.Iterator.next() Code Sample 1 PrintEntries1(ArrayList<string> entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } Code Sample 2 PrintEntries2(ArrayList<string> entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … } Sample 2 (6/1243) Sample 1 (1218 / 1243) 1243 code examples Mined Pattern from existing approaches: “boolean check on return of Iterator.hasNext before Iterator.next” 42 Example: java.util.Iterator.next() Code Sample 1 PrintEntries1(ArrayList<string> entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } Code Sample 2 PrintEntries2(ArrayList<string> entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … } Require more general patterns (alternative patterns): P1 or P2 P1 : boolean check on return of Iterator.hasNext before Iterator.next P2 : boolean check on return of ArrayList.size before Iterator.next Cannot be mined by existing approaches, since alternative P2 is infrequent 43 Our Solution: ImMiner Algorithm Mines alternative patterns of the form P1 or P2 Based on the observation that infrequent alternatives such as P2 are frequent among code examples that do not support P1 1243 code examples Sample 2 (6/1243) Sample 1 (1218 / 1243) P2 is infrequent among entire 1243 code examples P2 is frequent among code examples not supporting P1 44 Alternative Patterns ImMiner mines three kinds of alternative patterns of the general form “P1 or P2” Balanced: all alternatives (both P1 and P2) are frequent Imbalanced: some alternatives (P1) are frequent and others are infrequent (P2). Represented as “P1 or P^2” Single: only one alternative 45 ImMiner Algorithm Uses frequent-itemset mining [Burdick et al. ICDE 01] iteratively An input database with the following APIs for Iterator.next() Input database Mapping of IDs to APIs 46 ImMiner Algorithm: Frequent Alternatives Input database Frequent itemset mining (min_sup 0.5) Frequent item: 1 P1: boolean-check on the return of Iterator.hasNext() before Iterator.next() 47 ImMiner: Infrequent Alternatives of P1 Split input database into two databases: Positive and Negative Positive database (PSD) Negative database (NSD) Mine patterns that are frequent in NSD and are infrequent in PSD Reason: Only such patterns serve as alternatives for P1 Alternative Pattern : P “const check on the return of ArrayList.size() 2 before Iterator.next()” Alattin applies ImMiner algorithm to detect neglected conditions 48 Neglected Conditions Neglected conditions refer to Missing conditions that check the arguments or receiver of the API call before the API call Missing conditions that check the return or receiver of the API call after the API call One of the primary reasons for many fatal issues security or buffer-overflow vulnerabilities [Chang et al. ISSTA 07] 49 Alattin Approach Phase 1: Issue queries and collect relevant code samples. Eg: “lang:java java.util.Iterator next” Phase 2: Generate pattern candidates Open Source Projects on web 1 2 … N Pattern Candidates … Classes and methods Extract classes and methods reused Application Under Analysis Phase 3: Mine alternative patterns Violations Detect neglected conditions Alternative Patterns Phase 4: Detect neglected conditions statically 50 Evaluation 1. 2. Research Questions: Do alternative patterns exist in real applications? How high percentage of false positives are reduced (with low or no increase of false negatives) in detected violations? 51 Subjects #Samples: #code examples collected from Google code search Two categories of subjects: 3 Java default API libraries 3 popular open source libraries 52 RQ1: Balanced and Imbalanced Patterns How high percentage of balanced and imbalanced patterns exist in real apps? Balanced patterns: 0% to 30% (average: 9.69%) Imbalanced patterns: 30% to 100% (average: 65%) for Java default API libraries 0% to 9.5% (average: 5%) for open source libraries Explanation: Java default API libraries provide more different ways of writing code compared to open source libraries 53 RQ2: False Positives and False Negatives How high % of false positives are reduced (with low or no increase of false negatives)? Applied mined patterns (“P1 or P2 or ... or Pi or A^1 or A^2 or ... or A^j ”) in three modes: Existing mode: “P1 or P2 or ... or Pi or A^1 or A^2 or ... or A^j ” P1 ,P2, ... , Pi Balanced mode: “P1 or P2 or ... or Pi or A^1 or A^2 or ... or A^j ” “P1 or P2 or ... or Pi” Imbalanced mode: “P1 or P2 or ... or Pi or A^1 or A^2 or ... or A^j ” “P1 or P2 or ... or Pi or A^1 or A^2 or ... or A^j ”54 RQ2: False Positives and False Negatives Existing Mode vs Balanced Mode Application Existing Mode Balanced Mode Defects False Positives Defects False Positives % of reduction False Negatives Java Util 37 104 37 104 0 0 Java Transaction 51 105 51 105 0 0 Java SQL 56 143 56 90 37.06 0 BCEL 2 14 2 8 42.86 0 HSqlDB 1 0 1 0 0 0 Hibernate 10 9 10 8 11.11 0 15.17 0 AVERAGE/ TOTAL Balanced mode reduced false positives by 15.17% without any increase in false negatives 55 RQ2: False Positives and False Negatives Existing Mode vs Imbalanced Mode Application Existing Mode Imbalanced Mode Defects False Positives Defects False Positives % of reduction False Negatives Java Util 37 104 36 74 28.85 1 Java Transaction 51 105 47 76 27.62 4 Java SQL 56 143 53 81 43.36 3 BCEL 2 14 2 6 57.04 0 HSqlDB 1 0 1 0 0 0 Hibernate 10 9 10 8 11.11 0 28.01 8 AVERAGE/ TOTAL Imbalanced mode reduced false positives by 28% with quite small increase in false negatives 56 Conclusion Problem-driven methodology by identifying new problems, patterns mining algorithms, defects CAR-Miner [ICSE 09]: mining sequence association rules of the form (FCc1 ... FCcn) Λ FCa => (FCe1 ... Fcen) Context Trigger Recovery reduce false negatives Alattin [ASE 09]: mining alternative patterns classified into three categories: balanced, imbalanced, and single P1 or P2 or ... or Pi or A^1 or A^2 or ... or A^j reduce false positives 57 Thank You Questions? Tao Xie University of Illinois at Urbana-Champaign http://web.engr.illinois.edu/~taoxie/ [email protected] 58