Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Integrating Artificial Intelligence in Software Testing Roni Stern and Meir Kalech, ISE department, BGU Niv Gafni, Yair Ofir and Eliav Ben-Zaken, Software Eng., BGU 1 Abstract Artificial Intelligence Di Diagnosis i Planning Software Engineering Testing 2 Testingg Planningg Diagnosis 3 Testing is Important • Quite a few software development paradigms • All of them recognize the importance of testing • There are several types of testing – E.g., unit i testing, i black-box bl k b (functional) (f i l) testing i …. Programmer • Write programs • Follows spec spec. • Goal: fix bugs! Tester • Runs tests • Follows a test plan • Goal: find bugs! 4 Handling a Bug 1. Bugs are reported by the tester - “Screen A crashed on step p 13 of test suite 4…” 2. Prioritized bugs are assigned to programmers 3. Bugs are diagnosed - What caused the bug? g 4. Bugs are fixed H f ll - Hopefully… 5 Why is Debugging Hard? • The developer needs to reproduce the bug • Reproducing (correctly) a bug is non-trivial non trivial – Bug reports may be inaccurate or missing – Software S f may h have stochastic h i elements l • Bug occurs only once in a while – Bugs may appear only in special cases Let the tester provide more information! 6 7 Prrogrammer Compute Diagnoses Plan next tests Diagnose Program mmer Run a test suite Find a bug AI Teester Run a test suite Fi d a b Find bug Tester T Introducing AI! Fix the bug Fi the Fix h b bug 8 1. 2. 3. 4. 5 5. Compute Diagnoses Pl nextt ttests Plan t Prograammerr Run a test suite Find a bug AI Testerr Test Diagnose and Plan (TDP) Test, Fix the bug The tester runs a set of planned tests (test suite) The tester finds a bug AI generates possible diagnoses If there is only one candidate – pass to programmer Else AI plans new tests to prune candidates Else, 9 Diagnosis Find the reason of a problem given observed symptoms Requirement: knowledge of the diagnosed system • Given Gi b by experts • Learned by AI techniques 10 Model Based Diagnosis Model-Based • Given: a model of how the system works ÎI f what ÎInfer h t wentt wrong • Example: IF ok(battery) AND ok(ignition) THEN start(engine) • What if the engine doesn’t start? 11 Where is MBD applied? • Printers i (Kuhn and de Kleer 2008) • Vehicles (Pernestål et. al. 2006) • Robotics (Steinbauer 2009) • Spacecraft - Livingstone (Williams and Nayak, 1996; Bernard et al., 1998) …. 12 Software is Difficult to Model • Need to code how the software should behave – Specs p are complicated p to model • Static code analysis ignores runtime – Polymorphism P l hi – Instrumentation … 13 Zoltar [Abrue [Ab et. t al.l 2011’] • Construct a model from the observations • Observations should include execution traces – Functions visited during execution – Observed Ob d outcome: b bug / no b bug • Weaker model Ok(function1) Æ function1 outputs valid value Î A bug entails that at least one comp was not Ok 14 Execution Matrix • A key component in Zoltar is the execution matrix • It is built from the observed execution traces – Observation 1 (BUG) : F1ÆF5ÆF6 – Observation 2 ((BUG)) : F2ÆF5 – Observation 3 (OK) : F2ÆF6ÆF7ÆF8 F1 F2 F3 F4 F5 F6 F7 F8 Bug Obs1 1 0 0 0 1 1 0 0 1 Obs2 0 1 0 0 1 0 0 0 1 Obs3 0 1 0 0 0 1 1 1 0 15 Diagnosis = Hitting Sets of Conflicts In every BUG trace at least one function is faulty • Observations: – Observation 1 (BUG) : F1ÆF5ÆF6 – Observation Ob i 2 (BUG) : F2ÆF5 • Conflicts: – Ok(F1) AND Ok(F5) AND Ok(F6) Hitting sets of Conflicts – Ok(F2) AND Ok(F5) • Possible diagnoses: {F5} ,{F1,F2},{F6,F2} 16 Software Diagnosis with Zoltar Prioritize P i iti Diagnoses g Bug Report Zoltar Set of Candidate Diagnoses 17 Plan More Tests Bug Report AIEngine Set of Possible Diagnoses Suggest New Test to P Prune Candidates C did 18 Plan More Tests Bug Report AIEngine A single diagnosis Suggest New Test to P Prune Candidates C did 19 Pacman Example 20 Pacman Example 21 Pacman Example 22 Pacman Example 23 Pacman Example 24 Pacman Example 25 Pacman Example 26 Execution Trace F1 Move F2 Stop F3 Eat Pill 27 Which Diagnosis is Correct? 28 Knowledge • Most probable cause: Eat Pill 29 Knowledge • Most probable cause: Eat Pill • Most easy to check: Stop 30 Objective Plan a sequence of tests to find f d the h best b diagnosis d with minimal tester actions 31 1 Highest Probability (HP) 1. • Compute the h prob. b off every ffunction Ci = Sum of p prob. of candidates containingg Ci • Plan a test to check the most probable function 0.2 0.4 F5 F1 0.2 F2 0.4 F6 04 0.4 32 1 Highest Probability (HP) 1. • Compute the h prob. b off every ffunction Ci = Sum of p prob. of candidates containingg Ci • Plan a test to check the most probable function 0.2 F1 0.6 0.4 F5 04 0.4 F2 F6 33 2 Lowest Cost (LC) 2. • Consider d only l ffunctions Ci with h 0<P(C ( i)<1 ) • Test the closest function from this set F1 F5 F2 F6 34 3 Entropy-based 3. • A test α = a set of components – Ω+ = set of candidates that assume α will pass – Ω- = set of candidates that assume α will fail – Ω? = set of candidates that are indifferent to α • Prob. α will pass = prob. of candidates in Ω+ • Choose test with lowest Entropy(Ω py( + ,,Ω- ,,Ω?) = highest information gain 0.2 0.4 F5 F1 0.2 F2 0.4 F6 04 0.4 35 4 Planning Under Uncertainty 4. • Outcome of test is unknown Î we would like to minimize expected cost • Formulate as Markov Decision Process (MDP) – – – – States: set of performed tests and their outcomes Actions: Possible tests Transition: Pr(Ω+), ) Pr(Ω-) and Pr(Ω?) Reward (cost): Cost of performed tests • Current solver uses myopic sampling 36 Preliminary Experimental Results • Synthetic code • Setup – Generated random call graphs with 300 nodes – Generate G code d ffrom the h random d callll graphs h – Injected 10/20 random bugs – Generate 15 random initial tests • Run until a diagnosis with prob prob.>0.9 >0 9 is found – Results on 100 instances 37 Preliminary Experimental Results • • • • HP and Entropy perform worse than (LC) and MDP probability based (HP and Entropy) perform worse than the cost based approach. MDP which considers cost and probability outperforms the others 20 bugs is more costly for all algorithms than 10 bugs 38 Preliminary Experimental Results - NUnits • Real code • Well Well-known known testing framework for C# • Setup – Generated call graphs with 302 nodes from NUnits – Injected j 10,15,20,25,30 random bugs g – Generate 15 random initial tests • Run until a diagnosis with prob prob.>0.9 >0 9 is found – Results on 110 instances 39 Preliminary Experimental Results - NUnits • Similar results • The cost increases with the p probabilityy bound • MDP which considers cost and probability outperforms the others 40 Conclusion • The test, diagnose and plan (TDP) paradigm: – AI diagnoses g observed bugg with MBD algorithm g – AI plans further tests to find correct diagnosis • Empowers E ttester t using i AI – Bug diagnosis done by tester+AI 41 SCM Server Logs Source Code AI Engine i QA Tester Developer 42