Download Integrating Artificial Intelligence in Software Testing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Integrating Artificial Intelligence
in
Software Testing
Roni Stern and Meir Kalech, ISE department, BGU
Niv Gafni, Yair Ofir and Eliav Ben-Zaken, Software Eng., BGU
1
Abstract
Artificial Intelligence
Di
Diagnosis
i
Planning
Software Engineering
Testing
2
Testingg
Planningg
Diagnosis
3
Testing is Important
• Quite a few software development paradigms
• All of them recognize the importance of testing
• There are several types of testing
– E.g., unit
i testing,
i black-box
bl k b (functional)
(f
i
l) testing
i ….
Programmer
• Write programs
• Follows spec
spec.
• Goal: fix bugs!
Tester
• Runs tests
• Follows a test plan
• Goal: find bugs!
4
Handling a Bug
1. Bugs are reported by the tester
- “Screen A crashed on step
p 13 of test suite 4…”
2. Prioritized bugs are assigned to programmers
3. Bugs are diagnosed
- What caused the bug?
g
4. Bugs are fixed
H
f ll
- Hopefully…
5
Why is Debugging Hard?
• The developer needs to reproduce the bug
• Reproducing (correctly) a bug is non-trivial
non trivial
– Bug reports may be inaccurate or missing
– Software
S f
may h
have stochastic
h i elements
l
• Bug occurs only once in a while
– Bugs may appear only in special cases
Let the tester provide more information!
6
7
Prrogrammer
Compute
Diagnoses
Plan next tests
Diagnose
Program
mmer
Run a test suite
Find a bug
AI
Teester
Run a test suite
Fi d a b
Find
bug
Tester
T
Introducing AI!
Fix the bug
Fi the
Fix
h b
bug
8
1.
2.
3.
4.
5
5.
Compute
Diagnoses
Pl nextt ttests
Plan
t
Prograammerr
Run a test suite
Find a bug
AI
Testerr
Test Diagnose and Plan (TDP)
Test,
Fix the bug
The tester runs a set of planned tests (test suite)
The tester finds a bug
AI generates possible diagnoses
If there is only one candidate – pass to programmer
Else AI plans new tests to prune candidates
Else,
9
Diagnosis
Find the reason of a problem
given observed symptoms
Requirement: knowledge of the diagnosed system
• Given
Gi
b
by experts
• Learned by AI techniques
10
Model Based Diagnosis
Model-Based
• Given: a model of how the system works
ÎI f what
ÎInfer
h t wentt wrong
• Example:
IF ok(battery) AND ok(ignition)
THEN start(engine)
• What if the engine doesn’t start?
11
Where is MBD applied?
• Printers
i
(Kuhn and de Kleer 2008)
• Vehicles
(Pernestål et. al. 2006)
• Robotics
(Steinbauer 2009)
• Spacecraft - Livingstone
(Williams and Nayak, 1996; Bernard et al., 1998)
….
12
Software is Difficult to Model
• Need to code how the software should behave
– Specs
p
are complicated
p
to model
• Static code analysis ignores runtime
– Polymorphism
P l
hi
– Instrumentation
…
13
Zoltar [Abrue
[Ab
et.
t al.l 2011’]
• Construct a model from the observations
• Observations should include execution traces
– Functions visited during execution
– Observed
Ob
d outcome: b
bug / no b
bug
• Weaker model
Ok(function1) Æ function1 outputs valid value
Î A bug entails that at least one comp was not Ok
14
Execution Matrix
• A key component in Zoltar is the execution matrix
• It is built from the observed execution traces
– Observation 1 (BUG) : F1ÆF5ÆF6
– Observation 2 ((BUG)) : F2ÆF5
– Observation 3 (OK) : F2ÆF6ÆF7ÆF8
F1
F2
F3
F4
F5
F6
F7
F8
Bug
Obs1
1
0
0
0
1
1
0
0
1
Obs2
0
1
0
0
1
0
0
0
1
Obs3
0
1
0
0
0
1
1
1
0
15
Diagnosis = Hitting Sets of Conflicts
In every BUG trace at least one function is faulty
• Observations:
– Observation 1 (BUG) : F1ÆF5ÆF6
– Observation
Ob
i 2 (BUG) : F2ÆF5
• Conflicts:
– Ok(F1) AND Ok(F5) AND Ok(F6)
Hitting sets of Conflicts
– Ok(F2) AND Ok(F5)
• Possible diagnoses: {F5} ,{F1,F2},{F6,F2}
16
Software Diagnosis with Zoltar
Prioritize
P
i iti
Diagnoses
g
Bug Report
Zoltar
Set of
Candidate
Diagnoses
17
Plan More Tests
Bug Report
AIEngine
Set of
Possible
Diagnoses
Suggest New Test to
P
Prune
Candidates
C did
18
Plan More Tests
Bug Report
AIEngine
A single diagnosis
Suggest New Test to
P
Prune
Candidates
C did
19
Pacman Example
20
Pacman Example
21
Pacman Example
22
Pacman Example
23
Pacman Example
24
Pacman Example
25
Pacman Example
26
Execution Trace
F1
Move
F2
Stop
F3
Eat Pill
27
Which Diagnosis is Correct?
28
Knowledge
• Most probable cause: Eat Pill
29
Knowledge
• Most probable cause: Eat Pill
• Most easy to check: Stop
30
Objective
Plan a sequence of tests
to find
f d the
h best
b diagnosis
d
with minimal tester actions
31
1 Highest Probability (HP)
1.
• Compute the
h prob.
b off every ffunction Ci
= Sum of p
prob. of candidates containingg Ci
• Plan a test to check the most probable function
0.2
0.4
F5
F1
0.2
F2
0.4
F6
04
0.4
32
1 Highest Probability (HP)
1.
• Compute the
h prob.
b off every ffunction Ci
= Sum of p
prob. of candidates containingg Ci
• Plan a test to check the most probable function
0.2
F1
0.6
0.4
F5
04
0.4
F2
F6
33
2 Lowest Cost (LC)
2.
• Consider
d only
l ffunctions Ci with
h 0<P(C
( i)<1
)
• Test the closest function from this set
F1
F5
F2
F6
34
3 Entropy-based
3.
• A test α = a set of components
– Ω+ = set of candidates that assume α will pass
– Ω- = set of candidates that assume α will fail
– Ω? = set of candidates that are indifferent to α
• Prob. α will pass = prob. of candidates in Ω+
• Choose test with lowest Entropy(Ω
py( + ,,Ω- ,,Ω?)
= highest information gain
0.2
0.4
F5
F1
0.2
F2
0.4
F6
04
0.4
35
4 Planning Under Uncertainty
4.
• Outcome of test is unknown
Î we would like to minimize expected cost
• Formulate as Markov Decision Process (MDP)
–
–
–
–
States: set of performed tests and their outcomes
Actions: Possible tests
Transition: Pr(Ω+),
) Pr(Ω-) and Pr(Ω?)
Reward (cost): Cost of performed tests
• Current solver uses myopic sampling
36
Preliminary Experimental Results
• Synthetic code
• Setup
– Generated random call graphs with 300 nodes
– Generate
G
code
d ffrom the
h random
d
callll graphs
h
– Injected 10/20 random bugs
– Generate 15 random initial tests
• Run until a diagnosis with prob
prob.>0.9
>0 9 is found
– Results on 100 instances
37
Preliminary Experimental Results
•
•
•
•
HP and Entropy perform worse than (LC) and MDP
probability based (HP and Entropy) perform worse than the cost based approach.
MDP which considers cost and probability outperforms the others
20 bugs is more costly for all algorithms than 10 bugs
38
Preliminary Experimental Results - NUnits
• Real code
• Well
Well-known
known testing framework for C#
• Setup
– Generated call graphs with 302 nodes from NUnits
– Injected
j
10,15,20,25,30 random bugs
g
– Generate 15 random initial tests
• Run until a diagnosis with prob
prob.>0.9
>0 9 is found
– Results on 110 instances
39
Preliminary Experimental Results - NUnits
• Similar results
• The cost increases with the p
probabilityy bound
• MDP which considers cost and probability outperforms the others
40
Conclusion
• The test, diagnose and plan (TDP) paradigm:
– AI diagnoses
g
observed bugg with MBD algorithm
g
– AI plans further tests to find correct diagnosis
• Empowers
E
ttester
t using
i AI
– Bug diagnosis done by tester+AI
41
SCM
Server Logs
Source Code
AI Engine
i
QA Tester
Developer
42