Download Using Visual Symptoms for Debugging Presentation Failures

Document related concepts
no text concepts found
Transcript
Using Visual Symptoms for Debugging
Presentation Failures in Web Applications
Sonal Mahajan, Bailan Li, Pooyan Behnamghader, William G. J. Halfond
University of Southern California
Los Angeles, California, USA
Work supported by NSF Grant CCF-1528163
Background Information
• What do we mean by presentation?
– “Look and feel” of the website in a browser
• What is a presentation failure?
– Web page rendering ≠ expected appearance
• Importance of presentation
– Aesthetics impact users’ evaluation [Tractinsky et. al. 2006]
– Impacts trustworthiness and usability [Lindgaard et. al. 2011]
2
Usage Scenario
Regression Debugging – modify current version of the
web page to correct a bug or refactor the HTML structure.
<table>
Menu | Contact
News
----------------------------
Username
Password
<div>
<tr> <td>
<div>
<tr>
<td>
<div>
<div>
<td>
<div>
Sign in
About us | Feedback| FAQ
Web page
<tr> <td>
<div>
Div-based layout
Table-based layout
3
Usage Scenario – Difficulties
Menu | Contact
News
----------------------------
Menu | Contact
Username
Password
Problem1
Sign in
News
----------------------------
Username
Password
Sign in
About us | Feedback| FAQ
Oracle (Previous version)
Problem2
About us | Feedback| FAQ
Test web page
Developer
4
Usage Scenario – Difficulties
Menu | Contact
News
----------------------------
Analyze the observed
differences
Username
Password
Sign in
About us | Feedback| FAQ
Oracle (Previous version)
Explore the UI to
find the fault
Background color
of “Sign in” button
Menu | Contact
News
----------------------------
Username
Password
Sign in
About us | Feedback| FAQ
Test web page
Developer
5
Usage Scenario – Difficulties
Menu | Contact
Analyze the observed
difference
Menu | Contact
Manual debugging is difficult
Username interaction between HTML,
News
News
1. Complex
CSS, Username
and JS
------------------Explore the UI to
Password
Password
2.
Hundreds
of
HTML
elements
+
CSS
properties
------------------find the fault
Sign in
Sign in
------------------3. Makes labor intensive and error prone
About us | Feedback| FAQ
Background color
About us | Feedback| FAQ
of “Sign in”
[Mahajan
et.button
al. ICST 2015]
Test web page
Prior user study
Oracle (Previous version)
• Correct fault identified in only 36% test cases!
Developer
6
Limitations of Existing Techniques
• DOM comparison techniques (e.g., XBT)
– Not effective if DOM has changed significantly
• Invariant specification techniques (e.g., Selenium)
– Not practical, since all correctness properties need to be
provided
• Fighting layout bugs
– Checks app independent problems only
Our approach – Automate debugging of presentation
failures
7
Three Key Insights
1. Visual differences can help diagnosis
Visual symptoms
Sign in
Oracle
Color related presentation failure
Sign in
Test page
8
Definition
Visual symptom – boolean predicate describing
the visual difference
clues to the fault
Visual Symptoms
1.
Almost matched element
2.
Shift bottom element
3.
Page size changed
4.
Added
color
2. Shift bottom
element
.
.
– moved downwards
.
.
.
.
– e.g.: margin-top
23.
–All
analyze
diff. pixels
diff. pixels in top of the element
CSS Properties
1. Almost
matched element
– only position
changed
margin-top,
padding-top,
etc.
– e.g.: margin-top
margin-top,
margin-left, etc.
– sub-image searching
height, width, padding, etc.
background-color, color, etc.
.
.
.
padding-top, border-top-width, etc.
9
Three Key Insights
2. Probabilistic correlations can help
identify faults
<button, background-color>
Color Symptom
Size Symptom
T
F
F
F
0.0
1.0
F
T
0.2
T
F
0.95
T
T
0.7
✔
0.8
0.05
0.3
10
Three Key Insights
• Building probabilistic model
– Approaches
✗
✗
• Pool of known presentation failures
– Differences depend on page layout. Not generalizable.
• Historical data for the page
– Available only for mature pages
– Manual extraction from bug-tracking system
3. Probabilistic models can be
automatically generated from the faulty
test page
✔
11
Our Approach
Input
1. Test page
2. Oracle image (previous version screenshot, mockup etc.)
PhasesThe goal is to automatically identify the fault of a
presentation
failure
observed in a test page.
1. Detect
presentation
failures
2. Build the probabilistic model
3. Identify the most likely faults
Output
1. Ranked list of likely faults
12
P1. Detect Presentation Failures
Use WebSee [Mahajan et. al. ICST 2015]
Oracle image
Presentation
failures
Visual comparison
Computer vision technique,
Perceptual Image Differencing (PID)
Test web page
13
P2. Build the Probabilistic Model
Model based on conditional probability
Set of visual symptoms
e = HTML element
Fault = Root cause <e, p>
p = CSS property
Probability that a potential root cause, r, is faulty given the
observed set of visual symptoms, S.
14
P2. Build the Probabilistic Model
1. Generate data samples
– Inject faults into the test page
•
Assign different values to potential root causes
– Observe visual symptoms
– Build truth table
15
Truth Table – Example
Root Causes
<p, color>
Injected
values
blue
<div, margin-top> 0px, 50px
Visual Symptoms
Added
color
Almost
matched
element
T
F
F
F
F
F, F
T, T
T, F
F, T
F, T
✗
Data samples
Shift
Shift top
bottom
element
element
<div, margin-top = 0px>
Page
size
changed
✔
<div, margin-top = 50px>
16
P2. Build the Probabilistic Model
1. Generate data samples
– Inject faults into the test page
•
Assign different values to potential root causes
– Observe true visual symptoms
– Build truth table
2. Calculate probabilities
– Individual symptoms and conditional probability
– Learn correlation between the root cause and
visual symptoms
17
Probabilities Calculation
Conditional probability
Bayes’ theorem
r = root cause
S = set of visual symptoms
18
Probabilities Calculation
P(S | r)P(r)
P(r | S) =
P(S)
r = root cause
S = set of visual symptoms
P(S|r) = Probability of the status of visual
symptoms S given r is the faulty root cause
• Assumes visual symptoms
are conditionally
independent given the root
cause
• Advantages
– Easier to calculate
– Parallelizable
19
Probabilities Calculation
P(S | r)P(r)
P(r | S) =
P(S)
r = root cause
S = set of visual symptoms
P(S|r) = Probability of the status of visual
symptoms S given r is the faulty root cause
• Measure P(s|r) in data samples
• Observe visual symptoms for a
seeded root cause
20
Conditional Probability Table – Example
Root Causes
<p, color>
Injected
values
blue
<div, margin-top> 0px, 50px
Visual Symptoms
Added
color
Almost
matched
element
Shift
Shift top
bottom
element
element
T
F
F
F
F
F, F
T, T
T, F
F, T
F, T
21
Page
size
changed
Conditional Probability Table – Example
Root Causes
<p, color>
Injected
values
blue
<div, margin-top> 0px, 50px
Visual Symptoms
Added
color
Almost
matched
element
Shift
Shift top
bottom
element
element
T
F
F
F
F
F, F
T, T
T, F
F, T
F, T
22
Page
size
changed
Conditional Probability Table – Example
Root Causes
<p, color>
Injected
values
blue
<div, margin-top> 0px, 50px
Visual Symptoms
Added
color
Almost
matched
element
Shift
Shift top
bottom
element
element
1.0
0.0
0.0
0.0
0.0
0.0
1.0
0.5
0.5
0.5
23
Page
size
changed
Probabilities Calculation
P(S | r)P(r)
P(r | S) =
P(S)
r = root cause
S = set of visual symptoms
P(r) = Relative probability of r being the
faulty root cause
• Assume developers cause
faults with uniform probability
r = <e, p>
e = HTML element
p = CSS property
24
Probabilities Calculation
P(S | r)P(r)
P(r | S) =
P(S)
P(r) = Relative probability of r being the
faulty root cause
r = root cause
S = set of visual symptoms
r = <e, p>
e = HTML element
p = CSS property
25
P(p) Computation – Example
Total 2 properties in the page
color, margin-top
26
Probabilities Calculation
P(S | r)P(r)
P(r | S) =
P(S)
r = root cause
S = set of visual symptoms
P(S) = Probability of symptoms in S being
T/F for a given page
• P(S) is independent of r
• Values of s Î S are given
27
Probabilities Calculation
P(S | r)P(r)
P(r | S) =
P(S)
r = root cause
S = set of visual symptoms
P(e), P(S) = Constants
é
ù
êÕ P(s | r)ú P(e)P( p)
ë
û
P(r | S) = sÎS
P(S)
28
P3. Identify Most Likely Root Causes
• for r
R = {<e1, p1>, …, <en, pn>}
1. calculate P(p) for r = <e, p>
2. determine visual symptoms, S
3. for s S
look up P(s|r) in the model
4. calculate
• Rank root causes by their probabilities
29
Empirical Evaluation
• RQ1: How accurate is our approach in
identifying root causes of presentation
failures?
• RQ2: What are the computational
resources needed to run our approach?
30
Implementation
• Approach implemented in FieryEye (火眼)
• Building the probabilistic model
– Parallelized over 200 Amazon EC2 c4.large
instances
• Identifying visual symptoms
– Used OpenCV to compare screenshots,
extract color information, perform sub-image
searching, etc.
31
Experiment Protocol
Refactoring of web pages
Regression
Debugging
activity
1. Migrate HTML 4 to 5 (<div id=‘head’> to <header>)
2. Convert table-based layout to div based
3. Replace deprecated tags (<font> to CSS font)
For each subject
Generate
test cases
Performance
comparison
–
–
–
–
Download page (H), take screenshot = oracle
Refactor H to get H’
Seed presentation failure in H’ to create a variant
Run FieryEye on oracle and variant
WebSee, XPERT, Text Diff Tool (TDT) – diff
32
Subjects
Random URL generator (http://www.uroulette.com)
Subject
Size (Total RC)
Generated # test
cases
Perl
1,592
36
GTK
1,121
30
Konqueror
6,779
39
Amulet
88
22
UCF
2,415
47
33
RQ1: Accuracy
Ranking of the correct root cause in the result set
(Effort required to find the correct root cause)
• Other techniques do not rank root causes
• Adapted other techniques to report rank
Quantify a range in the way developers may
use the results
Ranking U = Upper bound on effort
Ranking L = Lower bound on effort
34
RQ1: Accuracy Results
7.9
FieryEye
FieryEye rank = 7.9
WebSee rank-L = 10.2
100
38.9
65.6
WebSee-U
10.2
WebSee-L
65.6
FieryEye recall = 100%
Avg. Median. Rank
WebSee recall = 65.6%
244.3
XPERT-U
57.6
Recall (%)
29.7
57.6
XPERT-L
549.1
TDT-U
100
64.3
TDT-L
100
0
100
200
300
400
500
35
600
RQ1: Accuracy Results
80
Cumulative Frequency (%)
70
In Y% cases, correct
root cause ranked in
the top X
(X, Y)
60
FieryEye
50
FieryEye:WebSee-U
45% cases
WebSee-L
40
XPERT-U
30
20
10
Correct root
cause in top 5
XPERT-L
WebSee: 5% (U),
10% (L) cases
TDT-U
TDT-L
XPERT, TDT: 1% (U and L)
0
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49
Ranking
36
RQ2: Computational Resources
Prediction, 17 sec
25
20
Time (sec)
FieryEye
Model
building, 3 min
15
10
Fast but
imprecise
5
0
FieryEye WebSee XPERT
TDT
200 Amazon EC2 instances
1 c4.large = $0.11 per hour
Cost = 200 * $0.0018/min * 3
Model building cost = $1
37
Summary
• Technique for finding root cause of
presentation failure
– Image processing to find visual symptoms
– Probabilistic models to predict root causes
• Empirical evaluation shows positive results
– Avg. median correct root cause rank = 7.9
– Prediction time = 17 sec
– Model building cost = $1
38
Thank you
Using Visual Symptoms for Debugging
Presentation Failures in Web Applications
Sonal Mahajan, Bailan Li, Pooyan Behnamghader, William G. J. Halfond
[email protected]
[email protected]
[email protected]
[email protected]
Work supported by NSF Grant CCF-1528163
Ranking U and L
• Techniques report HTML elements
– Add defined CSS properties
Set of root causes
• for e Î reported faulty HTML elements
{
if (e == incorrect faulty element)
{
rankingU = rankingU + e.getProps()
rankingL = rankingL + 1
}
else
{
rankingU = rankingU + e.getProps() / 2
rankingL = rankingL + e.getProps() / 2
}
}
WebSee: ranked list of
HTML elements
XPERT, TDT: unsorted
rankingU = rankingU / 2
rankingL = rankingL / 2
40
Definitions
Root cause – tuple <e, p>,
where e = HTML element and p = CSS property
Oracle image
Test web page
<div, margin-top>
41
Definitions
Visual symptom – boolean predicate describing
the visual difference
clues to the root cause
1. Almost matched element
– only position changed
– e.g.: margin-top
– sub-image searching
Oracle image
Test web page
2. Shift bottom element
– moved downwards
– e.g.: margin-top
– analyze diff. pixels
42
Full Running Example
Running example
Menu | Contact
Menu | Contact
News
----------------------------
Username
Password
Sign in
News
----------------------------
Username
Password
Sign in
About us | Feedback| FAQ
About us | Feedback| FAQ
Oracle (Previous version)
Test web page
44
P2. Generate Data Samples – Example
2 HTML elements in page
<p>
color
<div>
margin-top
Test web page
2 Potential Root Causes
<p, color>
<div, margin-top>
45
P2. Generate Data Samples – Example
2 Potential Root Causes
<p, color>
<div, margin-top>
inject
3 Data Samples
<p, color>
blue
<div, margin-top>
0px, 50px
Data sample 1: <p, color = blue>
1 Visual Symptom
- Added color
46
P2. Generate Data Samples – Example
2 Potential Root Causes
<p, color>
<div, margin-top>
inject
3 Data Samples
<p, color>
blue
<div, margin-top>
0px, 50px
Data sample 2: <div, margin-top = 0px>
2 Visual Symptoms
- Almost matched element
- Shift top element
47
P2. Generate Data Samples – Example
2 Potential Root Causes
<p, color>
<div, margin-top>
inject
3 Data Samples
<p, color>
blue
<div, margin-top>
0px, 50px
Data sample 3: <div, margin-top = 50px>
3 Visual Symptoms
- Almost matched element
- Shift bottom element
- Page size changed
48
Truth Table – Example
Root Causes
<p, color>
Injected
values
blue
<div, margin-top> 0px, 50px
Visual Symptoms
Added
color
Almost
matched
element
Shift
Shift top
bottom
element
element
T
F
F
F
F
F, F
T, T
T, F
F, T
F, T
Data samples
49
Page
size
changed
Conditional Probability Table – Example
Root Causes
<p, color>
Injected
values
blue
<div, margin-top> 0px, 50px
Visual Symptoms
Added
color
Almost
matched
element
Shift
Shift top
bottom
element
element
T
F
F
F
F
F, F
T, T
T, F
F, T
F, T
50
Page
size
changed
Conditional Probability Table – Example
Root Causes
<p, color>
Injected
values
blue
<div, margin-top> 0px, 50px
Visual Symptoms
Added
color
Almost
matched
element
Shift
Shift top
bottom
element
element
T
F
F
F
F
F, F
T, T
T, F
F, T
F, T
51
Page
size
changed
Conditional Probability Table – Example
Root Causes
<p, color>
Injected
values
blue
<div, margin-top> 0px, 50px
Visual Symptoms
Added
color
Almost
matched
element
Shift
Shift top
bottom
element
element
1.0
0.0
0.0
0.0
0.0
0.0
1.0
0.5
0.5
0.5
52
Page
size
changed
P(p) Computation – Example
Total 2 properties in the page
color, margin-top
53
P3. – Example
• All root causes, R = {<p, color>, <div, margin-top>}
• r = <p, color>
1. P(p) = 0.5
2. S = {almost element matched, shift bottom element}
3. s1 = almost element matched, P(s1|r) = 0.0
s2 = shift bottom element,
= 0.0
P(s2|r) = 0.0
4.
54
Visual Symptoms (S) – Example
1. Almost matched element
– only position changed
– e.g.: margin-top
– sub-image searching
Oracle image
Test web page
2. Shift bottom element
– moved downwards
– e.g.: margin-top
– analyze diff. pixels
55
P3. – Example
• All root causes, R = {<p, color>, <div, margin-top>}
• r = <p, color>
1. P(p) = 0.5
2. S = {almost element matched, shift bottom element}
3. s1 = almost element matched, P(s1|r) = 0.0
s2 = shift bottom element,
= 0.0
4.
P(s2|r) = 0.0
= P(<p, color> | S)
= 0.0 * 0.5
= 0.0
56
P3. – Example
• All root causes, R = {<p, color>, <div, margin-top>}
• r = <div, margin-top>
1. P(p) = 0.5
2. S = {almost element matched, shift bottom element}
3. s1 = almost element matched, P(s1|r) = 1.0
s2 = shift bottom element,
= 1.0 * 0.5 = 0.5
4.
P(s2|r) = 0.5
= P(<div, margin-top> | S)
= 0.5 * 0.5
= 0.25
57
P3. – Example
• Rank root causes by their probabilities
✔1.
P(<div, margin-top>) = 0.25
2. P(<p, color>)
= 0.0
58