Download Notes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Analysis of variance wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
VLSI Systems Design—Experiments
Necessary steps:
• Explore the problem space
• Design experiment(s)
• Carry out experiment(s)
• Analyze results
software packages: R, Matlab, …
• Report results
Example: design a “better” transistor
What do we mean by “better”?
What FACTORS influence design?
--fabrication
--design
--environmental
For which of these is there random
variation?
Which “factors” do we want to
investigate?
SUMMARY—15 IMPORTANT POINTS FOR EXPERIMENTERS:
1. Even careful experimentation and observation may miss important facts; new experiments may cause old
conclusions to be discarded; EXPERIMENTS ARE NOT PROOFS.
2. It is just as important to report NEGATIVE results as to report POSITIVE results. The experimenter must
always accurately record and thoroughly report ALL results.
3. IGNORING IMPORTANT FACTORS CAN LEAD TO ERRONEOUS CONCLUSIONS, SOMETIMES WITH
TRAGIC RESULTS.
4. YOUR RESULTS ARE ONLY VALID FOR THE PART OF THE DATA-TREATMENT SPACE YOU HAVE
EXPLORED; YOU CANNOT CLAIM KNOWLEDGE OF WHAT YOU HAVE NOT EXPLORED
5. An experiment is worthless unless it can be REPEATED by other researchers using the same experimental
setup; experimenters have a duty to the research community to report enough about their experiment and data so
that other researchers can verify their claims
6. YOU ONLY GET ANSWERS TO THE QUESTIONS YOU ASK
7. if your are going to use a (pseudo-)RANDOM NUMBER GENERATOR, make sure the output behaves enough like a sequence of
TRUE RANDOM NUMBERS
8. An experiment must be repeated a SUFFICIENT NUMBER OF TIMES for the results to be attributed to more
than random error
9. Choosing the CORRECT MEASURE for the question you are asking is an important part of the experimental
design
10. Reporting CORRECT results, PROPERLY DISPLAYED, is an integral part of a well-done experiment
11. MISUSE OF GRAPH LABELING can lead to MISLEADING RESULTS AND INCORRECT CONCLUSIONS
12. INTERPOLATING your results to regions you have not explored can lead to INCORRECT CONCLUSIONS
13. IGNORING the “NULL HYPOTHESIS” when reporting your results can be very misleading
14. Don’t mistake CORRELATION for DEPENDENCE
15. Justify your choice of CURVE using VALID STATISTICS, not “appearance”
Topics
• Analyzing and Displaying Data
– Simple Statistical Analysis
– Comparing Results
– Curve Fitting
• Designing Experiments: Factorial Designs
– 2K Designs Including Replications
– Full Factorial Designs
• Ensuring Data Meets Analysis Criteria
• Presenting Your Results; Drawing Conclusions
Example: A System
Factors
(Experimental Conditions)
System
Inputs
System
(“Black Box”)
System
Outputs
Responses
(Experimental Results)
Experimental Research
Define
System
Identify
Factors
and Levels
Define system outputs first
● Then define system inputs
● Finally, define behavior (i.e., transfer function)
●
Identify system parameters that vary (many)
● Reduce parameters to important factors (few)
● Identify values (i.e., levels) for each factor
●
Identify
Response(s)
●
Identify time, space, etc. effects of interest
Design
Experiments
●
Identify factor-level experiments
Create and Execute System; Analyze Data
Define
Workload
Create
System
Execute
System
Analyze &
Display
Data
Workload can be a factor (but often isn't)
● Workloads are inputs that are applied to
system
●
Create system so it can be executed
● Real prototype
● Simulation model
● Empirical equations
●
Execute system for each factor-level binding
● Collect and archive response data
●
Analyze data according to experiment design
● Evaluate raw and analyzed data for errors
● Display raw and analyzed data to draw
conclusions
●
Some Examples
Analog Simulation
– Which of three solvers is
best?
– What is the system?
– Responses
• Fastest simulation
time
• Most accurate result
• Most robust to types
of circuits being
simulated
– Factors
• Solver
• Type of circuit model
• Matrix data structure
Epitaxial growth
– New method using nonlinear temp profile
– What is the system?
– Responses
• Total time
• Quality of layer
• Total energy required
• Maximum layer
thickness
– Factors
• Temperature profile
• Oxygen density
• Initial temperature
• Ambient temperature
Basic Descriptive Statistics for a Random Sample X
• Mean
• Median
• Mode
• Variance / standard deviation
• Z scores: Z = (X – mean)/ (standard deviation)
• Quartiles, box plots
• Q-Q plot
Note: these can be deceptive. For example, if
P (X = 0) = P(X = 100) = 0.5 and P (Y = 50 ) = 1,
Then X and Y have the same mean (and nastier examples can
be constructed)
home.oise.utoronto.ca/~thollenstein/Exploratory%20Data%20Analysis.ppt
SIMPLE MODELS OF DATA
Example: Evaluation of a new wireless network protocol
System: wireless network with new protocol
Workload:
10 messages applied at single source
Each message identical configuration
Experiment output:
Roundtrip latency per message (ms)
Data file “latency.dat”
Ms. #
Latency
1
2
3
4
5
6
7
8
9
10
22
23
19
18
15
20
26
17
19
17
Mean: 19.6 ms
Variance: 10.71 ms2
Std Dev: 3.27 ms
Hypothesis:
Distribution is N(m,s2)
Verify Model Preconditions
Check randomness
Use plot of residuals around mean
Residuals “appear” random
Check normal distribution
Use quantile-quantile (Q-Q) plot
Pattern adheres consistently along
ideal quantile-quantile line
http://itl.nist.gov/div898/software/dataplot/refman1/ch2/quantile.pdf
Confidence Intervals
Sample mean vs Population mean
If many samples are collected,
about 1 - a will contain the
“true mean”
CI: > 30 samples
( x  z[1a / 2] s / n , x  z[1a / 2] s / n )
CI: < 30 samples
x  t[1a / 2;n1] s / n , x  t[1a / 2;n1] s / n )
For the latency data,
m = 10, a = 0.05:
(17.26, 21.94)
Raj Jain, “The Art of Computer Systems Performance Analysis,” Wiley, 1991.
Depth
Scatter and Line Plots
Resistance profile of doped
silicon epitaxial layer
Expect linear resistance increase
as depth increases
Resistance
1
1.689015
2
4.486722
3
7.915209
4
6.362388
5
11.830739
6
12.329104
7
14.011396
8
17.600094
9
19.022146
10
21.513802
Linear Regression Statistics
(hypothesis: resistance = b0 + b1*depth + error)
model = lm(Resistance ~ Depth)
summary(model)
Residuals:
Min
-2.11330
1Q
-0.40679
Median
0.05759
3Q
0.51211
Max
1.57310
t value
-0.077
17.336
“reject hypotheses b0 = 0, b1 = 0”
Pr(>|t|)
0.94
1.25e-07 ***
Coefficients:
Estimate
-0.05863
2.13358
Std. Error
0.76366
0.12308
(Intercept)
Depth
--Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Residual standard error: 1.118 on 8 degrees of freedom “variance of error: (1.118)2”
Multiple R-Squared: 0.9741, Adjusted R-squared: 0.9708
F-statistic: 300.5 on 1 and 8 DF, p-value: 1.249e-07 “evidence this estimate valid”
(Using R system; based on http://www.stat.umn.edu/geyer/5102/examp/reg.html
Validating Residuals
Errors are marginally normally
distributed due to “tails”
Comparing Two Sets of Data
Example: Consider two different
wireless access points. Which one
is faster?
Inputs: same set of 10 messages
communicated through both access
points.
Approach:
Take difference of data and
determine CI of difference.
If CI straddles zero, cannot
tell which access point is
faster.
Response (usecs):
Latency1 Latency2
22
19
23
20
19
24
18
20
15
14
20
18
26
21
17
17
19
17
17
18
CI95% = (-1.27, 2.87) usecs
Confidence interval straddles
zero.
Thus, cannot determine which is
faster with 95% confidence
Plots with error bars
Execution time of
SuperLU linear system
solution on parallel
computer
Ax = b
For each p, ran problem
multiple times with same
matrix size but different
values
Determined mean and CI
for each p to obtain curve
and error intervals
Matrix density p
> model = lm(t ~ poly(p,4))
> summary(model)
Curve Fitting
Call:
lm(formula = t ~ poly(p, 4))
Residuals:
1
2
-0.4072 0.7790
3
4
5
0.5840 -1.3090 -0.9755
Coefficients:
Estimate Std. Error t value
(Intercept) 236.9444
0.7908 299.636
poly(p, 4)1 679.5924
2.3723 286.467
poly(p, 4)2 268.3677
2.3723 113.124
poly(p, 4)3 42.8772
2.3723 18.074
poly(p, 4)4
2.4249
2.3723
1.022
--Signif. codes: 0 `***' 0.001 `**' 0.01
6
0.8501
Pr(>|t|)
7.44e-10
8.91e-10
3.66e-08
5.51e-05
0.364
7
8
2.6749 -3.1528
***
***
***
***
`*' 0.05 `.' 0.1 ` ' 1
Residual standard error: 2.372 on 4 degrees of freedom
Multiple R-Squared:
1,
Adjusted R-squared: 0.9999
F-statistic: 2.38e+04 on 4 and 4 DF, p-value: 5.297e-09
9
0.9564
Model Validation: y’ = ax + b
R2 – Coefficient of Determination
“How well does the data fit your model?”
What proportion of the “variability” is accounted for by the statistical
model? (what is ratio of explained variation to total variation?)
Suppose we have measurements y1, y2, …, yn with mean m
And predicted values y1’, y2’, …, yn’ (yi’ = axi + b = yi + ei)
SSE = sum of squared errors = ∑ (yi – yi’)2 = ∑ei2
SST = total sum of squares =∑ (yi – m)2
SSR = SST – SSE = residual sum of squares = ∑ (m – yi’)2
R2 = SSR/SST = (SST-SSE)/SST
R2 is a measure of how good the model is.
The closer R2 is to 1 the better.
Example: Let SST = 1499 and SSE = 97.
Then R2 = 93.5%
http://www-stat.stanford.edu/~jtaylo/courses/stats191/notes/simple_diagnostics.pdf
Using the t-test
Consider the following data (“sleep.R”)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
extra group
0.7
1
-1.6
1
-0.2
1
-1.2
1
-0.1
1
3.4
1
3.7
1
0.8
1
0.0
1
2.0
1
1.9
2
0.8
2
1.1
2
0.1
2
-0.1
2
4.4
2
5.5
2
1.6
2
4.6
2
3.4
2
From “Introduction to R”, http://www.R-project.org
T.test result
>
t.test(extra ~ group, data = sleep)
Welch Two Sample t-test
data: extra by group
t = -1.8608, df = 17.776, p-value = 0.0794
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.3654832 0.2054832
sample estimates:
mean of x mean of y
p-value is smallest 1- confidence where null
0.75
2.33
hypothesis. not true.
p-value = 0.0794 means difference not 0
above 92%
Factorial Design
What “factors” need to be taken into account?
How do we design an efficient experiment to test all these
factors?
How much do the factors and the interactions among the
factors contribute to the variation in results?
Example: 3 factors a,b,c, each with 2 values: 8 combinations
But what if we want random order of experiments?
What if each of a,b,c has 3 values?
Do we need to run all experiments?
http://www.itl.nist.gov/div898/handbook/pri/section3/pri3332.htm
Standard Procedure-Full Factorial Design
(Example)
Variables A,B,C: each with 3 values, Low, Medium, High (coded as -1,0,1)
“Signs Table”:
1
A
B
C
-1
-1
-1
2
+1
-1
-1
3
-1
+1
-1
4
+1
+1
-1
5
-1
-1
+1
6
+1
-1
+1
7
-1
+1
+1
8
+1
+1
+1
1.Run the experiments in the table (“2 level,
full factorial design”)
2.Repeat the experiments in this order n
times by using rows 1,…,8,1,…,8, …
(“replication”)
3.Use step 2, but choose the rows randomly
(“randomization”)
4.Use step 4, but add some “center point
runs”, for example, run the case 0,0,0, then
use 8 rows, then run 0,0,0, …finish with a
0,0,0 case
In general, for 5 or more factors, use a
“fractional factorial design”
2k Factorial Design
Example: k = 2, factors are A,B, and X’s are computed from the
signs table: y = q0 + qAxA + qBxB + qABxAB
SST = total variation around the mean
= ∑ (yi – mean)2
= SSA+SSB+SSAB
where SSA = 22qA2 (variation allocated to A), and SSB, SSAB
are defined similarly
Note: var(y) = SST/( 2k – 1)
Fraction of variation explained by A = SSA/SST
Example: 2k Design
www.stat.nuk.edu.tw/Ray-Bing
/ex-design/ex-design/ExChapter6.ppt
Factor
Levels
Line Length (L)
32, 512 words
No. Sections (K)
4, 16 sections
Control Method (C) multiplexed, linear
Experiment Design
Address
Trace
Cache
Misses
Are all factors needed?
If a factor has little effect on the variability of
the output, why study it further?
L
32
512
32
512
32
512
32
512
K
4
4
16
16
4
4
16
16
C Misses
mux
mux
mux
mux
lin
lin
lin
lin
Encoded Experiment Design
Method?
L
K
C Misses
a. Evaluate variation for each factor using only
-1 -1 -1
two levels each
1 -1 -1
b. Must consider interactions as well
-1
1 -1
Interaction: effect of a factor dependent on
the levels of another
1
-1
1
-1
1
1
-1
-1
1
1
-1
1
1
1
1
Example: 2k Design (continued)
http://www.cs.wustl.edu/~jain/cse567-06/ftp/k_172kd/sld001.htm
Obtain Reponses
L
-1
1
-1
1
-1
1
-1
1
K
-1
-1
1
1
-1
-1
1
1
C
-1
-1
-1
-1
1
1
1
1
Misses
14
22
10
34
46
58
50
86
Analyze Results (Sign Table)
I L K C LK LC KC LKC
1 -1 -1 -1 1 1 1 -1
1 1 -1 -1 -1 -1 1 1
1 -1 1 -1 -1 1 -1 1
1 1 1 -1 1 -1 -1 -1
1 -1 -1 1 1 -1 -1 1
1 1 -1 1 -1 1 -1 -1
1 -1 1 1 -1 -1 1 -1
1 1 1 1 1 1 1 1
qi: 40 10 5 20 5 2 3
Ex: y1 = 14 = q0 – qL –qK –qC
+ qLK + qLC + qKC – qLKC
= 1/∑(signi*Responsei)
Solve for q’s
SSL = 23q2L = 800
SST = SSL+SSK+SSC+SSLK+SSLC+SSKC+SSLKC
= 800+200+3200+200+32+72+8
= 4512
%variation(L) = SSL/SST = 800/4512 = 17.7%
Miss.Rate (yj)
14
22
10
34
46
58
50
86
1
Effect % Variation
L
17.7
C
4.4
K
70.9
LC
4.4
LK
0.7
CK
1.6
LCK
0.2
Full Factorial Design
Model:
yij = m+ai + bj + eij
Effects computed such that ∑ai = 0 and ∑bj = 0
m = mean(y..)
aj = mean(y.j) – m
bi = mean(yi.) – m
Experimental Errors
SSE = ei2j
SS0 = abm2
SSA= b∑a2
SSB= a∑b2
SST = SS0+SSA+SSB+SSE
Full-Factorial Design Example
Determination of the speed of light
Morley Experiments
Factors: Experiment No. (Expt)
Run No. (Run)
Levels: Expt – 5 experiments
Run – 20 repeated runs
001
002
003
004
019
020
021
022
023
096
097
098
099
100
Expt Run Speed
1
1
850
1
2
740
1
3
900
1
4
1070
<more data>
1
19
960
1
20
960
2
1
960
2
2
940
2
3
960
<more data>
5
16
940
5
17
950
5
18
800
5
19
810
5
20
870
Box Plots of Factors
Two-Factor Full Factorial
> fm <- aov(Speed~Run+Expt, data=mm) # Determine ANOVA
> summary(fm)
# Display ANOVA of factors
Df Sum Sq Mean Sq F value
Pr(>F)
Run
19 113344
5965 1.1053 0.363209
Expt
4 94514
23629 4.3781 0.003071 **
Residuals
76 410166
5397
--Signif. codes:
0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Conclusion: Data across experiments has acceptably small
variation, but variation within runs is significant
Visualizing Results: Tufte’s Principles
• Have a properly chosen format and design
• Use words, numbers, and drawing together
• Reflect a balance, a proportion, a sense of
relevant scale
• Display an accessible complexity of detail
• Have a story to tell about the data
• Draw in a professional manner
• Avoid content-free decoration, including “chart
junk”
Back to the transistor:
• What factors are there?
• Which ones do we want to investigate?
• How should we define our experiments?
• What role will randomness play?
(simulation/actual)
• How should we report the results?