Download Confidence interval

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Confidence interval wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
CPSC 531: Output Data Analysis
Instructor: Anirban Mahanti
Office: ICT 745
Email: [email protected]
Class Location: TRB 101
Lectures: TR 15:30 – 16:45 hours
Slides primarily adapted from:
“The Art of Computer Systems Performance
Analysis” by Raj Jain, Wiley 1991.
[Chapters 12, 13, and 25]
CPSC 531: Data Analysis
1
Outline
 Measures of Central Tendency

Mean, Median, Mode
 How to Summarize Variability?
 Comparing Systems Using Sample Data
 Comparing Two Alternatives
 Transient Removal
CPSC 531: Data Analysis
2
Measures of Central Tendency (1)
 Sample mean – sum of all observations divided
by the total number of observations
Always exists and is unique
 Mean gives equal weight to all observations
 Mean is strongly affected by outliers

 Sample median – list observations in an
increasing order; the observation in the middle
of the list is the median;
Even # of observations – mean of middle two values
 Always exists and is unique
 Resistant to outliers (compared to mean)

CPSC 531: Data Analysis
3
Measures of Central Tendency (2)
mode
0.4



Mode may not exists (e.g.,
all sample have equal
weight)
More than one mode may
exist (i.e. bimodal)
If only one mode then
distribution is unimodal
0.2
0.1
0
0
4
8
12
x
16
20
mode
mode
0.2
PDF f(x)
0.15
0.1
0.05
0
0
4
8
12
16
20
x
mode
0.6
0.5
PDF f(x)
histogram from the
observations; find
bucket with peak
frequency; the middle
point of this bucket is
the mode;
PDF f(x)
 Sample mode – plot
0.3
0.4
0.3
0.2
0.1
0
0
4
8
12
x
CPSC 531: Data Analysis
4
Measure of Central Tendency (3)
 Is data categorical?
 Yes:
use mode
 e.g. most used resource in a system
 Is total of interest?
 Yes: use mean
 e.g. total response time for Web requests
 Is distribution skewed?

Yes: use median
• Median less influenced by outlier than mean.

No: use mean. Why?
CPSC 531: Data Analysis
5
Common Misuses of Means (1)
 Usefulness of mean depends on the number of
observations and the variance

E.g. two response time samples: 10 ms and 1000 ms.
Mean is 505 ms! Correct index but useless.
 Using mean without regard to skewness
System A
10
9
11
10
10
Mean: 10
Mode: 10
Min,Max: [9,11]
System B
5
5
5
4
31
10
5
[4,31]
CPSC 531: Data Analysis
6
Common Misuses of Means (2)
 Mean of a Product by Multiplying means




Mean of product equals product of means if the
two random variables are independent.
If x and y are correlated E(xy) != E(x)E(y)
Avg. users in system 23; avg. processes/user 2.
Avg. # of processes in system? Is it 46?
No! Number of processes spawned by users
depends on the load.
CPSC 531: Data Analysis
7
Outline
 Measures of Central Tendency
 How to Summarize Variability?
 Comparing Systems Using Sample Data
 Comparing Two Alternatives
 Transient Removal
CPSC 531: Data Analysis
8
Summarizing Variability
 Summarizing by a single number rarely enough.

Given two systems with same mean, we generally
prefer one with less variability
20%
4s
Mean=2s
Response Time
Frequency
Frequency
80% 1.5 s
60% ~ 0.001 s
~5 s
40%
Mean=2s
Response Time
 Indices of dispersion
• Range, Variance, 10- and 90-percentiles, Semi-interquantile
range, and mean absolute deviation
CPSC 531: Data Analysis
9
Range
 Easy to calculate; range = max – min
 In many scenarios, not very useful:
 Min may be zero
 Max may be an “outlier”
 With more samples, max may keep increasing and
min may keep decreasing → no “stable” point
 Range is useful if systems performance is
bounded
CPSC 531: Data Analysis
10
Variance and Standard Deviation
 Given sample of n observations {x1, x2, …, xn} the
sample variance is calculated as:
2
1 n
s 
 xi  x 
n  1 i 1
2
1 n
where x   xi
n i 1
 Sample variance: s2 (square of the unit of observation)
 Sample standard deviation: s (in unit of observation)
 Note the (n-1) in variance computation
 (n-1) of the n differences are independent
 Given (n-1) differences, the nth difference can be computed
 Number of independent terms is the degrees of freedom (df)
CPSC 531: Data Analysis
11
Standard Deviation (SD)
 Standard deviation and mean have same units
 Preferred!
 E.g. a) Mean = 2 s, SD = 2 s; high variability?
 E.g. b) Mean = 2 s, SD = 0.2 s; low variability?
 Another widely used measure – C.O.V
 C.O.V = Ratio of standard deviation to mean
 C.O.V does not have any units
 C.O.V shows magnitude of variability
 C.O.V in (a) is 1 and in (b) is .1
CPSC 531: Data Analysis
12
Percentiles, Quantiles, Quartiles
 Lower and upper bounds expressed in percents
or as fractions
90-percentile →0.9-quantile
 –quantile: sort and take [(n-1)+1]th observation

• [] means round to nearest integer
 Quartiles divide data into parts at 25%, 50%,
75% → quartiles (Q1, Q2, Q3)
25% of the observations ≤ Q1 (the first quartlie)
 Second quartile Q2 is also the median

 The range (Q3 – Q1) is interquartile range
 (Q3 – Q1)/2 is semi-interquartile (SIQR) range
CPSC 531: Data Analysis
13
Mean Absolute Deviation
 Mean absolute deviation is calculated as:
1 n
 xi  x
n i 1
CPSC 531: Data Analysis
14
Influence of Outliers
 Range: considerably
 Sample variance: considerably, but less than
range
 Mean absolute deviation: less than variance

Doesn’t square (aka magnify) the outliers
 SIQR range: very resistant
 Use SIQR for index of dispersion whenever
median is used as index of central tendency
CPSC 531: Data Analysis
15
Outline
 Measures of Central Tendency
 How to Summarize Variability?
 Comparing Systems Using Sample Data
 Sample vs. Population
 Confidence Interval for Mean
 Comparing Two Alternatives
 Transient Removal
CPSC 531: Data Analysis
16
Comparing Systems Using Sample Data
 The words “sample” and “example” have a
common root – “essample” (French)
 One sample does not prove a theory - a sample
is just an example
 The point is - definite statement cannot be
made about characteristics of all systems.
 However, probabilistic statements about the
range of most systems can be made
 Confidence interval concept as a building block
CPSC 531: Data Analysis
17
Sample versus Population
 Generate 1-million random numbers

with mean  and SD  and put them in an urn
 Draw sample of n observations
 {x1, x2, …, xn} has mean , standard deviation s
x
x

is likely different than !
 The population mean  is unknown or impossible
to obtain in many real-world scenarios
obtain estimate of  from
x
 Therefore,
CPSC 531: Data Analysis
18
Confidence Interval for the Mean
 Define bounds c1 and c2 such that:
Prob{c1 <  < c2} = 1-
(c1, c2) is confidence interval
  is significance level
 100(1- ) is confidence level

 Typically small  desired
 confidence level 90%, 95% or 99%
 One approach: take k samples, find sample
means, sort, and take the [1+0.05(k-1)]th as
c1 and [1+0.95(k-1)]th as c2
CPSC 531: Data Analysis
19
Central Limit Theorem
 We do not need many samples. Confidence
intervals can be determined from one sample
because ~ N(, /sqrt(n))
 SD of sample mean  /sqrt(n) called
Standard error
 Using the CLT, a 100(1- )% confidence
interval for a population mean is
( -z1-/2s/sqrt(n), +z1-/2s/sqrt(n))
x
x
x
 z1-/2
is the (1-/2)-quantile of a unit normal
variate (and is obtained from a table!)
 s is the sample SD
CPSC 531: Data Analysis
20
Confidence Interval Example
 CPU times obtained by repeating experiment
32 times. The sorted set consists of


{1.9,2.7,2.8,2.8,2.8,2.9,3.1,3.1,3.2,3.2,3.3,3.4,3.6,3.7,3.8,3.9,3.9
,4.1,4.1,4.2,4.2,4.4,4.5,4.5,4.8,4.9,5.1,5.1,5.3,5.6,5.9}
Mean = 3.9, standard deviation (s) = 0.95, n=32
 For 90% confidence interval z1-/2 = 1.645, and
we get {3.90 + (1.645)(0.95)/(sqrt(32))} =
(3.62,4.17)
CPSC 531: Data Analysis
21
Meaning of Confidence Interval
 What does this mean? With 90% confidence,
we can say population mean is within the above
bounds; that is, chance of error is 10%.
E.g., Take 100 samples and construct CI’s. In 10
cases, the interval will not contain population mean
x
-c
x
x

+c
90% chance that this interval contains 
CPSC 531: Data Analysis
22
Length of Confidence Interval
 Let z1-/2s/sqrt(n) = c
 Then, z1-/2 = (c.sqrt(n))/s
 Larger s implies wider confidence interval
 Larger n implies shorter confidence interval
• → with more observations, we are better able to predict
population mean
• → square-root n relationship implies increasing
observations by a factor of 4 only cuts confidence interval
by a factor of 2.
 Confidence Interval computation, as described
here works for n ≥ 30.
CPSC 531: Data Analysis
23
What if n not large?
 For smaller samples, can construct confidence
intervals only if observations come from
normally distributed population
x  t[1 / 2;n1]s /
n , x  t[1 / 2;n 1]s / n

 t[1-α/2;n-1]
is the (1-α/2)-quantile of a t-variate with
(n-1) degrees of freedom
CPSC 531: Data Analysis
24
Testing for a Zero Mean
 Check if measured value is significantly
different than zero
 Determine confidence interval
 Then check if zero is inside interval.
 Procedure applicable to any other value a
mean
0
Mean is zero
Mean is nonzero
CPSC 531: Data Analysis
25
Outline
 Measures of Central Tendency
 How to Summarize Variability?
 Comparing Systems Using Sample Data
 Comparing Two Alternatives
 Transient Removal
CPSC 531: Data Analysis
26
Comparing Two Alternatives
 Often interested in comparing systems
“naïve” VOD vs. “batching” VOD (assignment 3)
 “SJF” vs. “FIFO” request scheduling (assignment 1)

 Statistical techniques for such comparison:
 Paired Observations
 Unpaired Observations (we will omit this!)
 Approximate Visual Test
 Did you use any of these in your assignments?
CPSC 531: Data Analysis
27
Paired Observations (1)
 n experiments with one-to-one corrsp. between
test on system A and test on system B
no correspondence => unpaired
 This test uses the zero mean idea…

 Treat the two samples as one sample of n pairs
 For each pair, compute difference
 Construct confidence interval for difference
 CI includes zero => systems not significantly
different
CPSC 531: Data Analysis
28
Paired Observations (2)
 Six similar workloads used on two systems.
{(5.4, 19.1), (16.6, 3.5), (0.6,3.4), (1.4,2.5), (0.6,
3.6) (7.3, 1.7)} Is one system better?
 The performance differences are
{-13.7, 13.1, -2.8, -1.1, -3.0, 5.6}
 Sample mean = -.32, sample SD = 9.03
 CI = -0.32 + t[sqrt(81.62/6)] = -0.32 + t(3.69)
 .95 quantile of t with 5 DF’s is 2.015
 90% confidence interval = (-7.75, 7.11)
 Systems not different as zero mean in CI
CPSC 531: Data Analysis
29
Approximate Visual Test
 Compute confidence interval for means
 If CI’s don’t overlap, one system better than
the other
mean
mean
CI’s do not overlap =>
alternatives different
mean
CI’s overlap and mean
of one is in the CI of
the other =>
not significantly diff.
CI’s overlap but mean
of one is not in the
CI of the other =>
need more testing
CPSC 531: Data Analysis
30
Determining Sample Size
 Goal: find the smallest sample size n such that desired
confidence in the results
 Method:



small set of preliminary measurements
estimate variance from the measurements
use estimate to determine sample size for accuracy
 r% accuracy=> +r% at 100(1-)% confidence
r 
xz
 x 1 

100 
n
s
 100zs 
n  

 rx 
2
CPSC 531: Data Analysis
31
Outline
 Measures of Central Tendency
 How to Summarize Variability?
 Comparing Systems Using Sample Data
 Comparing Two Alternatives
 Transient Removal
CPSC 531: Data Analysis
32
Transient Removal
 In many simulations, we are interested in
steady state performance
 Remove
initial transient state
 However, defining exactly what constitutes
end of transient state is difficult!
 Several heuristics developed:
Long runs
 Proper initialization
 Truncation
 Initial data deletion
 Moving average of replications
 Batch means

CPSC 531: Data Analysis
33
Long Runs
 Use very long runs
 Impact of transient state becomes negligible
 Wasteful use of resources
 How long is “long enough”?
 Raj Jain text recommends that this method
not be used in isolation
CPSC 531: Data Analysis
34
Batch Means
 Run simulation for long
duration
 Divide observations (N) into
m batches, each of size n
 Compute variance of batch
means using procedure shown
for n = 2, 3, 4, 5 …
 Plot variance vs. batch size
Ignore
1) Computebatch mean
1 n
xi   xij , i  1,2,...,m
n i 1
2)Computeoverallmean
1 m
x   xi
m i 1
3) Computevarianceof batch means
1 m
2
Var ( x ) 
 ( xi  x )
m  1 i 1
Variance of
Batch means
Transient
interval
Batch Size n
CPSC 531: Data Analysis
35