Download Query Processing, Resource Management and Approximate in a

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Least squares wikipedia , lookup

Regression analysis wikipedia , lookup

Forecasting wikipedia , lookup

Linear regression wikipedia , lookup

Transcript
Traffic Prediction on the
Internet
Anne Denton
Outline
 Paper by Y. Baryshnikov, E. Coffman,
D. Rubenstein and B. Yimwadsana
 Solutions
 Time-Series prediction
 Our work for the KDD-cup 03
Time Series Prediction on the
Internet
By Y. Baryshnikov, E. Coffman, D.
Rubenstein and B. Yimwadsana
 Adjustment to “hot spots”
 Avoiding degradation, even “denial of
service”
 Can “hot spots” be predicted?
 Can predicted “hot spots” be avoided?
What are “hot spots”?
 Exceptionally large numbers of requests
 Spontaneous, short lifetime
 “instant” ramp up in traffic
 Only valid on long time scales
 Claim: time scale for increase larger than
time scale to react
 Why does increase take time?
 Passing on the word
 How good does a predictor have to be?
 Cost of missing a “hot spot” higher than
aggregate cost of false alarms (similar to
hurricane)
Examples
 Olympics (Nagano 98)
 Soccer World Cup (98)
 NASA (95)
What to do about “hot spots”?
 <Detour> “The Columbia Hotspot Rescue
Service: A Research Plan”
E. Coffman, P. Jelenkovic, J.Nieh, and D.
Rubenstein
 Approaches
 Deal ad hoc with high request
 Build a better network (expensive)
 Content delivery services
 Caching
 Extra bandwidth
 Suggested solution: use available and
underutilized resources
Hotspot Rescue Service
 Server-based approach
 Requires additional resources from
server when necessary
 Resources provided by other members of
Hotspot Rescue Service
 Peer-to-Peer approach
 Requires additional resources from client
when necessary
 Caching
Four Phases
 Prediction (see rest of presentation)
 Server-based: daemons
 P2P: plug-ins
 Replication
 Server-based: replication of objects
 P2P: identified cached copies
 More advanced: redistribution of traffic load
 Notification
 Modifications to DNS (Domain Name System)
 P2P system proactively announces hot objects
and indicates alternative locations?
 Termination
<End of Detour>
Tail of Distribution
 Requests per 10-second time slot
 X-axis: number of hits per time slot
 Y-axis: probability that that number
of hits will be exceeded
Time Scales
 Prediction relies on correlation
between values at different times
 Auto correlation function
 f (t ) f (t   ) dt
 Predictability
on time scales
of 5-30 min
Prediction Algorithm
 Standard problem
 Signal processing
 Econometrics
 Internet traffic
 Particularly bursty
 Simplest model
 Linear extrapolation
Structure of Prediction
Algorithms
 Traffic observation
 # of requests in time unit (t-1,t]
 Usually 1s
 Prediction window
 Duration Wp  0
 Advance notice 
 Prediction at time t:
 Mapping of observations in [t-Wp,t] to a number
pt  0 of requests predicted in interval
[t+, t++1] that is  units in the future
Linear Prediction
 Linear Fit: Least squares linear fit
 pt = ft(t+) with
t
 ft(s) = at s+bt
2


f
(
i
)

r

t
i
 Minimizing
i t W p
 Performance: O(W+T)
 W: Window size
 T: uptime duration
 Problems
 Prediction window size must match burstiness
parameters governing request flow
Results
 Depends on properties of autocorrelation function
Conclusions of Paper
 Build a load-based taxonomy of web
server traffic
 Depends on technological,
sociological, and psychological factors
 Look for quantification of basic
patterns reflecting behavior
Do we agree ???
 Why cluster when we can classify!!
Our Approach
 Normally time series prediction uses
only data in that time series
 We use similarity to other instances
 E.g., other web sites
 Model-free
 Weighted Nearest Neighbor approach
 Problem:
 How integrate time?
Typical Nearest Neighbor
Classification / Regression
 R(A1, …, An, C)
 Attributes Ai
 C class label (classification)
 or continuous variable (regression)
 Based on distance function on Ai
 K nearest neighbors
 Neighbors within a range
 Use kernel function to weight closer ones
higher
Weighting of Attributes
 Some attributes are more important
than others
 Apply scaling to space
 Optimize weights through
 Hill-climbing
 Genetic Algorithm
 How does this generalize to a timeseries?
Our Answer
 Identify “relevant” sections in the
time series
 E.g. times with already high download
rates
 We’ll call each relevant section a
“prediction”
Predictions
 Each prediction contains information
about
 The nature of the time series
 The time instance in question, i.e. the
history of requests
 The actual change in requests
 Make a table of predictions
 Leads to a relation just as standard
classification / regression setting
Data Set
 Paper citations in “e-print ArXive”
 Background: KDD-cup 03
 Predict the change in citations in successive 3month periods
 Only consider periods with at least 6 citations
 Evaluation: L1 distance (Manhattan distance)
between predicted and real difference
 Very close match between citation history
and request history
 Predict change in requests
 Only consider periods that already show large
number of requests
Attributes of a “Prediction”
 Quantitative attributes
 Number of citations in window
 Gradient of citations in window
 Aggregate number of citations up to and
through window (assume finite time series)
 Attribute values given by time series






Keyword occurrences
Author
Number of revisions of papers
Maximum time interval between revisions
Country of origin
Format
Similarity Function
 Common kernel-function
 ( x0  x1 ) 2

K ( x0 , x1 )  exp 
2
2



 What worked better
1
K ( x0 , x1 ) 
1  w x0  x1
Plot of Similarity Function
1
f(x)
0.8
0.6
Gaussian
0.4
1/(1+x)
0.2
0
0
5
10
x
15
20
Accuracy
 No linear extrapolation data available
 Could lead to negative citations
 Comparison
 Default prediction: No change: 1851
 Very simple model (decrease by 0.3 in 3
months): 1532
 Prediction based on average of time series
(synchronized at first non-0): 1593
 Prediction based on quantitative attributes:
1465
 Full prediction (prelimiary): 1357
 Weight optimized (very preliminary): reduction
1414 -> 1391
Results
3000
2500
2000
Series1
Series2
1500
Series3
Series4
1000
500
0
1
2
3
4
5
6
7
8
9
10
11
Conclusions
 Method works well for citation
prediction
 Yet to be tested for hot-spot
prediction