Download Open Problems in Streaming

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Open Problems in Streaming
Sampath Kannan
University of Pennsylvania
Outline
• Questions about the model(s)
• Relationship between streaming
model(s) and other models
• Algorithm design questions
Current Model(s)
• Input stream(s): one or more; read head can
only move to the right.
• Data order: stream is a sequence of items
permuted adversarially.
Var: Problem is about particular order in
the stream.
• Workspace: bounded; polylogarithmic?
• Approximate solutions; randomization; single
or multi pass.
Model too pessimistic?
Adversarial input ordering makes life
difficult.
Order-specific problems may not be
general enough.
What are the alternatives?
Average-case analysis?
Assume that the data in a stream is chosen
by an adversary but its ordering is random.
Open Problem: Is there a problem where
this really helps?
Conjecture: No! Simulate random order by
sampling from prefix of stream...
Prover-assisted streaming
Stream creator/prover allowed to annotate
stream (create separate stream of anntn).
Downstream verifier can use annotation to
compute... but must not be fooled by a
dishonest prover.
Element distinctness/set disjointness can be
verified with O(n) annotation [J. Zhang].
For what problems does prover-assistance
help?
Is there an example where annotation of
length o(n) makes streaming algorithm
possible?
Passes, space, approx. factor
What are the tradeoffs?
Munro and Paterson show trade-off
between space and number of passes
for exact computation of median.
Clustering: current trade-off for arbitrary
metric spaces:
O(n ) space for (21/  ) approximat ion.
Better tradeoffs?
Distributed Streams
[Gibbons & Tirthapura] study streaming
complexity where there are multiple
streams with one observer per stream.
Each observer computes a “sketch” from
which a function on the union of the streams
is computed.
If there is a canonical interleaving of streams
what functions of this canonical interleaving
can we compute?
Relation to other models
Worst-case one-round communication
complexity: Function f. Inputs partitioned
between Alice and Bob in worst possible
way. Number of bits Alice needs to
communicate for Bob to compute f is
WOCC(f).
Conj: PASS(f) = WOCC(f).
Issues: Non-uniformity in communication
complexity model: Even for the worst-case
partition, Alice and Bob could exploit the
fact that they know the partition.
Does this allow WOCC(f) < PASS (f)?
Can show: WOCC(f) < = PASS(f)
there exists f : WOCC(f) = 2
PASS(f) = 3 [Ishay,K
Strauss]
Algorithmic questions
Stream items: points from metric space
Compute diameter
(For d-dimensional
d Euclidean space
d

can approximate in O   space.)
 
(Feigenbaum, K, Zhang).
Better space by dimension reduction?
Clustering in specific metric spaces
Can we get constant factor approximation
with polylog space for Euclidean d-dimensional
space? Other specific metric spaces?
Computational geometry
If promised real convex hull has few sides
can we find approximate convex hull for a
stream of points?
Data Mining
Stream is interleaved output of one or
more Markov Chains. Many questions.
Given one or more Markov Chains and a
stream of states find max. probability
that stream generated by given chains.
Can be solved in space = product of number
of states of chains. (Batu, Guha, K)
Other questions:
Markov Chain(s) not given: Find most likely.
Assume good mix in stream.
Markov Chain(s) “hidden”... stream is not a
sequence of states but a sequence of edge
labels.
“Background” Markov Chain given. Find
motifs --- other Markov Chains that explain
“signals” in the stream.
Open Problem
Are there other open problems?
Related documents