Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Open Problems in Streaming Sampath Kannan University of Pennsylvania Outline • Questions about the model(s) • Relationship between streaming model(s) and other models • Algorithm design questions Current Model(s) • Input stream(s): one or more; read head can only move to the right. • Data order: stream is a sequence of items permuted adversarially. Var: Problem is about particular order in the stream. • Workspace: bounded; polylogarithmic? • Approximate solutions; randomization; single or multi pass. Model too pessimistic? Adversarial input ordering makes life difficult. Order-specific problems may not be general enough. What are the alternatives? Average-case analysis? Assume that the data in a stream is chosen by an adversary but its ordering is random. Open Problem: Is there a problem where this really helps? Conjecture: No! Simulate random order by sampling from prefix of stream... Prover-assisted streaming Stream creator/prover allowed to annotate stream (create separate stream of anntn). Downstream verifier can use annotation to compute... but must not be fooled by a dishonest prover. Element distinctness/set disjointness can be verified with O(n) annotation [J. Zhang]. For what problems does prover-assistance help? Is there an example where annotation of length o(n) makes streaming algorithm possible? Passes, space, approx. factor What are the tradeoffs? Munro and Paterson show trade-off between space and number of passes for exact computation of median. Clustering: current trade-off for arbitrary metric spaces: O(n ) space for (21/ ) approximat ion. Better tradeoffs? Distributed Streams [Gibbons & Tirthapura] study streaming complexity where there are multiple streams with one observer per stream. Each observer computes a “sketch” from which a function on the union of the streams is computed. If there is a canonical interleaving of streams what functions of this canonical interleaving can we compute? Relation to other models Worst-case one-round communication complexity: Function f. Inputs partitioned between Alice and Bob in worst possible way. Number of bits Alice needs to communicate for Bob to compute f is WOCC(f). Conj: PASS(f) = WOCC(f). Issues: Non-uniformity in communication complexity model: Even for the worst-case partition, Alice and Bob could exploit the fact that they know the partition. Does this allow WOCC(f) < PASS (f)? Can show: WOCC(f) < = PASS(f) there exists f : WOCC(f) = 2 PASS(f) = 3 [Ishay,K Strauss] Algorithmic questions Stream items: points from metric space Compute diameter (For d-dimensional d Euclidean space d can approximate in O space.) (Feigenbaum, K, Zhang). Better space by dimension reduction? Clustering in specific metric spaces Can we get constant factor approximation with polylog space for Euclidean d-dimensional space? Other specific metric spaces? Computational geometry If promised real convex hull has few sides can we find approximate convex hull for a stream of points? Data Mining Stream is interleaved output of one or more Markov Chains. Many questions. Given one or more Markov Chains and a stream of states find max. probability that stream generated by given chains. Can be solved in space = product of number of states of chains. (Batu, Guha, K) Other questions: Markov Chain(s) not given: Find most likely. Assume good mix in stream. Markov Chain(s) “hidden”... stream is not a sequence of states but a sequence of edge labels. “Background” Markov Chain given. Find motifs --- other Markov Chains that explain “signals” in the stream. Open Problem Are there other open problems?