Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Survey

Transcript

ROLL NO. NAME CS 636 – Adv. Data Mining Quiz 1 Solution (Time limit: 10 minutes) 1. (4 points) What is the significance of the Johnson-Lindenstrauss lemma in data stream analysis? Explain briefly and clearly. The Johnson-Lindenstrauss lemma states that embedding a n-dimensional space into a kdimensional space (k = O(log n) preserves distances within a small bounded range. This lemma provides the theoretical foundation for random projection methods for dimensionality reduction, which is often necessary for analyzing highly detailed data streams. 2. (6 points) Suppose a supermarket has an automatic sales and inventory monitoring system that collects real-time item sales and purchases information from multiple sensors. Define a data stream model for a. (2 points) monitoring inventory in real-time, and for signaling an item if its inventory falls below a threshold value. The turnstile model would be most appropriate. Suppose s1, s1, s3…is the item sale data stream, and p1, p2, p3,… is the item purchase data stream. Both streams have independent time clocks. At any instant i (the occurrence of sale or purchase), the signal Ai[j] gives the inventory of item j ( 0 <= j <= number of items in store). If at instant i, item j was sold, then Ai[j] = Ai-1[j] – 1 If at instant I, item j was purchased, then Ai[j] = Ai-1[j] + 1 At each instant i, Ai[j] is searched to find all j’s for which Ai[j] is less than the threshold value. b. (2 points) monitoring item sales over a fixed period of time (e.g. a week) The cash-register model would be most appropriate for this scenario. Using the notations defined above, at the start of each week all A[j]’s are set to zero. If at instant i from start of week, item j was sold, then Ai[j] = Ai-1[j] + 1 At any instant i from the start of the week, Ai[j] would give the number of items j sold so far. c. (2 points) finding frequent sequential patterns in item sales. The time-series model would be most appropriate for this scenario. The signal A[j] would be the item ID of the jth item sold. That is, A is an evolving signal with a new value appended whenever another item is sold. CS 636 (Wi 04/05) – Dr. Asim Karim Page 1 of 1