Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia, lookup

Transcript
```ROLL NO.
NAME
CS 636 – Adv. Data Mining
Quiz 1 Solution
(Time limit: 10 minutes)
1. (4 points) What is the significance of the Johnson-Lindenstrauss lemma in data
stream analysis? Explain briefly and clearly.
The Johnson-Lindenstrauss lemma states that embedding a n-dimensional space into a kdimensional space (k = O(log n) preserves distances within a small bounded range. This
lemma provides the theoretical foundation for random projection methods for
dimensionality reduction, which is often necessary for analyzing highly detailed data
streams.
2. (6 points) Suppose a supermarket has an automatic sales and inventory monitoring
system that collects real-time item sales and purchases information from multiple
sensors. Define a data stream model for
a. (2 points) monitoring inventory in real-time, and for signaling an item if its
inventory falls below a threshold value.
The turnstile model would be most appropriate. Suppose s1, s1, s3…is the item sale data
stream, and p1, p2, p3,… is the item purchase data stream. Both streams have independent
time clocks. At any instant i (the occurrence of sale or purchase), the signal Ai[j] gives
the inventory of item j ( 0 <= j <= number of items in store).
If at instant i, item j was sold, then
Ai[j] = Ai-1[j] – 1
If at instant I, item j was purchased, then
Ai[j] = Ai-1[j] + 1
At each instant i, Ai[j] is searched to find all j’s for which Ai[j] is less than the threshold
value.
b. (2 points) monitoring item sales over a fixed period of time (e.g. a week)
The cash-register model would be most appropriate for this scenario. Using the notations
defined above, at the start of each week all A[j]’s are set to zero.
If at instant i from start of week, item j was sold, then
Ai[j] = Ai-1[j] + 1
At any instant i from the start of the week, Ai[j] would give the number of items j sold so
far.
c. (2 points) finding frequent sequential patterns in item sales.
The time-series model would be most appropriate for this scenario. The signal A[j] would
be the item ID of the jth item sold. That is, A is an evolving signal with a new value
appended whenever another item is sold.
CS 636 (Wi 04/05) – Dr. Asim Karim
Page 1 of 1
```
Related documents