Download Midterm - NYU Computer Science

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Entity–attribute–value model wikipedia , lookup

Big data wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Functional Database Model wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Database model wikipedia , lookup

Transcript
NAME _______________________________________
Data Warehousing/Mining
Midterm
Part 1 (30 points)
1) Which of the following statements about data warehouses is not true?
a) Data warehouses store current and historical data of interest to managers throughout the company.
b) Data warehouses make multidimensional analysis possible.
c) The principles involved in data warehouses and data marts make them incompatible.
d) Data mining of data warehouses often produces useful results.
2) A relational database developer refers to a record as a(n)
a) Criteria
b) Relation
c) Tuple
d) Attribute
3) Two advantages of object-oriented databases, relative to relational databases, is that they can
(a) Access data faster and use Structured Query Language (SQL)
(b) Use Structured Query Language (SQL) and store associations among data
(c) Store associations among data and store more data
(d) Store more types of data and access data faster
4) Data mining tasks can be divided into two categories:
a) Descriptive and analytical
b) Descriptive and predictive
c) Predictive and analytical
d) Analytical and quantitative
5) Which of the following is an objective measure of associative pattern interestingness?
a) Support
b) Variance
c) Completeness
d) Efficiency
6) Comparing OLTP and OLAP, OLAP is used for complex queries
a) True
b) False
7) What allows data to be modeled and viewed in multiple dimensions?
a) Fact table
b) Measure
c) Data Cube
d) Relational databases
8) Which of the following is not an example of a multidimensional database model?
a) Star Schema
b) Snowflake Schema
c) Constellation Schema
d) Moon Schema
NAME _______________________________________
9) Which is not an example of an OLAP operation?
a) Roll-down
b) Drill-down
c) Pivot
d) Slice and Dice
10) Binning and clustering are examples of which type of data transformation?
a) Aggregation
b) Normalization
c) Generalization
d) Smoothing
Part 2 (170 points)
1) Describe the 7 steps involved in data mining as a process of knowledge discovery. (20 points)
2) Suppose that one needs to record three measures in a data cube: min, average and median. Design an
efficient computation and storage method for each measure given that the cube allows data to be
deleted incrementally (i.e., in small portions at a time) form the cube. (20 points)
NAME _______________________________________
3) In the real world, tuples with missing values for some attributes are a common occurrence. Describe
two methods for handling this problem. (15 points)
4) Suppose that the data for analysis includes the attribute age. The age values for the data tuples are: 13,
15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, and 72.
(15 points)
a)
What is the mean and median?
b) What is the mode of the data?
c)
Give the five-number summary of the data?
NAME _______________________________________
5) Discuss two issues to consider during data integration. (20 points)
6) Design a data warehouse for a regional weather bureau. The weather bureau has about 10000 probes
which are scattered throughout various land and ocean locations to collect basic weather data including
air pressure, temperature and precipitation for each hour. All data are sent to the central station which
has collected such data for over 10 years. Your design should facilitate efficient querying and online
analytical processing and derive general weather patterns in space. (Make sure to include a star
schema) (40 points)
NAME _______________________________________
7) Consider the following multifeature cube query: Grouping by all aspects of {item, region, month}, find
the minimum shelf life in 1999 for each group, and the fraction of the total sales due to tuples whose
price is less than $100 and whose shelf life is within 25% of the minimum shelf life, and within 50% of
the minimum shelf life. (30 points)
a)
Draw the multifeature cube graph for the query
b) Express the query in extended SQL
c)
Is this a distributive multifeature cube? Why or why not?
NAME _______________________________________
8) Describe 3 challenges to data mining regarding data mining methodology and user interaction (15
points)
9) How is the data warehouse different from a database? How are they similar? (15 points)