Download IT6702-Data warehousing and Data Mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

K-means clustering wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Cluster analysis wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
VALLIAMMAI ENGINEERING COLLEGE
SRM Nagar, Kattankulathur – 603 203
DEPARTMENT OF
COMPUTER SCIENCE AND ENGINEERING
QUESTION BANK
VI SEMESTER
IT 6702-DATAWARE HOUSING AND DATA MINING
Regulation – 2013
Academic Year 2016 – 17
Prepared by
Mr.K.SHANMUGAM, Assistant Professor/CSE
VALLIAMMAI ENGINEERING COLLEGE
SRM Nagar, Kattankulathur – 603 203.
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
QUESTION BANK
SUBJECT
: IT6702-DATAWAREHOUSING AND DATAMINING
SEM / YEAR : VI-Third year
UNIT -1- DATA WAREHOUSING
Data warehousing Components –Building a Data warehouse –- Mapping the Data
Warehouse to a Multiprocessor Architecture – DBMS Schemas for Decision Support –
Data Extraction, Cleanup, and Transformation Tools –Metadata.
Q.N
O
1. 2. 3. PART A
QUESTIONS
How is data ware house different from a database? Identify the
similarity.
Differentiate metadata and data mart.
COMPETEN
CE
Remember
LEVE
L
BTL-1
Understand
BTL-2
Analyze why one of the biggest challenges when designing a data Analyze
ware house is the data placement and distribution strategy.
How would you evaluate the goals of data mining?
Evaluate
BTL-4
Remember
BTL-1
6. List the two ways the parallel execution of the tasks within SQL
statements can be done.
What elements would you use to relate the design of data warehouse?
Apply
BTL-3
7. Define Data mart?
Remember
BTL-1
8. Define star schema?
Remember
BTL-1
9. What is Data warehousing? Explain the benefits of Data warehousing.
Evaluate
BTL-5
10. Why data transformation is essential in the process of Knowledge
Remember
discovery? Describe it.
Describe the alternate technologies used to improve the performance Understand
in data warehouse environment?
Distinguish STARjoin and STARindex .
Understand
BTL-1
How would you show your understanding in Multidimensional data
model?
Formulate What conclusion can you draw in Bitmapped index.
Apply
BTL-3
Create
BTL-6
4. 5. 11. 12. 13. 14. BTL-5
BTL-2
BTL-2
15. Explain Vendor solutions?
Analyze
BTL-4
16. Explain the features of Metadata repository in data warehousing?
Analyze
BTL-4
17. Define Metadata repository?
Remember
BTL-1
18. Discuss the main ideas in Prims solution.
Understand
BTL-2
19. Illustrate the benefits of metadata repository.
Apply
BTL-3
20. Design the data warehouse architecture.
Create
BTL-6
1 PART B
What is data warehouse? Give the Steps for design and construction of
Understand
Data Warehouses and explain with three tier architecture diagram.
2 Diagrammatically illustrate and
preprocessing techniques:
(i)
Data cleaning
(ii)
Data Integration
(iii)
Data transformation
(iv)
Data reduction
3 Analyze
(i).Draw the data warehouse architecture and explain its components.
(ii).Explain the different types of OLAP tools.
(i).Describe in detail about Mapping the Data warehouse to a Remember
multiprocessor architecture?
(ii).Describe in detail on data warehouse Metadata.
4 5 6 7 8 9 10 discuss
the
following
(i).Explain the steps in building a data warehouse.
(ii).Analyze the information needed to support DBMS schemas for
Decision support.
(i).Discuss in detail about access tools types?
ii) Describe the overall architecture of data warehouse?
i)Discuss in detail about Vendor approaches.
ii) Discuss in detail about access to legacy data.
i) Describe in detail about data extraction?
ii) Describe in detail about transformation tools?
(i).Suppose that a data warehouse consists of four dimensions
customer, product, salesperson and sales time, and the three measure
sales Amt(in rupees), VAT(in rupees) and payment_type(in rupees).
Draw the different classes of schemas that are popularly used for
modeling data warehouses and explain it.
(ii).How would you explain Metadata implementation with examples?
Describe in detail about i) Bitmapped indexing
ii)STARjoin and index.
Apply
BTL-2
BTL-3
BTL-4
BTL-1
Analyze
BTL-4
understand
BTL-2
Understand
BTL-2
Remember
BTL-1
Evaluate
BTL-5
Remember
BTL-1
11 12 13 14 (i).What is data Preprocessing? Explain the various data pre
Analyze
processing techniques.
(ii).Explain the basic methods for data cleaning.
(i).Describe the steps involved in Knowledge discovery in
Remember
databases(KDD).
(ii).Draw the diagram and Describe the architecture of data mining
system.
i)Demonstrate in detail about Data marts .
Apply
ii)Demonstrate data warehouse administration and management.
(i).Generalize the potential performance problems with star
Create
schema.
(ii).Design and discuss about the star and snowflake schema
models of a Data warehouse.
UNIT 2- BUSINESS ANALYSIS
BTL-4
BTL-1
BTL-3
BTL-6
Reporting and Query tools and Applications – Tool Categories – The Need for Applications
– Cognos Impromptu – Online Analytical Processing (OLAP) – Need – Multidimensional
Data Model – OLAP Guidelines – Multidimensional versus Multirelational OLAP –
Categories of Tools – OLAP Tools and the Internet.
1
2
PART A
Can you list the categories of tools.
How would you describe the basic functions used in reporting tools?
Remember
Remember
BTL-1
BTL-1
3
Can you explain the tool to corporate data.
Analyze
BTL-4
4
Define the various access types to the data stored in a data warehouse?
Remember
BTL-1
5
Apply
BTL-3
6
What would demonstrate the result if an organization uses a familiar
approach to build a query and reporting environment for the data
warehouse?
What is a reporting tool? Give examples for managed query tools.
Understand
BTL-2
7
Explain the theme of SGF and DEF?
Analyze
BTL-4
8
Compare ADF with SGF.
Analyze
BTL-4
9
Illustrate what changes would you make to solve the problem in
portability and supported platforms.
Discuss the need for OLAP.
Apply
BTL-3
Understand
BTL-2
Apply
BTL-3
12
How would you classify the ideas of multidimensional data model
with multirelational OLAP?
Which parameter and design choices determine the MOLAP?
Create
BTL-6
13
Summarize to improve the purpose of ROLAP in OLAP.
Understand
BTL-2
14
Define MQE?
Remember
BTL-1
15
Explain the difference between pilot software?
Evaluate
BTL-5
10
11
16
Generalize the function of OLAP tools in the internet.
Create
BTL-6
17
Describe MOLAP and ROLAP?
Understand
BTL-2
18
How would you explain the results for fusion components?
Evaluate
BTL-5
19
Describe the information needed to support the views of IBI FOCUS
fusion?
List out the key features in business applications using OLAP?
Remember
BTL-1
Remember
BTL-1
20
1
PART B
What is business analysis? List and discuss the basic features that are Remember
provided by reporting and query tools used for business analysis.
BTL-
2
(i).Explain the feature of Cognous impromotu business analysis tool.
Analyze
BTL-4
3
(i).Compare OLTP and OLAP systems.
(ii).Summarize the various OLAP operations in the Multidimensional
Data Model.
(iii).Explain the different types of OLAP servers?
i)How would you describe in detail about reporting query?
ii) Describe in detail about application of reporting query?
(i). Write about the multidimensional data model. Describe, how is it
used in data warehousing.
(ii). Describe in detail about tool categories in query tools.
(i).Examine how a client/server application can be constructed using
Power Builder painters.
(ii).Illustrate the three functional components offers to support the
life cycle of Forte.
(i). Describe the need for OLAP.
(ii).Discuss in detail about the OLAP guidelines.
(i). Diagrammatically illustrate and describe the architecture of
MOLAP and ROLAP.
(ii). Identify the Major difference between MOLAP and ROLAP.
Describe in detail about
(i).Managed query Environment(MQE)
(ii). State of the Market.
i). Discuss in detail about the OLAP tools.
ii) Discuss in detail about application in the internet?
Explain in detail about the need for the architectural styles,trends and
major players of OLAP.
Examine the relevant examples discuss multidimensional online
analytical processing and multi relational online analytical processing.
Evaluate
BTL-5
Understand
BTL-2
Remember
BTL-1
Apply
BTL-3
Understand
BTL-2
Remember
BTL-1
Remember
BTL-1
Understand
BTL-2
Analyze
BTL-4
Apply
BTL-3
Create
BTL-6
4
5
6
7
8
9
10
11.
12
13
Generalize the topic on
(i). Pilot software
(ii). Arbor Essbase web.
14
Explain in detail about the FOCUS fusion components.
Analyze
BTL-4
UNIT 3- DATA MINING
Introduction – Data – Types of Data – Data Mining Functionalities – Interestingness of
Patterns –Classification of Data Mining Systems – Data Mining Task Primitives –
Integration of a Data Mining System with a Data Warehouse – Issues –Data Preprocessing.
1
PART A
Define Data mining? List out the steps in data mining?
Remember
BTL-1
Remember
BTL-1
3
List the steps involved in the process of KDD. How does it relate to
data mining.
List the ways in which interesting patterns should be mined?
Remember
BTL-1
4
Compare drill down with roll up approach.
Analyze
BTL-4
5
Describe what are the other kinds of data?
Understand
BTL-2
6
How would you illustrate key distribution center?
Apply
BTL-3
7
Analyze data characterization related to data discrimination.
Analyze
BTL-4
8
Define association and correlations?
Remember
BTL-1
9
List the primitives that specify a data mining task.
Remember
BTL-1
10
Evaluate the major tasks of data preprocessing.
Evaluate
BTL-5
11
Understand
BTL-2
12
Are all patterns generated are interesting and useful? Give reasons to
justify.
Classify different types of reductions.
Apply
BTL-3
13
Distinguish between data cleaning and noisy data.
Understand
BTL-2
14
Explain the principle elements of missing values in data cleaning?
Analyze
BTL-4
15
Discuss the roles of noisy data in data preprocessing.
Analyze
BTL-2
16
Can you develop the strategies in data reduction?
Create
BTL-6
17
Show how the attribute selection set is important in data reduction.
Apply
BTL-3
18
BTL-5
19
Consider
the
following
set
of
data
X
= Evaluate
{15,27,62,35,39,50,44,44,22,98} Do preprocessing using smoothing
by bin means and bin boundary to smooth the data, using a bin of
depth 3.Evaluate it.
Formulate the ideas in correlation analysis.
Create
20
Define an efficient procedure for cleaning the noisy data?
BTL-1
2
Remember
BTL-6
1
2
3
4
PART B
i) Demonstrate in detail about data mining steps in the process of Apply
knowledge discovery?
ii) List the application area of data mining?
Evaluate
Explain in detail about the Master Data
Management.
Create
(i).What is interestingness of a pattern? What approach would you
designed to mine interestingness of patterns?
(ii). Design and discuss in detail about integration of data mining
system with a data warehouse?
(i).Discuss whether or not each of the following activities is a data Understand
mining task.
1. Credit card fraud detection using transaction records.
2. Dividing the customers of a company according to their gender.
3. Computing the total sales of a company
4. Predicting the future stock price of a company using historical
records.
5. Monitoring seismic waves for earthquake activities.
BTL-3
BTL-5
BTL-6
BTL-2
(ii). Discuss on descriptive and predictive data mining tasks with
illustrations.
5
6
7
8
9
10
11
(i).State and Explain the various classification of data mining Analyze
systems with example?
(ii). Explain the various data mining functionalities in detail.
Suppose that the data for analysis include the attributed age.The age Apply
values
for
the
data
tuples
are
13,15,16,19,20,20,21,22,22,25,25,25,25,30,33,33,35,35,35,35,
36,40,45,46,52,70.
(i).use smoothing by bin means to smooth the above data using a bin
depth of 3. Illustrate your steps.
(ii). Classify the various methods for data smoothing.
(i). Distinguish between data generalization and characterizations.
(ii). Sketch the various phases of data mining and explain the
different steps involved in preprocessing with their significance
before mining, Give an example for each process.
i)Describe in detail about the issues of data mining
ii) Describe in detail about the applications of data mining?
i)Describe in detail about data integrity preprocessing
ii) Describe in detail about reduction in data preprocessing?
i)Describe in detail about various data transformation techniques
ii)Describe in detail about and discretization techniques?
List and explain the primitives for specifying a data mining task.
BTL-4
BTL-3
Understand BTL-2
Remember BTL-1
Remember BTL-1
Remember
BTL-1
Remember
BTL-1
12
13
14.
(i). How data mining system can be classified? discuss it.
Understand
(ii).Give the architecture of a typical data mining system.
Explain the following schemes used for integration of a data mining Analyze
system with a database or data warehouse system.
(i). No coupling
(ii). Loose coupling
(iii). Semi tight coupling
(iv). Tight coupling
Using Equi-depth binning method, partition the data given below into Analyze
4 bins and perform smoothing according to the following methods.
1. Smoothing by bin means
2. Smoothing by bin median
3. Smoothing by bin boundaries
24,25,26,27,28,56,67,70,70,75,78,89,89,90,91,94,95,96,100,102,103,
107,109,112.
(ii).What motivated data mining? Why is it important?
(iii).Explain the evaluation of database technology.
BTL-2
BTL-4
BTL-4
UNIT-4- ASSOCIATION RULE MINING AND CLASSIFICATION
Mining Frequent Patterns, Associations and Correlations – Mining Methods – Mining
various Kinds of Association Rules – Correlation Analysis – Constraint Based Association
Mining – Classification and Prediction - Basic Concepts - Decision Tree Induction Bayesian Classification – Rule Based Classification – Classification by Back propagation –
Support Vector Machines – Associative Classification – Lazy Learners – Other
Classification Methods – Prediction.
1
PART A
Define correlation and market basket analysis.
Remember
BTL-1
2
Formulate the principle frequent itemset and closed itemset.
Create
BTL-6
3
How would you explain the principle of Apriori algorithm?
How can the efficiency of an Apriori algorithm be improved?
Define Data pruning? State the need for pruning phase in
decision tree construction.
Compare the advantages of FP growth algorithm over apriori
algorithm.
Explain how will you generate association rules from
frequent itemsets?
What is naïve Bayesian classification? How is it differing
from Bayesian classification?
Discuss association rule mining .list the two interesting
measures of an association rule.
Describe the uses of multilevel mining with reduced
support?
Discuss the features of Decision tree induction.
Evaluate
BTL-5
Remember
BTL-1
Analyze
BTL-4
Analyze
BTL-4
Analyze
BTL-4
Understand
BTL-2
Understand
BTL-2
Understand
BTL-2
4
5
6
7
8
9
10
11
How would you evaluate attribute selection measures?
Evaluate
BTL-5
12
List the two interesting measures of an association rule.
Remember
BTL-1
13
Define decision tree induction.
Remember
BTL-1
14
Illustrate would result if a tree is not pruned.
Apply
BTL-3
15
Apply
BTL-3
16
How would you show your understanding about pessimistic
pruning?
Discuss decision tree induction?
Understand
BTL-2
17
What inference can you formulate with Bayes theorem?
Create
BTL-6
18
Demonstrate the Bayes classification methods.
Apply
BTL-3
19
Define Lazy leaners?
Remember
BTL-1
20
Define state vector machine?
Remember
BTL-1
Analyze
BTL-4
Understand
BTL-2
Remember BTL-1
Apply BTL-3
Evaluate
BTL-5
1
2
3
4
5
6
PART B
(i).Compare Classification and Prediction.
(ii). Explain the issues regarding classification and prediction.
(iii).Write and explain the algorithm for mining frequent item
sets without candidate generation.
i)How would you summarize in detail about mining methods?
ii)Summarize in detail about various kinds of association rules
Describe in detail about what you learned to develop constraint
and correlation based association mining?
(i).Develop an algorithm for classification using decision trees.
Illustrate the algorithm with a relevant example.
(ii).What approach would you use to apply decision tree
induction?
(i).What is Classification? What are the features of Bayesian
classification explain in detail with an example.
(ii). Explain how the Bayesian Belief Networks are trained to
perform classification.
(i). Giving concrete example , explain a method that performs Understand
frequent item set mining by using the prior knowledge of
frequent item set properties.
BTL-2
7
(ii). Discuss in detail the constraint based association mining.
(i).Examine in detail about Lazy learners with examples?
Remember
BTL-1
8
(ii). Describe with an example the process of mining multilevel
association rules from transactional databases.
(i).Describe in detail about frequent pattern classification?
Understand
BTL-2
(ii). Write an algorithm for FP-Tree Construction and discuss
9
10
11.
how frequent itemsets are generated from FP-Tree.
Consider a home finance loan to predict the housing loan Apply
payment. Design a general hierarchical a structure and analyze
the factors using rule discovery techniques to accurately predict
the number of loan payments in a given quarter/year. Loan is
availed for a period of 20 to 25 years, but an average life span
of the loan exists for only 7 to 10 years due to payment.
Make necessary assumptions: Maintenance record of the
customer details and details of the prevailing interest rates,
borrower characteristics, account dare, fine tune loan
prepayment such as interest rates and fees in order to maximize
the profits of the company. Elaborately discuss the association
rule mining issues. Also Examine on the multi level
association rules and find if you could relate any relation on
from the above application.
i)Generalize the Bayes theorem of posterior probability and Create
explain the working of a Bayesian classifier with an example.
ii)Formulate rule based classification techniques.
Explain and Apply the Apriori algorithm for discovering Analyze
frequent item sets of the table.
Trans ID
Items Purchased
101
Milk,bread,eggs
102
Milk,juice
103
Juice,butter
104
Milk,bread,eggs
105
Coffee,eggs
106
Coffee
107
Coffee,Juice
108
109
110
12.
13.
14.
BTL-3
BTL-6
BTL-4
Milk,bread,cookies,eggs
Cookies,butter
Milk,bread
Use 0.3 for the minimum support value. Illustrate each step of
the Apriori Algorithm.
(i).Define classification? With an example explain how support Remember
vector machines can be used for classification.
(ii). What are the prediction techniques supported by a data
mining systems?
(i). Explain the algorithm for constructing a decision tree from Analyze
training samples.
(ii). Write Bayes theorem.
1.Describe in detail about the following Classification methods Remember
(i). Bayesian classification
(ii). Fuzzy set approach
(iii). Genetic algorithms.
2. Describe in detail Classification by Back propagation.
BTL-1
BTL-4
BTL-1
UNIT 5- CLUSTERING AND TRENDS IN DATA MINING
Cluster Analysis - Types of Data – Categorization of Major Clustering Methods – Kmeans–Partitioning Methods – Hierarchical Methods - Density-Based Methods –Grid
Based Methods –Model-Based Clustering Methods – Clustering High Dimensional Data Constraint – Based ClusterAnalysis – Outlier Analysis – Data Mining Applications.
PART A
1
BTL-1
2
Identify what changes would you make to solve the problem in Remember
cluster analysis.
Define K-means partitioning?
Remember
3
What is a STING? List the approaches in grid based methods?
Remember
BTL-1
4
Explain why a cluster has to be evaluated?
Analyze
BTL-4
5
Illustrate the intrinsic methods in cluster analysis.
Apply
BTL-3
6
How do you explain the extrinsic and intrinsic methods?
Evaluate
BTL-5
7
Define biclustering?
Remember
BTL-1
8
Apply BTL-3
Apply BTL-3
Create BTL-6
11
How would you classify the difference between bicluster and
spectral clustering?
What the services are provided by clustering graph and network
data. Classify it.
Formulate
the role of application and challenges in
clustering.
Organize the design goals of graph clustering methods.
Create BTL-6
12
Classify the hierarchical clustering methods.
Analyze
BTL-4
13
Distinguish between hard constraints and soft constraints.
Understand
BTL-2
14
Define outlier. Examine the outlier analysis.
Remember
BTL-1
15
Discuss the challenges of outlier detection.
Understand BTL-2
16
Distinguish between Classification and clustering.
Understand BTL-2
17
Evaluate what information is used by outlier detection method.
Evaluate
BTL-5
18
Give the categorization of major clustering methods.
Understand
BTL-2
19
Define semi supervised method?
Remember
BTL-1
20
Explain the typical phases of outlier detection methods?
Analyze
BTL-4
9
10
PART B
BTL-1
1
2
3
4
5
6
7
8
9
10
11
12.
13.
14
(i).Based on what you know how will you explain cluster
analysis
(ii).Analze the Requirements of clustering in Data Mining.
(i).Describe in detail about categorization of major clustering
methods?
(ii).List oput the General applications of Clustering.
What is clustering? Describe in detail about the features of Kmeans partioning method?
i) Explain in detail about hierarchical based method?
ii)Explain in detail about density based methods?
i)What is grid based clustering? With an example explain an
algorithm for grid based clustering.
ii) Examine in detail about proximity based approach?
(i).Demonstrate in detail about model based clustering
methods.
(ii).Illustrate the topic on
1. CLIQUE
2. DBSCAN
(i).Demonstrate on clustering high dimensional data?
(ii).Consider five points { X 1, X 2, X 3, X 4, X 5 } with the
following coordinates as a two dimensional sample for
clustering:
X 1 = (0,2.5); X 2 = (0,0); X 3 = (1.5,0); X 4 = (5,0); X 5 = (5,2)
Illustrate the K-means partitioning algorithm using the above
data set.
i)How would you discuss the outlier analysis .
ii)Discuss in detail about the various detection techniques in
outlier.
(i).Will you explain in detail about data mining applications.
(ii). With relevant examples summarize in detail about
constraint based cluster analysis.
Design statistical approaches in outlier detection with neat
design and with examples.
(i). Discuss the various clustering method in Datamining.
(ii).Describe the working of PAM (Partioning Around
Medoids) algorithm with suitable example.
(i). Disucss in detail about the different types of data in cluster
analysis.
(ii). Discuss the following clustering algorithm using examples.
1. K.means
2. K-medoid.
(i). Write the Comparison between CLARA and CLARANS.
(ii).Discuss the use of Outlier analysis.
What is outlier mining important? Briefly describe the
different approaches behind statistical –based outlier detection,
distance based outlier detection and deviation based outlier
detection.
Analyze
BTL-4
Remember BTL-1
Remember BTL-1
Analyze
BTL-4
Remember
BTL-1
Apply
BTL-3
Apply BTL-3
Understand BTL-2
Evaluate
BTL-5
Create
BTL-6
Understand
BTL-2
Understand
BTL-2
Analyze
BTL-4
Remember
BTL-1