Download Question Bank/Assignment

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

K-means clustering wikipedia , lookup

Cluster analysis wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
GOVERNMENT ENGINEERING COLLEGE-MODASA
MODASA
Question Bank
Faculty
Subject Name
Subject Code
Semester
Department
Term
Prof. C. H. Makwana
Data Warehousing and Data Mining
171601
7th
Information Technology
ODD 2015
Chapter 1: Introduction to Data Warehousing
1. Define following terms:
1. Data Mart
2. Enterprise Warehouse
3. Virtual Warehouse
(Summer 2014)
2. Explain Data Cube with all OLAP Operations in brief. (Winter 2013)
 Explain different OLAP operation with example. (Summer 2015)
 What is Cuboid? Explain various OLAP operations on data cube with suitable example. (Winter 2014)
3. Differentiate OLAP vs. OLTP. (Winter 2014, Nov/Dec 2011)
 Differentiate between Operational Database System and Data Warehouse.(Summer 2014)
4. Explain Star, Snowflake, Fact Constellation Schema for Multidimensional Database. (Winter 2013)
 Explain Star and Fact Galaxy schemas used in data warehouse for multidimensional database. (Winter
2014)
 Suppose that a data warehouse consists of the three dimensions time, doctor, and patient, and the two
measures count and charge, where charge is the fee that a doctor charges a patient for a visit.
1) Draw a star schema diagram for the data warehouse.
2) Starting with the base cuboid [day, doctor, patient], what specific OLAP operations should be
performed in order to list the total fee collected by each doctor in 2004? (Summer 2015)
5. Differentiate Fact table vs. Dimension table. (Winter 2014)
6. Explain three tier architecture of Data Warehouse (Winter 2013, Nov/Dec 2011)
7. Discuss the application of data warehousing and data mining in government sector. (Winter 2014)
Chapter 2: Introduction to Data Mining
1. Define Data Mining. Explain various application area of Data Mining Techniques and Knowledge Discovery
Process in brief. (Winter 2013)
 Define the term “Data Mining”. With the help of a suitable diagram explain the process of knowledge
discovery from databases.
2. Define KDD. How data mining techniques applied over multimedia database, temporal database and spatial
database to extract useful knowledge. (Winter 2014)
 Explain the KDD process in details. (Nov/Dec 2011)
3. What is Data Mining? List Challenges to data mining regarding data mining methodology and user- interaction
issues. (Summer 2014)
4. Write short note on Spatial, Legacy and Multimedia Database. (Summer 2014)
5. List and describe major issues in data mining. (Nov/Dec 2011)
6. Explain metadata repository. (Nov/Dec 2011)
1|Page
Chapter 3: Data Preprocessing and Data Mining Primitives
1. What is data cleaning? Discuss various ways of handling missing values during data cleaning.
 List and describe methods for handling missing values in data cleaning (Summer 2014, Nov/Dec 2011)
2. What is noise? Explain the different techniques to remove the noise from data. (Summer 2015, Summer 2014)
 What is noise? Describe the possible reasons for noisy data. Explain the different techniques to remove the
noise from data. (Nov/Dec 2011)
 What is noise? Explain data smoothing methods as noise removal technique to divide given data into bins
of size 3 by bin partition (equal frequency), by bin means, by bin medians and by bin boundaries. Consider
the data: 10, 2, 19, 18, 20, 18, 25, 28, 22 (Winter 2014)
 Explain Following:
1. Binning method
2. How to handle missing values in Data Set. (Winter 2013)
3. What are the major challenges of mining a huge amount of data in comparison with mining a small amount of
data? (Summer 2015)
4. Define sampling. Explain different type of sampling techniques with example. (Summer 2015)
5. Discuss why analytical data characterization is needed and how it can be performed. (Summer 2014)
6. How to compute the dissimilarity between objects described by the following types of variables:
1) Interval-scaled variables
2) Asymmetric binary variables
3) Categorical variables
(Summer 2015)
7. Suppose that the data for analysis includes the attribute age. The age values for the data tuples are (in increasing
order):
13, 15, 16, 16, 19, 20, 23, 29, 35, 41, 44, 53, 62, 69, 72
i) Use min-max normalization to transform the value 45 for age onto the range [0:0, 1:0]
ii) Use z-score normalization to transform the value 45 for age, where the standard
deviation of age is 20.64 years.
(Winter 2014)
8. Explain Data Discretizaion and Concept Hierarchy Generation in brief. (Winter 2013)
 What is concept hierarchy? List and explain types of concept hierarchy in detail. (Winter 2014, Summer
2014)
9. Explain data transformation in data mining. (Nov/Dec 2011)
2|Page
Chapter 4: Concept Description and Association Rule Mining
1. Explain with an example attribute removal and attribute generalization. (Summer 2014)
2. What is Measures? List and explain types of measures. (Summer 2014)
3. Explain Market Basket Analysis with it’s use and Association Rules in brief. (Winter 2013)
4. Write an algorithm for finding frequent item-sets using candidate generation.
 Describe the list of techniques for improving the efficiency of Apriori-based mining.
 State the Apriori Property. Generate large itemsets and association rules using Apriori algorithm on the
following data set with minimum support value and minimum confidence value set as 50% and 75%
respectively.
TID
Items Purchased
T101
Cheese, Milk, Cookies
T102
Butter, Milk, Bread
T103
Cheese, Butter, Milk, Bread
T104
Butter, Bread
(Winter 2014)
 Explain Apriori Algorithm for finding Frequent Item-sets. (Winter 2013)
5. List two shortcomings of the algorithms which helped in improving the efficiency of Apriori algorithm. Discuss
any TWO variations of the Apriori algorithm to improve the efficiency. (Summer 2014)
6. State how the partitioning method may improve the efficiency of association mining. (Summer 2014)
7. Why strong association rule is not always interesting? Explain with example. (Summer 2015)
8. How multilevel association rules can be mined efficiently using concept hierarchy? (Summer 2015)
9. Explain Outlier Analysis Techniques. (Winter 2013)
3|Page
Chapter 5: Classification and Clustering
1. Explain Following terms:
1. Information Gain
2. Mean, Median, Mode. (Winter 2013)
 Short note: Information gain, Gain ratio, Gini index.
2. Explain rule based classification and case based reasoning in details. (Nov/Dec 2011)
3. What is decision tree induction? Write Basic algorithm for inducing a decision tree from training tuples.
(Summer 2015)
 Briefly outline the major steps of decision tree classification. (Summer 2014)
 Explain Classification with Decision Tree Induction method. (Winter 2013)
4. List strengths and weakness of neural network as classifier.
(Summer 2015)
5. Explain how the accuracy of a classifier can be measured. How Bagging strategy helps improving the classifier
accuracy?
(Winter 2014)
6. What is supervised learning? Using the given table, show how the ROOT splitting attribute is selected using
InfoGain measure in the overall process of decision tree induction.
Class
Windy
1
False
N
2
True
N
3
False
P
4
False
P
5
False
P
6
True
N
7
True
P
8
False
N
9
False
P
10
False
P
11
True
P
12
True
P
13
False
P
14
True
N
(Winter 2014)
7. What are neural networks? Describe the various factors which make them useful for classification and prediction
in
data
mining.
Explain
how
the
topology
of
neural
network
is
designed.
(Winter 2014)
8. Why naïve Bayesian classification is called “naïve”? Briefly outline the major ideas of naïve Bayesian
classification. (Summer 2014)
 Explain Baye’s Theorm and Naïve Bayesian Classification. (Winter 2013)
10. Write the typical requirements of clustering in data mining.
11. What is Cluster Analysis? List and explain requirements of clustering in data mining. (Summer 2014)
12. How can distance be computed for attributes that having missing valves in K-Nearest Neighbor classifier?
(Summer 2015)
 Explain k-means and k-medoids algorithm for clustering. (Winter 2013, Nov/Dec 2011)
 How K-Mean clustering method differs from K-Medoid clustering method? Discuss the process of K-Mean
clustering.
Also
outline
major
drawbacks
of
K-Mean
clustering
technique.
(Winter 2014)
 Write the steps of the k-medoids clustering algorithm with limitation. (Summer 2014)
13. Suppose that the data mining task is to cluster the following eight points (with (x, y) representing location) into
three clusters:
A1(2, 10), A2(2, 5), A3(8, 4), B1(5, 8), B2(7, 5), B3(6, 4), C1(1, 2), C2(4, 9):
The distance function is Euclidean distance. Suppose initially we assign A1, B1, and C1 as the center of each
cluster, respectively. Use the k-means algorithm to show
1) The three cluster centers after the first round execution
2) The final three clusters
(Summer 2015)
14. Explain Linear Regression and Non-linear Regression techniques of prediction. (Winter 2014)
No
Outlook
Sunny
Sunny
Overcast
Rain
Rain
Rain
Overcast
Sunny
Sunny
Rain
Sunny
Overcast
Overcast
Rain
4|Page
Attributes
Temperature
Humidity
Hot
High
Hot
High
Hot
High
Mild
High
Cool
Normal
Cool
Normal
Cool
Normal
Mild
High
Cool
Normal
Mild
Normal
Mild
Normal
Mild
High
Hot
Normal
Mild
High

Explain linear regression? What are the reasons for not using the linear regression model to estimate the
output data? (Summer 2015)
 Explain Linear Regression with example. (Winter 2013)
15. Differentiate Classification and Clustering. (Winter 2013)
16. Short Note: Distributive and Holistic measures.
Chapter 6: Advance Topics of Data Mining and its Applications
1. Explain different types of web mining with suitable example. (Summer 2014)
 What is web log? Explain web structure mining and web usage mining in detail. (Winter 2014)
 Explain use of Data Mining techniques in Web/Internet Technology.
 What are the challenges for effective resource and knowledge discovery in mining the World Wide Web?
2. Explain the information retrieval methods used in text mining. (Winter 2014, Summer 2014)
3. Explain mining in following Databases with example.
1. Temporal Databases
2. Sequence Databases
3. Spatial Databases and Spatiotemporal Databases. (Winter 2013)
4. Explain the methodologies for stream data processing and stream data Systems. (Nov/Dec 2011)
5|Page