Download DATAMINING WAREHOUSING - E

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
NEHRU ARTS AND SCIENCE COLLEGE
T.M PALAYAM, COIMBATORE
PG & RESEARCH DEPARTMENT OF COMPUTER SCIENCE
QUESTION BANK
CLASS: I MSc CS
SUBJECT NAME: DATA MINING AND WAREHOUSING
UNIT-1
SECTION-A
ONE MARKS:
1. The Data accessed is usually a different version from that of the original
operational database.
a) Query
b) Data
c) Output
d) Model
2. The Output of the data mining query probably is not a subset of the database.
a) Query
b) Data
c) Output
d) Model
3. A Predictive Model makes a prediction about values of data using known
results found from different data.
a) Predictive model
b) Descriptive model c) Both a& b
d) None
4. A Descriptive model identifies patterns or relationships in data.
a) Predictive model
b) Descriptive model c) Both a& b
d) None
5. Classification maps data into predefined groups or classes.
a) Classification
b) Regression
c) Prediction
d) Time series
6. A Regression is used to map a data item to a real valued predication variable.
a) Classification
b) Regression
c) Prediction
d) Time series
7. Clustering is similar to classification except that the groups are not predefined.
a) Regression
b) Clustering
c) Association
d) Summarization
8. A Summarization maps data into subsets with associated simple descriptions.
a) Query
b) Model
c) Summarization
d) Association
9. Link analysis is alternatively referred to as Affinity analysis.
a) Clustering
b) Prediction
c) Link analysis d) None
10. Sequential analysis is also known as Sequence discovery.
a) Selection
b) Sequence analysis c) Preprocessing d) Data mining
11. Both a & b is used to determine sequential patterns in data.
a) Sequence analysis
b) Sequence discovery c) Both a & b d) None
12. KDD stands for Knowledge Discovery in Databases.
a) Knowledge Discovery in Databases
b) Knowledge Detection in Databases
c) Knowledge Discovery in Data mining
d) Knowledge Domain in Databases
13. KDD is the process of finding useful information and patterns in data.
a) CAD
b) DTD
c) KDD
d) CD
14. The KDD consists of 5 steps.
a) 3
b) 4
c) 5
d) 6
15. The data needed for the data mining process may be obtained from many
different & heterogeneous data sources is Selection.
a) Transformation
b) Data mining
c) Selection
d) Evaluation
16. The data from different sources must be converted into a common format for
processing is Transformation.
a) Transformation
b) Data mining
c) Selection
d) Evaluation
17. The data to be used by the process may have incorrect or missing data is Pre processing.
a) Transformation
b) Data mining
c) Selection
d) Pre-processing
18. Visualization refers to the visual presentation of data.
a) Graphical
b) Icon based c) Visualization
d) Pixel based
19. Geometric techniques include the box plot and scatter diagram techniques.
a) Graphical
b) Icon based c) Visualization
d) Geometric
20. Some attributes in the database might not be of interest to the data mining task
being developed is Irrelevant data.
a) Missing data
b) Irrelevant data
c) Multimedia data d) None
21. A Conventional database scheme may be composed of many different
attributes is High dimensionality.
a) High dimensionality b) Low dimensionality c) Medium Dimensionality
d) All of these
22. Outliers often many data entries that do not fit nicely into derived model.
a) Large dataset
b) Outliers
c) Selection
d) Application
23. A large database can be viewed as using Approximation.
a) Large dataset
b) Outliers
c) Selection
d) Approximation
24. A segmentation a database is partitioned into disjoined groupings of similar
tuples called Segments.
a) Segments
b) Association c) Dimensional
d) Outliers
25. Data mining can consists of 3 parts.
a) 3 b) 4 c) 5 d) 6
SECTION-B
5 MARKS:
1. Write a short note on Data mining Vs Knowledge discovery in databases.
2. Write a short note on Development of Data mining.
3. Write a short note on Summarization.
4. Write a short note on Sequence Discovery.
5. Write a short note on Social implications of data mining.
SECTION-C
8 MARKS:
1. Explain in detail about Data mining from a database perspective.
2. Explain in detail about,
i) Classification
ii) Regression
iii) Time series analysis
3. Explain in detail about,
i) Predication
ii) Clustering
iii) Association Rules
4. Explain in detail about Data mining Issues.
5. Explain in detail about Data mining Metrics.
UNIT-2
SECTION-A
ONE MARKS:
1. Parametric model describe the relationship between input & output through the
use of algebraic equations.
a) Parametric model b) Non-parametric model c) Both a & b d) None
2. The squared error is often examined for a specific predication to measure
accuracy rather than to look at the average difference.
a) RMS B) Squared error c) Unbiased
d) Biased
3. RMS stands for Root Mean Square.
a) Root Mean Square
b) Root Median Square c) Range Mean Square d)
Range Median Square
4. The RMS may also be used to estimate error or as another statistic to describe a
distribution.
a) RMS B) Squared error c) Unbiased
d) Biased
5. Pointer estimation refers to the process of estimating a population parameter.
a) Parametric model
b) Non-parametric model c) Both a & b d) Pointer
estimation
6. MLE stands for Maximum Likelihood Estimate.
a) Maximum Likelihood Estimate b) Maximum Likelihood Effort c) Maximum
Likelihood Error d) Maximum Likelihood Extent
7. Expectation Maximization algorithm is an approach that solves the estimation
problem with incomplete data.
a) RMS B) Squared error c) Unbiased
d) Expectation Maximization
8. Frequency Distribution provides an even better model of data.
a) Histogram b) Frequency distribution c) Both a & b d) None
9. Hypothesis testing attempts to find a model that explains the observed data by
first creating a hypothesis.
a) Alternative hypothesis b) Hypothesis testing c) Both a & b d) None
10. Correlation can be used to evaluate the strength of a relationship between two
variables.
a) Linear b) Correlation c) Hypothesis
d) RMS
11. Linear regression assumes that a linear relationship exists between the input
data the output data.
a) Linear regression b) Correlation c) Hypothesis
d) RMS
12. A Decision tree is a predictive modeling technique used in classification tasks.
a) Decision tree
b) Correlation
c) Input database
d) Binary search
13. A Decision tree is a tree where the root and each internal node is labeled with
a question.
a) Input tree
b) Output tree
c) Decision tree
d) All of these
14. Decision tree consists of 3 parts.
a) 2
b) 3 c) 4
d) 5
15. Neural networks is also known as Artificial Neural Networks.
a) Artificial Neural Networks
b) Artificial Neural data
c) Artificial
Network data
d) Artificial Neural interface
16. ANN stands for Artificial Neural Networks.
a) a) Artificial Neural Networks
b) Artificial Neural data
c) Artificial
Network data
d) Artificial Neural interface
17. A neural network consists of 3 parts.
a) 2 b) 3 c) 4 d) 5
18. An activation function may also known as Firing rule.
a) Firing rule b) Threshold c) Linear d) All of these
19. An activation function is sometimes called a Both a & b.
a) Processing element function b) Squashing function c) Both a& b d) None
20. The linear threshold function also called a Both a & b.
a) Ramp function b) Piecewise function c) Both a & b d) None
21. Genetic Algorithm are examples of evolutionary computing methods are
optimization type algorithms.
a) Gaussian law
b) Genetic algorithm c) Hyperbolic tangent d) None
22. A Genetic algorithm is a computational model consisting of 5 parts.
a) 3 b) 4 c) 5 d) 6
23. The precise algorithm that indicates how to combine the given set of
individuals to produce new once is crossover algorithm.
a) Crossover algorithm
b) Genetic algorithm
c) Hyperbolic tangent d)
None
24. A Linear activation function produces a linear output value based on the input.
a) Linear
b) Threshold
c) Activation
d) Genetic algorithm
25. A neural network consists of 2 parts.
a) 2 b) 3
c) 4
d) 5
SECTION-B
5 MARKS:
1. Write a short note on Point estimation.
2. Write a short note on Models based on summarization.
3. Write a short note on Bayes Theorem.
4. Write a short note on Hypothesis Testing.
5. Write a short note on Regression & Correlation.
SECTION-C
8 MARKS:
1. Explain in detail about Similarity measures.
2. Explain in detail about Decision trees.
3. Explain in detail about neural networks.
4. Explain in detail about Activation functions.
5. Explain in detail about Genetic algorithms.
UNIT-3
SECTION-A
ONE MARKS:
1. Regression problems deal with estimation of an output value based on input
values.
a) Classification b) Data Mining c) Regression d) Statistical
2. ROC Stands for Both a & b.
a) Relative Operating Characteristic b) Receiver Operating Characteristic
c) Both a & b
d) None
3. KNN Stands for K Nearest Neighbors.
a) K nearest Neighbors b) K Notification Neighbors c) K Notation Neighbors
d) None
4. CART is a technique that generates a binary decision tree.
a) KNN b) CART c) ROC
d) RRC
5. RBF Stands for Both a & b.
a) Radial Function b) Radial Basis Function c) Both a & b d) None
6. RBF is a class of functions whose value decreases with the distance from a
central point.
a) RBF b) KNN c) CART d) ROC
7. A Perceptrons is a single neuron with multiple inputs & one output.
a) Perceptrons
b) Rule based algorithm c) Generating Rules d) None
8. Multiple Independent approaches can be applied to a classification problem.
a) Multiple Dependent b) Multiple Independent c) Both a & b d) None
9. DCS Stands for Dynamic Classifier Selection.
a) Data Classifier Selection
b) Date Class Selection
c) Dynamic Classifier Selection
d) Dynamic Class Selection.
10. AVC Stands for Attribute Value Class.
a) Attribute Value Class
b) Attribute Virtual Class
c) Attribute Virtual Collections
d) Attribute Value Collections.
11. CART Stands for Classification & Regression Trees.
a) Class & Regression Trees
b) Classification & Regression Trees
c) Class & Rotational Trees
d) Classification & Rotational Trees
12. A Subtree is replaced by a leaf node if this replacement results in an error rate
close to that of the original tree.
a) Selection Tree
b) Sub Tree
c) Regression Tree d) None
13. ID3 technique to building a decision tree is based on information theory &
attempt to minimize the expected number of comparison.
a) ID2
b) ID3
c) Both a & b
d) None
14. A tuple is classified based on the region into which it falls.
a) Tuple
b) Decision Tree c) Sub Tree d) Classification
15. The data are divided into regions based on class is Division.
a) Division b) Prediction
c) tuple
d) Tree
16. The formulas are generated to predict the output class value is Prediction.
a) Division b) Prediction
c) tuple
d) Tree
17. Classification accuracy is usually calculated by determining the percentage of
tuples placed in the correct class.
a) Classification
b) Division
c) Trees
d) Prediction
18. Missing Data values cause problems during both the training phase & to the
classification process.
a) Decision tree
b) Missing tree
c) Classification tree
d) prediction tree
19. Missing Data is the training data must be handled & may produce an
inaccurate result.
a) Decision tree
b) Missing tree
c) Classification tree
d) prediction tree
20. There are 3 methods used to solve the classification problem.
a) 2 b) 3
c) 4 d) 5
21. The Logistic curve gives a value between 0 & 1 so it can be interpreted as the
probability of class membership.
a) Plain curve
b) Logistic curve
c) Linear curve
d) Non-linear curve
22. Regression can be used to perform 2 approaches.
a) 2 b) 3
c) 4
d) 5
23. The common classification scheme based on the use of distance measures is
KNN.
a) KNN
b) CART
c) SRT
d) ROC
24. The classification problem using decision trees is 2 process.
a) 2
b) 3
c) 4
d) 5
25. Pruning remove redundant comparison or remove sub trees to achieve better
performance.
a) Pruning
b) KNN
c) Training tree
d) Decision tree
SECTION-B
5 MARKS:
1. Write a short note on Issues in classification.
2. Write a short note on Regression.
3. Write a short note on Bayesian classification.
4. Write a short note on Simple approach.
5. Write a short note on K Nearest neighbors
SECTION-C
8 MARKS:
1. Explain in detail about Decision tree based algorithm.
2. Explain in detail about,
i) ID3
ii) C4.5
3. Explain in detail about Neural Network based algorithms.
4. Explain in detail about,
i) CART
ii) Scalable DT techniques
5. Explain in detail about Rule based Algorithm.
UNIT-4
SECTION-A
ONE MARKS:
1. A Data Warehouse is a repository of subjectively selected & adapted
operational data.
a) Data
b) Data Warehouse
c) Data mart
d) None
2. OLAP Stands for Online Analytical Processing.
a) Online Analytical Processing
b) Online Analytical Problem
c) Online Analytical Process
d) Online Analytical Proceeding
3. OLAP is prepared periodically but is directly based on detailed reference.
a) Data
b) Data Warehouse
c) Data mart
d) OLAP
4. The individual departmental components are called Data marts.
a) Data
b) Data Warehouse
c) Data mart
d) Data Morphing
5. Data can be classified into 3 categories.
a) 2
b) 3
c) 4
d) 5
6. Both a & b data originates from operational system & is normally kept in a
conventional database system.
a) Reference
b) Transaction
c) Both a & b d) None
7. Denormalized data which is the basic for OLAP tools.
a) Reference
b) Transaction
c) Both a & b d) Denormalized data
8. Data marts can be classified into 2 groups.
a) 2
b) 3
c) 4
d) 5
9. Data Warehouse is only a collection of data marts.
a) Data
b) Data Warehouse
c) Data mart
d) None
10. The data mart is loaded with data from a data warehouse by means of a Load
Program.
a) Data
b) Data Warehouse
c) Data mart
d) Load Program
11. Metadata describes the details about the data in a data warehouse or data mart.
a) Data
b) Data Warehouse
c) Data mart
d) Meta data
12. A formal data is required to be built for a large data mart which may also have
some processing.
a) Data
b) Data Warehouse
c) Data mart
d) Formal data
13. Reference data stored in addition to basic data in the data mart help & enable
the end users of the data mart.
a) External data
b) Internal data
c) Reference data d) None
14. Monitoring mainly relates data usage to data content Tracking.
a) Tracking
b) Data Warehouse
c) Data mart
d) None
15. OLTP Stands for Online Transaction Processing.
a) Online Transaction Processing
b) Online Transaction Process
c) Online Transaction Problem
d) Online Transaction Proceeding
16. A very popular & early approach for achieving analytical processing is Both a
& b.
a) Star schema
b) Collection model c) Both a & b d) None
17. The star schema provides a multidimensional view.
a) Star schema b) Collection model c) Data model d) None
18. OLAP Tools can be broadly classified into 2 categories.
a) 2
b) 3
c) 4
d) 5
19. MOLAP tools presuppose the data to present in a multidimensional database.
a) OLAP
b) MOLAP
c) ROLAP d) None
20. MOLAP based products organize, navigate & analyse data typically in an
aggregate form.
a) MOLAP
b) OLAP
c) ROLAP
d) None
21. ROLAP is the latest & fastest growing OLAP segment in the market.
a) MOLAP
b) OLAP
c) ROLAP
d) None
22. MQE Stands for Managed Query Environment.
a) Managed Query Environment
b) Managed Query Enhanced
c) Managed Quality Environment
d) Memory Query Environment
23. OLAP is to enable capability for users to perform limited analysis directly
against RDBMS products.
a) MOLAP
b) OLAP
c) ROLAP
d) None
24. The analytical data used by power play is stored in multidimensional data sets
called PowerCubes.
a) MOLAP
b) OLAP
c) ROLAP
d) Powercubes
25. IBI is a multidimensional database technology for OLAP & data warehousing.
a) MOLAP
b) OLAP
c) ROLAP
d) IBI
SECTION-B
5 MARKS:
1. Write a short note on Types of Data Mart.
2. Write a short note on Software Components for a data mart.
3. Write a short note on loading a data mart.
4. Write a short note on Metadata for a data mart.
5. Write a short note on OLAP Tools & Internet.
SECTION-C
8 MARKS:
1. Explain in detail about Characteristics of a Data Warehouse.
2. Explain in detail about other aspects of Data mart.
3. Explain in detail about Security in a Data mart
4. Explain in detail about OLAP Tools
5. Explain in detail about Data Modeling.
UNIT-5
SECTION-A
ONE MARKS:
1. A Data Warehouse can be built either on a Both a & b.
a) Top-down
b) Bottom-up
c) Both a & b
d) None
2. Metadata defines the contents & location of the data in the data warehouse.
a) Metadata
b) Metaphor
c) Data Warehousing d) None
3. OLAP application on a data warehouse are not calling for every stringent
a) MOLAP
b) OLAP
c) ROLAP
d) IBI
4. CASE tool used to design the database in the data warehouse.
a) MOLAP
b) OLAP
c) ROLAP
d) CASE
5. A fact table is a large control table in a dimensional design that has a multi part
key.
a) Fact table
b) OLAP
c) ROLAP
d) IBI
6. A disk controller supports a certain amount of data throughput.
a) Disk Problem b) Disk Controller
c) Disk Schedule d) None
7. Data Warehouse can be internet or intranet enabled as the choice.
a) Data mart
b) Data Warehouse c) Both a & b d) None
8. A Data Warehouse cannot be purchased &installed.
a) Data mart
b) Data Warehouse c) Both a & b d) None
9. The important means of preparing the government to face the challenges of the
new millennium is Data Mining.
a) Data Mining
b) Data Warehouse c) Both a & b d) None
10. Data Mining can be performed for analysis & knowledge discovery.
a) Data mart
b) Data Warehouse c) Data Mining d) None
11. MIS Stands for Management Information System.
a) Management Information System
b) Management Input System
c) Management Information Software
d) Memory Information System
12. The other sectors can be categorized in to 5 types.
a) 3
b) 4
c) 5
d) 6
13. Economic affairs are the budget & expenditure data & annual economic
survey.
a) Economic affairs b) Tourism c) Audit
d) Revenue
14. Revenue is the customs data central excise data & commercial taxes data.
a) Economic affairs b) Tourism c) Audit
d) Revenue
15. Programme Implementation is central projects data for Monitoring.
a) Scheduling
b) Monitoring
c) Auditing
d) None
16. Commerce & trade can be analyzed & converted into a data warehouse.
a) Commerce & trade b) Schedule c) Trading
d) All of these
17. Drinking water census can be effectively utilized by OLAP & data mining
technologies.
a) Economic affairs b) Tourism c) Drinking water
d) Revenue
18. Data warehouse can be built for state plan data on all sectors in Planning.
a) Sector
b) Planning
c) Drinking water
d) Data Warehouse
19. Community needs assessment data, immunization data, data from national
programmers is in Health.
a) Health
b) Planning
c) Drinking waters
d) None
20. Land use pattern can also be analyzed in a warehousing environment.
a) Land use pattern b) Planning
c) Drinking water
d) None
21. Monitoring progress made on implementation of rural development
programmers.
a) Monitoring
b) Planning
c) Drinking water
d) None
22. The government departments have largely been satisfied with developing MIS.
a) MIS
b) Planning
c) Drinking water
d) None
23. The forecasting model can be strengthened for more accurate forecasting by
taking into account the external factors.
a) Planning
b) Forecasting model
c) Drinking water
d) None
24. Data Mining technologies have extensive potential applications in the
government.
a) Data Mining
b) Planning
c) Drinking water
d) None
25. Tourism exchange earnings data & hotels, travel & transportation data.
a) Planning
b) Tourism
c) Both a & b d) None
SECTION-B
5 MARKS:
1. Write a short note on Metadata.
2. Write a short note on Tools for Data Warehousing.
3. Write a short note on Distribution of Data.
4. Write a short note on Data Content.
5. Write a short note on Performance Considerations
SECTION-C
8 MARKS:
1. Explain in detail about Data Warehouse Architectural Strategies &
Organizational issues.
2. Explain in detail about National Data Warehouses.
3. Explain in detail about other areas for data warehousing & Data Mining.
4. Explain in detail about Design Consideration.
5. Explain in detail about Applications of Data Warehousing.