Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
NEHRU ARTS AND SCIENCE COLLEGE T.M PALAYAM, COIMBATORE PG & RESEARCH DEPARTMENT OF COMPUTER SCIENCE QUESTION BANK CLASS: I MSc CS SUBJECT NAME: DATA MINING AND WAREHOUSING UNIT-1 SECTION-A ONE MARKS: 1. The Data accessed is usually a different version from that of the original operational database. a) Query b) Data c) Output d) Model 2. The Output of the data mining query probably is not a subset of the database. a) Query b) Data c) Output d) Model 3. A Predictive Model makes a prediction about values of data using known results found from different data. a) Predictive model b) Descriptive model c) Both a& b d) None 4. A Descriptive model identifies patterns or relationships in data. a) Predictive model b) Descriptive model c) Both a& b d) None 5. Classification maps data into predefined groups or classes. a) Classification b) Regression c) Prediction d) Time series 6. A Regression is used to map a data item to a real valued predication variable. a) Classification b) Regression c) Prediction d) Time series 7. Clustering is similar to classification except that the groups are not predefined. a) Regression b) Clustering c) Association d) Summarization 8. A Summarization maps data into subsets with associated simple descriptions. a) Query b) Model c) Summarization d) Association 9. Link analysis is alternatively referred to as Affinity analysis. a) Clustering b) Prediction c) Link analysis d) None 10. Sequential analysis is also known as Sequence discovery. a) Selection b) Sequence analysis c) Preprocessing d) Data mining 11. Both a & b is used to determine sequential patterns in data. a) Sequence analysis b) Sequence discovery c) Both a & b d) None 12. KDD stands for Knowledge Discovery in Databases. a) Knowledge Discovery in Databases b) Knowledge Detection in Databases c) Knowledge Discovery in Data mining d) Knowledge Domain in Databases 13. KDD is the process of finding useful information and patterns in data. a) CAD b) DTD c) KDD d) CD 14. The KDD consists of 5 steps. a) 3 b) 4 c) 5 d) 6 15. The data needed for the data mining process may be obtained from many different & heterogeneous data sources is Selection. a) Transformation b) Data mining c) Selection d) Evaluation 16. The data from different sources must be converted into a common format for processing is Transformation. a) Transformation b) Data mining c) Selection d) Evaluation 17. The data to be used by the process may have incorrect or missing data is Pre processing. a) Transformation b) Data mining c) Selection d) Pre-processing 18. Visualization refers to the visual presentation of data. a) Graphical b) Icon based c) Visualization d) Pixel based 19. Geometric techniques include the box plot and scatter diagram techniques. a) Graphical b) Icon based c) Visualization d) Geometric 20. Some attributes in the database might not be of interest to the data mining task being developed is Irrelevant data. a) Missing data b) Irrelevant data c) Multimedia data d) None 21. A Conventional database scheme may be composed of many different attributes is High dimensionality. a) High dimensionality b) Low dimensionality c) Medium Dimensionality d) All of these 22. Outliers often many data entries that do not fit nicely into derived model. a) Large dataset b) Outliers c) Selection d) Application 23. A large database can be viewed as using Approximation. a) Large dataset b) Outliers c) Selection d) Approximation 24. A segmentation a database is partitioned into disjoined groupings of similar tuples called Segments. a) Segments b) Association c) Dimensional d) Outliers 25. Data mining can consists of 3 parts. a) 3 b) 4 c) 5 d) 6 SECTION-B 5 MARKS: 1. Write a short note on Data mining Vs Knowledge discovery in databases. 2. Write a short note on Development of Data mining. 3. Write a short note on Summarization. 4. Write a short note on Sequence Discovery. 5. Write a short note on Social implications of data mining. SECTION-C 8 MARKS: 1. Explain in detail about Data mining from a database perspective. 2. Explain in detail about, i) Classification ii) Regression iii) Time series analysis 3. Explain in detail about, i) Predication ii) Clustering iii) Association Rules 4. Explain in detail about Data mining Issues. 5. Explain in detail about Data mining Metrics. UNIT-2 SECTION-A ONE MARKS: 1. Parametric model describe the relationship between input & output through the use of algebraic equations. a) Parametric model b) Non-parametric model c) Both a & b d) None 2. The squared error is often examined for a specific predication to measure accuracy rather than to look at the average difference. a) RMS B) Squared error c) Unbiased d) Biased 3. RMS stands for Root Mean Square. a) Root Mean Square b) Root Median Square c) Range Mean Square d) Range Median Square 4. The RMS may also be used to estimate error or as another statistic to describe a distribution. a) RMS B) Squared error c) Unbiased d) Biased 5. Pointer estimation refers to the process of estimating a population parameter. a) Parametric model b) Non-parametric model c) Both a & b d) Pointer estimation 6. MLE stands for Maximum Likelihood Estimate. a) Maximum Likelihood Estimate b) Maximum Likelihood Effort c) Maximum Likelihood Error d) Maximum Likelihood Extent 7. Expectation Maximization algorithm is an approach that solves the estimation problem with incomplete data. a) RMS B) Squared error c) Unbiased d) Expectation Maximization 8. Frequency Distribution provides an even better model of data. a) Histogram b) Frequency distribution c) Both a & b d) None 9. Hypothesis testing attempts to find a model that explains the observed data by first creating a hypothesis. a) Alternative hypothesis b) Hypothesis testing c) Both a & b d) None 10. Correlation can be used to evaluate the strength of a relationship between two variables. a) Linear b) Correlation c) Hypothesis d) RMS 11. Linear regression assumes that a linear relationship exists between the input data the output data. a) Linear regression b) Correlation c) Hypothesis d) RMS 12. A Decision tree is a predictive modeling technique used in classification tasks. a) Decision tree b) Correlation c) Input database d) Binary search 13. A Decision tree is a tree where the root and each internal node is labeled with a question. a) Input tree b) Output tree c) Decision tree d) All of these 14. Decision tree consists of 3 parts. a) 2 b) 3 c) 4 d) 5 15. Neural networks is also known as Artificial Neural Networks. a) Artificial Neural Networks b) Artificial Neural data c) Artificial Network data d) Artificial Neural interface 16. ANN stands for Artificial Neural Networks. a) a) Artificial Neural Networks b) Artificial Neural data c) Artificial Network data d) Artificial Neural interface 17. A neural network consists of 3 parts. a) 2 b) 3 c) 4 d) 5 18. An activation function may also known as Firing rule. a) Firing rule b) Threshold c) Linear d) All of these 19. An activation function is sometimes called a Both a & b. a) Processing element function b) Squashing function c) Both a& b d) None 20. The linear threshold function also called a Both a & b. a) Ramp function b) Piecewise function c) Both a & b d) None 21. Genetic Algorithm are examples of evolutionary computing methods are optimization type algorithms. a) Gaussian law b) Genetic algorithm c) Hyperbolic tangent d) None 22. A Genetic algorithm is a computational model consisting of 5 parts. a) 3 b) 4 c) 5 d) 6 23. The precise algorithm that indicates how to combine the given set of individuals to produce new once is crossover algorithm. a) Crossover algorithm b) Genetic algorithm c) Hyperbolic tangent d) None 24. A Linear activation function produces a linear output value based on the input. a) Linear b) Threshold c) Activation d) Genetic algorithm 25. A neural network consists of 2 parts. a) 2 b) 3 c) 4 d) 5 SECTION-B 5 MARKS: 1. Write a short note on Point estimation. 2. Write a short note on Models based on summarization. 3. Write a short note on Bayes Theorem. 4. Write a short note on Hypothesis Testing. 5. Write a short note on Regression & Correlation. SECTION-C 8 MARKS: 1. Explain in detail about Similarity measures. 2. Explain in detail about Decision trees. 3. Explain in detail about neural networks. 4. Explain in detail about Activation functions. 5. Explain in detail about Genetic algorithms. UNIT-3 SECTION-A ONE MARKS: 1. Regression problems deal with estimation of an output value based on input values. a) Classification b) Data Mining c) Regression d) Statistical 2. ROC Stands for Both a & b. a) Relative Operating Characteristic b) Receiver Operating Characteristic c) Both a & b d) None 3. KNN Stands for K Nearest Neighbors. a) K nearest Neighbors b) K Notification Neighbors c) K Notation Neighbors d) None 4. CART is a technique that generates a binary decision tree. a) KNN b) CART c) ROC d) RRC 5. RBF Stands for Both a & b. a) Radial Function b) Radial Basis Function c) Both a & b d) None 6. RBF is a class of functions whose value decreases with the distance from a central point. a) RBF b) KNN c) CART d) ROC 7. A Perceptrons is a single neuron with multiple inputs & one output. a) Perceptrons b) Rule based algorithm c) Generating Rules d) None 8. Multiple Independent approaches can be applied to a classification problem. a) Multiple Dependent b) Multiple Independent c) Both a & b d) None 9. DCS Stands for Dynamic Classifier Selection. a) Data Classifier Selection b) Date Class Selection c) Dynamic Classifier Selection d) Dynamic Class Selection. 10. AVC Stands for Attribute Value Class. a) Attribute Value Class b) Attribute Virtual Class c) Attribute Virtual Collections d) Attribute Value Collections. 11. CART Stands for Classification & Regression Trees. a) Class & Regression Trees b) Classification & Regression Trees c) Class & Rotational Trees d) Classification & Rotational Trees 12. A Subtree is replaced by a leaf node if this replacement results in an error rate close to that of the original tree. a) Selection Tree b) Sub Tree c) Regression Tree d) None 13. ID3 technique to building a decision tree is based on information theory & attempt to minimize the expected number of comparison. a) ID2 b) ID3 c) Both a & b d) None 14. A tuple is classified based on the region into which it falls. a) Tuple b) Decision Tree c) Sub Tree d) Classification 15. The data are divided into regions based on class is Division. a) Division b) Prediction c) tuple d) Tree 16. The formulas are generated to predict the output class value is Prediction. a) Division b) Prediction c) tuple d) Tree 17. Classification accuracy is usually calculated by determining the percentage of tuples placed in the correct class. a) Classification b) Division c) Trees d) Prediction 18. Missing Data values cause problems during both the training phase & to the classification process. a) Decision tree b) Missing tree c) Classification tree d) prediction tree 19. Missing Data is the training data must be handled & may produce an inaccurate result. a) Decision tree b) Missing tree c) Classification tree d) prediction tree 20. There are 3 methods used to solve the classification problem. a) 2 b) 3 c) 4 d) 5 21. The Logistic curve gives a value between 0 & 1 so it can be interpreted as the probability of class membership. a) Plain curve b) Logistic curve c) Linear curve d) Non-linear curve 22. Regression can be used to perform 2 approaches. a) 2 b) 3 c) 4 d) 5 23. The common classification scheme based on the use of distance measures is KNN. a) KNN b) CART c) SRT d) ROC 24. The classification problem using decision trees is 2 process. a) 2 b) 3 c) 4 d) 5 25. Pruning remove redundant comparison or remove sub trees to achieve better performance. a) Pruning b) KNN c) Training tree d) Decision tree SECTION-B 5 MARKS: 1. Write a short note on Issues in classification. 2. Write a short note on Regression. 3. Write a short note on Bayesian classification. 4. Write a short note on Simple approach. 5. Write a short note on K Nearest neighbors SECTION-C 8 MARKS: 1. Explain in detail about Decision tree based algorithm. 2. Explain in detail about, i) ID3 ii) C4.5 3. Explain in detail about Neural Network based algorithms. 4. Explain in detail about, i) CART ii) Scalable DT techniques 5. Explain in detail about Rule based Algorithm. UNIT-4 SECTION-A ONE MARKS: 1. A Data Warehouse is a repository of subjectively selected & adapted operational data. a) Data b) Data Warehouse c) Data mart d) None 2. OLAP Stands for Online Analytical Processing. a) Online Analytical Processing b) Online Analytical Problem c) Online Analytical Process d) Online Analytical Proceeding 3. OLAP is prepared periodically but is directly based on detailed reference. a) Data b) Data Warehouse c) Data mart d) OLAP 4. The individual departmental components are called Data marts. a) Data b) Data Warehouse c) Data mart d) Data Morphing 5. Data can be classified into 3 categories. a) 2 b) 3 c) 4 d) 5 6. Both a & b data originates from operational system & is normally kept in a conventional database system. a) Reference b) Transaction c) Both a & b d) None 7. Denormalized data which is the basic for OLAP tools. a) Reference b) Transaction c) Both a & b d) Denormalized data 8. Data marts can be classified into 2 groups. a) 2 b) 3 c) 4 d) 5 9. Data Warehouse is only a collection of data marts. a) Data b) Data Warehouse c) Data mart d) None 10. The data mart is loaded with data from a data warehouse by means of a Load Program. a) Data b) Data Warehouse c) Data mart d) Load Program 11. Metadata describes the details about the data in a data warehouse or data mart. a) Data b) Data Warehouse c) Data mart d) Meta data 12. A formal data is required to be built for a large data mart which may also have some processing. a) Data b) Data Warehouse c) Data mart d) Formal data 13. Reference data stored in addition to basic data in the data mart help & enable the end users of the data mart. a) External data b) Internal data c) Reference data d) None 14. Monitoring mainly relates data usage to data content Tracking. a) Tracking b) Data Warehouse c) Data mart d) None 15. OLTP Stands for Online Transaction Processing. a) Online Transaction Processing b) Online Transaction Process c) Online Transaction Problem d) Online Transaction Proceeding 16. A very popular & early approach for achieving analytical processing is Both a & b. a) Star schema b) Collection model c) Both a & b d) None 17. The star schema provides a multidimensional view. a) Star schema b) Collection model c) Data model d) None 18. OLAP Tools can be broadly classified into 2 categories. a) 2 b) 3 c) 4 d) 5 19. MOLAP tools presuppose the data to present in a multidimensional database. a) OLAP b) MOLAP c) ROLAP d) None 20. MOLAP based products organize, navigate & analyse data typically in an aggregate form. a) MOLAP b) OLAP c) ROLAP d) None 21. ROLAP is the latest & fastest growing OLAP segment in the market. a) MOLAP b) OLAP c) ROLAP d) None 22. MQE Stands for Managed Query Environment. a) Managed Query Environment b) Managed Query Enhanced c) Managed Quality Environment d) Memory Query Environment 23. OLAP is to enable capability for users to perform limited analysis directly against RDBMS products. a) MOLAP b) OLAP c) ROLAP d) None 24. The analytical data used by power play is stored in multidimensional data sets called PowerCubes. a) MOLAP b) OLAP c) ROLAP d) Powercubes 25. IBI is a multidimensional database technology for OLAP & data warehousing. a) MOLAP b) OLAP c) ROLAP d) IBI SECTION-B 5 MARKS: 1. Write a short note on Types of Data Mart. 2. Write a short note on Software Components for a data mart. 3. Write a short note on loading a data mart. 4. Write a short note on Metadata for a data mart. 5. Write a short note on OLAP Tools & Internet. SECTION-C 8 MARKS: 1. Explain in detail about Characteristics of a Data Warehouse. 2. Explain in detail about other aspects of Data mart. 3. Explain in detail about Security in a Data mart 4. Explain in detail about OLAP Tools 5. Explain in detail about Data Modeling. UNIT-5 SECTION-A ONE MARKS: 1. A Data Warehouse can be built either on a Both a & b. a) Top-down b) Bottom-up c) Both a & b d) None 2. Metadata defines the contents & location of the data in the data warehouse. a) Metadata b) Metaphor c) Data Warehousing d) None 3. OLAP application on a data warehouse are not calling for every stringent a) MOLAP b) OLAP c) ROLAP d) IBI 4. CASE tool used to design the database in the data warehouse. a) MOLAP b) OLAP c) ROLAP d) CASE 5. A fact table is a large control table in a dimensional design that has a multi part key. a) Fact table b) OLAP c) ROLAP d) IBI 6. A disk controller supports a certain amount of data throughput. a) Disk Problem b) Disk Controller c) Disk Schedule d) None 7. Data Warehouse can be internet or intranet enabled as the choice. a) Data mart b) Data Warehouse c) Both a & b d) None 8. A Data Warehouse cannot be purchased &installed. a) Data mart b) Data Warehouse c) Both a & b d) None 9. The important means of preparing the government to face the challenges of the new millennium is Data Mining. a) Data Mining b) Data Warehouse c) Both a & b d) None 10. Data Mining can be performed for analysis & knowledge discovery. a) Data mart b) Data Warehouse c) Data Mining d) None 11. MIS Stands for Management Information System. a) Management Information System b) Management Input System c) Management Information Software d) Memory Information System 12. The other sectors can be categorized in to 5 types. a) 3 b) 4 c) 5 d) 6 13. Economic affairs are the budget & expenditure data & annual economic survey. a) Economic affairs b) Tourism c) Audit d) Revenue 14. Revenue is the customs data central excise data & commercial taxes data. a) Economic affairs b) Tourism c) Audit d) Revenue 15. Programme Implementation is central projects data for Monitoring. a) Scheduling b) Monitoring c) Auditing d) None 16. Commerce & trade can be analyzed & converted into a data warehouse. a) Commerce & trade b) Schedule c) Trading d) All of these 17. Drinking water census can be effectively utilized by OLAP & data mining technologies. a) Economic affairs b) Tourism c) Drinking water d) Revenue 18. Data warehouse can be built for state plan data on all sectors in Planning. a) Sector b) Planning c) Drinking water d) Data Warehouse 19. Community needs assessment data, immunization data, data from national programmers is in Health. a) Health b) Planning c) Drinking waters d) None 20. Land use pattern can also be analyzed in a warehousing environment. a) Land use pattern b) Planning c) Drinking water d) None 21. Monitoring progress made on implementation of rural development programmers. a) Monitoring b) Planning c) Drinking water d) None 22. The government departments have largely been satisfied with developing MIS. a) MIS b) Planning c) Drinking water d) None 23. The forecasting model can be strengthened for more accurate forecasting by taking into account the external factors. a) Planning b) Forecasting model c) Drinking water d) None 24. Data Mining technologies have extensive potential applications in the government. a) Data Mining b) Planning c) Drinking water d) None 25. Tourism exchange earnings data & hotels, travel & transportation data. a) Planning b) Tourism c) Both a & b d) None SECTION-B 5 MARKS: 1. Write a short note on Metadata. 2. Write a short note on Tools for Data Warehousing. 3. Write a short note on Distribution of Data. 4. Write a short note on Data Content. 5. Write a short note on Performance Considerations SECTION-C 8 MARKS: 1. Explain in detail about Data Warehouse Architectural Strategies & Organizational issues. 2. Explain in detail about National Data Warehouses. 3. Explain in detail about other areas for data warehousing & Data Mining. 4. Explain in detail about Design Consideration. 5. Explain in detail about Applications of Data Warehousing.