Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
040020304 – Data Mining Models and Methods 2014 UNIT -1 Data Mining Long Answer Questions 1. Give an example where data mining is crucial to the success of a business. What data mining functions does this business need? Can they be performed alternatively by data query processing or simple statistical analysis? 2. How is a data warehouse different from a database? How are they similar? 3. Describe why concept hierarchies are useful in data mining. 4. Suppose your task as software engineer at University is to design a data system mining to examine the university course database, which contains the information like name, address and status (e.g. UG or PG) of each student, the courses taken, and SGPA. Describe the architecture you would choose. What is the purpose of each component of this architecture? 5. Describe three challenges to data mining regarding data mining methodology and user interaction issues. 6. What are the major challenges of mining a huge amount of data (such as billions of tuples) in comparison with mining a small amount of data (such as a few hundred tuple)? 7. Outline the major research challenges of data mining in one specific application domain, such as stream data analysis, spatiotemporal data analysis, or bioinformatics. 8. Taking fraudulence detection as an example, propose two methods that can be used to detect outliers and discuss which one is more reliable. 9. Discuss what kind of interesting knowledge can be mined from spatial data streams, with limited time and resources. 10. Describe the differences between the following approaches for the integration of a data mining system with a database or data warehouse system: no coupling, loose coupling, semitight coupling, and tight coupling. State which approach you think is the most popular with appropriate reasons. Short Answer Questions 1. 2. 3. 4. What is the important of data mining for knowledge discovery? Which step you have to perform after data cleaning in KDD process? Write down any two major components of data mining system. Are all of the patterns interesting which you find using data mining process? State any two reasons to justify your answer. 5. Differentiate between sequence and time series databases? 6. How information differ from knowledge? 7. What is the use of knowledge database in typical data mining system? 8. How descriptive differ from predictive data mining task? 9. What is the purpose of support and confidence related with association rule? 10. Is classification differ from prediction? Give any two reasons to justify your answer. Fill in the blanks 040020304 – Data Mining Models and Methods 2014 1. _______________ is the process of identifying a valid, potentially useful and ultimately understandable structure in data. 2. ______ includes data cleaning, data integration, data selection, data transformation, data mining, pattern evaluation, and knowledge presentation. 3. Data mining is the task of discovering _____ from large amounts of data where the data can be stored in databases, data warehouses, or other information repositories. 4. _____ can be mined from many different kinds of databases. 5. A ______________ makes statistical decisions using experimental data. 6. _____________ investigates how computers can learn or improve their performance based on data. 7. _______________ is a class of machine learning techniques that make use of both labeled and unlabeled examples when learning a model. 8. The learning process is _______________ since the input examples are not class labeled. 9. ____________ operations include drill-down, roll-up, and pivot. 10. ____________ analysis describes and models regularities or trends for objects whose behavior changes over time. 11. _________ can be designed to support ad hoc and interactive data mining. 12. A pattern represents _______ if it is easily understood by humans. 13. A ________ is repository for long-term storage of data from multiple sources, organized so as to facilitate management decision making. 14. A _______________ system has the potential to generate thousands or even millions of patterns, or rules. 15. A ________________ is a set of mathematical functions that describes the behavior of the object in a target class in terms of random variables and their associated probability distributions. 16. _____________ is a machine learning approach that lets users play an acive role in the learning process. 17. Postal code recognition problem, a set of handwritten postal code images, is the example of _______________ learning. 18. ____________ technologies provide historical, current, and predictive views of business operations. 19. A ________________ is a specialized computer server that searches for information on the web. 20. Clustering can also facilitate _________________ formation, that is, the organization of observations into a hierarchy of classes that group similar events together. Multiple Choice Questions 1. Data transformation includes which of the following? a) A process to change data from a detailed level to a summary level b) A process to change data from a summary level to a detailed level c) Joining data from one source into various sources of data d) Separating data from one source into various sources of data 2. Which is an interdisciplinary field, the confluence of a set of disciplines, including database systems, statistics, and machine learning, visualization, and information science? a) Data warehouse b) KDD Process 040020304 – Data Mining Models and Methods 2014 c) Data Mining d) DBMS 3. A database may contain data objects that do not comply with the general behavior or model of data. These objects are ______. a) Classification b) Clustering c) Outlier d) Noisy data 4. Which is the repository of information collected from multiple sources, stored under unified schema, and that usually resides at single site? a) Data mining b) Data warehouse c) Data integration d) Data transformation 5. Relational database is collection of ________, each of which is assigned a unique name. a) Tables b) Attributes c) Tuples d) Entities 6. Which database contains file where each record represents transaction. a) Relational database b) Advanced database c) Object-relational database d) Transactional database 7. Object-relational databases are constructed based on an/a __________. a) Object-oriented databases b) Object-relational data model c) Transactional data model d) Geo-relational data model 8. Maps can be represented in _________ format. a) Raster b) Vector c) Raster and vector d) None of the above 9. ______ Database typically stores relational data that include time-related attributes. a) Temporal b) Sequence c) Multimedia d) Heterogeneous 10. Which database is consists of a set of interconnected, autonomous component databases? a) Heterogeneous database b) Homogenous database c) Multimedia database d) None of the above 11. Many applications include the generation and analysis of new kind of data, called ________. a) Legacy database b) Stream data 040020304 – Data Mining Models and Methods 2014 c) Web usage mining d) Spatiotemporal database 12. Which functionality describe the comparison of general features of target class data objects with general features of objects from one or set of contrasting classes? a) Data discrimination b) Data characterization c) Class/ concept description d) Data Summarization 13. Which of the following is process of finding a model that describes and distinguishes data classes or concepts, for the purpose of being able to use the model to predict the class of objects whose class label is unknown? a) Classification b) Clustering c) Prediction d) Outlier Analysis 14. What can be designed to support ad hoc and interactive data mining? a) Data mining program b) Data mining query language c) Data query language d) None of above 040020304 – Data Mining Models and Methods Ms. Priti Prajapati 2013 Page 5