Download 040020304 – Data Mining Models and Methods 2014

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
040020304 – Data Mining Models and Methods
2014
UNIT -1 Data Mining
Long Answer Questions
1. Give an example where data mining is crucial to the success of a business. What data
mining functions does this business need? Can they be performed alternatively by data
query processing or simple statistical analysis?
2. How is a data warehouse different from a database? How are they similar?
3. Describe why concept hierarchies are useful in data mining.
4. Suppose your task as software engineer at University is to design a data system mining
to examine the university course database, which contains the information like name,
address and status (e.g. UG or PG) of each student, the courses taken, and SGPA.
Describe the architecture you would choose. What is the purpose of each component of
this architecture?
5. Describe three challenges to data mining regarding data mining methodology and user
interaction issues.
6. What are the major challenges of mining a huge amount of data (such as billions of
tuples) in comparison with mining a small amount of data (such as a few hundred
tuple)?
7. Outline the major research challenges of data mining in one specific application domain,
such as stream data analysis, spatiotemporal data analysis, or bioinformatics.
8. Taking fraudulence detection as an example, propose two methods that can be used to
detect outliers and discuss which one is more reliable.
9. Discuss what kind of interesting knowledge can be mined from spatial data streams,
with limited time and resources.
10. Describe the differences between the following approaches for the integration of a data
mining system with a database or data warehouse system: no coupling, loose coupling,
semitight coupling, and tight coupling. State which approach you think is the most
popular with appropriate reasons.
Short Answer Questions
1.
2.
3.
4.
What is the important of data mining for knowledge discovery?
Which step you have to perform after data cleaning in KDD process?
Write down any two major components of data mining system.
Are all of the patterns interesting which you find using data mining process? State any
two reasons to justify your answer.
5. Differentiate between sequence and time series databases?
6. How information differ from knowledge?
7. What is the use of knowledge database in typical data mining system?
8. How descriptive differ from predictive data mining task?
9. What is the purpose of support and confidence related with association rule?
10. Is classification differ from prediction? Give any two reasons to justify your answer.
Fill in the blanks
040020304 – Data Mining Models and Methods
2014
1. _______________ is the process of identifying a valid, potentially useful and ultimately
understandable structure in data.
2. ______ includes data cleaning, data integration, data selection, data transformation, data
mining, pattern evaluation, and knowledge presentation.
3. Data mining is the task of discovering _____ from large amounts of data where the data
can be stored in databases, data warehouses, or other information repositories.
4. _____ can be mined from many different kinds of databases.
5. A ______________ makes statistical decisions using experimental data.
6. _____________ investigates how computers can learn or improve their performance based
on data.
7. _______________ is a class of machine learning techniques that make use of both labeled
and unlabeled examples when learning a model.
8. The learning process is _______________ since the input examples are not class labeled.
9. ____________ operations include drill-down, roll-up, and pivot.
10. ____________ analysis describes and models regularities or trends for objects whose
behavior changes over time.
11. _________ can be designed to support ad hoc and interactive data mining.
12. A pattern represents _______ if it is easily understood by humans.
13. A ________ is repository for long-term storage of data from multiple sources, organized so
as to facilitate management decision making.
14. A _______________ system has the potential to generate thousands or even millions of
patterns, or rules.
15. A ________________ is a set of mathematical functions that describes the behavior of the
object in a target class in terms of random variables and their associated probability
distributions.
16. _____________ is a machine learning approach that lets users play an acive role in the
learning process.
17. Postal code recognition problem, a set of handwritten postal code images, is the
example of _______________ learning.
18. ____________ technologies provide historical, current, and predictive views of business
operations.
19. A ________________ is a specialized computer server that searches for information on the
web.
20. Clustering can also facilitate _________________ formation, that is, the organization of
observations into a hierarchy of classes that group similar events together.
Multiple Choice Questions
1. Data transformation includes which of the following?
a) A process to change data from a detailed level to a summary level
b) A process to change data from a summary level to a detailed level
c) Joining data from one source into various sources of data
d) Separating data from one source into various sources of data
2. Which is an interdisciplinary field, the confluence of a set of disciplines, including
database systems, statistics, and machine learning, visualization, and information
science?
a) Data warehouse
b) KDD Process
040020304 – Data Mining Models and Methods
2014
c) Data Mining
d) DBMS
3. A database may contain data objects that do not comply with the general behavior or
model of data. These objects are ______.
a) Classification
b) Clustering
c) Outlier
d) Noisy data
4. Which is the repository of information collected from multiple sources, stored under
unified schema, and that usually resides at single site?
a) Data mining
b) Data warehouse
c) Data integration
d) Data transformation
5. Relational database is collection of ________, each of which is assigned a unique name.
a) Tables
b) Attributes
c) Tuples
d) Entities
6. Which database contains file where each record represents transaction.
a) Relational database
b) Advanced database
c) Object-relational database
d) Transactional database
7. Object-relational databases are constructed based on an/a __________.
a) Object-oriented databases
b) Object-relational data model
c) Transactional data model
d) Geo-relational data model
8. Maps can be represented in _________ format.
a) Raster
b) Vector
c) Raster and vector
d) None of the above
9. ______ Database typically stores relational data that include time-related attributes.
a) Temporal
b) Sequence
c) Multimedia
d) Heterogeneous
10. Which database is consists of a set of interconnected, autonomous component
databases?
a) Heterogeneous database
b) Homogenous database
c) Multimedia database
d) None of the above
11. Many applications include the generation and analysis of new kind of data, called
________.
a) Legacy database
b) Stream data
040020304 – Data Mining Models and Methods
2014
c) Web usage mining
d) Spatiotemporal database
12. Which functionality describe the comparison of general features of target class data
objects with general features of objects from one or set of contrasting classes?
a) Data discrimination
b) Data characterization
c) Class/ concept description
d) Data Summarization
13. Which of the following is process of finding a model that describes and distinguishes
data classes or concepts, for the purpose of being able to use the model to predict the
class of objects whose class label is unknown?
a) Classification
b) Clustering
c) Prediction
d) Outlier Analysis
14. What can be designed to support ad hoc and interactive data mining?
a) Data mining program
b) Data mining query language
c) Data query language
d) None of above
040020304 – Data Mining Models and Methods
Ms. Priti Prajapati
2013
Page 5