Download 1.Non-trivial extraction of ______, previously unknown and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
1.Non-trivial extraction of ____________, previously
unknown and potentially useful information from
dataNon-trivial extraction of implicit, previously
unknown and potentially useful information from
dataNon-trivial extraction of implicit, previously
unknown and potentially useful information from
data
(A)
A)Implicit B)Explicit
C)A&B
D)None of these
2.Traditional Techniques may be unsuitable due to
A)Enormity of data B)High dimensionality C)A&B
(C)
D)None of these
3.Each
record contains a set of______________, one of the
attributes is the ______________
[C]
A)Class& Attribute
B)Class&object
C)Attribute and class D)Class & Methods
4.Finding amodel for class attribute as a function of the values of other attributes is called [A]
A)classification B)Clustering
C)Regression D)None of these
5.___________is a process of partitioning a set of data (or objects) into a set of meaningful
sub-classes
A)Regression
[B]
B)Clustering
C)Classification
D)None of these
6.____________________is a data mining (machine learning) technique used to fit an equation
to a dataset.
[C]
A)Clustering B)Classification
C)Regression
D)None of these
7.Challenges of Data Mining
A)Scalability
[C]
B)Dimensionality
C)A&B
D)None of these
8._______________attribute has only a finite or countably infinite set of values
A)continuous
B)Discrete
C)Final
[B]
D)None of these
9.Real values can only be measured and represented using a finite number of digits in ______ attributes
[A]
A)ContinuousB)discrete
C)Final
D)None of these
10. _______________ is example for Discrete Attribute
A)temperature
B)Height
C) Weight
[D]
D)Zip codes
11._______________ is Example for Continuous Attribute
A)counts
B)set of words
[C]
c)temperature
D) A&B
12.Data that consists of a collection of records, each of which consists of a fixed set of
attributesis called
[B]
A)data Matrix
B)Record data
C)document data
D)None of these.
13._____________A special type of record data, whereeach record involves a set of
items.
[c]
A)Data Matrix
B)Record data
C)Transaction Data d)document Data
14. _______________are data objects with characteristics that are considerably
different than most of the other data objects in the data set
[A]
A)Outliers
B)Missing values
C)Duplicate dataD)None of these
15.Combining two or more attributes (or objects) into a single attribute (or object) is
called
A)Aggregation
[A]
B)Sampling
C)Feature creation
D)Binarization
16.________________is the technique used for both the preliminary investigation of the
dataand the final data analysis.
[C]
A)Aggregation
B)Feature creation
C)Sampling
D)Binarization
17.The Feature subset selection used which approaches effectively
A)Brute-force approach
B)Embedded approaches:
C)A&B
[C]
D)None of these
18.____________is the technique creates new attributes that can capture theimportant
information in a data set much moreefficiently than the original attributes [D]
A)Aggregation
B)Feature creation
C)Sampling
D)Feature Creation
20._______________is the methodology used in Feature creation
A)Feature Extraction
B)Feature Construction
C)A&B
D)None of these
UNIT-II
1.__________________________is the node that has no incoming edges and zero or more
outgoing edges
[A]
A)Root Node
B)Internal Nodes
C)Leaf
D)Terminal Nodes
2.______________________is the each of which has exactly one incoming edge and two or
more outgoing eddges
[b]
A)Terminal Nodes
B)Internal Nodes C)Leaf
D)Root Node
3.________________is the nodes each of which has exactly one incoming edge and
more outgoing edges
[D]
A)Terminal Nodes
B)Internal Nodes
C)Leaf
D)A&C
4.The test condition for a _______________generates two potential outcomes
A)Binary
B)Nominal
C)Ordinal
[A]
D)None of these
5._______________attributes can have many values ,its test condition can be expressed in two
ways is called
[B]
A)Binary
B)Nominal
C)Ordinal
D)None of these
6.______________attributes can also produce binary or multiway splits.& also can be
grouped
[C]
A)Binary
B)Nominal
C)Ordinal
D)None of these
7.__________Attributes ,the test conditions can be expressed as a comparison test
(A<V) or (A>=V) with binary outcomes
[D]
A)Binary
B)Nominal
C)Ordinal
D)Continuous
8.Impurity measures such as entropy and Gini index tend to favor attributes that have a large
number of distinct values
[A]
A)Gain Ratio
B)Get ration
C)Receive ration
D)Others
9.___________approach is typically used with classification techniques that can be
parameterized to obtain moel with different levels of complexity
[C]
A)Verification B)validity
C)Validation
D)A&B
10._____________approach is tree-growing algorithm is halted before generating a fully grown
tree that perfectly fir entire training data
[]
A)Pre Pruning
B)Post Pruning
C)A&B
D)None of these
1. The purpose of preprocessing is to transform the raw input data into an appropriate format for
subsequent analysis
2.Data mining tasks are generally divided into two major categories
3.The attribute to be predicted is commonly known as the target or dependent variable
4.Association analysis is used to discover patterns that describe strongly associated features in the data
5.Anomaly detection is the task of identifying observationsw hose characteristics are significantly
different from the rest of the data
6.A data set can often be viewed as a collection of data objects.
7. An attribute is a property or characteristic of an object that may vary; either from one object to
another or from one time to another.
8.Nominal and ordinal attributes are collectively referred to as categorical or qualitative attributes
9.Much data mining work assumes that the data set is a collection of records
10.The best discretization and binarization approach is the one that "produces the best result for the
data mining algorithm that will be used to analyze the data."
11.Avariable transformation refers to a transformation that is applied to all the values of a variable
12.the similarity between two objects is a numerical measure of the degree to which the two objects are
alike
13.The dissimilarity between two objects is a numerical measureof the degree to which the two objects
are different
14.Similarity measures between objects that contain only binary attributes are called similarity
coefficients
15. EDA stands for Exploratory Data Analysis
16.Themode of a categorical attribute is the value that has the highest frequency.
17. two of the most widely used summary statistics are the mean and median,
18.To overcome problems with the traditional definition of a mean, the notion of a trimmed mean is
sometimes used
19.Data visualization is the display of information in a graphic or tabular format.
20.ECDF stands for empirical cumulative distribution function
21.A multidimensional representation of the data, together with all possible totals (aggregates), is
known as a data cube.
22. Slicing is selecting a group of cells from the entire multidimensional array by specifying a specific
value for one or more dimensions
23.Data mining is______________________
[
A. knowledge mining from data B. knowledge extraction
C. data analysis
d
]
D. all
24.OLAP stands for ________________
[
b
]
A. off-line analytical processing
B.on-line analytical processing
C. off-line analytical program
D. on-line analytical program
25. Which of the following databases is used to store time-related data?
A Temporal databases
B Relational databases
C Transactional databases
D Spatial databases
[ a
26. ___________ database stores the audio ,video and images
A.
Relational
B multimedia
[b
C spatial
B. Data Transformation C. Data Selection
[ d
[ a
A Knowledge discovery in databases
B Knowledge inventory in databases
C Knowledge acceptance in databases
D Knowledge disposal in databases
29. ___ Mining tasks characterize the general properties of the data in the database.
B Predictive
C Metadata
]
D. Data Integration
28.Many people treat data mining as a synonym for another popularly used term.
A Descriptive
]
D text
27. In which preprocessing technique multiple data sources may be combined
A. Data cleaning
]
]
[a]
D Data
30.___is a comparison of the general features of target class data objects from one or a set of
contrasting classes.
[ c ]
A Data characterization B Data summarization
C Data discrimination
D None
31.KDD stands for_______knowledge discovery from data_
32. OLTP stands for_ on-line transaction processing
33.The domain knowledge that is used to guide the search or evaluate the interesting of resulting
patterns called
[ a
]
A Knowledge base
B Data Mining Engine C Graphical user interface
D Both (a) & (b)
34. __________ reduction reduces the data size by removing attributes
A Stepwise forward Selection B Dimensionality C Data Compression
[
a
]
D None of above
35. In_____ the attribute data are scaled so as to full within a small specified range
[ b
]
A Smoothing
B Normalization
C Classification
D None of above
36._______consists of sequences of values or events obtained over repeated measurements of time
[ a
]
A Time Series database B Spatial database
C Test database D Multimedia database
37.Data mining refers___________ knowledge from large amounts of data
A. refreshing
B. deleting
C. extracting
D. all
[ c
38. ____ Database consists of a file where each record represents a transaction [ c
]
A. Spatial
B. Spatio Temporal
C. Transactional
D. Data stream
39. Which one is used to identify the redundancy in numerical attributes
A. Correlation coefficient
B. Chi-square test
]
[ a
]
C. Both A&B D. None
40.Which data reduction technique removes the irrelevant, weakly and relevant attributes [ b ]
A. Data cube aggregation B. Attribute Subset selection C. Numerosityreduction D. None
19 The Process of grouping a set of Physical or abstract objects into classes of Similarobjects
______________
[ a
]
A Clustering
is called
B Classification C Segmenting D None
20 Which one is used to identify the redundancy in categorical attributes
A. Correlation coefficient
B.Chi-square test
[
C. Both A&B
b
]
D. None
1) The Input Data For A Classification Task Is A Collection Of ……………………
A) Files
B) Entities
C) Records
D) All Of These
2) The Target Function Is Also Known Informally As A ………………
A) Clusters
B) Classification Model
C) Informal Function
D) None of the above
3) Which node that has no incoming edges and zero or more outgoing edges
A) Root B) Parent
C) Child
D) All of these
4) In which algorithm, a decision tree is grown in a recursive fashion by partitioning records
intosuccessive subsets .
A) Decision B) Hunt’s C) Recursive
D) Quick sort
5) The ………… function extends the decision tree by creating a new node
A) Createnode ()
B) Startnode () C) New node () D) Extendnode ()
6) Decision tree induction is an approach for building classification models
A)Clusters approachB) a nonparametric approachC) Informal approach D) Classification
approach
7) The border between two neighbouring regions of different classes is known as a …………
A)Decisionboundary B) Parent boundary C) Child boundary
D) neighbouring boundary
8) Constructive induction provides another way to partition the data into homogeneous)
nonrectangular regions
A)Files induction B) Entities inductionC) Constructive induction
D) All of these
9) The estimated error helps the learning algorithm to do …………..selection
A) Model selection B) Estimated selection C) Algorithm selection D) Extend selection
10)The difference in entropy is known as the ……………..
A) Information gain B) Induction gainC) Constructive gain
D) Entropy gain
11) After building the decision tree, step can be performed to reduce the size of the decision
tree.
A)Files Pruning B) Entities PruningC) tree-pruning D) All of these
12) Decision trees that are too large are susceptible to a phenomenon known as
A) modelfitting B) Estimated fitting C) Algorithm fitting D) over fitting.
13) In Web usage mining, it is important to distinguish accesses made by human users from
those due to...
A)Files Pruning B) Web PruningC) Web robots D) Web snake
14) Web mining techniques to analyse ……… browsing behaviour
A) Human B) EntitiesC) tree D) All of these
15) In the method, the original data with labelled examples is partitioned into two disjoint sets,
called the training and the test sets, respectively
A)Straec B) Roll upC) Holdout D) Roll down
16) The method can be repeated several times to improve the estimation of a classifier's
performance
A) Analyse B) BootstrapC) Holdout D) Random
17) In the ……….approach, the training records are sampled with replacement
A)BootstrapB) EntitiesC) tree D) All of these
18) To determine the confidence interval, we need to establish the probability distribution
A)Probability distributionB) Statisticdistribution C) tree distribution D) Data distribution
19) The task of predicting the class labels of test records can also be considered as a
A) Task experiment B) Random experiment C) binomial experiment D) None of the above
20) Model over fitting may arise in learning algorithms that employ a methodology known as
A)Multiple comparison procedure B) Comparisonprocedure c) tree D) All of these