Download UNIT-I 1.Non-trivial extraction of ______, previously unknown and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Expectation–maximization algorithm wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

K-means clustering wikipedia , lookup

Cluster analysis wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
UNIT-I
1.Non-trivial extraction of ____________, previously unknown and potentially useful information
from dataNon-trivial extraction of implicit, previously unknown and potentially useful information
from dataNon-trivial extraction of implicit, previously unknown and potentially useful information
from data
A)Implicit
B)Explicit
C)A&B
D)None of these
2.Traditional Techniques may be unsuitable due to
A)Enormity of data
B)High dimensionality C)A&B
D)None of these
3.Each record contains a set of______________, one of the attributes is the ______________
A)Class& Attribute
B)Class&object
C)Attribute and class D)Class & Methods
4.Finding a model for class attribute as a function of the values of other attributes is called
A)classification
B)Clustering
C)Regression
D)None of these
5.___________is a process of partitioning a set of data (or objects) into a set of meaningful subclasses
A)Regression B)Clustering
C)Classification
D)None of these
6.____________________is a data mining (machine learning) technique used to fit an equation to a
dataset.
A)Clustering B)Classification
C)Regression
D)None of these
7.Challenges of Data Mining
A)Scalability
B)Dimensionality
C)A&B
D)None of these
8._______________attribute has only a finite or countably infinite set of values
A)continuous
B)Discrete
C)Final
D)None of these
9.Real values can only be measured and represented using a finite number of digits in ______
attributes
A)Continuous
B)discrete
C)Final
D)None of these
10. _______________ is example for Discrete Attribute
A)temperature
B)Height
C) Weight
D)Zip codes
11._______________ is Example for Continuous Attribute
A)counts
B)set of words
c)temperature
D) A&B
12.Data that consists of a collection of records, each of which consists of a fixed set of attributesis
called
[B]
A)data Matrix
B)Record data
C)document data D)None of these.
13._____________A special type of record data, whereeach record involves a set of items.
A)Data Matrix
B)Record data
C)Transaction Data d)document Data
14. _______________are data objects with characteristics that are considerably different than most
of the other data objects in the data set
A)Outliers
B)Missing values
C)Duplicate dataD)None of these
15.Combining two or more attributes (or objects) into a single attribute (or object) is called
A)Aggregation
B)Sampling
C)Feature creation
D)Binarization
16.________________is the technique used for both the preliminary investigation of the dataand the
final data analysis.
A)Aggregation
B)Feature creation
C)Sampling
D)Binarization
17.The Feature subset selection used which approaches effectively
A)Brute-force approach
B)Embedded approaches:
C)A&B
D)None of these
18.____________is the technique creates new attributes that can capture theimportant information in
a data set much moreefficiently than the original attributes
A)Aggregation
B)Feature creation
C)Sampling
D)Feature Creation
20._______________is the methodology used in Feature creation
A)Feature Extraction
B)Feature Construction
C)A&B
D)None of these
UNIT-II
1.__________________________is the node that has no incoming edges and zero or more outgoing
edges
A)Root Node
B)Internal Nodes
C)Leaf
D)Terminal Nodes
2.______________________is the each of which has exactly one incoming edge and two or more
outgoing edges
A)Terminal Nodes
B)Internal Nodes
C)Leaf
D)Root Node
3.________________is the nodes each of which has exactly one incoming edge and more outgoing
edges
A)Terminal Nodes
B)Internal Nodes
C)Leaf
D)A&C
4.The test condition for a _______________generates two potential outcomes
A)Binary
B)Nominal
C)Ordinal
D)None of these
5._______________attributes can have many values ,its test condition can be expressed in two ways
is called
A)Binary
B)Nominal
C)Ordinal
D)None of these
6.______________attributes can also produce binary or multiway splits.& also can be grouped
A)Binary
B)Nominal
C)Ordinal
D)None of these
7.__________Attributes ,the test conditions can be expressed as a comparison test (A<V) or (A>=V)
with binary outcomes
A)Binary
B)Nominal
C)Ordinal
D)Continuous
8.Impurity measures such as entropy and Gini index tend to favor attributes that have a large number
of distinct values
A)Gain Ratio
B)Get ration
C)Receive ration
D)Others
9.___________approach is typically used with classification techniques that can be parameterized to
obtain model with different levels of complexity
A)Verification
B)validity
C)Validation
D)A&B
10. Which of the following databases is used to store time-related data?
A Temporal databases
B Relational databases
C Transactional databases
D Spatial databases
11. ___________ database stores the audio ,video and images
A.
Relational
B multimedia
C spatial
D text
12. In which preprocessing technique multiple data sources may be combined
A. Data cleaning
B. Data Transformation
C. Data Selection
D. Data Integration
13.Many people treat data mining as a synonym for another popularly used term.
A Knowledge discovery in databases
C Knowledge acceptance in databases
B Knowledge inventory in databases
D Knowledge disposal in databases
14. ___ Mining tasks characterize the general properties of the data in the database.
A Descriptive
B Predictive
C Metadata
D Data
15.___is a comparison of the general features of target class data objects from one or a set of
contrasting classes.
A Data characterization B Data summarization C Data discrimination
D None
16. __________ reduction reduces the data size by removing attributes
A Stepwise forward Selection B Dimensionality C Data Compression
D None of above
17. In which algorithm, a decision tree is grown in a recursive fashion by partitioning records
intosuccessive subsets .
A) Decision
B) Hunt’s
C) Recursive
D) Quick sort
18. Decision tree induction is an approach for building classification models
A)Clusters approach B) a nonparametric approach C) Informal approach D) Classification
approach
19. After building the decision tree, step can be performed to reduce the size of the decision tree.
A)Files Pruning
B) Entities Pruning
C) tree-pruning
D) All of these
20. Decision trees that are too large are susceptible to a phenomenon known as
A) modelfitting B) Estimated fitting
C) Algorithm fitting
D) over fitting.
UNIT - III
1. …………………..is a measure which refers to a condition that covers more number of tuples.
A.Entropy
B. Information
C. statistical test
D. Rule quality
2.…………………...classifier compare a particular test tuple with its equivalent training tuple.
A. Rule based
B. Bayesian
C. Nearest neighbour
D. Case-based reasonin
3. If- then rules can be pruned out directly from training data using ……………………..
A.Sequantial covering algorithm
B.Apriori algorithm
C.FB – growth algorithm
D. Back –propagation algorithm
4.……………….is based on the assumption that there exists class conditional indepencies among
the subsets of variables
A.Nearest –neighbor classifier
B. Rule-based classifier
C. Bayesian classifier
D. Artificial neural networks
5. The amount of information required to change impure partition to a pure partition is given by
information (D)=……………………..
𝑚
𝐴. -∑𝒎
𝒋=𝟏 𝒅𝒑𝒋𝒍𝒐𝒈𝟐𝒅𝒑𝒋 B. ∑𝑗=1 𝑑𝑝𝑗𝑙𝑜𝑔2𝑑𝑝𝑗
C..𝑑𝑝𝑗 − ∑𝑚
𝑗=1 𝑑𝑝𝑗𝑙𝑜𝑔2
d. 𝑑𝑝𝑗 + ∑𝑚
𝑗=1 𝑑𝑝𝑗𝑙𝑜𝑔2
6 .The hybrid combination of neural networks, fuzzy logic and probability reasoning is known as….
A. Computing
B. Parallel computing
C. Cloud computing
D. None
7. ……………Problem can be solved by introducing a hidden layer between input and out layer
A. XOR
B. SOM
C. Multi balance
D. None
8. ……………. Is one of the ways to improve the accuracy of a decision tree induction
A. Bagging
B. Boosting
C.SVM
D.Training
9. In boosting votes depend on the ……………………………..
A. Accuracy B. Error rate
C. Output
D. Input
10. ……………………..Approach does not return all the extracted values
B. Template –based C. Bayesian
A. Visualization
D. none
11. …………….. is a circumstance in which the hidden variables either cause the pair of variables to
reverse its direction or make them disappear
A. e –estimate
B.Simpson’s paradox
C. A&B
D. None
12. The statistical classifier that can predict the probalities ofclassfier membership is called…….
A. Rulebased classfier
B.Bayiesian classfier C. Nearest neighbour classifier D.None
13. weight adjustment of desired connection in order to predict the correct class membership is
called……
A. Connectionist learning
Connectionist reading
C. connectionistadjustment
D.connectionist imbalance
14.The misclassification role of error rate (ER) of a classifier’s’ is calculated by using ER……….
A.1+A(R)
B.1-E(R)
C.1+E(R)
D.1-A(R)
15.A rule based classifier uses a set of ………………. rules for classifier [ a]
A. if then
B. Bayesian
C.for loop
D. nearest
16.the ClOSENESS between two training tuples a1 and a2 is given a dist(a11,a22) [b]
(a) −√∑𝑛𝑖=1(𝑎1𝑖 − 𝑎2𝑖)2
B. √∑𝒏𝒊=𝟏(𝒂𝟏𝒊 − 𝒂𝟐𝒊)𝟐
C.√− ∑𝑛𝑖=1(𝑎1𝑖 − 𝑎2𝑖)2
D.√∑𝑛𝑖=1(𝑎1𝑖 − 𝑎2𝑖)2
17………. Is an algorithm used for classification of both linear and non-linear
A.Support vector machine
B. Simple vector machine
C.System vector machine
D. support violating machine
18………… is a process of discriminating noise and missing data
A. Data base
B. data mining
C. data cleaning
D. data discrimination
19………… classifier develop decision boundaries that have arbitrary shapes providing flexible
model representation
A.Rule based B.Bayesian
C. case based reasoning
D. nearest neighbour
20……………….. problem is a condition where distribution of data sets is imbalanced[a]
A.
Class imbalance
B.class balanced
C. class equal D. class diagram
19………… classfier develop decision boundaries that have arbitrary shapes providing flexible
model representation
A. Rule based
B. Bayesian
C.Case based reasoning
D. nearest neighbour
20……………….. problem is a condition where distribution of data sets is imbalanced
A. Class imbalance B. class balanced
C. class equal D. class diagram
UNIT - IV
1. which operation generates new candidate k itemsets based on frequent
a. candidate generation b. candidate pruning
c. a&b d. none
2. which operation eliminates some of the candidates k items based on purning
a. k-items
b. generation c. brute force d. candidate pruning
3. -------------- method considers every k-itemset as a potential candidate
a. geration
b. pruning
c. brute-force
d. none
4.which strategy of trimming the exponential search space based on support measure
a. generation b. support-based purning
c. apriori
d. monotone
5.-------------- is the first association rule mining algorithm that pioneered support-based
a. apriori
b. pruning
c. anti
d. monotone
6.----------- lowering the support threshold often results often results in more itemsets
a. threshold
b. apriori
c. support threshold d. all
7.number of items are also known as ------------a. i/o costs
b. dimensionality
c. algorithm
d. width
8. The Apriori algorithm makes repeated passes over the data set at runtime
a. Transactions
9.a search for frequent
a.specific
b. width
c. items
d. rule generation
item sets can be viewed as a
b.specific to general c.traversal of itemset lattice
d.frequent
10.the apriori algorithm traverses the lattice in a-------------a.depth first
b.back tracking
c .gready
d. breadth first
11.it encodes the data set using a compact data structure called an
a. FP-growth
b. transaction
c. data
d. FP-tree
12.the representation on the left is called
a. vertical
b. horizontal
c. fp-side
d. straight
13.the first step is to gather all the paths containing node e. these paths are known as
a. postfix paths
b. suffix paths
c. prefix paths
d.none
14.--------------- approaches requires a user-friendly environment to keep human user in loop
a. template
b. visualization
c. a&b
d.none
15.association rules that contain continuous attributes are commonly known as
a. statics
b. association c. quantitative association d. discretion
16.a concept hierarchy can be represented using a----------a. dag
b. cyclic
c. taxonomy
d. transaction
17.------------- approach allows the user constraints type of patterns[
a. visual
b. based
c. template based
d.none
18.some associations may appear or disappear when condition upon the value of certain variables are
a. paradox
b. simpson’s paradox
c. synposis
d. contingency
19.whe the low-frequency item such as caviar such patterns are called---------a. Cross-product
b. cross relation
c. skewed
d.cross-support
20.the ---------- uses a level-wise approach for generating association rules
A.apriori algorithm
b.depth
c. a&b
d.none
UNIT -V
1.Each object is represented by the index of the prototype associated with its cluster is known as
a. vector quantization
b. compression
c. a&b d.none
2.--------------analysis groups data objects based on information found in the data
a. cluster
b. partitronal
c. a&b
d.none
3.The centroids are indicated by the ------ symbol
a.-
b.*
c.+
d./
4.how many basic steps for generating a hierarchical clustering
a.1
b.2
c.3
d.4
5.a hierarchial clustering is displayed graphically using a tree-like diagram called a-------a. divisive
b. dendrogram
c. a&b
d.none
6.Which is the worst case of time complexity DBSCAN algorithm
a. o(m2)
b. o(m)
c. a&b
d.none
7.Supervised measures are often called
a. supervised b. internal indices
c. external indices
d.none
c .a&b
d.none
8.Unsupervised measures are often called
a. External indices b. Internal indices
9.CLUTO means
a. clustering tool kit b. clustering tool key c. a&b d. statistical
10.a cluster is modeled as a -------- distribution
a. clusters
b. fazzy
c. a&b
d. self
11.standard for SOM
a. Self organizing maps
b. self-organization mat
c. a&b d.none
12.The probability of the data regarded as a function of the parameters is called a
a. likelihood function
b. like function
c. a&b
d.none
13.Which is the density based clustering
a. CLIQUE
b. CLAQUE
c. DENCLEU
d. DENCLUE
14.Which step can be considered as chameleon algorithm
a. specification
b. graph partitioning c. a&b
d.none
15.Which is a highly efficient clustering technique for data in Euclidean vector spaces
a. DBSCAN
b. BIRCA
c. BIRCD
d. BIRCH
16.Which is a clustering algorithm that uses a variety of different techniques
a. DBSCAN
b. CLIQUE
c. DENCLUE
d. CURE
17.Which clustering technique that was specifically designed for clustering space
a. opossum
b. opossim
c. a&b
d.none
c. a&b
d.none
18.CLIQUE means
a. CLUSTERING IN QUEST b. CLUSTING
19.Which theory allows the object to a set with a degree membership between 0 and 1
a. fuzzy
b. clusters
c. a&b
d. fuzzy set
20.Which is the characteristics of clustering algorithm
a. scalability b. parameter section c. a&b
d. none