Download UNIT-I 1.Non-trivial extraction of ______, previously unknown and

UNIT-I 1.Non-trivial extraction of ____________, previously unknown and potentially useful information from dataNon-trivial extraction of implicit, previously unknown and potentially useful information from dataNon-trivial extraction of implicit, previously unknown and potentially useful information from data A)Implicit B)Explicit C)A&B D)None of these 2.Traditional Techniques may be unsuitable due to A)Enormity of data B)High dimensionality C)A&B D)None of these 3.Each record contains a set of______________, one of the attributes is the ______________ A)Class& Attribute B)Class&object C)Attribute and class D)Class & Methods 4.Finding a model for class attribute as a function of the values of other attributes is called A)classification B)Clustering C)Regression D)None of these 5.___________is a process of partitioning a set of data (or objects) into a set of meaningful subclasses A)Regression B)Clustering C)Classification D)None of these 6.____________________is a data mining (machine learning) technique used to fit an equation to a dataset. A)Clustering B)Classification C)Regression D)None of these 7.Challenges of Data Mining A)Scalability B)Dimensionality C)A&B D)None of these 8._______________attribute has only a finite or countably infinite set of values A)continuous B)Discrete C)Final D)None of these 9.Real values can only be measured and represented using a finite number of digits in ______ attributes A)Continuous B)discrete C)Final D)None of these 10. _______________ is example for Discrete Attribute A)temperature B)Height C) Weight D)Zip codes 11._______________ is Example for Continuous Attribute A)counts B)set of words c)temperature D) A&B 12.Data that consists of a collection of records, each of which consists of a fixed set of attributesis called [B] A)data Matrix B)Record data C)document data D)None of these. 13._____________A special type of record data, whereeach record involves a set of items. A)Data Matrix B)Record data C)Transaction Data d)document Data 14. _______________are data objects with characteristics that are considerably different than most of the other data objects in the data set A)Outliers B)Missing values C)Duplicate dataD)None of these 15.Combining two or more attributes (or objects) into a single attribute (or object) is called A)Aggregation B)Sampling C)Feature creation D)Binarization 16.________________is the technique used for both the preliminary investigation of the dataand the final data analysis. A)Aggregation B)Feature creation C)Sampling D)Binarization 17.The Feature subset selection used which approaches effectively A)Brute-force approach B)Embedded approaches: C)A&B D)None of these 18.____________is the technique creates new attributes that can capture theimportant information in a data set much moreefficiently than the original attributes A)Aggregation B)Feature creation C)Sampling D)Feature Creation 20._______________is the methodology used in Feature creation A)Feature Extraction B)Feature Construction C)A&B D)None of these UNIT-II 1.__________________________is the node that has no incoming edges and zero or more outgoing edges A)Root Node B)Internal Nodes C)Leaf D)Terminal Nodes 2.______________________is the each of which has exactly one incoming edge and two or more outgoing edges A)Terminal Nodes B)Internal Nodes C)Leaf D)Root Node 3.________________is the nodes each of which has exactly one incoming edge and more outgoing edges A)Terminal Nodes B)Internal Nodes C)Leaf D)A&C 4.The test condition for a _______________generates two potential outcomes A)Binary B)Nominal C)Ordinal D)None of these 5._______________attributes can have many values ,its test condition can be expressed in two ways is called A)Binary B)Nominal C)Ordinal D)None of these 6.______________attributes can also produce binary or multiway splits.& also can be grouped A)Binary B)Nominal C)Ordinal D)None of these 7.__________Attributes ,the test conditions can be expressed as a comparison test (A<V) or (A>=V) with binary outcomes A)Binary B)Nominal C)Ordinal D)Continuous 8.Impurity measures such as entropy and Gini index tend to favor attributes that have a large number of distinct values A)Gain Ratio B)Get ration C)Receive ration D)Others 9.___________approach is typically used with classification techniques that can be parameterized to obtain model with different levels of complexity A)Verification B)validity C)Validation D)A&B 10. Which of the following databases is used to store time-related data? A Temporal databases B Relational databases C Transactional databases D Spatial databases 11. ___________ database stores the audio ,video and images A. Relational B multimedia C spatial D text 12. In which preprocessing technique multiple data sources may be combined A. Data cleaning B. Data Transformation C. Data Selection D. Data Integration 13.Many people treat data mining as a synonym for another popularly used term. A Knowledge discovery in databases C Knowledge acceptance in databases B Knowledge inventory in databases D Knowledge disposal in databases 14. ___ Mining tasks characterize the general properties of the data in the database. A Descriptive B Predictive C Metadata D Data 15.___is a comparison of the general features of target class data objects from one or a set of contrasting classes. A Data characterization B Data summarization C Data discrimination D None 16. __________ reduction reduces the data size by removing attributes A Stepwise forward Selection B Dimensionality C Data Compression D None of above 17. In which algorithm, a decision tree is grown in a recursive fashion by partitioning records intosuccessive subsets . A) Decision B) Hunt’s C) Recursive D) Quick sort 18. Decision tree induction is an approach for building classification models A)Clusters approach B) a nonparametric approach C) Informal approach D) Classification approach 19. After building the decision tree, step can be performed to reduce the size of the decision tree. A)Files Pruning B) Entities Pruning C) tree-pruning D) All of these 20. Decision trees that are too large are susceptible to a phenomenon known as A) modelfitting B) Estimated fitting C) Algorithm fitting D) over fitting. UNIT - III 1. …………………..is a measure which refers to a condition that covers more number of tuples. A.Entropy B. Information C. statistical test D. Rule quality 2.…………………...classifier compare a particular test tuple with its equivalent training tuple. A. Rule based B. Bayesian C. Nearest neighbour D. Case-based reasonin 3. If- then rules can be pruned out directly from training data using …………………….. A.Sequantial covering algorithm B.Apriori algorithm C.FB – growth algorithm D. Back –propagation algorithm 4.……………….is based on the assumption that there exists class conditional indepencies among the subsets of variables A.Nearest –neighbor classifier B. Rule-based classifier C. Bayesian classifier D. Artificial neural networks 5. The amount of information required to change impure partition to a pure partition is given by information (D)=…………………….. 𝑚 𝐴. -∑𝒎 𝒋=𝟏 𝒅𝒑𝒋𝒍𝒐𝒈𝟐𝒅𝒑𝒋 B. ∑𝑗=1 𝑑𝑝𝑗𝑙𝑜𝑔2𝑑𝑝𝑗 C..𝑑𝑝𝑗 − ∑𝑚 𝑗=1 𝑑𝑝𝑗𝑙𝑜𝑔2 d. 𝑑𝑝𝑗 + ∑𝑚 𝑗=1 𝑑𝑝𝑗𝑙𝑜𝑔2 6 .The hybrid combination of neural networks, fuzzy logic and probability reasoning is known as…. A. Computing B. Parallel computing C. Cloud computing D. None 7. ……………Problem can be solved by introducing a hidden layer between input and out layer A. XOR B. SOM C. Multi balance D. None 8. ……………. Is one of the ways to improve the accuracy of a decision tree induction A. Bagging B. Boosting C.SVM D.Training 9. In boosting votes depend on the …………………………….. A. Accuracy B. Error rate C. Output D. Input 10. ……………………..Approach does not return all the extracted values B. Template –based C. Bayesian A. Visualization D. none 11. …………….. is a circumstance in which the hidden variables either cause the pair of variables to reverse its direction or make them disappear A. e –estimate B.Simpson’s paradox C. A&B D. None 12. The statistical classifier that can predict the probalities ofclassfier membership is called……. A. Rulebased classfier B.Bayiesian classfier C. Nearest neighbour classifier D.None 13. weight adjustment of desired connection in order to predict the correct class membership is called…… A. Connectionist learning Connectionist reading C. connectionistadjustment D.connectionist imbalance 14.The misclassification role of error rate (ER) of a classifier’s’ is calculated by using ER………. A.1+A(R) B.1-E(R) C.1+E(R) D.1-A(R) 15.A rule based classifier uses a set of ………………. rules for classifier [ a] A. if then B. Bayesian C.for loop D. nearest 16.the ClOSENESS between two training tuples a1 and a2 is given a dist(a11,a22) [b] (a) −√∑𝑛𝑖=1(𝑎1𝑖 − 𝑎2𝑖)2 B. √∑𝒏𝒊=𝟏(𝒂𝟏𝒊 − 𝒂𝟐𝒊)𝟐 C.√− ∑𝑛𝑖=1(𝑎1𝑖 − 𝑎2𝑖)2 D.√∑𝑛𝑖=1(𝑎1𝑖 − 𝑎2𝑖)2 17………. Is an algorithm used for classification of both linear and non-linear A.Support vector machine B. Simple vector machine C.System vector machine D. support violating machine 18………… is a process of discriminating noise and missing data A. Data base B. data mining C. data cleaning D. data discrimination 19………… classifier develop decision boundaries that have arbitrary shapes providing flexible model representation A.Rule based B.Bayesian C. case based reasoning D. nearest neighbour 20……………….. problem is a condition where distribution of data sets is imbalanced[a] A. Class imbalance B.class balanced C. class equal D. class diagram 19………… classfier develop decision boundaries that have arbitrary shapes providing flexible model representation A. Rule based B. Bayesian C.Case based reasoning D. nearest neighbour 20……………….. problem is a condition where distribution of data sets is imbalanced A. Class imbalance B. class balanced C. class equal D. class diagram UNIT - IV 1. which operation generates new candidate k itemsets based on frequent a. candidate generation b. candidate pruning c. a&b d. none 2. which operation eliminates some of the candidates k items based on purning a. k-items b. generation c. brute force d. candidate pruning 3. -------------- method considers every k-itemset as a potential candidate a. geration b. pruning c. brute-force d. none 4.which strategy of trimming the exponential search space based on support measure a. generation b. support-based purning c. apriori d. monotone 5.-------------- is the first association rule mining algorithm that pioneered support-based a. apriori b. pruning c. anti d. monotone 6.----------- lowering the support threshold often results often results in more itemsets a. threshold b. apriori c. support threshold d. all 7.number of items are also known as ------------a. i/o costs b. dimensionality c. algorithm d. width 8. The Apriori algorithm makes repeated passes over the data set at runtime a. Transactions 9.a search for frequent a.specific b. width c. items d. rule generation item sets can be viewed as a b.specific to general c.traversal of itemset lattice d.frequent 10.the apriori algorithm traverses the lattice in a-------------a.depth first b.back tracking c .gready d. breadth first 11.it encodes the data set using a compact data structure called an a. FP-growth b. transaction c. data d. FP-tree 12.the representation on the left is called a. vertical b. horizontal c. fp-side d. straight 13.the first step is to gather all the paths containing node e. these paths are known as a. postfix paths b. suffix paths c. prefix paths d.none 14.--------------- approaches requires a user-friendly environment to keep human user in loop a. template b. visualization c. a&b d.none 15.association rules that contain continuous attributes are commonly known as a. statics b. association c. quantitative association d. discretion 16.a concept hierarchy can be represented using a----------a. dag b. cyclic c. taxonomy d. transaction 17.------------- approach allows the user constraints type of patterns[ a. visual b. based c. template based d.none 18.some associations may appear or disappear when condition upon the value of certain variables are a. paradox b. simpson’s paradox c. synposis d. contingency 19.whe the low-frequency item such as caviar such patterns are called---------a. Cross-product b. cross relation c. skewed d.cross-support 20.the ---------- uses a level-wise approach for generating association rules A.apriori algorithm b.depth c. a&b d.none UNIT -V 1.Each object is represented by the index of the prototype associated with its cluster is known as a. vector quantization b. compression c. a&b d.none 2.--------------analysis groups data objects based on information found in the data a. cluster b. partitronal c. a&b d.none 3.The centroids are indicated by the ------ symbol a.- b.* c.+ d./ 4.how many basic steps for generating a hierarchical clustering a.1 b.2 c.3 d.4 5.a hierarchial clustering is displayed graphically using a tree-like diagram called a-------a. divisive b. dendrogram c. a&b d.none 6.Which is the worst case of time complexity DBSCAN algorithm a. o(m2) b. o(m) c. a&b d.none 7.Supervised measures are often called a. supervised b. internal indices c. external indices d.none c .a&b d.none 8.Unsupervised measures are often called a. External indices b. Internal indices 9.CLUTO means a. clustering tool kit b. clustering tool key c. a&b d. statistical 10.a cluster is modeled as a -------- distribution a. clusters b. fazzy c. a&b d. self 11.standard for SOM a. Self organizing maps b. self-organization mat c. a&b d.none 12.The probability of the data regarded as a function of the parameters is called a a. likelihood function b. like function c. a&b d.none 13.Which is the density based clustering a. CLIQUE b. CLAQUE c. DENCLEU d. DENCLUE 14.Which step can be considered as chameleon algorithm a. specification b. graph partitioning c. a&b d.none 15.Which is a highly efficient clustering technique for data in Euclidean vector spaces a. DBSCAN b. BIRCA c. BIRCD d. BIRCH 16.Which is a clustering algorithm that uses a variety of different techniques a. DBSCAN b. CLIQUE c. DENCLUE d. CURE 17.Which clustering technique that was specifically designed for clustering space a. opossum b. opossim c. a&b d.none c. a&b d.none 18.CLIQUE means a. CLUSTERING IN QUEST b. CLUSTING 19.Which theory allows the object to a set with a degree membership between 0 and 1 a. fuzzy b. clusters c. a&b d. fuzzy set 20.Which is the characteristics of clustering algorithm a. scalability b. parameter section c. a&b d. none

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download UNIT-I 1.Non-trivial extraction of ______, previously unknown and