Download 1.Non-trivial extraction of ______, previously unknown and

1.Non-trivial extraction of ____________, previously unknown and potentially useful information from dataNon-trivial extraction of implicit, previously unknown and potentially useful information from dataNon-trivial extraction of implicit, previously unknown and potentially useful information from data (A) A)Implicit B)Explicit C)A&B D)None of these 2.Traditional Techniques may be unsuitable due to A)Enormity of data B)High dimensionality C)A&B (C) D)None of these 3.Each record contains a set of______________, one of the attributes is the ______________ [C] A)Class& Attribute B)Class&object C)Attribute and class D)Class & Methods 4.Finding amodel for class attribute as a function of the values of other attributes is called [A] A)classification B)Clustering C)Regression D)None of these 5.___________is a process of partitioning a set of data (or objects) into a set of meaningful sub-classes A)Regression [B] B)Clustering C)Classification D)None of these 6.____________________is a data mining (machine learning) technique used to fit an equation to a dataset. [C] A)Clustering B)Classification C)Regression D)None of these 7.Challenges of Data Mining A)Scalability [C] B)Dimensionality C)A&B D)None of these 8._______________attribute has only a finite or countably infinite set of values A)continuous B)Discrete C)Final [B] D)None of these 9.Real values can only be measured and represented using a finite number of digits in ______ attributes [A] A)ContinuousB)discrete C)Final D)None of these 10. _______________ is example for Discrete Attribute A)temperature B)Height C) Weight [D] D)Zip codes 11._______________ is Example for Continuous Attribute A)counts B)set of words [C] c)temperature D) A&B 12.Data that consists of a collection of records, each of which consists of a fixed set of attributesis called [B] A)data Matrix B)Record data C)document data D)None of these. 13._____________A special type of record data, whereeach record involves a set of items. [c] A)Data Matrix B)Record data C)Transaction Data d)document Data 14. _______________are data objects with characteristics that are considerably different than most of the other data objects in the data set [A] A)Outliers B)Missing values C)Duplicate dataD)None of these 15.Combining two or more attributes (or objects) into a single attribute (or object) is called A)Aggregation [A] B)Sampling C)Feature creation D)Binarization 16.________________is the technique used for both the preliminary investigation of the dataand the final data analysis. [C] A)Aggregation B)Feature creation C)Sampling D)Binarization 17.The Feature subset selection used which approaches effectively A)Brute-force approach B)Embedded approaches: C)A&B [C] D)None of these 18.____________is the technique creates new attributes that can capture theimportant information in a data set much moreefficiently than the original attributes [D] A)Aggregation B)Feature creation C)Sampling D)Feature Creation 20._______________is the methodology used in Feature creation A)Feature Extraction B)Feature Construction C)A&B D)None of these UNIT-II 1.__________________________is the node that has no incoming edges and zero or more outgoing edges [A] A)Root Node B)Internal Nodes C)Leaf D)Terminal Nodes 2.______________________is the each of which has exactly one incoming edge and two or more outgoing eddges [b] A)Terminal Nodes B)Internal Nodes C)Leaf D)Root Node 3.________________is the nodes each of which has exactly one incoming edge and more outgoing edges [D] A)Terminal Nodes B)Internal Nodes C)Leaf D)A&C 4.The test condition for a _______________generates two potential outcomes A)Binary B)Nominal C)Ordinal [A] D)None of these 5._______________attributes can have many values ,its test condition can be expressed in two ways is called [B] A)Binary B)Nominal C)Ordinal D)None of these 6.______________attributes can also produce binary or multiway splits.& also can be grouped [C] A)Binary B)Nominal C)Ordinal D)None of these 7.__________Attributes ,the test conditions can be expressed as a comparison test (A<V) or (A>=V) with binary outcomes [D] A)Binary B)Nominal C)Ordinal D)Continuous 8.Impurity measures such as entropy and Gini index tend to favor attributes that have a large number of distinct values [A] A)Gain Ratio B)Get ration C)Receive ration D)Others 9.___________approach is typically used with classification techniques that can be parameterized to obtain moel with different levels of complexity [C] A)Verification B)validity C)Validation D)A&B 10._____________approach is tree-growing algorithm is halted before generating a fully grown tree that perfectly fir entire training data [] A)Pre Pruning B)Post Pruning C)A&B D)None of these 1. The purpose of preprocessing is to transform the raw input data into an appropriate format for subsequent analysis 2.Data mining tasks are generally divided into two major categories 3.The attribute to be predicted is commonly known as the target or dependent variable 4.Association analysis is used to discover patterns that describe strongly associated features in the data 5.Anomaly detection is the task of identifying observationsw hose characteristics are significantly different from the rest of the data 6.A data set can often be viewed as a collection of data objects. 7. An attribute is a property or characteristic of an object that may vary; either from one object to another or from one time to another. 8.Nominal and ordinal attributes are collectively referred to as categorical or qualitative attributes 9.Much data mining work assumes that the data set is a collection of records 10.The best discretization and binarization approach is the one that "produces the best result for the data mining algorithm that will be used to analyze the data." 11.Avariable transformation refers to a transformation that is applied to all the values of a variable 12.the similarity between two objects is a numerical measure of the degree to which the two objects are alike 13.The dissimilarity between two objects is a numerical measureof the degree to which the two objects are different 14.Similarity measures between objects that contain only binary attributes are called similarity coefficients 15. EDA stands for Exploratory Data Analysis 16.Themode of a categorical attribute is the value that has the highest frequency. 17. two of the most widely used summary statistics are the mean and median, 18.To overcome problems with the traditional definition of a mean, the notion of a trimmed mean is sometimes used 19.Data visualization is the display of information in a graphic or tabular format. 20.ECDF stands for empirical cumulative distribution function 21.A multidimensional representation of the data, together with all possible totals (aggregates), is known as a data cube. 22. Slicing is selecting a group of cells from the entire multidimensional array by specifying a specific value for one or more dimensions 23.Data mining is______________________ [ A. knowledge mining from data B. knowledge extraction C. data analysis d ] D. all 24.OLAP stands for ________________ [ b ] A. off-line analytical processing B.on-line analytical processing C. off-line analytical program D. on-line analytical program 25. Which of the following databases is used to store time-related data? A Temporal databases B Relational databases C Transactional databases D Spatial databases [ a 26. ___________ database stores the audio ,video and images A. Relational B multimedia [b C spatial B. Data Transformation C. Data Selection [ d [ a A Knowledge discovery in databases B Knowledge inventory in databases C Knowledge acceptance in databases D Knowledge disposal in databases 29. ___ Mining tasks characterize the general properties of the data in the database. B Predictive C Metadata ] D. Data Integration 28.Many people treat data mining as a synonym for another popularly used term. A Descriptive ] D text 27. In which preprocessing technique multiple data sources may be combined A. Data cleaning ] ] [a] D Data 30.___is a comparison of the general features of target class data objects from one or a set of contrasting classes. [ c ] A Data characterization B Data summarization C Data discrimination D None 31.KDD stands for_______knowledge discovery from data_ 32. OLTP stands for_ on-line transaction processing 33.The domain knowledge that is used to guide the search or evaluate the interesting of resulting patterns called [ a ] A Knowledge base B Data Mining Engine C Graphical user interface D Both (a) & (b) 34. __________ reduction reduces the data size by removing attributes A Stepwise forward Selection B Dimensionality C Data Compression [ a ] D None of above 35. In_____ the attribute data are scaled so as to full within a small specified range [ b ] A Smoothing B Normalization C Classification D None of above 36._______consists of sequences of values or events obtained over repeated measurements of time [ a ] A Time Series database B Spatial database C Test database D Multimedia database 37.Data mining refers___________ knowledge from large amounts of data A. refreshing B. deleting C. extracting D. all [ c 38. ____ Database consists of a file where each record represents a transaction [ c ] A. Spatial B. Spatio Temporal C. Transactional D. Data stream 39. Which one is used to identify the redundancy in numerical attributes A. Correlation coefficient B. Chi-square test ] [ a ] C. Both A&B D. None 40.Which data reduction technique removes the irrelevant, weakly and relevant attributes [ b ] A. Data cube aggregation B. Attribute Subset selection C. Numerosityreduction D. None 19 The Process of grouping a set of Physical or abstract objects into classes of Similarobjects ______________ [ a ] A Clustering is called B Classification C Segmenting D None 20 Which one is used to identify the redundancy in categorical attributes A. Correlation coefficient B.Chi-square test [ C. Both A&B b ] D. None 1) The Input Data For A Classification Task Is A Collection Of …………………… A) Files B) Entities C) Records D) All Of These 2) The Target Function Is Also Known Informally As A ……………… A) Clusters B) Classification Model C) Informal Function D) None of the above 3) Which node that has no incoming edges and zero or more outgoing edges A) Root B) Parent C) Child D) All of these 4) In which algorithm, a decision tree is grown in a recursive fashion by partitioning records intosuccessive subsets . A) Decision B) Hunt’s C) Recursive D) Quick sort 5) The ………… function extends the decision tree by creating a new node A) Createnode () B) Startnode () C) New node () D) Extendnode () 6) Decision tree induction is an approach for building classification models A)Clusters approachB) a nonparametric approachC) Informal approach D) Classification approach 7) The border between two neighbouring regions of different classes is known as a ………… A)Decisionboundary B) Parent boundary C) Child boundary D) neighbouring boundary 8) Constructive induction provides another way to partition the data into homogeneous) nonrectangular regions A)Files induction B) Entities inductionC) Constructive induction D) All of these 9) The estimated error helps the learning algorithm to do …………..selection A) Model selection B) Estimated selection C) Algorithm selection D) Extend selection 10)The difference in entropy is known as the …………….. A) Information gain B) Induction gainC) Constructive gain D) Entropy gain 11) After building the decision tree, step can be performed to reduce the size of the decision tree. A)Files Pruning B) Entities PruningC) tree-pruning D) All of these 12) Decision trees that are too large are susceptible to a phenomenon known as A) modelfitting B) Estimated fitting C) Algorithm fitting D) over fitting. 13) In Web usage mining, it is important to distinguish accesses made by human users from those due to... A)Files Pruning B) Web PruningC) Web robots D) Web snake 14) Web mining techniques to analyse ……… browsing behaviour A) Human B) EntitiesC) tree D) All of these 15) In the method, the original data with labelled examples is partitioned into two disjoint sets, called the training and the test sets, respectively A)Straec B) Roll upC) Holdout D) Roll down 16) The method can be repeated several times to improve the estimation of a classifier's performance A) Analyse B) BootstrapC) Holdout D) Random 17) In the ……….approach, the training records are sampled with replacement A)BootstrapB) EntitiesC) tree D) All of these 18) To determine the confidence interval, we need to establish the probability distribution A)Probability distributionB) Statisticdistribution C) tree distribution D) Data distribution 19) The task of predicting the class labels of test records can also be considered as a A) Task experiment B) Random experiment C) binomial experiment D) None of the above 20) Model over fitting may arise in learning algorithms that employ a methodology known as A)Multiple comparison procedure B) Comparisonprocedure c) tree D) All of these

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download 1.Non-trivial extraction of ______, previously unknown and