Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CUSTOMER_CODE SMUDE DIVISION_CODE SMUDE EVENT_CODE APR2016 ASSESSMENT_CODE BT9001_APR2016 QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 34709 QUESTION_TEXT Write a note on the process of knowledge discovery. SCHEME OF EVALUATION KDD is over all process of discovering useful knowledge from data • Develop an understanding for the application domain • Create a target dataset • Data Cleaning and pre processing • Data Reduction and projection • Selection of appropriate data mining task • Selection of algorithm • Data Mining • Interpretation and visualization • Consolidating discovered knowledge QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 34711 QUESTION_TEXT Write a note on Business Intelligence infrastructure. SCHEME OF EVALUATION 1. 2. 3. 4. 5. 6. 7. 8. 9. The information warehouse layer Customer intelligence Business decisions Operation intelligence Application layer Business Intelligence requirements Presenting Portals can reduce the overall infrastructure cost Web –based portals QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 73678 QUESTION_TEXT What is the need of data mining in security systems? SCHEME OF EVALUATION 1. Data mining offers a convenient way to monitor the large computer networks, thus providing the security. The data mining system does this by developing a profile of the typical activities of each user in the network. 2. To improve the accuracy, data mining programs are used to analyze audit data and extract features that can distinguish normal activities from intrusions. 3. To improve efficiency, the computational costs of features are analyzed and a multiple-model cost-based approach is used to produce detection models with low cost and high accuracy. 4. To improve usability, adaptive learning algorithms are used to facilitate model construction and incremental updates. Unsupervised anomaly detection algorithms are used to reduce the reliance on labeled data. 5. Security of network systems is becoming increasingly important. Intrusion detection systems have thus become a critical technology. 6. IDS models generalize from both known attacks and normal behavior in order to detect unknown attacks. 7. Data mining based IDSs have higher false positive rates than traditional handcrafted signature-based methods, making them unusable in real systems 8. The anomaly detection algorithms explore the use of information theoretic measures, i.e. entropy, conditional entropy, relative entropy, information gain and information cost to capture intrinsic characteristics of normal data and use such measures to guide the process of building and evaluating anomaly detection models. QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 73680 QUESTION_TEXT What is un-supervised learning? When is it required for analysis? SCHEME OF EVALUATION Unsupervised learning is a class of problems in which one seeks to determine how the data are organized. Unsupervised learning is closely related to the problem of density estimation in statistics. (2 M) Clustering is one form of unsupervised learning. This unsupervised learning is required whenever an estimate of group operations is wanted. Clustering is the method by which like records are grouped together. Usually this is done to give the end user a high level view of what is going on in the database. Clustering is sometimes used to mean segmentation – which most marketing people will tell is useful for coming up with a birds eye view of business. Clustering is a form of learning by observation rather than learning by examples. Clustering is the classification of similar objects into different groups or more precisely, the partitioning of a data set into subsets, so that the data in each subset share common characteristics. (4 M) These can be used to help understand the business better and also exploited to improve future performance through predictive analytics. For example, data mining can warn you that there’s a high probability that a specific customer won’t pay on time. This is based on an analysis of customers with similar characteristics. There are 4 clustering methods 1. K-means 2. Hierarchical 3. Agglomerative 4. Divisive (4M) QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 118705 QUESTION_TEXT Explain Data Transformation process and Data Reduction process. Data transformation process: In data transformation, the data are transformed or consolidated into forms appropriate for mining. Data transformation can involve the following: SCHEME OF EVALUATION • Smoothing • Aggregation • Generalization • Normalization • Attribute construction Data Reduction: Data reduction techniques can be applied to obtain a reduced representation of the data set that is much smaller in volume, yet closely maintains the integrity of the original data. That is, mining on the reduced data set should be more efficient yet produce the same (or almost the same) analytical results. Strategies for data reduction include the following: 1. Date cube aggregation 2. Dimension reduction 3. Data compression 4. Numerosity reduction 5. Discretization and concept hierarchy generation QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 118707 QUESTION_TEXT Briefly explain the applications of data mining with reference to business applications and scientific applications. SCHEME OF EVALUATION Business Applications: Data mining has emerged this decade as a key technology for areas such as business intelligence, marketing, and so forth. For the purposes of discussion, application, and business domains we consider here include telecommunications, medical devices, space science (vehicle health management and scientific instrumentation), targeted marketing, and mining…………….(Explain in detail) Scientific applications using data mining: Recent progress in scientific and engineering applications has accumulated huge volumes of high dimensional data, stream data, unstructured and semi structured data, and spatial and temporal data. Highly scalable and sophisticated data mining tools for such applications represent one of the most active research frontiers of data mining. Here, we outline the related challenges in several emerging domains……………. (Explain in detail)