Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CUSTOMER_CODE SMUDE DIVISION_CODE SMUDE EVENT_CODE Jan2017 ASSESSMENT_CODE BT9001_Jan2017 QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 34714 QUESTION_TEXT Discuss Clustering and segmentation Software. SCHEME OF EVALUATION 1. BayeslialLab 2. ClustanGraphics3 3. CViz Cluster visualization 4. Neusciences 5. PloyAnalyst 6. StarProbe Free Open Source 1. Autocalss 2. CLUTO 3. Databionic 4. David Dowe Mixture 5. MCLUST/EMCLUST 6. PermutMatrix 7. PROXIMUS (Scheme: 10 marks) QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 73675 QUESTION_TEXT Explain the A Priori Algorithm. SCHEME OF EVALUATION Initialize: K:1,C1=all the 1-itemsets; Read the database to count the support of C1 to determine L1 L1:={frequent 1-itemsets}; K:=2 //k represents the pass number// While (Lk-1≠φ) do Begin Ck:=gen_candidate_itemsets with the given Lk-1 Prune(Ck) For all transactions t Є T do increment the count of all candidates in Ck that are contained in t; Lk:=All candidates in Ck with minimum support K:=k+1; End Answer:= Uk Lk; QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 73677 QUESTION_TEXT Explain the objectives of using data mining in business. SCHEME OF EVALUATION The objectives of using data mining in business are 1. To help data engineers in a large corporation investigate the bad debts database and uncover useful patterns in selecting targets for debt recovery, thereby dramatically improving the corporation’s debt recovery. 2. To understand the difference between the results of the average practitioner’s and the quality practitioner. 3. To find the right balance between software, intellectual property and so forth. 4. The business expert not only uses the results of data mining but also evaluates them, and this evaluation should be a continual source of guidance for the data mining process 5. The process must be thoroughly domain-oriented rather than technically oriented, and the tools must support an interactively, incremental and iterative style of work. 6. Data mining techniques can be implemented rapidly on existing software and hardware platforms across D&B to enhance the value of existing resources and can be integrated with new products and systems as they are brought on-line. 7. The commercial success of data mining lies in providing true value to the business person in a form that can be used and understood by the business community. 8. Data mining tool aims to empower the business analyst to explore and understand the dataset in relation to his/her own knowledge, rather than aiming to replace the analyst with some automated data-discovery algorithm. QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 73679 QUESTION_TEXT Differentiate between data warehouse and business intelligence. SCHEME OF EVALUATION Data warehousing deals with all aspects of managing the development, implementation and operation of a data warehouse or data mart including meta data management, data acquisition, data cleansing, data transformation, storage management, data distribution, data archiving, operational reporting, analytical reporting, security management, backup/recovery planning etc. (5 M) Business intelligence, on the other hand, is a set of software tools that enable an organization to analyze measurable aspects of its business such as sales performance, profitability, operational efficiency, effectiveness of marketing, campaigns, market penetration among certain customer groups, cost trends, anomalies and expectations etc. Business intelligence is used to refer to systems and technologies that provide the business with the means for decision-makers to extract personalized meaningful information about their business and industry, not typically available from internal systems alone. This includes advanced decision support tools and backroom systems and databases to support these tools. Business intelligence encompasses any and all decision support activities, whether operational, tactical or strategic. (5 M) QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 73680 QUESTION_TEXT What is un-supervised learning? When is it required for analysis? SCHEME OF EVALUATION Unsupervised learning is a class of problems in which one seeks to determine how the data are organized. Unsupervised learning is closely related to the problem of density estimation in statistics. (2 M) Clustering is one form of unsupervised learning. This unsupervised learning is required whenever an estimate of group operations is wanted. Clustering is the method by which like records are grouped together. Usually this is done to give the end user a high level view of what is going on in the database. Clustering is sometimes used to mean segmentation – which most marketing people will tell is useful for coming up with a birds eye view of business. Clustering is a form of learning by observation rather than learning by examples. Clustering is the classification of similar objects into different groups or more precisely, the partitioning of a data set into subsets, so that the data in each subset share common characteristics. (4 M) These can be used to help understand the business better and also exploited to improve future performance through predictive analytics. For example, data mining can warn you that there’s a high probability that a specific customer won’t pay on time. This is based on an analysis of customers with similar characteristics. There are 4 clustering methods 1. K-means 2. Hierarchical 3. Agglomerative 4. Divisive (4M) QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 118705 QUESTION_TEXT Explain Data Transformation process and Data Reduction process. Data transformation process: In data transformation, the data are transformed or consolidated into forms appropriate for mining. Data transformation can involve the following: SCHEME OF EVALUATION • Smoothing • Aggregation • Generalization • Normalization • Attribute construction Data Reduction: Data reduction techniques can be applied to obtain a reduced representation of the data set that is much smaller in volume, yet closely maintains the integrity of the original data. That is, mining on the reduced data set should be more efficient yet produce the same (or almost the same) analytical results. Strategies for data reduction include the following: 1. Date cube aggregation 2. Dimension reduction 3. Data compression 4. Numerosity reduction 5. Discretization and concept hierarchy generation