Download CUSTOMER_CODE SMUDE DIVISION_CODE SMUDE

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
CUSTOMER_CODE
SMUDE
DIVISION_CODE
SMUDE
EVENT_CODE
Jan2017
ASSESSMENT_CODE BT9001_Jan2017
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
34714
QUESTION_TEXT
Discuss Clustering and segmentation Software.
SCHEME OF EVALUATION
1. BayeslialLab
2. ClustanGraphics3
3. CViz Cluster visualization
4. Neusciences
5. PloyAnalyst
6. StarProbe
Free Open Source
1. Autocalss
2. CLUTO
3. Databionic
4. David Dowe Mixture
5. MCLUST/EMCLUST
6. PermutMatrix
7. PROXIMUS
(Scheme: 10 marks)
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
73675
QUESTION_TEXT
Explain the A Priori Algorithm.
SCHEME OF EVALUATION
Initialize: K:1,C1=all the 1-itemsets;
Read the database to count the support of C1 to determine L1
L1:={frequent 1-itemsets};
K:=2 //k represents the pass number//
While (Lk-1≠φ) do
Begin
Ck:=gen_candidate_itemsets with the given Lk-1
Prune(Ck)
For all transactions t Є T do
increment the count of all candidates in Ck that are contained in t;
Lk:=All candidates in Ck with minimum support
K:=k+1;
End
Answer:= Uk Lk;
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
73677
QUESTION_TEXT
Explain the objectives of using data mining in business.
SCHEME OF
EVALUATION
The objectives of using data mining in business are
1. To help data engineers in a large corporation investigate the bad debts
database and uncover useful patterns in selecting targets for debt recovery,
thereby dramatically improving the corporation’s debt recovery.
2. To understand the difference between the results of the average
practitioner’s and the quality practitioner.
3. To find the right balance between software, intellectual property and so
forth.
4. The business expert not only uses the results of data mining but also
evaluates them, and this evaluation should be a continual source of guidance
for the data mining process
5. The process must be thoroughly domain-oriented rather than technically
oriented, and the tools must support an interactively, incremental and iterative
style of work.
6. Data mining techniques can be implemented rapidly on existing software
and hardware platforms across D&B to enhance the value of existing
resources and can be integrated with new products and systems as they are
brought on-line.
7. The commercial success of data mining lies in providing true value to the
business person in a form that can be used and understood by the business
community.
8. Data mining tool aims to empower the business analyst to explore and understand
the dataset in relation to his/her own knowledge, rather than aiming to replace the
analyst with some automated data-discovery algorithm.
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
73679
QUESTION_TEXT
Differentiate between data warehouse and business intelligence.
SCHEME OF
EVALUATION
Data warehousing deals with all aspects of managing the development,
implementation and operation of a data warehouse or data mart including meta
data management, data acquisition, data cleansing, data transformation,
storage management, data distribution, data archiving, operational reporting,
analytical reporting, security management, backup/recovery planning etc. (5 M)
Business intelligence, on the other hand, is a set of software tools that enable
an organization to analyze measurable aspects of its business such as sales
performance, profitability, operational efficiency, effectiveness of marketing,
campaigns, market penetration among certain customer groups, cost trends,
anomalies and expectations etc. Business intelligence is used to refer to
systems and technologies that provide the business with the means for
decision-makers to extract personalized meaningful information about their
business and industry, not typically available from internal systems alone. This
includes advanced decision support tools and backroom systems and
databases to support these tools. Business intelligence encompasses any and
all decision support activities, whether operational, tactical or strategic. (5 M)
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
73680
QUESTION_TEXT
What is un-supervised learning? When is it required for analysis?
SCHEME OF
EVALUATION
Unsupervised learning is a class of problems in which one seeks to determine
how the data are organized. Unsupervised learning is closely related to the
problem of density estimation in statistics. (2 M)
Clustering is one form of unsupervised learning. This unsupervised learning is
required whenever an estimate of group operations is wanted.
Clustering is the method by which like records are grouped together. Usually
this is done to give the end user a high level view of what is going on in the
database. Clustering is sometimes used to mean segmentation – which most
marketing people will tell is useful for coming up with a birds eye view of
business. Clustering is a form of learning by observation rather than learning by
examples.
Clustering is the classification of similar objects into different groups or more
precisely, the partitioning of a data set into subsets, so that the data in each
subset share common characteristics. (4 M)
These can be used to help understand the business better and also exploited to
improve future performance through predictive analytics. For example, data
mining can warn you that there’s a high probability that a specific customer
won’t pay on time. This is based on an analysis of customers with similar
characteristics.
There are 4 clustering methods
1. K-means
2. Hierarchical
3. Agglomerative
4. Divisive (4M)
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
118705
QUESTION_TEXT
Explain Data Transformation process and Data Reduction process.
Data transformation process:
In data transformation, the data are transformed or consolidated into
forms appropriate for mining. Data transformation can involve the
following:
SCHEME OF
EVALUATION
•
Smoothing
•
Aggregation
•
Generalization
•
Normalization
•
Attribute construction
Data Reduction:
Data reduction techniques can be applied to obtain a reduced
representation of the data set that is much smaller in volume, yet closely
maintains the integrity of the original data. That is, mining on the
reduced data set should be more efficient yet produce the same (or
almost the same) analytical results.
Strategies for data reduction include the following:
1.
Date cube aggregation
2.
Dimension reduction
3.
Data compression
4.
Numerosity reduction
5.
Discretization and concept hierarchy generation