Download CUSTOMER_CODE SMUDE DIVISION_CODE SMUDE

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
CUSTOMER_CODE
SMUDE
DIVISION_CODE
SMUDE
EVENT_CODE
APR2016
ASSESSMENT_CODE BT9001_APR2016
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
34709
QUESTION_TEXT
Write a note on the process of knowledge discovery.
SCHEME OF EVALUATION
KDD is over all process of discovering useful knowledge from data
• Develop an understanding for the application domain
• Create a target dataset
• Data Cleaning and pre processing
• Data Reduction and projection
• Selection of appropriate data mining task
• Selection of algorithm
• Data Mining
• Interpretation and visualization
• Consolidating discovered knowledge
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
34711
QUESTION_TEXT
Write a note on Business Intelligence infrastructure.
SCHEME OF EVALUATION
1.
2.
3.
4.
5.
6.
7.
8.
9.
The information warehouse layer
Customer intelligence
Business decisions
Operation intelligence
Application layer
Business Intelligence requirements
Presenting
Portals can reduce the overall infrastructure cost
Web –based portals
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
73678
QUESTION_TEXT
What is the need of data mining in security systems?
SCHEME OF
EVALUATION
1. Data mining offers a convenient way to monitor the large computer
networks, thus providing the security. The data mining system does this by
developing a profile of the typical activities of each user in the network.
2. To improve the accuracy, data mining programs are used to analyze audit
data and extract features that can distinguish normal activities from intrusions.
3. To improve efficiency, the computational costs of features are analyzed
and a multiple-model cost-based approach is used to produce detection
models with low cost and high accuracy.
4. To improve usability, adaptive learning algorithms are used to facilitate
model construction and incremental updates. Unsupervised anomaly detection
algorithms are used to reduce the reliance on labeled data.
5. Security of network systems is becoming increasingly important. Intrusion
detection systems have thus become a critical technology.
6. IDS models generalize from both known attacks and normal behavior in
order to detect unknown attacks.
7. Data mining based IDSs have higher false positive rates than traditional
handcrafted signature-based methods, making them unusable in real systems
8. The anomaly detection algorithms explore the use of information theoretic
measures, i.e. entropy, conditional entropy, relative entropy, information gain and
information cost to capture intrinsic characteristics of normal data and use such
measures to guide the process of building and evaluating anomaly detection models.
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
73680
QUESTION_TEXT
What is un-supervised learning? When is it required for analysis?
SCHEME OF
EVALUATION
Unsupervised learning is a class of problems in which one seeks to determine
how the data are organized. Unsupervised learning is closely related to the
problem of density estimation in statistics. (2 M)
Clustering is one form of unsupervised learning. This unsupervised learning is
required whenever an estimate of group operations is wanted.
Clustering is the method by which like records are grouped together. Usually
this is done to give the end user a high level view of what is going on in the
database. Clustering is sometimes used to mean segmentation – which most
marketing people will tell is useful for coming up with a birds eye view of
business. Clustering is a form of learning by observation rather than learning by
examples.
Clustering is the classification of similar objects into different groups or more
precisely, the partitioning of a data set into subsets, so that the data in each
subset share common characteristics. (4 M)
These can be used to help understand the business better and also exploited to
improve future performance through predictive analytics. For example, data
mining can warn you that there’s a high probability that a specific customer
won’t pay on time. This is based on an analysis of customers with similar
characteristics.
There are 4 clustering methods
1. K-means
2. Hierarchical
3. Agglomerative
4. Divisive (4M)
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
118705
QUESTION_TEXT
Explain Data Transformation process and Data Reduction process.
Data transformation process:
In data transformation, the data are transformed or consolidated into
forms appropriate for mining. Data transformation can involve the
following:
SCHEME OF
EVALUATION
•
Smoothing
•
Aggregation
•
Generalization
•
Normalization
•
Attribute construction
Data Reduction:
Data reduction techniques can be applied to obtain a reduced
representation of the data set that is much smaller in volume, yet closely
maintains the integrity of the original data. That is, mining on the
reduced data set should be more efficient yet produce the same (or
almost the same) analytical results.
Strategies for data reduction include the following:
1.
Date cube aggregation
2.
Dimension reduction
3.
Data compression
4.
Numerosity reduction
5.
Discretization and concept hierarchy generation
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
118707
QUESTION_TEXT
Briefly explain the applications of data mining with reference to business
applications and scientific applications.
SCHEME OF
EVALUATION
Business Applications: Data mining has emerged this decade as a key
technology for areas such as business intelligence, marketing, and so
forth. For the purposes of discussion, application, and business domains
we consider here include telecommunications, medical devices, space
science (vehicle health management and scientific instrumentation),
targeted marketing, and mining…………….(Explain in detail)
Scientific applications using data mining: Recent progress in scientific
and engineering applications has accumulated huge volumes of high
dimensional data, stream data, unstructured and semi structured data, and
spatial and temporal data. Highly scalable and sophisticated data mining
tools for such applications represent one of the most active research
frontiers of data mining. Here, we outline the related challenges in
several emerging domains……………. (Explain in detail)