Download UNIT – I

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
UNIT – IV
Part – A
1. Define classification.
Data classification is a tow-step process. In the first step, a model is
built describing a predetermined set of data classes or concepts. In the
second step, the model is used for classification by estimating the predictive
accuracy of the model.
2. Define prediction.
Prediction can be viewed as construction and use of a model to
access the class of an unlabelled sample or to access the value of an
attribute that a given sample is likely to have.
3. What is the difference between classification and regression?
Classification is used to predict discrete or nominal values. While
regression is used to predict continuous or ordered values.
4. What is supervised and unsupervised learning?
If the class label of each training sample is provided, it is called
supervised learning.
If the class label of each training sample is not known and the
number or set of classes to be learned is not known in advance, it is called
unsupervised learning.
5. What is accuracy of a model?
The accuracy of a model on a given test is the percentage of test set
samples that are correctly classified by the model. If accuracy is considered
acceptable, the model can be used to classify future data tuples or objects for
which the class label is not known.
6. What are the preprocessing steps for classification and prediction?
The following preprocessing steps may be applied
 Data cleaning
 Relevance analysis
 Data transformation
7. How are various classification methods compared?
 Predictive accuracy
 Speed
 Robustness
 Scalability
 Interpretability
8. What is a decision tree?
A decision tree is a flow-chart like tree structure, where each internal
node represents a test on an attribute, each branch represents an outcome of
the test and leaf nodes represent classes or class distribution. The topmost
node in the tree is the root node.
9. What is tree pruning?
Tree pruning attempts to identify and remove branches that reflect
noise or outliers in the training data with the goal of improving classification
accuracy on unseen data.
10. Define information gain.
The information gain measure is used to select the test attribute at
each node in the tree. Such a measure is referred to as an attribute
selection measure or measure of goodness of split. The attribute with the
highest information gain is chosen as the test attribute for the current node.
11. How does tree pruning work?
There are two approaches to tree pruning
a) In prepruning approach, a tree is pruned by halting its construction
early. E.g. by deciding not to further split the training samples at a
given node. Upon halting, the node becomes a leaf node.
b) In postpruning approach, all branches from a fully-grown tree are
removed. The lowest pruned node becomes a leaf and is labeled
by the most frequent class.
12. How are classification rules extracted from a decision tree?
The knowledge represented in a decision tree can be extracted
and represented in the form of classification IF-THEN rules. One rule is
created for each path from the root to a leaf node.
E.g. IF age=”<=30” AND student = “yes” THEN buys_computer = “no”
IF age=”<=30” AND student = “no” THEN buys_computer = “yes”
13. What are the problems to which a decision tree is prone?
The following are the problems in using decision tree
1) Fragmentation - the number of samples at a given branch becomes
so small as to be statistically insignificant
2) Repetition – occurs when an attribute is repeatedly tested along a
given branch of a tree
3) Replication – duplicate subtrees exist within the tree
14. What is attribute construction?
Attribute construction is an approach for dealing with the problems of
fragmentation, repetition and replication in a decision tree. In this method, the
limited representation of the given attributes is improved by creating new
attributes based on the existing ones.
15. What is the scalability issue in decision tree algorithms?
Most decision tree algorithms have a restriction that the training
samples should reside in the mail memory. In data mining applications, very
large training sets of millions of samples are common. Hence, this restriction
limits the scalability of algorithms.
16. What is an exception threshold?
In decision tree induction, the partitioning of the subset is continued till
the portion of samples in the given subset is less than a threshold value,
known as exception threshold.
17. What are Bayesian classifiers?
Bayesian classifiers are statistical classifiers. They can predict
class membership probabilities such as the probability that a given sample
belongs to a particular class.
18. What is class conditional independence?
Naïve Bayesian classifiers assume that the effect of an attribute
value on a given class is independent of the values of the other attributes.
This assumption is called class conditional independence.
19. What are the two components of a belief network?
The two components of a belief network are
1) A directed acyclic graph, where each node represents a random variable
and each arc represents a probabilistic dependence
2) A conditional probability table (CPT) for each variable
20. What are the three steps in the learning algorithm of Bayesian network?
a) Compute the gradients
b) Take a small step in the direction of the gradient
c) Normalize the weights
Part – B
1. Explain classification by Decision tree induction in detail.
(16)
2. What is Bayesian classification? Explain in detail.
(16)
3. a) Explain the different classification methods.
(8)
b) What is prediction? Give an account of different types of regression. (8)
4. What is partitioning? Explain different partitioning methods for cluster
analysis.
5. Explain the various hierarchical methods for cluster analysis?
(16)