Download BI0041

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Forecasting wikipedia , lookup

Transcript
SMU-DDE-Assignments-Scheme of Evaluation
PROGRAM
SEMESTER
SUBJECT CODE &
NAME
BK ID
DRIVE
MARKS
Q.
No
1
A
2
A
MSC BIOINFORMATICS
4
BI0041
DATA WAREHOUSING AND DATA MINING
B0038
WINTER 2015
60
Total
Marks
Define a border set. Prove that every subset of any item set must contain either a frequent set
or a border set.
(Unit 4, Page No. 104)
Definition of border set: An item set is a border set if it is not a
2
10
frequent set, but all its proper subsets are frequent sets.
Explanation for every subset of any item set must contain either a
8
frequent set or a border set.
Discuss the following with suitable examples in the context of Data Cleaning.
a. Missing Values
(Unit 3, Page No. 73)
b. Noisy Data
Five methods of filling the missing values:
5
10
 Ignore the tuple: This is usually done when the class label is
missing (assuming the mining task involves classification or
description). This method is not very effective, unless the
tuple contains several attributes with missing values. It is
especially poor when the percentage of missing values per
attribute varies considerably.
 Fill in the missing value manually: In general, this approach is
time consuming and may not be feasible given a large data set
with many missing values.
 Use a global constant to fill in the missing value: Replace all
missing attribute values by the same constant, such as a label
like “Unknown” or- . If missing values are replaced by ,say,
“Unknown,” then the mining program may mistakenly think
that they form an interesting concept, since they all have a
value in common-that of “Unknown.” Hence, although this
method is simple, it is not recommended.
 Use the attribute mean to fill in the missing value: For
example, suppose that the average income of All Electronics
customer is $28,000. Use this value to replace the missing
value for income.
 Use the attribute mean for al samples belonging to the same
class as the given tuple.
Criteria
Explanation for ‘Noisy Data in the context of data clearing’ with
example.
Noise is random error or variance in measured variable.
Data smoothing techniques to remove noise are:
 Binning
Marks
5
SMU-DDE-Assignments-Scheme of Evaluation
3
A
4
A
5
A
6
 Clustering
 Combined computer and human inspection
 Regression
Explain any five major differences between operational database systems and data
warehouses.
(Unit 2, Page No. 43)
Differences between operational database systems and data
5X2
10
warehouses in terms of the following parameters:
 Users and system orientation
 Data contents
 Database design
 View
 Access patterns
Define data reduction. Explain the techniques of data reduction. (Unit 3, Page No. 81)
Definition of data reduction.
1
10
Data reduction is a technique applied to obtain a reduced
representation of the data set that is much smaller in volume, yet
closely maintains the integrity of the original data.
Explanation for the following strategies for data reduction include the
9
following:
1. Date cube aggregation, where aggregation operations are applied
to the data in the construction of a data cube.
2. Dimension reduction, where irrelevant, weakly relevant, or
redundant attributes or dimensions may be detected and removed.
3. Data compression, where encoding mechanisms are used to reduce
the data set size.
4. Numerosity reduction, where the data are replaced or estimated by
alternative, smaller data representations such as a parametric models
(which need store only the model parameters instead of the actual
data), or nonparametric methods such as clustering, sampling, and the
use of histograms.
5. Discretization and concept hierarchy generation, where raw data
values for attributes are replaced by ranges or higher conceptual
levels.
Define data mining. Explain the major issues in data mining.
(Unit 1, Page No. 11, 27)
Definition of data mining:
3
10
Data mining refers to extracting or “mining” knowledge from large
amounts of data.
Explaining the major issues in data mining:
7
 Mining different kinds of knowledge in databases
 Interactive mining of knowledge at multiple levels of
abstraction
 Incorporation of background knowledge
 Data mining query languages and ad hoc data mining
 Presentation and visualization of data mining results
 Handling noisy or incomplete data
 Pattern evaluation
Describe any five requirements for clustering in data mining.
(Unit 6, Page No. 179)
SMU-DDE-Assignments-Scheme of Evaluation
A
Explanation for any five of the following requirements of Clustering in
data mining;
 Scalability
 Ability to deal with different types of attributes
 Discovery of clusters with arbitrary shape
 Minimal requirements for domain knowledge to determine
input parameters
 Ability to deal with noisy data
 Insensitivity to the order of input records
 High dimensionality
 Constraint – based clustering
5X2
10
*A-Answer
Note –Please provide keywords, short answer, specific terms, specific examples (wherever necessary)
***********