Download Data Preprocessing,Measures of Similarity and Dissimilarity Basics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

K-means clustering wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
ST. ANN’S COLLEGE OF ENGINEERING &TECHNOLOGY: CHIRALA
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
LESSON PLAN
Subject: Data Warehousing And Data Mining
Name : CH.VEERABABU
Unit No.
I
II
Academic Year: 2015-16
Year & Sem/Section: IV-I-SEM 'A'
Sub Topic Names
Introduction to DWDM:What is Data Mining,Motivating
Challenges,Origins of Data Mining,Data Mining Tasks
Types of Data-Attributes and Measurements,Types of Data
sets, Data Quality
Data Preprocessing,Measures of Similarity and
Dissimilarity Basics,Similarity and Dissimilarity between
simple Attributes ,Dissimilarities between Data Objects,
Examples of Proximity measures,Similarity Measures for
Binary Data,,Jaccard Coeeficient, Cosine Similarity,
No. of
classes
required
06
10
Extended Jaccard Coeeficient, correlation, Exploring data:
dataset, summary statistics
III
Basic Concepts of Data warehouse, Data warehousing
Modelling:Data Cube and OLAP,Data warehouse
implementation:Efficient Data Cube Computation
Partial Materilization,Indexing OLAP Data
Efficient Processing of OLAP Queries
05
Basic Concepts of Classification,
General Approach to solving
a clasification Problem,Working of Decission Tree induction:
working of desition tree,Building a Decission Tree,Methods for
Expressing Attribute Test Conditions,Measures For selecting
the best split, Algorithm for Decission Tree
IV
V
VI
09
Induction,Model Over fitting:Due to presence of noise,
Due to lack of representation samples,Evaluating the
performance of Classifier: Holdout Method, Random sub
smapling,Cross-validation, Bootstrap
Classification Alternative Techniques: Bayesian classifier:
Bayes Theorem,Using Bayes Theorem for Classification,
Navie Bayes Classifier, Bayes Error Rate,Model
Representation, Model Building.
06
Association analysis:problem definition,frequent item set
generation-apriori principle, frequent item set generation
in the apriori algorithm,candidate generation and
pruning,support counting,rule generation,compact
representation of frequent item sets,fp-growth algorithms
07
VII
VIII
Overview Types of Clustering, Basic K-Means,
K-Means additional Issues, Bi-secting K-Means
K-means and different types of clusters,
Strengths and weaknesses,
K-Means as an optimization problem
07
Agglomarative Hierarchical Clustering, Basic Agglomarative
Hierarchical Clustering Algorithm, Specific Techniques,
DBSCAN, Traditional Density: Center Based Approach,
Strengths and weaknesses
06
TOTAL
56
TEXT BOOKS:
1. Introduction to Data Mining : Pang-Ning tan, Michael Steinbach,Vip Kumar, Pearson
2. Data Mining, Concepts and Techniques,3/e Jiawei HAN ,Micheline KAMBER,
Elsevier
REFERENCE BOOKS:
1.
2.
3.
4.
Introduciton to Data Mining with Case studies 2nd ed: Gk Gupta; PH
Data Mining: Introductory and Advanced Topics: Dunham , Sridhar Pearson.
Data Warehousing .Data Mining & OLAP, Alex Berson, Stephen j Smith, TMH
Data Mining Theory and Practice, Somnan, Diwakar, Ajay PHI,2006
FACULTY MEMBER
HEAD OF THE DEPARTMENT