Download - Free Documents

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
UNIT Marks
. What is datawarehouse A data warehouse is a subjectoriented, integrated, timevariant, and
nonvolatile collection of data in support of managements decisionmaking process
. What is the significant use of subject oriented datawarehouse Focusing on the modeling
and analysis of data for decision makers, not on daily operations or transaction processing.
Provide a simple and concise view around particular subject issues by excluding data that
are not useful in the decision support process. . Why do we use integrated version of
datawarehouse Constructed by integrating multiple, heterogeneous data sources relational
databases, flat files, online transaction records Data cleaning and data integration techniques
are applied. Ensure consistency in naming conventions, encoding structures, attribute
measures, etc. among different data sources
When data is moved to the warehouse, it is converted.
. What is the role of time variant feature in Datawarehouse The time horizon for the data
warehouse is significantly longer than that of operational systems. Operational database
current value data. Data warehouse data provide information from a historical perspective
e.g., past years Every key structure in the data warehouse Contains an element of time,
explicitly or implicitly But the key of operational data may or may not contain time element.
. What is meant by non volatile nature in datawarehouse A physically separate store of data
transformed from the operational environment. Operational update of data does not occur in
the data warehouse environment.
. State the difference between datawarehouse vs operational DBMS. Traditional
heterogeneous DB integration Build wrappers/mediators on top of heterogeneous databases
Query driven approach dictionary is used to translate the query into queries appropriate for
individual heterogeneous sites involved, and the results are integrated into a global answer
set
Data warehouse updatedriven, high performance Information from heterogeneous sources is
integrated in advance and stored in warehouses for direct query and analysis
. List the distinct features of OLTP with OLAP. Distinct features OLTP vs. OLAP User and
system orientation customer vs. market Data contents current, detailed vs. historical,
consolidated Database design ER application vs. star subject View current, local vs.
evolutionary, integrated Access patterns update vs. readonly but complex queries
. Why we need separate datawarehouse Different functions and different data missing data
Decision support requires historical data which operational DBs do not typically maintain data
consolidation DS requires consolidation aggregation, summarization of data from
heterogeneous sources
data quality different sources typically use inconsistent data representations, codes and
formats which have to be reconciled
. Give the conceptual modeling of datawarehouse. Modeling data warehouses dimensions
amp measures Star schema A fact table in the middle connected to a set of dimension tables
Snowflake schema A refinement of star schema where some dimensional hierarchy is
normalized into a set of smaller dimension tables, forming a shape similar to snowflake Fact
constellations Multiple fact tables share dimension tables, viewed as a collection of stars,
therefore called galaxy schema or fact constellation
. Define the distributive measure of datawarehouse categories.
distributive if the result derived by applying the function to n aggregate values is the same as
that derived by applying the function on all the data without partitioning.
. Define the algebraic measure of datawarehouse categories. algebraic if it can be computed
by an algebraic function with M arguments where M is a bounded integer, each of which is
obtained by applying a distributive aggregate function.
. Define the holistic measure of datawarehouse categories. holistic if there is no constant
bound on the storage size needed to describe a subaggregate.
. List the OLAP operations and their functionality Roll up drillup summarize data by climbing
up hierarchy or by dimension reduction
stored. What are the different types of datawarehouse design process Four views regarding
the design of a data warehouse a. sees the perspectives of data in the warehouse from the
view of enduser . Enterprise warehouse e. collects all of the information about subjects
spanning the entire organization . Topdown view i. Data warehouse view i. exposes the
information being captured. D to series of D planes. Data source view i.Drill down roll down
reverse of rollup from higher level summary to lower level summary or detailed data. or
introducing new dimensions Slice and dice project and select Pivot rotate reorient the cube.
Business query view i. Other operations drill across involving across more than one fact table
drill through through the bottom level of the cube to its backend relational tables using SQL .
allows selection of the relevant information necessary for the data warehouse b. consists of
fact tables and dimension tables d. Define enterprise warehouse. visualization. and managed
by operational systems c.
Only some of the possible summary views may be materialized . What is meant by datamart
a subset of corporatewide data that is of value to a specific groups of users. What are the
backend tools and utilities of Datawarehouse Data extraction h. and reporting using
crosstabs. Its scope is confined to specific. and external sources Data cleaning i. sort. What
are the applications of datawarehousing Three kinds of data warehouse applications
Information processing supports querying. detect errors in the data and rectify them when
possible Data transformation j. f. check integrity. get data from multiple. heterogeneous..
summarize. propagate the updates from the data sources to the warehouse . basic statistical
analysis. compute views. tables. convert data from legacy or host format to warehouse
format Load k. dependent directly from warehouse data mart . and build indicies and
partitions Refresh l. charts and graphs Analytical processing . consolidate. A set of views
over operational databases g. Define virtual warehouse. selected groups. such as marketing
data mart Independent vs.
. reporting and OLAP tools o. High quality of data in data warehouses i. Marks . ODBC.
cleaned data n. OLAPbased exploratory data analysis i. Web accessing. DW contains
integrated. pivoting. dicing. . pivoting Data mining knowledge discovery from hidden patterns
supports associations. and presenting the mining results using visualization tools. drilling.
Available information processing structure surrounding data warehouses i. integration and
swapping of multiple mining functions. constructing analytical models. multidimensional
analysis of data warehouse data supports basic OLAP operations. . Write short notes on
data warehouse Meta data. service facilities. and tasks. mining with drilling. . p. performing
classification and prediction. Online selection of data mining functions i. algorithms. Explain
the Conceptual Modeling of Data Warehouses . slicedice. consistent. Give the Architecture of
Datawarehouse and explain its usage. Why we need online analytical mining m. etc. Explain
the operations performed on data warehouse with examples . State the difference between
OLTP and OLAP in detail. OLEDB.
What is meant by Data cleaning Fill in missing values. List the multidimensional measure of
data quality i Accuracy ii Completeness iii Consistency iv Timeliness v Believability vi Value
added vii Interpretability viii Accessibility . Why we need data preporocessing.Unit . Data in
the real world is dirty i incomplete lacking attribute values. smooth noisy data. or containing
only aggregate data ii noisy containing errors or outliers iii inconsistent containing
discrepancies in codes or names . lacking certain attributes of interest. identify or remove
outliers. or files . data cubes. Integration of multiple databases. Define Data integration. . and
resolve inconsistencies. Why we need Data transformation .
or senior. especially for numerical data . What is meant by Data discretization It can be
defined as Part of data reduction but with particular importance. . middleaged. . Define
Concept hierarchy. Define Data reduction. Why we need Data Mining Primitives and
Languages unrealistic because the patterns could be too many but uninteresting o User
directs what to be mined . Data reduction Obtains reduced representation in volume but
produces the same or similar analytical results. What is the discretization processes involved
in data preprocessing It reduces the number of values for a given continuous attribute by
dividing the range of the attribute into intervals. . Interval labels can then be used to replace
actual data values. It reduce the data by collecting and replacing low level concepts such as
numeric values for the attribute age by higher level concepts such as young.o minmax
normalization o zscore normalization o normalization by decimal scaling i New attributes
constructed from the given ones .
What tasks should be considered in the design GUIs based on a data mining query
language i Data collection and data mining query composition ii Presentation of discovered
patterns iii Hierarchy specification and manipulation iv Manipulation of data mining primitives
v Interactive multilevel mining . Define Datamining Query Language. technology transfer.
What are the types of knowledge to be mined . i A DMQL can provide the ability to support
adhoc and interactive data mining ii By providing a standardized language like SQL a Hope
to achieve a similar effect like that SQL has on relational database b Foundation for system
development and evolution c Facilitate information exchange. commercialization and wide
acceptance .mining system g these primitives in a data mining query language .
constructs models for the database. predictive data mining i Descriptive mining describes
concepts or taskrelevant data sets in concise. summarative. relevant data tion and
visualization techniques to be used for displaying the discovered patterns . multiway join.
indexing. List the five primitives for specification of a data mining task. aggregation. mining
query is optimized based on mining query. informative.vi Other miscellaneous information .
Descriptive vs. No couplingflat file processing. query processing methods.g. not
recommended . Provide efficient implement a few data mining primitives in a DB/DW system.
What are the types of Coupling data mining system with DB/DW system . discriminative
forms ii Predictive mining Based on data and analysis. Tight couplingA uniform information
processing environment DM is smoothly integrated into a DB/DW system. histogram
analysis. e. and predicts the trend and properties of unknown data . etc. Loose coupling
Fetching data from DB/DW . . indexing. What is the strength of Data Characterization i An
efficient implementation of data generalization .. precomputation of some stat functions .
Semitight couplingenhanced DM performance a. sorting.
Give the basic principle of Attribute oriented induction. How Attribute oriented induction will
be done c. . What is the basic algorithm for Attribute oriented induction . deriving the initial
relation. b. Apply aggregation by merging identical. e.ii Computation of various kinds of
measures a e. a. or As higher level concepts are expressed in terms of other attributes. f.
threshold control typical . Interactive presentation with users. including dimensions. Give the
list of limitations of Data Characterization.g. InitialRel Query processing of taskrelevant data.
and the result is the initial relation. then select an operator and generalize A. Perform
generalization by attribute removal or attribute generalization. removal remove attribute A if
there is a large set of distinct values for A but there is no generalization operator on A.
handle only dimensions of simple nonnumeric data and measures of simple aggregated
numeric values. count . cant tell which dimensions should be used and what levels should
the generalization reach . max iii Generalization and specialization can be performed on a
data cube by rollup and drilldown . and there exists a set of generalization operators on A.. .
generalization If there is a large set of distinct values for A. average . generalized tuples and
accumulating their respective counts. Lack of intelligent analysis. relevant data. . sum .
specified/default. Collect the taskrelevant data initial relation using a relational database
query d.
Analytical characterization data dispersion analysis. Presentation of data summarization at
multiple levels of abstraction. mapping into rules. Frequency histograms i. Consists of a set
of rectangles that reflect the counts or frequencies of the classes present in the given data .. .
Automated desired level allocation. l. Whiskers two lines outside the box extend to Minimum
and Maximum . Presentation User interaction adjust levels by drilling. Interactive drilling.
pivoting. j. Differences i. accumulating the counts. Dimension relevance analysis and ranking
when there are many relevant dimensions. PreGen Based on the analysis of the number of
distinct values in each attribute. Define Histogram Analysis. slicing and dicing. The ends of
the box are at the first and third quartiles. i. PrimeGen Based on the PreGen plan. Graph
displays of basic statistical class descriptions q. . Data is represented with a box n.e. the
height of the box is IRQ o.. State the difference between Characterization and OLAP. The
median is marked by a line within the box p. pivoting. determine generalization plan for each
attribute removal or how high to generalize . A univariate graphical method ii. visualization
presentations. What is meant by Boxplot analysis m. cross tabs. perform generalization to
the right level to derive a prime generalized relation. Similarity g. k. h. Sophisticated typing on
dimensions and measures. .
Displays all of the data allowing the user to assess both the overall behavior and unusual
occurrences . Define Scatter Plot. For a data xi data sorted in increasing order. outliers and
boxplots r. Q. Outlier usually. max u.. Q. Plots quantile information . whiskers. and plot outlier
individually v. o Graphs the quantiles of one univariate distribution against the corresponding
quantiles of another o Allows the user to view whether there is a shift in going from one
distribution to another . a value higher/lower than . Interquartile range IQR Q Q t. median is
marked. What is meant by Quantile plot . dependence ted by setting two parameters a
smoothing parameter. How we can measure the dispersion of data Quartiles. Quartiles Q th
percentile. and the degree of the polynomials that are fitted by the regression . Boxplot ends
of the box are the quartiles. Define QuantileQuantile plot. Q th percentile s. M. pair of values
is treated as a pair of coordinates and plotted as points in the plane . fi indicates that
approximately fi of the data are below or equal to the value xi . x IQR . Five number summary
min. Give the definition for Loess Curve.
Unit . etc. probability that a transaction contains X Y Z n confidence. State the rule measure
for finding association. conditional probability that a transaction having X Y also contains Z .
clustering. catalog design. . Why do we preprocess the data Explain how data preprocessing
techniques can improve the quality of the data. . What is data cleaning List and explain
various techniques used for data cleaning . n support. c.What are the Applications of
Association rule mining n Basket data analysis. . classification. crossmarketing. List out and
describe the primitives for specifying a data mining task. How is Attribute Oriented Induction
implemented Explain with an example . lossleader analysis. s. Explain major tasks in Data
Preprocessing.Variance and standard deviation Marks . Describe how concept hierarchies
and data generalization are useful in data Mining.
..K buysx. Above Single level vs. DMBook buysx.List the Methods to Improve Aprioris
Efficiency. SQLServer buysx. DBMiner . . n Candidate itemsets are stored in a hashtree n
Leaf node of hashtree contains a list of itemsets and counts n Interior node contains a hash
table n Subset function finds all the candidates contained in a transaction .. PC . multiplelevel
analysis .n What brands of beers are associated with what brands of diapers Why counting
supports of candidates a problem n The total number of candidates can be very huge n One
transaction may contain many candidates . incomex. lower support threshold a method to
determine the completeness Dynamic itemset counting add new candidate itemsets only
when all of their subsets are estimated to be frequent .. Hashbased itemset counting A
kitemset whose corresponding hashing bucket count is below the threshold cannot be
frequent Transaction reduction A transaction that does not contain any frequent kitemset is
useless in subsequent scans Partitioning Any itemset that is potentially frequent in DB must
be frequent in at least one of the partitions of DB Sampling mining on a subset of given data.
n agex.What are the different way to find association Boolean vs. quantitative associations
Based on the types of values handled n buysx. . multiple dimensional associations see ex.
Single dimension vs.Give the Method to find supports of candidates.
or it contains only one path single path will generate all the combinations of its subpaths.List
the major step to mine FP Tree.Give the method for mininf frequent pattern using FP tree
Structure. Construct conditional pattern base for each node in the FPtree Construct
conditional FPtree from each conditional patternbase Recursively mine conditional FPtrees
and grow frequent patterns obtained so far If the conditional FPtree contains a single path.
construct its conditional patternbase. and then its conditional FPtree n Repeat the process on
each newly created conditional FPtree n Until the resulting FPtree is empty.What is the
advantage of FP Tree Structure Completeness n never breaks a long pattern of any
transaction n preserves complete information for frequent pattern mining Compactness n
reduce irrelevant informationinfrequent items are gone n frequency descending ordering
more frequent items are more likely to be shared n never be larger than the original database
if not count nodelinks and counts .What is meant by Nodelink property For any frequent item
ai. simply enumerate all the patterns . n For each item. starting from ais head in the FPtree
header Define Prefix path property .. each of which is a frequent pattern . all the possible
frequent patterns that contain ai can be obtained by following ais nodelinks.
Items at the lower level are expected to have lower support. What is the principle of frequent
pattern growth. Define Multiple Level Association Rule. . What is meant by Uniform support.
Items often form hierarchy. . no candidate test n Use compact data structure n Eliminate
repeated database scan n Basic operation is counting and FPtree building . B be as
conditional pattern base. Rules regarding itemsets at appropriate levels could be quite
useful.To calculate the frequent patterns for a node ai in a path P. and b be an itemset in B.
and its frequency count should carry the same count as node ai. Pattern growth property n
Let a be a frequent itemset in DB. and is also faster than treeprojection n No candidate
generation. Why Is Frequent Pattern Growth Fast n FPgrowth is an order of magnitude faster
than Apriori. only the prefix subpath of ai in P need to be accumulated. Uniform Support the
same minimum support for all levels . Transaction database can be encoded based on
dimensions and levels We can explore shared multilevel mining . Then a b is a frequent
itemset in DB iff b is frequent in B. What is meant by Icerberg query It Compute aggregates
over one or a set of attributes only for those whose aggregate values is above certain
threshold .
. First apply rough/cheap operator superset coverage Then apply expensive algorithm on a
substantially reduced candidate set .n One minimum support threshold.Define two or Multi
Step mining. fine or rough Trade speed with quality stepbystep refinement. No need to
examine itemsets containing any item whose ancestors do not have minimum support. What
do you meant by Reduced Support It reduced minimum support at lower levels There are
search strategies a Levelbylevel independent b Levelcross filtering by kitemset c Levelcross
filtering by single item d Controlled levelcross filtering by single item . . What is the
functionality of Superset mining It Preserve all the positive answersallow a positive false test
but not a false negative test. n Lower level items do not occur as frequently. Why progressive
refinement is suitable for reduced support Mining operator can be expensive or cheap. If
support threshold n too high miss low level associations n too low generate too many high
level associations .
. D limitation n An alternative to ARCS Nongridbased equidepth binning clustering based on
a measure of partial completeness. What is the limitations of ARCS n Only quantitative
attributes on LHS of rules. State Distancebased association rules n This is a dynamic
discretization process that considers the distance between data points. . What is categorical
and Quantitative Attribute Categorical Attributes finite number of possible values. implicit
ordering among values .. Define Quantitative association rules Quantitative attributes are
dynamically discretized into binsbased on the distribution of the data. Step Detailed spatial
algorithm as refinement Apply only to those objects which have passed the rough spatial
association test no less than minsupport . Give the twostep mining of spatial association. no
ordering among values Quantitative Attributes numeric. . Step rough spatial computation as a
filter Using MBR or Rtree for rough estimation. n Only attributes on LHS.
item Where X Customer.i Explain the methods to improve the Aprioris Efficiency. A. A.
/Describe join and prune steps in Apriori Algorithm. Marks . B T // T // T // C. ii Construct the
FP tree for given transaction DB . item I . Let minsup and minconf . B. .Discuss the
approaches for mining databases multi dimensional association rule from transactional
databases. V X transactions buysX. B. TID Date Itemsbought K. . D. A database has four
transactions. C. Give suitable examples. A. B D.A. A. etc. D i ii Find all frequent itemsets
using Apriori and FPgrowth respectively. E T // B. item buysX.Explain how mining will be
done in frequent item sets with an example. item buysX. E. List all strong association rules
matching the following Meta rule.Discuss the following in detail Association Mining Support
Confidence Rule measures .
c.c. Give the role of prediction in datamining. predicts categorical class labels b. List the
typical Applications of classification and prediction credit approval target marketing medical
diagnosis treatment effectiveness analysis .TID Frequent Itemsets f.a. classifies data
constructs a model based on the training set and the values class labels in a classifying
attribute and uses it in classifying new data . Define Supervised learning classification a.p f. It
models continuousvalued functions. Supervision The training data observations.a.m.c.p unit .
predicts unknown or missing values .b c.b. are accompanied by labels indicating the class of
the observations . What is the functionality of Classification process a.a.p f.e.m.. i.
measurements. etc.b.m f.
observations. New data is classified based on the training set .b. etc. What is the process
involved in Data Preparation Preprocess data in order to reduce noise and handle missing
values Remove the irrelevant or redundant attributes Generalize and/or normalize data .
What is meant by Unsupervised learning clustering a. How we can evaluate classification
methods scalability o time to construct the model o time to use the model o handling noise
and missing values o efficiency in diskresident databases o understanding and insight
provded by the model s . with the aim of establishing the existence of classes or clusters in
the data . Given a set of measurements. The class labels of training data is unknown b.
they are discretized in advance . What is the condition to stop the partitioning a. State the
functionality of Greedy Algorithm. What is meant by Decision Tree chartlike tree structure els
or class distribution . There are no remaining attributes for further partitioning majority voting
is employed for classifying the leaf c. What are the phases involved in Decision Tree a. Tree
construction b. All samples for a given node belong to the same class b. down recursive
divideandconquer manner inuousvalued.o decision tree size o compactness of classification
rules . There are no samples left . Tree pruning branches that reflect noise or outliers .
convertible to simple and easy to understand classification rules . to get the possible split
values . Define Gini Index. Can be modified for continuousvalued attributes . State the two
approaches to avoid overfitting. All attributes are assumed to be categorical b. such as
clustering. Prepruning Halt tree construction earlydo not split a node if this would result in the
goodness measure falling below a threshold b. Postpruning Remove branches from a fully
grown treeget a sequence of progressively pruned trees decide which is the best pruned tree
. Why decision tree induction in data mining a. a. relatively faster learning speed than other
classification methods b.gain . What is meant by Information gain a. valued need other tools.
Incremental Each training example can incrementally increase/decrease the probability that
a hypothesis is correct. they can provide a standard of optimal decision making against
which other methods can be measured . The target function could be discrete. Prior
knowledge can be combined with observed data. All instances correspond to points in the nD
space. . Why we need Bayesian Classification . can use SQL queries for accessing
databases d.or real.c. posteriori probability of a hypothesis h. The nearest neighbor are
defined in terms of Euclidean distance. comparable classification accuracy with other
methods . . What is meant by KNearest Neighbor Algorithm . PhD follows the Bayes theorem
PCX PXCPC / PX C such that PXCPC is maximum .valued. Standard Even when Bayesian
methods are computationally intractable. . Probabilistic prediction Predict multiple
hypotheses. . CCC Given training data D. Probabilistic learning Calculate explicit
probabilities for hypothesis. weighted by their probabilities . among the most practical
approaches to certain types of learning problems .
g. Define case based reasoning approach. and problem solving . An initial population is
created consisting of randomly generated rules e.. a. State the functionality of Rough set
Approach. construct a model . o approximately or roughly define equivalent classes be in C
and an upper approximation cannot be described as not belonging to C the minimal subsets
reducts of attributes for feature reduction is NPhard but a discernibility matrix is used to
reduce the computation intensity . a new population is formed to consists of the fittest rules
and their offsprings . The fitness of a rule is represented by its classification accuracy on a
set of training examples .. Tight coupling between case retrieval. the kNN returns the most
common value among the k training examples nearest to xq. Define Prediction with
classification. Multiple retrieved cases may be combined c.g. For discretevalued. i Prediction
is similar to classification a. GA based on an analogy to biological evolution . IF A and Not A
then C can be encoded as . Instances represented by rich symbolic descriptions e. Based on
the notion of survival of the fittest. function graphs b. knowledgebased reasoning. Offsprings
are generated by crossover and mutation . First. . Each rule is represented by a string of bits
. What is the role of Genetic Algorithm ..
How we can estimate error rates . Many nonlinear functions can be transformed into the
above. divide the data set into k subsamples . test set/ . use k subsamples as training data
and one subsample as test data . Linear and multiple regression . Bootstrapping leaveoneout
. for small size data .kfold crossvalidation .g. for data set with moderate size . Partition
Trainingandtesting . Y. used for data set with large number of samples . What are the types
of Prediction Linear regression Y a b X a. . . X. b. use two independent data sets.
Classification refers to predict categorical class label b. Loglinear models . Multiple
regression Y b b X b X. Two parameters . X. Nonlinear regression ii Prediction is different
from classification a. Major method for prediction is regression . Second.. using the least
squares criterion to the known values of Y. use model to predict unknown value i. Prediction
models continuousvalued functions .b. a and b specify the line and are to be estimated by
using the data at hand. Crossvalidation . e. training set /.
b. Probability pa. What is meant by Boosting .The multiway table of joint probabilities is
approximated by a product of lowerorder tables. i. detect spatial clusters and explain them in
spatial data mining iii. Spatial Data Analysis a. Boosting increases classification accuracy
Applicable to decision trees or Bayesian classifier . State the role of cluster Analysis. Similar
to one another within the same cluster d. Give the applications of clustering. Image
Processing iv. where each classifier in the series pays more attention to the examples
misclassified by its predecessor . Boosting requires only linear time and constant space .
create thematic maps in GIS by clustering feature spaces b. d aab baccad dbcd . Pattern
Recognition ii. Economic Science especially market research v. Dissimilar to the objects in
other clusters Cluster analysis e. Grouping a set of data objects into clusters Clustering is
unsupervised classification no predefined classes . c. WWW . Cluster a collection of data
objects c. Learn a series of classifiers.
What is the requirement for clustering in data mining Scalability Ability to deal with different
types of attributes Discovery of clusters with arbitrary shape Minimal requirements for
domain knowledge to determine input parameters Able to deal with noise and outliers
Insensitive to order of input records High dimensionality Incorporation of userspecified
constraints Interpretability and usability . Hierarchy algorithms Create a hierarchical
decomposition of the set of data or objects using some criterion . What are the algorithms
used for clustering . What are outliers ssimilar from the remainder of the data . Partitioning
algorithms Construct various partitions and then evaluate them by some criterion . Cluster
Web log data to discover groups of similar access patterns . Document classification g.
Gridbased based on a multiplelevel granularity structure . Densitybased based on
connectivity and density functions . Modelbased A model is hypothesized for each of the
clusters and the idea is to find the best fit of that model to each other .f.
Describe the working of PAM Partioning Around Medoids algorithm. . Unit . Marks . Which
attribute is said to be set valued attribute level concepts e set. Plan mining extraction of
important or significant generalized sequential patterns from a planbase a large collection of
plans . Briefly outline the major steps of decision tree classification. or the weighted average
for numerical data videogames . Discuss Bayesian classification with its theorem What is
prediction Explain about various prediction techniques. How we classified Sequence value
Attribute valued attributes except that the order of the elements in the sequence should be
observed in the generalization . Explain the measure of attributes in decision tree induction
and outline the major steps involved in it . such as the number of elements in the set.
Discuss the different types of clustering methods. Define Plan mining. the types or value
ranges in the set. . . . .
. and time of creation intensive if performed manually . What is meant by Spatial trend
analysis ial dimension distance from an ocean . size. Online aggregation collect and store
pointers to spatial objects in a spatial data cube o expensive and slow. Define
Descriptionbased retrieval systems and perform object retrieval based on image
descriptions. Give the methods for computing spatial data cube. need efficient aggregation
techniques Precompute and store all the possible combinations o huge space overhead
Precompute and store rough approximations in a spatial data cube o accuracy tradeoff . such
as keywords. What are the steps needed to make spatial association Twostep mining of
spatial association tree for rough estimation minsupport . captions.
Discuss some of the application using data mining system . Explain mining WWW process.
Explain the concept involved in multimedia database. Describe how multidimensional
analysis performed in data mining system. Give some examples for text based database and
explain how it is implemented using datamining system. List the descriptors present in
multidimensional analysis.. . Describe the trends that cover data mining System in detail .
How spatial database helpful in data mining system . . visual characteristic edge layout
vector . . . Explain the ways in which descriptive mining of complex data objects is identified
with an example . What is the requirement for maintaining Timeseries database series
components rregular Marks . State the process involved in Contentbased retrieval systems
and wavelet transforms .
W WZSWVZWSZSXWZaTWXVZUbSaWZWSUSTaWVWWZW
YWZWSSZSZXWSUSTaWWbScYYWZWSW W WZSWVZWW
WZSZWXYWZWSSZWYWbWVWbWS WYWZWSWVWSZSUUaaSZYWUaZ
WWZSZWZWSUZ SVaWbWTVZY bZY SZYZ aWUSTbaSSZWWZSZ
SWWVXXWWZUWTWcWWZSSUWSZSZV.
S YWWZSZXVSSaSSZSaWWbWXSTSUZ ZWSUbWVZYbZYUZYSZVVUZY XXWWZUW
aSWVVWWVWbWSUSZ WZZWWbSZUWSZSSZVSZZYcWZWWSWSZWWbSZVWZZ
USWVZYZVWZZSZVWSaW ZSUSUSSUWSZVSSVWZSZS SWSZTSZS SSWWWZWVcST
ZWWZVXWTSWSWXSZVVaSWWWWYXWT WWVSZSWVTSZWcZWT
WcZWaVWWTWWZV ZaSZV Sa WXZW YSZS SVSXTSUSUSUSVWUZ WaWZUYS
aZbSSWYSUSWV ZXSWXWUSZYWSWXWUWUaZXWaWZUWXWUSWWWZZ
WYbWZVSS .
SWSZTaSZW SSXWVSSScZYWaWSWTWbWSTWSbSZVaZaaS UUaWZUW aSZWZXSZ
SVSSVSSWVZZUWSZYVWXZVUSWSSSW VSSSWTWcWaSWbSaW WXZWaSZWaSZW
SWaSZWXZWaZbSSWVTaZSYSZWUWZVZYaSZWX SZW
cWaWbWccWWWWSXZYZYXZWVTaZSZW WXZWUSW
bVWSXSTbSSWVSSWWUaWXZaWWU
SUSXbSaWWSWVSSSXUVZSWSZVWVSZZWSZW bWWVWXZZX WabW
VVSUabWSUSWZVWbVWTWWWUWZXWSWZX VWWZVWZUW
WUabWXWVTWZYcSSWWSZYSSWWSZVWVWYWWXW ZSSSWXWVTWWYWZ
ccWUSZWSaWWVWZXVSS aSWaWSZVT aSW WUWZW WUWZW XXW ZWaSWSZYW
bWZaTWaSZ S aWZVXWTSWWaSWWVSZSWVcWSZVaW ZVbVaS b.
aWaaSSbSaWYWcWSZ .
SSZUWSZVSZVSVVWbSZ SSZUW SYWTSUUSSTWUaSZ
SZVSVVWbSZWaSWXbSSZUW S SZSSZSSWUWZY SVSSUWSZZY
SZVWSZbSaWUZaWaWVXVSSUWSZZY cTaW.
WZWV ZVaUZWWZWVSZcSZWSW VcWWUWWVSSSZcVSSWUWZYWUZaWUSZbWW
aSXWVSS aSZVVWUTWWbWXWUXZYSVSSZZYS
WUTWcUZUWWSUWSZVVSSYWZWSSZSWaWXaZVSS ZZY Z SSWWUSZXUSZaWZZY
ZSWVSSSZSUSWZYUSSYVWYZWSVWSZSUaWZY USXUSZWU
SWWaWWSaWXXZVZYSUSZ ZaTSTSSSZSUZUZSZ
ZUZXVWZUWUUZVZSTSTSSSZSUZSbZY SUZSZ .
SSWWVXXWWZcSXZVSUSZ WSZbaSZSbWSUSZSWVZWWXbSaWSZVWV ZTa WbW Ta
Ta ZW ZSYW ZUW Ta ZYWVWZZbaWVWZZSSUSZWWWTbW ZYWWbWbaWWbWSZS
ZSTSZVXTWWSWSUSWVccSTSZVXVSW UaZZYaXUSZVVSWSTW
ZWSZaTWXUSZVVSWUSZTWbWaYW Z.
ZWSZSUZSUZSZSZUSZVVSW bWW WVXZVaXUSZVVSW ZSZVVSWWWSWWVZSSWW
Z WSXZVWXSWWUZSZSXWWSZVUaZ Z ZWZVWUZSZSSSTW
ZaTWXaZUZXZVSWUSZVVSWUZSZWVZSSZSUZ W WV bWXXUWZU
STSWVWWUaZZYWWcWUWZVZYSZYTaUWUaZTWc WWVUSZZTWXWaWZ
SZSUZWVaUZSZSUZSVWZUZSZSZXWaWZWWaWWZ aTWaWZUSZ
SZZYZWWSWZSXWaWZZaTWXWaWZZSWSZWX WSZX
SZYZZYZSaTWXYbWZVSScWaWV SWVVWWZW WUWWZW
ZSUWWUaZZYSVVZWcUSZVVSWWWZcWZSXWaTWSW WSWVTWXWaWZ .
SWSVbSZSYWXWWaUaW WWZW ZZWbWTWSSZYSWZXSZSZSUZ
ZWWbWUWWZXSZXXWaWZSWZZZY SUZW ZWVaUWWWbSZZXSZZXWaWZWSWYZW
ZXWaWZUVWUWZVZYVWZYWXWaWZWSWWWTWSWV
ZZWbWTWSYWSZWYZSVSSTSWXZUaZZVWZSZVUaZ
bWWWVXZZXXWaWZSWZaZYWWaUaW
ZWSUWUZaUUZVZSSWZTSWSZVWZUZVZSWW
ZWWSWUWZWSUZWcUWSWVUZVZSWW
ZZWWaZYWWWUZSZZZWSZYWScYWZWSWS WUTZSZXaTSWSUXcUSXWaWZSWZ
WSWZWWW ZaUUZVZSSWZTSWXWSUZVWZWWW
ZaUUZVZSWWXWSUUZVZSSWZTSW
WUabWZWUZVZSWWSZVYcXWaWZSWZTSZWVXS
XWUZVZSWWUZSZSZYWSWZaWSWSWSWZ SWSZTVWZW
SZXWaWZWSSWTWXWaWZSWZSUZSZSUSZTWTSZWVT XcZYS ZVWZSZYXS
WSVZWWWWSVW WXZWWXSW .
USUaSWWXWaWZSWZXSZVWSZSSZWWXaTSXSZ
ZWWVTWSUUaaSWVSZVXWaWZUUaZaVUSWSWUaZSZVWS SWZUWXXWaWZSWZYc
SWZYcW Z WSTWSXWaWZWWZTWS UZVZSSWZTSWSZVTTWSZWWZ
WZSTSXWaWZWWZXXTXWaWZZ WaWZSWZ cS
ZYcSZVWXSYZaVWXSWSZSZVSXSWSZWWWUZ ZUSZVVSWYWZWSZZUSZVVSWW
ZWUSUVSSaUaW ZZSWWWSWVVSSTSWUSZ ZSUWSZUaZZYSZVWWTaVZY SWSZT
UWTWYaW aWSYYWYSWbWZWSWXSTaWZXWcWSYYWYSW bSaWSTbWUWSZWV
WXZW aW WbWUSZaW WXWZXWSU WSWcWWbWSWWWUWVSbWcWa
aWWYSVZYWWSSSWWbWUaVTWaWaWXa
SZSUZVSSTSWUSZTWWZUVWVTSWVZVWZZSZVWbW WUSZWWSWVaWbWZZY
SWSZTZXa ZXaWSWZaaXSWbW .
Z.
ZWZaaWVZWWVWSZWWWUZSZZYSZWcW SZUWVZSbWZaa Z
cWWbWWVZUUaSXWaWZ XaWV ZYRcWbWSUSZ ZcRYWZWSWSZYWbWSUSZ
SVaWSZTWVaUWVa WVaUWVZaaScWWbW WWSW WSUSWYW S
WbWTWbWZVWWZVWZ T WbWUXWZYTWW U WbWUXWZYTZYWW
VZWVWbWUXWZYTZYWW YWbWWXZWWZaSTWXWVaUWVa
ZZYWSUSZTWWWZbWUWSXZWaY SVWWWVcaSWTWWXZWWZ
SWXaZUZSXaWWZZY WWbWSWbWSZcWScSbWXSWWTaZSXSWZWYSbWW WXZWc
aWZZY SaYUWSWSaWWUbWSYW WZSWWZbWSYZSaTSZSWVaUWVUSZVVSWW .
SUSWYUSSZVaSZSbWTaW SWYUSTaW XZWZaTWXTWbSaWZVWZYSZYbSaW
aSZSbWTaW ZaWUUVWZYSZYbSaW SWSZX Z.
ZaSZSbWSTaWZ Z.
Z STaWZ ZZSWZSbW ZYVTSWV WaVWTZZZY UaWZYTSWVZSWSaWXSSUWWZW
WXZWaSZSbWSUSZaW aSZSbWSTaWSWVZSUSVUWWVZTZTSWVZWVTaZXW VSS
SWSZUWTSWVSUSZaW ZSVZSUVUWSZUWSUZVWWVSZUWTWcWWZVSSZ
bWWcWZZYXSSSUSZ XaW SZ W aYSSUaSZSSXW ZY WWXaYWSZ W
WSWVSSSYSWXZWWZ ZWTWUcUSbWSWVWaYSSSUSZWZWSZ Z a .
S UaWXcZYZVWS O O O O USZ ZZY a ZXVWZUW aWWSaW
SZcZZYcTWVZWZXWaWZWWcSZWSW WUTWZSZVaZWWZY
UaWSSUWXZZYVSSTSWaVWZZSSUSZaWX SZSUZSVSSTSW bWaSTWWSW
VSSTSWSXaSZSUZ WZ a SW SZVZ UZX W TaY ZVSXWaWZWWaZYSZVYcWWUbW
SZYSUSZaWSUZYWXcZY WSaW SZSUZTaW TaW TaW WWaWW WU
SZWWVbWWXXUWZU ZaUWWWXYbWZSZSUZ .
aZ SWXaZUZSXSXUSZUW SWVUUSWYUSUSSTW WaWZ WW XUS XUST XT UT XUS
TUSXWVSSUZaUSVWTSWVZWSZZYWSZVWbSaWUSSTWZS
USXZYSTaWSZVaWZUSXZYZWcVSS bWWWXWVUZZVSSZZY
VWUZZaabSaWVXaZUZWWVUaZZcZZYbSaW WUSUSZXUSXUSZSZVWVUZ UWVSbS
SYWSWZY WVUSVSYZ WSWZWXXWUbWZWSZS WXZWaWbWVWSZZYUSXUSZ
SaWbZWSZZYVSSTWbSZWSaWWZWUSWSUUSZWVTSTW ZVUSZYWUSXWTWbSZ .
TWcVSSUSXWVTSWVZWSZZYW SWSZTZaWbWVWSZZYUaWZY
SWUSSTWXSZZYVSSaZZcZ T bWZSWXWSaWWZTWbSZWUcWSXWSTZYWWWZUWX
USWUaWZWVSS SWUWZbbWVZSSWSSZ SSUWSZZY
WUWVSSZVWWVaUWZWSZVSZVWZYbSaW WWbSZUWSZSXWSaWWWUZ
WbWWWWbSZWVaZVSZSTaW SSSZXSZ WZWSWSZVZSWVSS
ccWUSZWbSaSWUSXUSZWV WVUbWSUUaSU WWVSZVUSST WUZaUWVW WaWWVW
TaZW SZVZYZWSZVZYbSaW USST WXXUWZUZVWVWZVSSTSW ZWWST
aZVWSZVZYSZVZYbVWVTWVW VZWXaW .
VWUZWWW USUZWXUSXUSZaW SWSZTWUZWW XcUSWWWaUaW
ZWZSZVWVWZWSWZSZSTaW SZUWWWZSZaUWXWW
WSXZVWWWWZUSSTWUSVTaZ SSWW SWZbbWVZWUZWW SWWUZaUZ
SSWSZZYWSWSWSW SZWSWWUabWTSWVZWWUWVSTaW TWWaZZY
VWZXSZVWbWTSZUWSWXWUZWaW SWUZVZWSZZY
SSWXSYbWZZVWTWZYWSWUS TWWSWZWSZZYSTaWXXaWSZZYSbZYWWVX
USXZYWWSX UWWSWZSWWX SWWXaZUZSX WWVY
WWUZaUWVZSVcZWUabWVbVWSZVUZaWSZZW SSWSZZYWSWSWSW
TaWSWUSWYUSXUZZaabSaWVWSWVUWWVZSVbSZUW .
SWSWSZWVWUabWTSWVZWWUWVSTaW
WSTaWSWWWUWVZWTSXSWaUSUSWSaWWYZXSZ YSZ SWSZT ZXSZYSZ
SSTaWSWSaWVTWUSWYUS TSZTWVXWVXUZZaabSaWVSTaW WXZW Z ZVW
STaWSWSaWVUZZaabSaWV aWWWWWbWSTWbSaWXWSUSTaW
SZWWVWaUSUaWZYYWWTWbSaW SZTWVXWVXUSWYUSSTaW
SWWcSSUWSbVbWXZY SWaZZY SWWUZaUZWSVZSZVWXcaVWaZW
YVZWWSaWXSZYTWcSWV XXUaUWSZSSWWV
TaZZYWbWTSZUWXSXaYcZWWYWSWaWZUWXYWbW aZWVWW
WSWXVSSVXXWWZXWSZZYVSSVWUVWcUWTWaZWVWW VWUZWWZVaUZZVSSZZY
SWSbWXSWWSZZYWWVSZWUSXUSZWV TUZbWTWWSZVWSaZVWSZVUSXUSZaW .
UUSZaW aWWXSUUWZYVSSTSW VUSSTWUSXUSZSUUaSUcWWV
cWZWWVSWSZSXUSZ TSTUWSZZYSUaSWWUTSTWXWSZYW
SUUSSSUWUWSZWXWSZZYTW
ZUWWZSSUSZZYWSWUSZZUWWZSZUWSWVWUWSWWTSTS
SWUWUZcWVYWUSZTWUTZWVcTWbWVVSS
TSTUWVUZWVUaWWWcWYWVTWTSTW
SZVSVbWZcWZSWSZWVSWUaSZSZSUSTWWUSZbVWS
SZVSVXSVWUZSZYSYSZcUWWVUSZTWWSaWV bWZSZZYVSSWTSTXSWXcWSWWW
SWWW UZSZXSUSW WSbWXWXUSSW aUSSa aUSSa TWUaZYaZXWSTW SWSZT
WSWWYTY ZSZUWUWZVZZWZSUW WZWSWZWYTSWVWXZWVZWXaUVWSZVSZUW
WSYWXaZUZUaVTWVUWWWSbSaWV .
VUWWbSaWVWWaZWUZbSaWSZYWSZZYWSW ZWSW WXZWUSWTSWVWSZZYSSU
S ZSZUWWWWZWVTUTUVWUZWYXaZUZYS T aWWWbWVUSWSTWUTZWV
UYUaZYTWcWWZUSWWWbSZcWVYWTSWVWSZZYSZVTWbZY SWWX WZWUY
TSWVZSZSZSYTYUSWbaZ SUaWWWWZWVTSZYXT
ZZSaSZUWSWVUZZYXSZVYWZWSWVaW WY SZV WZ USZTWWZUVWVS
SWVZWZZXabbSXWXWSZWcaSZXWVUZXW XWaWSZVWXXZY
WXZWXSaWWWWZWVTUSXUSZSUUaSUZSWXSZZYWSW .
XXZYSWYWZWSWVTUbWSZVaSZ SWWXaZUZSXaYWSU
aYWSWaWVSSWaYVWXZWWabSWZUSW aYWXSYbWZUSSSWVTcWScWSSZUWSZ
TWZSZVSZaWSSZUSZZTWVWUTWVSZTWZYZY
ZVZYWZSaTWWVaUXSTaWXXWSaWWVaUZSVTaS VUWZTSaWVWVaUWWUaSZZWZ
WXZWWVUZcUSXUSZ WVUZSUSXUSZ SUZaUSVW .
TWUZVaWVWWVUaZZcZbSaW SWVXWVUZWYWZ ZWSSZVaWWYWZ ZZWSWYWZ
WVUZVXXWWZXUSXUSZ SSXUSZWXWWVUUSWYUSUSSTW
TWVUZVWUZZaabSaWVXaZUZ ccWUSZWSWWSW SZSZZYSZVWZY
aWcZVWWZVWZVSSWWYSZZYW WW aWVXVSSWcSYWZaTWXSW bSVSZ
VbVWWVSSWZaTSW aW aTSWSSZZYVSSSZVZWaTSWSWVSSXVUbSVSZ
XVSSWcVWSWW SZYWSbWZWa XSWVSS SSWWWXWVUZ ZWSWYWZ S T
ScSSWWSSZVTWUXWZWSZVSWTWWSWVTaZYWVSSSSZV
TaZYWWSaSWUWZWZcZbSaWX aWWYWZ T T T
SZZZZWSXaZUZUSZTWSZXWVZWSTbW YZWSVW .
WacSSTWXZTSTWSSWVTSVaUXcWVWSTW TSTSTUV SSTTSUUSVVTUV SWSZTZY
ZYZUWSWUSXUSZSUUaSU USTWVWUZWWSWSZUSXW
WSZSWWXUSXWcWWWSUUSXWZWWWSWSWZZW WSWUSXWVTWVWUW
ZYWaWZZWSWSZVUZSZSUW SWWWXUaWZS aWSUWUZXVSSTWU
USZWSZWcZWSWUaW VSWTWUZWUaW aWSZS W aZYSWXVSSTWUZUaW
aWZYaZaWbWVUSXUSZZWVWXZWVUSW bWWSUSZXUaWZY SWZWUYZZ SSSSZS
SUWSWWSUSZ TUaWZYXWSaWSUW TVWWUSSUaWSZVWSZWZSSVSSZZY
SYWUWZY bUZUUWZUWWWUSSWWWSU b .
XUaWZUSXUSZ YaWWTYVSSVUbWYaXSSUUWSWZ SWWaWWZXUaWZYZVSSZZY
USST TVWScVXXWWZWXSTaW UbWXUaWcSTSSW
ZSWaWWZXVSZZcWVYWVWWZWZaSSWW TWVWScZWSZVaW ZWZbWVWXZaWUV
YVWZZS ZUSZXaWWUXWVUZSZ ZWWSTSZVaST SSWWSYaWVXUaWZY
SZZYSYZaUbSaSZSZVWZWbSaSWWTW UWZ
WSUSYWSWSWSUUSVWUZXWWXVSSTWUaZY WUWZ
WZTSWVTSWVZUZZWUbSZVVWZXaZUZ VTSWVTSWVZSaWWbWYSZaSaUaW
VWTSWVVWWWVXWSUXWUaWSZVWVWSXZVWTW XXSVWWSUW SSWaW
WWXTWUSWUZVWSTVSXWWSZVWXWVSS SW USW VZSZW W .
S UaSWSZUSXUSZcWW SWVUZSZSTabSaWVUZWUZaW
WXaZWWSWXVWUZWWUSXUSZ UaWVXXWWZWXUaWZYWV WUTWWcZYX SZZYaZV
WVVSY SZWWSaWXSTaWZVWUZWWZVaUZSZVaZWWSW ZbbWVZ Z
USTaWSVTWWbSaWVSTaW WZWSSZXWSUbSaWZWWZUWZVZYYWWbWUZUW
WbSZXWYWZWSTWSbXWWaUSWZaTWXWWWZZWWW
WbSaWSZYWZWWWcWYWVSbWSYWXZaWUSVSS YTT WZZUWUWbZZZWZV
YSWYWZWSWaU bVW YSW ccWUSXWVWaWZUWbSaWTaW
SWSWbSaWVSTaWWUWSWVWXWWWWZZWWaWZUWaVTW TWbWVZWYWZWSSZ
WXZWSZZZY SZZZYWSUZXSZYZXUSZYWZWSWVWaWZSSWZXS
SZTSWSSYWUWUZXSZ YUbWSbWSWZZSZSXYVSSTSW
XZVYZXUSZSWZXWWaWZUWXSUZZWWSXSaTW .
bWWWVXUaZYSSVSSUaTW .
ZZWSYYWYSZUWUSZVWZWSSTWUZSSSVSSUaTW
WWZbWSZVcZWWVWXXUWZSYYWYSZWUZaW WUaWSZVWSWTWUTZSZ
aYWSUWbWWSV WUaWSZVWaYSSZZSSSVSSUaTW SUUaSUSVWXX SSWW
WZWWVWVSWSSSUSZ cWZZYXSSSUSZ W aYSSUaSZSSXW ZY WWXaYWSZ W
WSWVSSSYSWXZWWZ ZWTWUcUSbWSWVWaYSSSUSZWZWSZ Z a
SWSZTSSWZVSZS WWUUSZYWSZVWZVSZYSSSVWZZ
aVWWZVXZZSSSSVSSUSZYZYcSUW SW.
TWbWWWZVXUSZYWXWUSWbWYWSZcWZUWSZY VSZUWXSZUWSZ
WXZWWUZTSWVWWbSW aVZVUWSZVWXTWUWWbSTSWVZSYWVWUZaUSWcV
USZWSZVWXUWSZ STZWZbWXWXWVSZaS WaSWUSXaSXSaSWV .
SWWUWZbbWVZZWZTSWVWWbSW aWWbSTSWVZWSYWUZWZaUSUYSWaWSWTWU
SZVcSbWWSZX WVWUWWZZaVWZZSSZS WSaWVWUSWXbWUXWSUbaSUSSUWU
bWUUZSZWUYS WaWZbWUXbWUUWZV .
WaWZ.
WZSZbWUXbWWVYWWZSZUWZV SaVWUUZSZSUSabWUSZVSZWVYWSabWU
SWWaWWZXSZSZZYWWWVSSTSW ZXWaWZUWXbSaWWbWZUSZYZYcW
SSWUVWVSWYaSZWbS SSUWUWWWUZWZ WZVUUWWSZSWYaS S
bWWWSWXWTSWVVSSTSWSZVWSZcWWZWVaZY VSSZZYW SZZZYUW
UaWXWSUSZaZYVSSZZYW WUTWWWZVSUbWVSSZZYWZVWS
WUTWcaVWZZSSZSWXWVZVSSZZYW
SZWcSZcUVWUbWZZYXUWVSSTWUVWZXWVcSZ WSW cSSVSSTSWWXaZVSSZZYW
SZWUZUWZbbWVZaWVSVSSTSW .