Download Data Warehousing-Cubing Algorithms

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Data Mining and Data WarehousingData Warehousing-Cubing Algorithms
Imagination
While computing data cubes, we came across a concept of iceberg cubes, which
satisfy the minimum threshold for materializing a data cuboid. Iceberg cubes are
the cubes which have only those cuboids which have at least a minimum of 'k'
support, where 'k' is a threshold. All cuboids of support less than 'k' are pruned,
thereby reducing the size of data cube.
This process is done to reduce the size of data cube without losing out on much of
the information. But, is there another use of having a threshold on support? Is
there some kind of a pattern in the data with high support? If so, how can we find
such patterns and where is it used?
Insights
Cube computation is a memory intensive operation. Thus, algorithms for
computing cubes should be memory efficient and also intelligently use the
precomputed values to avoid re-computation of redundant parts of data cube.
Cubing algorithms usually follow bottom-up or top-down approach for computing
cubes. Bottom-up cubing algorithms use the base cuboid and perform
aggregations on different attributes to generate higher level cuboids, thereby
requiring only cuboids of previous level for computing higher level cuboids. Topdown cubing algorithms, on the other hand, start from apex cuboid and use
iceberg conditions to avoid constructing cuboids of support less than a threshold,
thereby, avoiding useless computation. There are some hybrid cubing algorithms
which combine both top-down and bottom-up approaches for efficient cube
computation.
Glossary


Iceberg cube: A cube which consists of cells which satisfy a certain Apriori
condition.
Materialization: The methodology of precomputing cube cells before applying
the cube construction algorithms.



BUC: A recursive bottom up method for computing the ROLAP data cube.
Multi array aggregation: A chunking based method for computing the MOLAP
data cube. Also referred to as Top down cubing.
Star cubing: An algorithm which integrates the advantages of both top down
and bottom up cubing.
Resources




Iceberg cube and Definitions PPT (For your convenience you can get them
inside Learn More Quadrant)
Cubing Heuristics PPT (For your convenience you can get them inside Learn
More Quadrant)
Materialization and Cubing algorithms PPT (For your convenience you can get
them inside Learn More Quadrant)
JIT lecture on Multi array cubing PPT (For your convenience you can get them
inside Learn More Quadrant)
References








http://www.olapreport.com
http://www.olapreport.com/Market.htm
http://www.bvicam.ac.in/news/INDIACom%202011/292.pdf
http://www.vldb.org/conf/2003/papers/S15P02.pdf
http://slidewiki.org/deck/1564_star-cubing#tree-0-deck-1564-1-view
http://cs.uiuc.edu/class/fa05/cs412/chaps/4.pdf
http://www2.cs.uregina.ca/~dbd/cs831/notes/dcubes/iceberg.html
Bache, K. &Lichman, M. (2013). UCI Machine Learning Repository [1]. Irvine,
CA: University of California, School of Information and Computer Science.