Download MSc in Bioinformatics 4 MBI403 ‑ DATA WAREHOUSING AND

PROGRAM MSc in Bioinformatics SEMESTER 4 SUBJECT MBI403 - DATA WAREHOUSING AND DATA MINING BOOK ID B1633 SESSION Winter 2015 No Q 1 Question/Answer key Marks Total Marks 10 Explain the Top-Down and Bottom-up data warehouse development methodologies. ( Unit 1 ; Section 1.5 ) A 1 Top-down and Bottom-up data warehouse development methodologies Despite the fact that Data Warehouses can be designed in a number of different ways, they all share a number of important characteristics. • Most Data Warehouses are Subject Oriented. This means that the information that is in the Data Warehouse is stored in a way that allows it to be connected to objects or event, which occur in reality. 10 • Another characteristic that is frequently seen in Data Warehouses is called Time Variant. • A time variant Data Warehouse will allow changes in the information to be monitored and recorded over time. • All the programs that are used by a particular institution will be stored in the Data Warehouse, and it will be integrated together Q 2 10 Explain the functionalities and advantages of data warehouses. ( Unit 1 ; Section 1.3 ) A 2 functionalities Data Warehouses provide the following functionality: • Roll-up: Data is summarized with increased generalization. 10 • Drill-down: Increasing levels of detail are revealed. • Pivot: Cross tabulation that is, rotation is performed. • Slice and Dice: Performing projection operations on the dimensions. • Sorting: Data is sorted by ordinal value. • Selection: Data is available by value or range. Ver : MScBI_1308 1 • A Data Warehouse provides a common data model for data, regardless of the data source. This makes it easier to report and analyze information than it would be if multiple data models from disparate sources were used to retrieve information such as sales invoices, order receipts, general ledger charges, etc. • Prior to loading data into the Data Warehouse inconsistencies are identified and resolved. This greatly simplifies reporting and analysis. • Information in the Data Warehouse is under the control of Data Warehouse users so that, even if the source system data is purged over time, the information in the warehouse can be stored safely for extended periods of time. • Because they are separate from operational systems, Data Warehouses provide fast retrieval of data without slowing down operational systems. • Data Warehouses facilitate Decision Support System applications such as trend reports Q 3 10 Describe Hyper cube and Multicube. ( Unit 6 ; Section 6.6 ) A 3 Describe Hyper Cube and Multicube. • Multidimensional databases can present their data to an application using two types of cubes: hypercubes and multicubes. • The Hypercube is the cube with four Dimensions. 10 • In the hypercube model, as shown in the following illustration, all data appears logically as a single cube. • This intuitive representation is a hypercube, a representation that accommodates more than three dimensions. • At a lower level of simplification, a Hypercube can very well accommodate three dimensions….. • Multicube: In the multicube model, data is segmented into a set of smaller cubes, each of which is composed of a subset of the available dimensions It means we can view the cube in multiple dimensions. Q 4 10 List and explain the strategies for data reduction. A 4 ( Unit 10 ; Section 10.5 ) The strategies for data reduction: 1) Date cube aggregation, where aggregation operations are applied to the data in the construction of a data cube. • 2) Dimension reduction, where irrelevant, weakly relevant, or redundant attributes or dimensions may be detected and removed. 10 • 3) Data compression, where encoding mechanisms are used to reduce the data set size. Ver : MScBI_1308 2 • 4) Numerosity reduction, where the data are replaced or estimated by alternative, smaller data representations such as a parametric models. • 5) Discretization and concept hierarchy generation, where raw data values for attributes are replaced by ranges or higher conceptual levels. Concept hierarchies allow the mining of data at multiple levels of abstraction and are a powerful tool for data mining. Q 5 10 Describe K-means method for clustering. List its advantages and drawbacks. ( Unit 12 ; Section 12.4 ) A 5 K-means method for clustering K-means is one of the simplest unsupervised learning algorithms that solve the well-known clustering problem. • The procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters) fixed a priori. 5 • The main idea is to define k centroids, one for each cluster. • The basic step of k-means clustering is simple. • In the beginning, we determine number of cluster K and we assume the centroid or center of these clusters. Advantages and drawbacks. • Advantages: • With a large number of variables, K-Means may be computationally faster than hierarchical clustering (if K is small). 5 • K-Means may produce tighter clusters than hierarchical clustering, especially if the clusters are globular. • Drawbacks: • It does not do well with overlapping clusters. • The clusters are easily pulled off-center by outliers. • Each record is either inside or outside of a given cluster. Q 6 10 Describe multilevel databases and web query systems ( Unit 13 ; Section 13.5 ) A 6 Ver : MScBI_1308 Multilevel Databases Several researchers have proposed a multilevel database approach to organizing Web-based information. • The main idea behind these proposals is that the lowest level of the database contains primitive semi-structured information stored in various web repositories, such as hypertext documents. 5 Web Query Systems 5 3 There have been many web-based query systems and languages developed recently that attempt to utilize standard database query languages such as SQL, structural information about web documents, and even natural language processing for accommodating the types of queries that are used in World Wide Web searches. Ver : MScBI_1308 4

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download MSc in Bioinformatics 4 MBI403 ‑ DATA WAREHOUSING AND