Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CUSTOMER_CODE SMUDE DIVISION_CODE SMUDE EVENT_CODE APR2016 ASSESSMENT_CODE MCA5043_APR2016 QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 11570 QUESTION_TEXT List and explain the web content mining problems. SCHEME OF EVALUATION There are 5 problems. Each explanation carries 2 Marks QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 72812 QUESTION_TEXT What is Data Loading in Data Warehouse? Explain different types of Data Loading. SCHEME OF EVALUATION Data Loading implies physical movement of the data from the computer(s) storing the source database(s) to that which will store the data warehouse database, assuming it is different. (1 mark) Data Loading Types: Initial Load: (3 marks) Populating all the Data Warehouse tables for the very first time. Creation of indexes on initial loads or full refreshes requires special consideration. Index creation on mass loads can be too timeconsuming. So drop the indexes prior to the loads to make the loads go quicker. You may rebuild or regenerate the indexes when the loads are complete. Incremental Load: (3 marks) Applying ongoing changes as necessary in a periodic manner. These are the application of ongoing changes from the source systems. Changes to the source systems are always tied to specific times, irrespective of whether or not they are based on explicit time stamps in the source systems. Full Refresh: (3 marks) Completely erasing the contents of one or more tables and reloading with fresh data. This type of application of data involves periodically rewriting the entire Data Warehouse. Sometimes partial refreshes also requires rewriting only specific tables. Partial refreshes are rare because every dimension table is intricately tied to the fact table. As far as the data application modes are concerned, full refresh is similar to the initial load. However in the case of full refreshes, data exists in the target tables before incoming data is applied. QUESTION_T DESCRIPTIVE_QUESTION YPE QUESTION_ID 72814 QUESTION_T Define Data Mining. Differentiate between Data Mining and DBMS. EXT Data Mining: It is the search for the relationships and global patterns that exist in large databases but are hidden among vast amounts of data, such as relationship between patient data and their medical diagnosis. It is the process of discovering meaningful, new correlation patterns and trends by sifting through large amounts of stored in repositories, using pattern recognition techniques. (2 marks) DBMS VS Data Mining SCHEME OF EVALUATION (8 marks) QUESTION_T DESCRIPTIVE_QUESTION YPE QUESTION_I 126111 D QUESTION_T Explain the basic tasks involved in Data transformation. EXT Selection : This takes place at the beginning of the whole process of data transformation. You select either the whole records or parts of several records from the source systems. The task of selection usually forms part of the extraction function itself –2 Marks Splitting/Joining : This task includes the types of data manipulation you need to perform on the selected parts of source records. Sometimes you will be splitting the selected parts even further during data transformation. Joining of parts selected from many source systems is more widespread in the Data Warehouse environment 2 Marks Conversion : This is an all–inclusive task. It includes a large variety of rudimentary conversions of single fields for two primary reasons – one to SCHEME OF standardize among the data extraction from disparate source systems, and EVALUATIO the other to make the fields usable and understandable to the N users 2 Marks Summarization : Sometimes you may find that it is not feasible to keep data at the lowest level of detail in your Data Warehouse. It may be that none of your users ever need data at the lowest granularity for analysis or querying 2 Marks Enrichment : This task is the rearrangement and simplification of individual fields to make them more useful for the Data Warehouse environment. You may use one or more fields from the same input record to create a better view of the data for the Data Warehouse. This principle is extended when one or more fields originate from multiple records, resulting in a single field for the Data Warehouse 2 Marks QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 126114 QUESTION_TEXT Discuss the following data warehouse schema a. Star schema b. Snowflake schema a. Star schema (5 marks) b. Snowflake schema (5 marks) SCHEME OF EVALUATION QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 126115 Explain the categories of web mining. QUESTION_TEXT SCHEME OF EVALUATION Web mining can be broadly divided into three catagories. a)Web content mining. b)Web structure mining. c)Web usage mining. a)Web content mining : web content mining targets the knowledge discovery in which the main objects are the traditional collections of multimedia document such as images , video and audio which are embedded in or linked to the web pages. Web content mining could be differentiated from two points of view: Agent based approach or database approach. The first approach aims on improving the information finding and filtering. The second approach aims on modeling the data on web into more structured form in order to apply standard database querying mechanism and datamining application analyze it. Web content mining problems and challenges are data/information extraction , web information integration, opinion extraction from online sources,knowledge synthesis, segmenting web pages and detecting noise . b) Web structure mining: this focuses on analysis of the link structure of the web and one of its purpose is to identify more preferable documents. The different objets are linked in some way. The appropriate handling f the links could lead to potential correlations and then improve the predictive accuracy of the learned models. The goal of the wb structure mining is to generate structural summary about the web site and web page. Based on the topology web structure mining will categorize the web pages and generate the information such as the similarity and relationship between different web sites. Web structure mining can also have another direction discovering the structure of web document itself. This type of structure mining can be used to reveal the structure of web pages. c) Web usage mining: this focuses on the techniques that could predict the behavior of users while they are interacting with the WWW. Web usage mining discover user navigation patterns from web data tries to discover the use full information from the secondary data derived from the interactions of the users while surfing on the web. Web usage mining collects the data from web log records to discover user access patterns of web pages .The insight knowledge could be utilized in personalization, system improvement, site modification, business intelligence and usage characterization. In general there are mainly 4 kinds of data mining techniques applied to the web mining domain to discover the user navigation pattern: Association rue mining, sequential pattern mining, clustering, classification.