Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CUSTOMER_CODE SMUDE DIVISION_CODE SMUDE EVENT_CODE APR2016 ASSESSMENT_CODE MIT401_APR2016 QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 11570 QUESTION_TEXT List and explain the web content mining problems. SCHEME OF EVALUATION There are 5 problems. Each explanation carries 2 Marks QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 11572 QUESTION_TEXT Explain various characteristics of data warehouse? 1.Subject oriented 2.Integrated SCHEME OF EVALUATION 3.Non Volatile 4.Time variant (2.5 marks each)(10 marks) QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 72809 QUESTION_TEXT Explain binning methods for data smoothing. Give sorted data for price (in dollars)=4, 8, 15, 21, 21, 24, 25, 28, 34. SCHEME OF EVALUATION Partition into (equidepth) bins: Bin 1: 4, 8, 15 Bin 2: 21, 21, 24 Bin 3: 25, 28, 34 Smoothing by bin means: Bin 1: 9, 9, 9 Bin 2: 22, 22, 22 Bin 3: 29, 29, 29 Smoothing by bin boundaries: Bin 1: 4, 4, 15 Bin 2: 21, 21, 24 Bin 3: 25, 25, 34 (10 marks) QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 72811 QUESTION_TEXT Explain major Source Data Components of Data Warehouse Architecture. SCHEME OF EVALUATION Source Data Component a. Production Data: (2.5 marks) This category of data comes from the various operational systems of the enterprise. The significant and disturbing characteristic of production data is disparity. Great challenge is to standardize and transform the disparate data from the various production systems, convert the data, and integrate the pieces into useful data for storage in the Data Warehouse. b. Internal Data: (2.5 marks) In every organization, users keep their “private” spreadsheets, documents, customer profiles, and sometimes even departmental databases. This is the internal data, parts of which could be useful for Data Warehouse for analysis. Internal data adds additional complexity to the process of transforming and integrating the data before it can be stored in the Data Warehouse. c. Archived Data: (2.5 marks) Operational systems are primarily intended to run the current business. In every operational system, periodically take the old data and store it in archived files. The circumstances in organization dictate how often and which portions of the operational databases are archived for storage. Some data is archived after a year. d. External Data: (2.5 marks) Most executives depend on data from external sources for a high percentage of the information they use. They use statistics relating to their industry produced by external agencies. They use market share data of competitors. They use standard values of financial indicators for their business to check on their performance. QUESTION_T DESCRIPTIVE_QUESTION YPE QUESTION_ID 72814 QUESTION_T Define Data Mining. Differentiate between Data Mining and DBMS. EXT Data Mining: It is the search for the relationships and global patterns that exist in large databases but are hidden among vast amounts of data, such as SCHEME OF EVALUATION relationship between patient data and their medical diagnosis. It is the process of discovering meaningful, new correlation patterns and trends by sifting through large amounts of stored in repositories, using pattern recognition techniques. (2 marks) DBMS VS Data Mining (8 marks) QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 126115 Explain the categories of web mining. QUESTION_TEXT SCHEME OF EVALUATION Web mining can be broadly divided into three catagories. a)Web content mining. b)Web structure mining. c)Web usage mining. a)Web content mining : web content mining targets the knowledge discovery in which the main objects are the traditional collections of multimedia document such as images , video and audio which are embedded in or linked to the web pages. Web content mining could be differentiated from two points of view: Agent based approach or database approach. The first approach aims on improving the information finding and filtering. The second approach aims on modeling the data on web into more structured form in order to apply standard database querying mechanism and datamining application analyze it. Web content mining problems and challenges are data/information extraction , web information integration, opinion extraction from online sources,knowledge synthesis, segmenting web pages and detecting noise . b) Web structure mining: this focuses on analysis of the link structure of the web and one of its purpose is to identify more preferable documents. The different objets are linked in some way. The appropriate handling f the links could lead to potential correlations and then improve the predictive accuracy of the learned models. The goal of the wb structure mining is to generate structural summary about the web site and web page. Based on the topology web structure mining will categorize the web pages and generate the information such as the similarity and relationship between different web sites. Web structure mining can also have another direction discovering the structure of web document itself. This type of structure mining can be used to reveal the structure of web pages. c) Web usage mining: this focuses on the techniques that could predict the behavior of users while they are interacting with the WWW. Web usage mining discover user navigation patterns from web data tries to discover the use full information from the secondary data derived from the interactions of the users while surfing on the web. Web usage mining collects the data from web log records to discover user access patterns of web pages .The insight knowledge could be utilized in personalization, system improvement, site modification, business intelligence and usage characterization. In general there are mainly 4 kinds of data mining techniques applied to the web mining domain to discover the user navigation pattern: Association rue mining, sequential pattern mining, clustering, classification.