Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Paper Presentation Authors: Xiaofang Li Changzhou Institute Of Technology Changzhou, China Guided By : Prof. Meiliu Lu Presenting: Pranavi Appana Neelam Baviskar Pallavi Vardhamane Yingchi Mao Hohai University Nanjing, China Agenda ● Background and Problem Statement ● Challenges and Problem Solution ● Improvised ETL Framework ● Dynamic Mirror Replication Technology ● Performance Evaluation ● Opinion ● Project Proposal Background & Problem Definition Why to use Real time data ware house? The load cycle of traditional data warehouse is fix and longer, which cannot timely response the rapid data change. Whereas Real-time data warehouse can capture the rapid data change and process the realtime data. Problem statements : 1. 2. To get real-time data access without the processing delay with the real-time data warehouse To avoid the Query contention between OLAP queries and OLTP updates Challenges and Problem Solutions Challenges - Enabling real-time ETL - Data aggregation operation not synchronized with the real-time data Solutions - Improvised ETL framework - Dynamic mirror replication technology Improvised ETL framework Fig 1: The pre-processing framework for real-time data warehouse Dynamic mirror replication technology Dynamic mirror creation and allocation - Creation of mirror files and initiate bucket link Dynamic mirror release - Load data into warehouse and release DSA The procedure of query processing - Retrieve the data image in the dynamic data storage based on the obtained data_id and perform processing Performance Evaluation Experiment Settings - The OLAP query response time in different update interval Experimental Results - The OLAP query response time in different size of DSA. The query response time in different update interval Our opinion on the research paper We agree and confirm with the solutions suggested by author for enabling Real time ETL and Data/Query Contention problem. We suggest additional solution of using MetaMatrix with DataMigrator. This will help solving above problems and improve Query efficiency References Xiaofang Li, Yingchi Ma. Real-Time Data ETL Framework for Big Real- Time Data Analysis, Information and Automation, 2015 IEEE International Conference (1289-1294), August 2015 http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7279485 Team : Pranavi Appana Neelam Baviskar Pallavi Vardhamane Spring 2016 Agenda Background and Motivation Purpose and Scope Queries Objectives Resources Schedule References Background and Motivation Dataset : https://bythenumbers.sco.ca.gov/browse?utf8=%E2%9C%93&page=1 A list of relevant financial reports have been provided by the Government of California in the above dataset. This dataset has details of the Expenditures, Revenues and State Income of all the departments generated in the form of fees, penalties and taxes. Purpose and Scope : DW and DM Purpose: Develop a tool/web application for the state employee's and public use to give important financial information on the government’s funding and income. Scope : Multiple relevant datasets are available for billions of data. We are trying to limit the scope for customer specific requirements like city, county, departments, yearly dataset for financial data. Queries 1. What is the County wise and City wise State Income? 2. What is the Business category under which this state income(taxes, fee) was generated? 3. Which sub-departments are responsible for the maximum income collection? 4. Determine the expenditures for a particular department. Example: sewage, water, safety, public departments 5. Estimate and determine which cities/counties are in profit or running in debt due to high expenses. Objectives Analyze, clean and prune the data. Create sample of data marts for different generic purposes to solve the problems. Design database schema. Design a data warehouse application. Load data to warehouse and perform user queries. Resources Data visualization - OffVis Database Development - MySQL, MariaDB Data warehouse - PHP, HTML5, CSS Data mining - Rapidminer, WEKA Schedule Week 1: Data analysis, cleaning and pruning. Designing Schema. Week 2: Creating Data mart samples. Designing warehouse application Week 3: Applying data mining. Applying Query processing. Week 4: Testing and Documentation. References California States Controller’s Office , Government Financial Reports, Datasets, https://bythenumbers.sco.ca.gov/browse?utf8=%E2%9C%93&page=1 This website gives consolidated information about the government expenditures in the state of California. Xiaofang Li, Yingchi Ma. Real-Time Data ETL Framework for Big Real-Time Data Analysis, Information and Automation, 2015 IEEE International Conference (1289-1294), August 2015 http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7279485 Thank You!