Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
DATA MINING AND DATA WAREHOUSING PRESENTED BY: R.V. Ravi Kiran Computer science engineering Email id:[email protected] Ph no:9440966469 P. Nagesh Computer science engineering (3/4 B.tech) Email id:[email protected] Ph no: 9701706236 S.k.chaitanya Computer science engineering (3/4 B.tech) Email id:[email protected] Ph no: 9491893712 Gayathri Vidya Parishad College Of Engineering Visakhapatnam. Both Data Mining and Data Warehousing are important in the present competitive market ABSTRACT: world with others. More applications like Data Mining is a concept that is taking off in the commercial sector as a means of finding useful information out data. While products environment Customer Retention, Marketing, Risk Assessment, Fraud detection and others. of gigabytes of for the commercial are starting to become available, tools for a scientific environment are much rarer (or even non-existent). Yet scientists have long had to search through reams of printouts and rooms full of tapes to find the gems that make up scientific discovery. This paper will explore some of the ad hoc methods generally used for Data Mining in the scientific community, including such things as scientific visualization, and some of the more recently outline how developed products used in the commercial environment FIG.DATA ANALYSIS INTRODUCTION can be adapted to scientific Data Mining In today’s fiercely competitive market Data Warehousing is a repository of place, companies have an insatiable need for data gathered from multiple sources stored information. Customer data, financial data and under a unified schema at a single site. In this Internet-click stream data is a powerful asset paper, we will discuss about the Data provided it can be integrated and utilized to Warehouse design using star and snowflake schemas. We are frequently using Star schema, it has more advantages over the other schemas. Snowflake schemas normalize dimensions to eliminate redundancy. enhance customer experiences. With centralization of the current trends in an organization’s data in large databases, particularly in a commercial environment, the process of extracting useful information has become more formalized and the term Data Mining has been coined for it. In one of the first papers on commercial Data Mining, Evangelos Simoudis of IBM defined it as: “The The ability to access meaningful process of extracting previously unknown, comprehensible and data, moving and sharing of data throughout an actionable information from large databases and organization between departments, officers and using it to make crucial business decisions” business partners in a timely efficient manner This definition has a definite business favor and through the use of familiar query and analytical much of IBM's development of Data Mining tools are critical. has been in this direction. In practice, Data Mining is a process which can take on different approaches depending on the type of data involved and the objectives desired. As this is still very much an evolving discipline, much work is being undertaken to determine standard processes for the varied environments. Further, as the context in which the data is gathered is often an important component, this must be factored into any analysis. FIG. HOW DATA IS SHARED DEF: A Database is a collection of nonredundant data which is sharable between different applications. WHAT IS DATAMINING? FIG.DATAMINING PROCESS Data Mining is defined as “the non-trivial extraction of implicit, previously unknown, potentially useful and understandable knowledge from data”. Data Mining is the process of finding correlations or patterns among dozens of fields in large relational databases. Latest Trends in Technologies and Methods LATEST TRENDS IN TECHNOLOGIES AND METHODS: FIG.DISTRIBUTED DATAMINING There are many number of Data Mining trends is in terms of technologies SPATIAL AND GEOGRAPHIC DATA MINING: and methodologies which are currently being developed and rehearsal. The trends identified include “The extraction of implicit knowledge, spatial relationships or other patterns not explicitly stored in spatial databases.” is known DISTRIBUTED/COLLECTIVE as spatial Data Mining. DATAMINING: The applications are useful in remote The information located in different places, in different physical locations is generally known as distributed Data Mining. Distributed Data Mining (DDM) is used to offer a different approach to traditional approaches analysis, by using a combination of localized data analysis, together with a “global data model”. sensing, medical, navigation, and related uses. FI FIG.SEQUENTIAL DATAMINING HYPERTEXT&HYPERMEDIA DATAMINING: Hypertext and Hypermedia Data Mining can be characterized as mining data which includes text, hyperlinks and text markups. FIG.SPATIAL DATAMINING TIME SERIES/SEQUENCE DATAMINING: Another important area in Data Mining centers on the mining of time series and sequence-based data. This involves the mining of sequence of data. Sequential pattern mining focuses on the identification of sequences. FIG.DATAMINING PHENOMENAL DATAMINING: Phenomenal Data mining focuses on the relationships between data and the phenomenon which are inferred from the data is not went WHAT IS DATA WAREHOUSING? well in data ware project. A single, complete and consistent store of data obtained from a variety APPLICATIONS OF of different sources made available to end users DATAMINING: in what they can understand and use in a Data Mining collects, stores and organizes data business context. for use in areas such as A data warehouse is a subject-oriented, Data Mining and customer relationship management (CRM) software for solving business decision problems integrated, time-variant and non-volatile collection of data in support of management’s decision making process Privacy of data in Insurance companies and Government agencies Fraud detection in Telecommunications and stock exchanges Medical diagnosis to detect abnormal patterns Airline reservation to maximize seat utilization FIG.DATA WAREHOUSE A Data Warehouse is a relational database that is designed analysis rather than for query and for transaction processing. It contains historical data derived from FIG.APPLICATIONS OF DATAMINING transaction data. characteristics, Subject oriented Data Warehouses Integrated Once loaded into the Non-volatile Data Warehouse, the data is not updated. Acts Time-variant as stable resource for consistent reporting and comparative analysis TIME-VARIANT: All data in the Data Warehouse is time stamped at time of entry into the warehouse or when it is summarized within the warehouse to act as chronological record and to provide historical and trend analysis possibilities FIG.DATAWAREHOUSING SUBJECT ORIENTED: The data in the warehouse is defined in business terms and is grouped under business oriented subject headings such as customers, products, sales analysis report and marketing campaigns achieved through data modeling. INTEGRATED: Data Warehouses must put FIG.PROCESS OF DATA WAREHOUSING data from disparate sources into a consistent format. They must resolve problems such as naming conflicts and inconsistencies among ARCHITECTURE OF DATA WAREHOUSE: Three common architectures in data units of measure. When they achieve this, they are said to be integrated. Ware house are Warehouse Architecture (Basic) Data NON-VOLATILE: Data Warehouse Architecture (with a Staging Area) Data Warehouse Architecture (with a management. Staging Area and Data Marts) FIG.DATAWAREHOUSE ARCHITECTURE DATA WAREHOUSE ARCHITECTURE (BASIC): The metadata and raw data of a traditional online transaction processing (OLTP) system is present, as is an additional FIG.DATAWAREHOUSE WITH STAGING type of data, summary data. A summary in DATA Oracle is called a materialized view.\ WITH STAGING AREA & DATA MARTS: DATA WAREHOUSE ARCHITECTURE WAREHOUSE ARCHITECTURE WITH A STAGING AREA: Most data warehouses use a staging area instead. A staging area simplifies building summaries and general warehouse FIG>DATA WARE WITH STAGING AREA & DATA MARTS We may want to customize your warehouse's analyzed by end users and the schema architecture for different groups within our design. organization. We can do this by adding data Are widely supported by a large number of marts, which are systems designed for a business intelligence tools.A star join is a particular line of business. primary key to foreign key join of the dimension tables to a fact table. PROCESSES WITHIN A DATA WAREHOUSE: SNOWFLAKE SCHEMA: The Snowflake schema is a more Extract and load the data complex data warehouse model than a star Clean and transform data into a form schema, and is a type of star schema. The that can cope with large data volumes diagram of the schema resembles a snowflake. and provide good query performance Snowflake schemas Backup and archive data dimensions to eliminate redundancy. Manage queries, and direct them to the CONCLUSION: normalize appropriate data sources Data Mining is a new term and formalism for a SCHEMAS IN DATA WAREHOUSE: A schema is a collection of process that has been undertaken by scientists database objects, including tables, views, for generations. The massive increase in the indexes, and synonyms. Commonly used volume of data collected or generated for Schemas are Star schema, Snowflake schema. analysis with the use of computers has made it an essential tool. However, despite the more STAR SCHEMA: The star schema is the simplest schema. The entity-relationship diagram of this schema resembles a star. The center of the star consists of a large fact table and the points of the star are the dimension tables. A Star schema is characterized by one or more fact tables and formal approach, Data Mining is something that scientists perform on an ad hoc basis and can easily adapt to. Many of the methods used for the analysis of the data were originally developed to process scientific data and are used unchanged. dimension tables Data Warehouse usually contains historical data derived from transaction data, The main advantages of star schemas are : but it can include data from other sources. The Provide a direct and intuitive mapping determination of which schema model should between be used for a Data Warehouse should be based the business entities being upon the requirements and preferences of the Data Warehouse project team. Star schemas are widely supported by a large number of business intelligence tools where as Snowflake schemas normalize dimensions to eliminate redundancy. As a final point, the biggest of all, the Internet, is becoming more and more important, and while there is useful information, extracting that from the terabytes being added daily is an enormous task. The techniques of Data Mining are applicable here more than any other domain. However, to make use of it takes time, effort and, above all, people with a knowledge of the field, to differentiate the true solutions from the infeasible Bibliography: Using Information Technology by William Sawyers Hutchinson Data Base System Concepts by Silberschatz, Korth and Sudharshan Data Base Management Systems by Alexis Leon and Mathews Leon http://www.technology-and-computers.com/