Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Protection Act, 2012 wikipedia , lookup
Data center wikipedia , lookup
Database model wikipedia , lookup
Forecasting wikipedia , lookup
Data analysis wikipedia , lookup
Data vault modeling wikipedia , lookup
3D optical data storage wikipedia , lookup
Data mining wikipedia , lookup
PAPER PRESENTATION ON Presented by:CONTACT DETAILS RAMSWAROOP SINGH T BRANCH: CSE K VINAY KUMAR BRANCH: IT ROLL NO: 05C71A0547 CONTACT NO: 9966952101 EMAIL ID: [email protected] ROLL NO: 05C71A1218 CONTACT NO: 9885522506 EMAIL ID: [email protected] ELLENKI COLLEGE OF ENGG. & TECH., PATEL GUDA INDEX ABSTRACT INTRODUCTION WHAT IS DATAMINNG? WHAT IS DATA WAREHOUSING? HOW DO DATAMINING AND DATAWARE HOUSING WORK TOGETHER? APPLICATIONS ADVANTAGES DISADVANTAGES CONCLUSION REFERENCES ABSTRACT We live in the age of information. Data is the most valuable resource of an enterprise. In today’s competitive global business environment, understanding and managing enterprise wide information is crucial for making timely decisions and responding to changing business conditions. Many companies are realizing a business advantage by leveraging one of their key assets – business Data. There is a tremendous amount of data generated by day-to-day business operational applications. In addition there is valuable data available from external sources such as market research organizations, independent surveys and quality testing labs. Studies indicate that the amount of data in a given organization doubles every 5 years. Data warehousing has emerged as an increasingly popular and powerful concept of applying information technology to turn these huge islands of data into meaningful information for better business. Data mining, the extraction of hidden predictive information from large databases is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. This paper describes the practicalities and the constraints in Data mining and Data warehousing and its advancements from the earlier technologie INTRODUCTION Data Warehousing o Extract archived operational data A data warehouse can be defined Overcome inconsistencies as any centralized data repository between different legacy which can be queried for data formats business benefit o Warehousing makes it possible to o Integrate data throughout an enterprise, regardless o of location, format, or guide to uncover inherent trends and communication tendencies in historical information, as requirements well as allow for statistical predictions, Incorporate additional or groupings and expert information Classification of data. Typical data warehousing implementations in organizations will Data Mining allow users to ask and answer questions such as “How many sales were made, by Data mining is not an territory, by sales person between the “intelligence” tool or framework, months of May and June in 1999?” Data typically drawn from an enterprise data warehouse is used to analyze and mining will allow business decision makers to ask and answer questions, uncover information about past such as “Who is my core customer that performance on an aggregate level. Data purchases a particular product we sell?” warehousing and business intelligence or “Geographically, how well would a provide a method for users to anticipate future trends from analyzing past line of products sell in a particular region and who would purchase them, patterns in organizational data. Data mining is more intuitive, allowing for given the sale of similar products in that region. increased insight beyond data warehousing. An implementation of data mining in an organization will serve as a WHAT IS DATA MINING? Generally, data mining information that can be used to increase (sometimes called data or knowledge revenue, cuts costs, or both. Data mining discovery) is the process of analyzing software is one of a number of analytical data from different perspectives and tools for analyzing data. It allows users summarizing it into useful information - to analyze data from many different dimensions or angles, categorize it, and not. Companies have used powerful summarize the relationships identified. computers to sift through volumes of Technically, data mining is the process supermarket scanner data and analyze of finding correlations or patterns among market dozens of fields in large relational However, continuous innovations in databases. computer research reports processing for power, years. disk storage, and statistical software are Although data mining is a relatively new term, the technology is dramatically increasing the accuracy of analysis while driving down the cost. WHAT IS DATA WAREHOUSING? Dramatic advances in data capture, allowing users to access this data freely. processing power, data transmission, and The data analysis software is what storage supports data mining. capabilities are enabling organizations to integrate their various databases into data warehouses. Data According to Bill Inman, author warehousing is defined as a process of of Building the Data Warehouse and the centralized and guru who is widely considered to be the retrieval. Data warehousing, like data originator of the data warehousing mining, is a relatively new term although concept, the concept itself has been around for characteristics that describe a data years. Data warehousing represents an warehouse: data management there are generally four ideal vision of maintaining a central repository of all organizational data. Centralization of data is needed to maximize user access and analysis. Dramatic technological advances are making this vision a reality for many companies. And, equally dramatic advances in data analysis software are Subject-oriented: data are organized according to subject instead of application e.g. an insurance company using a data warehouse would organize their data by customer, premium, and claim, instead of by different products (auto, life, etc.). The from the operational environment data organized by subject contain into the data warehouse, they only the information necessary assume for decision support processing. convention e.g. gender data is Integrated: When data resides in transformed to "m" and "f". many separate applications in the operational encoding environment, of data is often a consistent Time-variant: The coding data warehouse contains a place for storing data that are five to 10 inconsistent. For instance, in one years old, or older, to be used for application, gender might be comparisons, coded as "m" and "f" in another forecasting. These data are not by 0 and 1. When data are moved updated. trends, and An Overview of Data Mining Techniques This overview provides a description of some of the most common data 2) Next Generation Techniques such as trees, networks and rules. mining algorithms in use today. We have broken the discussion into two sections, Each section will describe a number of data mining algorithms at a high level, each with a specific theme: focusing on the "big picture" so that the 1) Classical Techniques such as statistics, neighborhoods and clustering, and reader will be able to understand how each algorithm fits into the landscape of data mining techniques. HOW DO DATAMINING AND DATAWAREHOUSING WORK TOGETHER?? Extracting meaningful that might otherwise be overlooked numerous is called "data mining." Assembling databases and cross-referencing it to the information in one place is called find patterns, trends and correlations "data warehousing." information from All the information is stored transformed and the useful in Information repositories. data is sent through Data Data warehouse takes the mining. The data, which is sent cleaned and integrated data. The data taken by Data through warehouse is selected and data mining is evaluated and presented. APPLICATIONS Data Warehousing Retrieve data - from a variety of heterogeneous operational databases Insulate data - i.e. the current o delivered to the data operational information o warehouse/store based on Preserves the security and a selected model (or integrity of mission- mapping definition) critical OLTP applications o Gives access to the broadest possible base of data. Data is transformed and o Metadata - information describing the model and definition of the source data elements Data cleansing - removal of ADVANTAGES: certain aspects of operational data, such as low-level transaction information, which slow down the query times. Transfer - processed data transferred to the data warehouse, a large database on a high performance box. Data Mining Enhances end-user access to a wide variety of data. Business decision makers can obtain various kinds of trend reports e.g. the item with the most sales in a particular area / country for the last two years. A data warehouse can be a significant enabler of commercial business applications, most notably Medicine - drug side effects, Customer relationship Management hospital cost analysis, genetic (CRM). sequence analysis, prediction etc. Finance - stock market prediction, credit assessment, fraud detection etc. DISADVANTAGES: Marketing/sales - product analysis, buying patterns, sales Data mining systems rely on prediction, target mailing, databases to supply the raw data for identifying `unusual behavior' input and this raises problems in that etc. databases tend be dynamic, incomplete, Knowledge Acquisition noisy, and large. Other problems arise as Scientific discovery - a result of the adequacy and relevance of superconductivity research, etc. the information stored. Engineering - automotive diagnostic expert systems, fault detection etc. Limited Information A database is often designed for purposes different from data mining and sometimes the properties or attributes that would simplify the learning task are not present nor can they be requested Missing data can be treated by discovery from the real world. Inconclusive data systems in a number of ways such as; causes problems because if some attributes essential to knowledge about the application domain are not present in the data it may be impossible to discover Simply disregard missing values Omit the corresponding records Infer missing values from known values significant knowledge about a given domain. For example cannot diagnose value to be included additionally malaria from a patient database if that in the attribute domain database does not contain the red blood cell count of the patients. Treat missing data as a special Or average over the missing values using Bayesian techniques. FUTURE VIEWS The future of data mining lies in emerging market for predictive analytics predictive analytics. The technology has been sustained by professional innovations in data mining since 2000 services, have been truly Darwinian and show recommendation) promise of consolidating and stabilizing applications in verticals such as retail, around predictive analytics. Variations, consumer finance, telecommunications, novelties and new candidate features travel and leisure, and related analytic have been expressed in a proliferation of applications. Predictive analytics have small start-ups that have been ruthlessly successfully culled from the herd by a perfect storm applications of bad economic news. Nevertheless, the recommendations, customer value and service bureaus and proliferated to support (rent a profitable into customer churn management, campaign Be realistic about the required complex optimization, and fraud detection. On the mixture of business acumen, statistical product side, success stories in demand processing and information technology planning; just in time inventory and support as well as the fragility of the market basket optimization are a staple resulting predictive model; but make no of assumptions predictive analytics. Predictive about the limits of analytics should be used to get to know predictive analytics. Breakthroughs often the customer, segment and predict occur in the application of the tools and customer behavior and forecast product methods demand and related market dynamics. opportunities . to new commercial CONCLUSION: Comprehensive data warehouses and prioritize information for specific that integrate operational data with end-user problems. The data mining customer, market tools can make this leap. Quantifiable in an business benefits have been proven explosion of information. Competition through the integration of data mining requires sophisticated with current information systems, and analysis on an integrated view of the new products are on the horizon that will data. However, there is a growing gap bring this integration to an even wider between more powerful storage and audience of users. supplier, information have timely and resulted and retrieval systems and the users’ ability to effectively analyze and act on the information they contain. Both relational and OLAP technologies have Data mining has a lot of potential Diversity in the field of tremendous capabilities for navigating application massive data warehouses, but brute force navigation of data is not enough. A new Estimated market for data mining is $500 million technological leap is needed to structure REFERENCES: . 1.Books Referred: a. Data Mining: concepts and techniques-Jiawei Han b. Data Mining TechniquesArun k. Pujari. c. Decision Support and Data Warehouse systems-Efrem G.Mallach 2. Internet Sites Availed: a. www.kluweronline.nl b. www.internet2.com c. www.the-datamine.com