Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
DATA MINING IN APPLIED WORLD Submitted by: Mr. AMIT V. MALWADE FINAL YEAR IT Guided by: Prof .Mr. NITIN R. CHOPDE Dept of IT, CBSCEM AMRAVATI. WHAT IS DATA? Data are a bunch of values of one or more variables. A variable is something that has different values. Values can be numbers or names, depending on the variable. • Numeric, e.g. weight • Counting, e.g. number of injuries • Ordinal, e.g. competitive level (values are numbers/names) • Nominal, e.g. gender (values are names) What is data warehouse? data warehousing is subject-oriented, integrated, timevariant, and non-volatile collection of data in support of management’s decision-making process. a data warehouse is data management and data analysis data webhouse is a distributed data warehouse that is implement over the web with no central data repository goal: is to integrate enterprise wide corporate data into a single reository from which users can easily run queries. Key Features Of Data Warehousing: Subject-oriented Integrated Time-variant Nonvolatile Data warehouse models Enterprise warehouse: Collects all of the information about subjects spanning the entire organization. Data mart: Are usually implemented on low-cost departmental servers that are UNIX or windows/NT –based. Virtual warehouse: i) It is a set of views over operational databases. ii) It is easy to build but requires excess capacity on operational database servers. What is data mining? Data mining is the process of extracting patterns from data. Data mining is becoming an increasingly important tool to transform this data into information. It is commonly used in a wide range of profiling practices, such as marketing, surveillance, fraud detection and scientific discovery. Architecture of data mining Above figure shows the simple architecture of data mining. It consist of following steps: Data cleaning Data integration Data selection Data transformation Data mining Pattern evaluation Knowledge discovery How does data mining work? Classes: Stored data is used to locate data in predetermined groups. For example, a restaurant chain could mine customer purchase data to determine when customers visit and what they typically order. Clusters: Data items are grouped according to logical relationships or consumer preferences. For example, data can be mined to identify market segments or consumer affinities. Associations: Data can be mined to identify associations. The beer- diaper example is an example of associative mining Sequential patterns: Data is mined to anticipate behaviour patterns and trends. For example, an outdoor equipment retailer could predict the likelihood of a backpack being purchased based on a consumer's purchase of sleeping bags and hiking shoes. What kind of Data can be mined? Flat files: Flat files are simple data files in text or binary format with a structure known by the data mining algorithm to be applied. Relational Databases: A relational database consists of a set of tables containing either values of entity attributes, or values of attributes from entity relationships. Tables have columns and rows, where columns represent attributes and rows represent tuples Multimedia Databases: Multimedia databases include video, images, audio and text media. They can be stored on extended object-relational or object-oriented databases, or simply on a file system. Data mining technologies OLAP : Data warehouse systems serve users or knowledge workers in the role of data analysis and decision-making. Such systems can organize and present data in various formats in order to accommodate the diverse needs of the different users. These systems are called on-line analytical processing (OLAP) systems. OLTP :The job of earlier on-line operational systems was to perform transaction and query processing. So, they are also termed as on-line transaction processing systems (OLTP). Difference between OLTP and OLAP Users and system orientation Data contents Database design View Access patterns Features of OLAP Multidimensional views of data: i) It provides the foundation for analytical processing through flexible access to information. ii) It must be able to analyze data across any dimensions at any level of aggregation, with equal functionality and ease. Calculation-intensive capabilities: i) Real test of an OLAP application is its ability to perform complex calculations; they must be able to do more than simple aggregation. ii) Analytical processing systems are judged on their ability to create information from data. Time Intelligence: True OLAP systems should understand the sequential nature of time. Advantages of Data mining Marking/Retailing Banking/Crediting Law enforcement Researchers Disadvantage of Data mining Security issues Misuse of information Conclusion Data mining is a synonym for knowledge discovery. There is much work to done in the area of knowledge discovery and data mining, and its future depends on developing tools and techniques that yield useful knowledge without causing undue threats to individuals’ privacy. References Advances and research directions in data warehousing technology by Mukesh Mohania, Sunil Samtani, John F. Roddick, Yahiko Kambayashi [[email protected]][researchpaper1] www.wikipedia.com Future trends in data mining by Hans-Peter Kriegel · Karsten M. Borgwardt ·Peer Kröger · Alexey Pryakhin · Matthias Schubert · Arthur Zimek. Book of data mining by jiwei han and micheline kamber[2006],Elservier inc www.springerlink.com THANK YOU………….