Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Course Methodology Bilingual Lectures Intelligent Discussions Information Processing Seminars Lab Work Short Paper Assignments Cao Sanxing [email protected] Communication University of China Paper Examination Teaching Assistant Contents Bibliography Introduction to IIP Data Mining and Knowledge Discovery Fuzzy Theory and Applications Knowledge Representation Wavelet Analysis Information Fusion Evolutional Computing and Synergetic Computing Media Knowledge Engineering Textbook: Bibliography Bibliography References: References: T. Hastie et. al., The Elements of Statistical Learning: Data Mining, Inference and Predication: SpringerVerlag, New York, Berlin, 2001.3 P. Giudici, Applied Data Mining: Statistical Methods for Business and Industry, Wiley, 2003 高隽: 智能信息处理方法导论,机械工业出版社, 2004.6 陈永利, 李敬功等: 模糊集理论及其应用,科学出版社, 2006.9 S.Mallat: A Wavelet Tour of Signal Processing,机械工业出版社 T.J.Ross: 模糊逻辑及其工程应用,钱同惠等 译,电子工业出版社 T. Dean, J. Allen and Y. Aloimonos, Artificial Intelligence: Theories and Practice, Addison Wesley, , transferred by Publishing House of Electronics Industry, China, 2004.4 J.Han, M. Kamber: Data Mining, Concepts and Techniques, Morgan Kaufmann Publishers,HEP, 2001.10 J. S. Albus: Intelligent Systems: Architecture, Design, Control, Wiley, PHEI, , 2004.8 1 Chapter I Introduction to IIP Key Issues: Chapter I Introduction to IIP Concept of Intelligent Information Processing The Intelligence Environment Chapter I Introduction to IIP Chapter I Introduction to IIP 1.1 Intelligence 1.1 Intelligence I believe that understanding of intelligence involves understanding how knowledge is acquired, represented, and stored; how intelligent behavior is generated and learned; how motives, and emotions, and priorities are developed and used; how sensory signals are transformed into symbols; how symbols are manipulated to perform logic; to reason about the past, and plan for the future; and how the mechanisms of intelligence produce the phenomena of illusion, belief, hope, fear, and dreams-and yes even kindness and love. To understand these functions at a fundamental level, I believe, would be a scientific achievement on the scale of nuclear physics, relativity, and molecular genetics. ----- James Albus Intelligence = Wisdom + Capacity Intelligence = Behavior + Reasoning + Adaptability Chapter I Introduction to IIP Chapter I Introduction to IIP 1.2 Artificial Intelligence 1.2 Artificial Intelligence Artificial Intelligence is the science that uses computers to simulate the functionality of thinking. Thinking is Computing. --------- Turin, 1946 2 Chapter I Introduction to IIP Chapter I Introduction to IIP 1.2 Artificial Intelligence 1.2 Artificial Intelligence Representation The Machine Learning Model The Environment Learning Learning Modules Knowledge Base(s) Executive Modules Reasoning Chapter I Introduction to IIP Chapter I Introduction to IIP 1.3 Computational Intelligence 1.3 Computational Intelligence CI = NN + EC + FS CI as a subset of AI? CI as a domain other than AI? CI: Computational Intelligence NN: Neural Networks EC: Evolutionary Computation FS: Fuzzy Systems Chapter I Introduction to IIP The Environment Carbon-based / Silicon-base Systems Sensory Intelligent Behaviors World View (Data + Knowledge) Algorithms + PR Chapter II Data Mining and Knowledge Discovery Reasoning, Abstraction, Summarization (CI) 3 Chapter II Data Mining and Knowledge Discovery Chapter II Data Mining and Knowledge Discovery Content: 2.1. Why Data Mining 2.1 Why Data Mining 2.2 Concept and Basis of Data Mining 2.3 Data Mining Functionalities 2.4 Data Mining: Important Issues 2.5 Concept and Basis of Data Warehousing 2.6 The Multidimensional Data Model 2.7 Data Warehouse: Architecture and Implementation 2.8 Further Development of Data Warehousing and Mining Chapter II Data Mining and Knowledge Discovery Key Issues: Relation of Data and Knowledge Concept of Data Warehouse Chapter II Data Mining and Knowledge Discovery Wide Availability of huge amounts of data 2.1. Why Data Mining Imminent need for turning data into useful information Motivation Leading to Data Mining Data mining: a natural evolution result of information technology. Necessity Necessity is is the the mother mother of of Invention. Invention. Chapter II Data Mining and Knowledge Discovery Data Data Collection Collection and and Database Database Creation Creation Chapter II Data Mining and Knowledge Discovery 1960s: Primitive file processing >> Database 1970s: Hierarchical, network >> Relational, query languages Database Database Management Management Systems Systems 1980s: Wide adoption of relational technology, Researches on new advanced data models Advanced Advanced Databases Databases Systems Systems Web-based Web-based Databases Databases Systems Systems Data Data Warehousing Warehousing and and Data Data Mining Mining 1990s ~ new century: Great boost of database and information systems, OLAP, Data warehousing, Data mining New New Generation Generation of of Integrated Integrated Information Information Systems Systems 4 Chapter II Data Mining and Knowledge Discovery Data warehouse: a repository of multiple heterogeneous data sources, organized under a unified schema at a single site in order to facilitate management decision making. Chapter II Data Mining and Knowledge Discovery Chapter II Data Mining and Knowledge Discovery Data warehousing technologies include: Data cleansing Data integration On-Line Analytical Processing Chapter II Data Mining and Knowledge Discovery OLAP: Analysis techniques with functionalities such as DATA DATARICH… RICH… Summarization, Consolidation, Aggregation, Ability to view information from different angles OLAP includes the basic functionalities of data mining, and the knowledge management based on data models. Chapter II Data Mining and Knowledge Discovery … …INFORMATION INFORMATIONPOOR POOR The fast-growing, tremendous amount of data, collected and stored in large and numerous databases, has far exceeded human capability for comprehension without powerful tools. Chapter II Data Mining and Knowledge Discovery TOMBS… TOMBS… of ofdata data DATABASES… DATABASES… EXPERT EXPERTSYSTEMS/ SYSTEMS/ KNOWLEDGE KNOWLEDGEBASES… BASES… … …DATA DATAWAREHOUSES WAREHOUSES WITH WITHDATA DATAMINING MINING … …GOLDEN GOLDENNUGGETS NUGGETS of ofknowledge knowledge By reshaping databases into a data warehouse, and with the introduction of effective data mining techniques, knowledge could be discovered and manipulated via huge amount of data. 5 Chapter II Data Mining and Knowledge Discovery 2.2. Concept and Basis of Data Mining Key Issues: Data Mining Process The Basis of Data Mining Chapter II Data Mining and Knowledge Discovery Chapter II Data Mining and Knowledge Discovery 2.2. Concept and Basis of Data Mining Data mining: Extracting (mining) Knowledge from Large Amounts of Data. Knowledge mining from databases Knowledge extraction Data/pattern analysis Data archaeology Data dredging KDD: Knowledge Discovery in Databases Chapter II Data Mining and Knowledge Discovery Data Mining Process: Data cleaning Data integration Data transformation Data mining Pattern evaluation Knowledge presentation Evaluation and Presentation Knowledge Data Mining Selection and Transformation Cleaning and Integration Databases Chapter II Data Mining and Knowledge Discovery Data mining is sometimes interactive with the user or the knowledge base. A broader view: Data mining is the process of discovering interesting knowledge from large amounts of data stored either in databases, data warehouses, or other information repositories. Patterns Flat files Chapter II Data Mining and Knowledge Discovery Data mining architecture: Data description and storage basis: Database, data warehouse or other information repository Database or data warehouse server Knowledge base Data mining engine Pattern evaluation module GUI 6 Chapter II Data Mining and Knowledge Discovery From a data warehouse perspective, data Graphical Graphical user user interface interface mining can be viewed as an advanced stage of OLAP. Pattern Pattern evaluation evaluation Knowledge Knowledge base base Date Date mining mining engine engine Database Database or or data data warehouse warehouse sever sever Data cleaning Data integration Database Database Chapter II Data Mining and Knowledge Discovery Filtering Date Date warehouse warehouse Chapter II Data Mining and Knowledge Discovery Target of knowledge mined: Decision making Process control Information management Query processing Data Mining is interdisciplinary: Database technology, Statistics, Machine Learning, High-performance computing, Pattern recognition, Neural networks, Data visualization, Information retrieval, Image and signal processing, Spatial data analysis. Chapter II Data Mining and Knowledge Discovery The Basis of Data Mining Data mining could be carried out in different types of data stores. Relational Databases Data Warehouses Transactional Databases Advanced Database Systems and Advanced Database Applications Therefore, data mining is considered one of the most important frontiers in database systems. Chapter II Data Mining and Knowledge Discovery Advanced Database Systems and Advanced Database Applications Object-Oriented Databases Object-Relational Databases Spatial Databases Temporal Databases and Time-Series Databases Text Databases and Multimedia Databases Heterogeneous Databases and Legacy Databases The World Wide Web Chapter II Data Mining and Knowledge Discovery Relational Databases DBMS Data Storage, Data Access (Concurrent, Shared, Distributed), Consistency and Security Ensuring Relational Database Constitution Tables Attributes, Tuples Relational Data Access: Database queries written in a relational query language, usually SQL, or with the assistance of GUI. 7 Chapter II Data Mining and Knowledge Discovery Mining Relational Databases: Upon the basis of statistical relational queries: Mining Data Warehouses sum, avg, count, max, min A data warehouse facilitates the mining of useful knowledge by: Further: searching for trends and data patterns Example: Analysis on customer data to predict the credit risk of new customers based on their income, age and previous credit information. Chapter II Data Mining and Knowledge Discovery Deviation Detection Chapter II Data Mining and Knowledge Discovery Collecting Information from multiple sources; Storing the information under a unified schema; Preparing the data by cleaning, transformation, and integration Organizing data around major subjects; Providing information from a historical perspective, typically summarized Chapter II Data Mining and Knowledge Discovery Mining Data Warehouses A data warehouse is usually modeled by a multidimensional database structure. Dimension <-> attribute / set of attributes Cell <-> the value of some aggregate measure Actual physical structure of a data warehouse may be: Client Data source in Beijing A relational data store A multidimensional data cube Date source in Shanghai Clean Transform Integrate Load Data warehouse Date source in Guangzhou Query and Analysis tools Client Date source in Hongkong Chapter II Data Mining and Knowledge Discovery Mining Data Warehouses Data Warehouses and Data Marts A data warehouse collects information about subjects that span an ENTIRE ORGANIZATION A data mart is a department subset of a data warehouse. It focuses on selected subjects Enterprise-wide Department-wide Chapter II Data Mining and Knowledge Discovery Mining Data Warehouses Important: Data warehouses are suitable for On-Line Analytical Processing. (thanks to the multidimensional data views and the pre-computed summarized data) Drill-down Roll-up 8 Chapter II Data Mining and Knowledge Discovery Mining Transactional Databases A transactional database consists of a file where each record represents a transaction. A transaction typically includes a unique transaction ID and a list of items making up the transaction. Transaction Database mining works best on Market Basket Data Analysis. Chapter II Data Mining and Knowledge Discovery Chapter II Data Mining and Knowledge Discovery Mining Advanced Database Systems and Applications Object-oriented Databases Object-relational Databases Spatial Databases Temporal / Time-Series Databases Text / Multimedia Databases Heterogeneous / Legacy Databases World Wide Web Chapter II Data Mining and Knowledge Discovery Descriptive Mining Tasks: 2.3. Data Mining Functionalities Data Mining functionalities are used to specify the kind of patterns to be found in data mining tasks. To characterize the general properties of data in the database or other mining basis. Predictive Mining Tasks: To perform the inference on the current data in order to make predictions. Basic Data Mining tasks categories: Descriptive Predictive Chapter II Data Mining and Knowledge Discovery REQUIREMENTS: It is important for a data mining system to be capable of mining multiple kinds of patterns, to accommodate different user expectations or applications. Data Mining systems should be able to discover patterns at various granularities. Data Mining systems should also allow users to specify hints to guide or focus the search for interesting patterns. Chapter II Data Mining and Knowledge Discovery Different kinds of Data mining functionalities: Concept/Class Description Association Analysis Classification and Prediction Cluster Analysis Outlier Analysis Evolution Analysis 9 Chapter II Data Mining and Knowledge Discovery 2.3.1. Concept/Class Description: Chapter II Data Mining and Knowledge Discovery Characterization and Discrimination Data can be associated with Classes or Concepts. It can be useful to describe individual classes and concepts in summarized, concise and yet precise terms. Such descriptions of a class or a concept are called Class/concept descriptions. Concept/Class Descriptions could be derived via Chapter II Data Mining and Knowledge Discovery Data characterization: A summarization of the general characteristics or features of a target class of data. Methods of data characterization: Data cube-based OLAP roll-up Attribute-oriented induction Chapter II Data Mining and Knowledge Discovery 2.3.2. Association Analysis Association Analysis: the Discovery of association rules showing attribute-value conditions that occur frequently together in a given set of data. Data characterization Data discrimination Both Chapter II Data Mining and Knowledge Discovery Data Discrimination: A comparison of the general features of target class data objects with the general features of objects from one or a set of contrasting classes. Discrimination descriptions are usually expressed in rule form: Discriminant Rules. Chapter II Data Mining and Knowledge Discovery Formal Description: X => Y A1 ∧ A2 ∧ … ∧ Am -> B1 ∧ B2 ∧ … ∧ Bn Example age (x, “20…29”) ∧ income (x, “20k…29k”) => buys (x, “MP3 Player”) [support = 2%, confidence=60%] 10 Chapter II Data Mining and Knowledge Discovery 2.3.3. Classification and Prediction Classification: The process of finding a set of Models that describe and distinguish data classes or concepts, for the purpose of being able to use the model to predict the class of objects whose class label is unknown. Chapter II Data Mining and Knowledge Discovery Classification can be used for predicting the class label of data objects. This is highly relevant to prediction. Classification and prediction may need to be preceded by relevance analysis, which attempts to identify attributes that do not contribute to the classification or prediction process. These attributes can then be excluded. Chapter II Data Mining and Knowledge Discovery Chapter II Data Mining and Knowledge Discovery The derived model is based on the analysis of a set of training data. The derived model could be expressed in: Classification (IF-THEN) rules, Decision trees, Mathematical formulae, Neural networks Chapter II Data Mining and Knowledge Discovery 2.3.4. Cluster Analysis To analyze data objects without consulting a known class label. To find different patterns or clusters that could be used in objects’ classification. To generate class labels. Chapter II Data Mining and Knowledge Discovery 2.3.5. Outlier Analysis Each cluster that is formed can be viewed as a class of objects. Clustering can also facilitate taxonomy formation, that is, the organization of observations into a hierarchy of classes that group similar events together. Outliers: data objects that do not comply with the general behavior or model of the data. Most data mining methods discard outliers as noise or exceptions. In some applications, such as Fraud Detection, rare events or outliers could be more interesting. 11 Chapter II Data Mining and Knowledge Discovery Chapter II Data Mining and Knowledge Discovery 2.3.6. Evolution Analysis Outliers may be detected using statistical tests that assume a distribution or probability model for the data, or using distance measures where objects that are a substantial distance from any other cluster are considered outliers. To describe and model the trends or regularities for objects whose behavior changes over time. Chapter II Data Mining and Knowledge Discovery 2.4. Data Mining: Important Issues Time-series data analysis Sequence or periodicity pattern matching Similarity-based data analysis. Chapter II Data Mining and Knowledge Discovery Are all the patterns interesting? What makes a pattern interesting? Can a data mining system generate ALL the interesting patterns? Can a data mining system generate ONLY the interesting patterns? 2.4.1. Filtering of Patterns Found Brief view A data mining system could potentially generate thousands or even millions of patterns or rules. Chapter II Data Mining and Knowledge Discovery Chapter II Data Mining and Knowledge Discovery A pattern is interesting if Easily understood by humans Valid on new or test data with some degree of certainty. Potentially useful. Novel. AN INTERESTING PATTERN ALWAYS REPRESENTS KNOWLEDGE. 12 Chapter II Data Mining and Knowledge Discovery There are some Objective Measures of Pattern Interestingness. Support Confidence Chapter II Data Mining and Knowledge Discovery Generally speaking, each interestingness measure is associated with a Threshold, which may be controlled by users. Rules below the threshold likely reflect noise, Chapter II Data Mining and Knowledge Discovery Objective measures: INSUFFICIENT exceptions, or minority cases and are probably of less value. Chapter II Data Mining and Knowledge Discovery Subjective interestingness measures are based on user beliefs in the data. … unless combined with subjective measures These measures find patterns interesting if Subjective measures are reflecting needs and interests of a particular user. Chapter II Data Mining and Knowledge Discovery Completeness of a data mining algorithm Can a data mining system generate ALL the interesting patterns? Unrealistic and insufficient User-provided constraints and interestingness measures should be used to focus the search. they are UNEXPECTED – contradicting a user’s belief. Chapter II Data Mining and Knowledge Discovery Optimization of a data mining system: Can a data mining system generate ONLY the interesting patterns? Progresses have been made in this direction. However, it remains a challenging issue in data mining. 13 Chapter II Data Mining and Knowledge Discovery 2.4.2. Classification of Data Mining Systems Database Database technology technology Information Information science science Statistics Statistics Chapter II Data Mining and Knowledge Discovery Classification according to the kinds of databases mined Relational, transactional, object-oriented, object-relational, data warehouse mining. Spatial, time-series, text, multimedia, WWW mining. Machine Machine learning learning Data Data Mining Mining Visualization Visualization Other Otherdisciplines disciplines Chapter II Data Mining and Knowledge Discovery Chapter II Data Mining and Knowledge Discovery Classification according to the kinds of knowledge mined. Characterization, discrimination, association, classification, clustering, outlier analysis, evolution analysis. Different granularities (levels of abstraction) of knowledge mined. Classification according to the kinds of techniques utilized. Degree of user interaction involved: Systems that mine data regularities vs. Systems that mine data irregularities General knowledge, primitive-level knowledge, knowledge at multiple levels. Chapter II Data Mining and Knowledge Discovery Autonomous systems, interactive exploratory systems, query-driven systems. Chapter II Data Mining and Knowledge Discovery Classification according to the applications adapted. Finance data mining systems, telcos, DNA, stock markets, web, e-mail... Media? Media! Methods of data analysis employed: Database-oriented, data warehouse-oriented, machine learning, statistics, visualization, pattern recognition, neural networks… 14 Chapter II Data Mining and Knowledge Discovery 2.4.3. Other Major Issues in Data Mining and Data Warehousing Mining methodology and user interaction issues Mining different kinds of knowledge in databases Interactive mining of knowledge at multiple levels of abstraction Chapter II Data Mining and Knowledge Discovery Mining methodology and user interaction issues Presentation and visualization of data mining results Handling noisy or incomplete data Chapter II Data Mining and Knowledge Discovery Performance Issues Chapter II Data Mining and Knowledge Discovery Mining methodology and user interaction issues Incorporation of background knowledge Data mining query languages and adhoc data mining Chapter II Data Mining and Knowledge Discovery Mining methodology and user interaction issues Pattern evaluation – the interestingness problem Chapter II Data Mining and Knowledge Discovery Issues relating to the diversity of database types Efficiency and scalability of data mining algorithms Handling of relational and complex types of data Parallel, distributed and incremental mining algorithms Mining information from heterogeneous databases and global information systems 15 Chapter II Data Mining and Knowledge Discovery B.1. What is a Data Warehouse? B.2. The Multidimensional Data Model B.3. Data Warehouse Architecture B.4. Data Warehouse Implementation B.5. Further Development of Data Cube Technology B.6. From Data Warehousing to Data Mining Chapter II Data Mining and Knowledge Discovery Chapter II Data Mining and Knowledge Discovery 2.5 Concept and Basis of Data Warehousing Data Warehousing provides architectures and tools for business executives to systematically organize, understand and use their data to make strategic decisions. Chapter II Data Mining and Knowledge Discovery In the last decade, many firms have spent a Loosely speaking, a data warehouse refers to large budget in building enterprise-wide data warehouses. Data warehousing is considered to be THE LATEST MUST-HAVE MARKETING WEAPON. a database that is maintained separately from an organization’s operational databases. Data warehouse system allow for the integration of a variety of application systems that support information processing by providing a solid platform of consolidated historical data for analysis. Chapter II Data Mining and Knowledge Discovery What is a data warehouse?? Definition of Data Warehouse by W. H. Inmon: A Data Warehouse is a subject-oriented, integrated, time-variant and nonvolatile collection of data in support of management’s decision making process. Chapter II Data Mining and Knowledge Discovery Subject-oriented: 面向主题的 A data warehouse is organized around major subjects, such as customer, supplier, product and sales. It is not a database that concentrates on the day-to-day operations and transaction processing of an organization. Data warehouse typically provide a simple and concise view around particular subject issues by excluding data not useful in decision support 16 Chapter II Data Mining and Knowledge Discovery Integrated: 集成的 A data warehouse is usually constructed by integrating multiple heterogeneous sources. Data cleansing and data integration techniques are applied to ensure consistency in naming conventions, encoding structures, attribute measures and so on. Chapter II Data Mining and Knowledge Discovery Nonvolatile: 非易失的,可记忆的 A Data warehouse is always a physically separate store of data transformed from the application data found in the operational environment. Due to this separation, a data warehouse does not require transaction processing, recovery and concurrency control mechanisms. It usually requires only 2 operations in data accessing: Time-variant: 时变的,动态的 Data are stored in a data warehouse just to provide information from a historical perspective, usually a period of several years. Every key structure in the data warehouse contains an element of time either implicitly or explicitly. Chapter II Data Mining and Knowledge Discovery What is data warehousing?? Data warehousing is the process of Constructing and Using data warehouses. The construction of a data warehouse requires data integration, data cleaning, and data consolidation. The utilization of a data warehouse often necessitates a collection of decision support technologies. Initial Data Loading Data Access Chapter II Data Mining and Knowledge Discovery Data warehouses are used for: Increasing customer focus Repositioning products and managing product portfolios Analyzing operations and looking for sources of profit Managing the customer relationships Chapter II Data Mining and Knowledge Discovery Chapter II Data Mining and Knowledge Discovery Differences between Operational Database Systems and Data Warehouses OLTP vs. OLAP Users and System orientation Data Contents Database Design View Access Patterns 17 Feature OLTP Characteristic Orientation User operational processing transaction clerk,DBA,database professional Function day-to-day operations DB design ER based,application-oriented Data current;guaranteed up-to-date Summarization pimitive,highly detailed View detailed,flat relational Unit of work short,simple transaction Access read/write Focus data in Operations index/hash on primary key Number of records tens accessed Number of users thousands DB size 100MB to GB Priority high performance,high availability Metric transaction throughput OLAP informational processing analysis knwledge worker(e.g,manager, executive,analyst) long-term informational requirements, decision support star/snowflake,subject-oriented historical;accuracy maintained over time summarized,consolidated summarized,multidmensional complex query mostly read information out lots of scans millions Operational Databases are tuned for: Why have a separate Data warehouse? To help promote the high performance of both operational and analytical systems. THEY ARE DIFFERENT hundreds 100GB to TB high flexbility,end-user autonomy query throughput,response time Chapter II Data Mining and Knowledge Discovery Chapter II Data Mining and Knowledge Discovery Indexing, searching, queries Concurrency control Raw data processing Chapter II Data Mining and Knowledge Discovery 2.6 The Multidimensional Data Model Multidimensional Data Model is known as Data warehouses are designed to support: Complex queries Large Data Groups Calculation Multidimensional Data views Historical, consolidated data processing the basis of Data warehouses and OLAP tools. The Multidimensional Data Model views data in the form of a Data Cube. Chapter II Data Mining and Knowledge Discovery Chapter II Data Mining and Knowledge Discovery Basic Ideas of Data Cubes What is a data cube? A Data Cube allows data to be modeled and viewed in multiple dimensions. It is defined by dimensions and facts. Dimensions: The perspectives or entities with respect to which an organization wants to keep records. time, item, branch, location Dimension Table: The relational table that implements a dimension. 18 Chapter II Data Mining and Knowledge Discovery Chapter II Data Mining and Knowledge Discovery A Multidimensional Data Model is typically organized around a Central Theme. Facts: Numerical measures , or quantities by which we want to analyze relationships between dimensions. Chapter II Data Mining and Knowledge Discovery Fact Tables: Relational tables that store the names of the facts, or measures, as well as keys to each of the related dimension tables. Chapter II Data Mining and Knowledge Discovery Location=“Shanghai” Location=“Beijing” item Item(type) home time(quarter) home entertainment computer phone security Q1 Q2 Q3 Q4 605 680 812 927 825 952 1023 1038 14 31 30 38 400 512 501 580 Chapter II Data Mining and Knowledge Discovery home Time ent. comp. phone sec. Q1 Q2 Q3 Q4 854 882 Location=“Guangzhou” item 89 623 home ent. comp. phone sec. 1087 968 38 Location=“Hongkong” item home ent. comp. phone sec. ent. comp. phone sec. 872 818 746 43 591 605 825 14 943 890 64 698 1130 1024 41 925 894 769 52 682 680 952 31 512 1032 924 59 789 1034 1048 45 1002 940 795 58 728 812 1023 30 501 1129 992 63 870 1142 1091 54 984 59 784 927 1038 38 580 978 864 400 Chapter II Data Mining and Knowledge Discovery (c itie s) 4-D Cube: see the book Lo ca tio n BJ SH GZ HK 854 862 89 623 1087 968 38 872 818 746 43 591 Q1 605 825 14 400 Q2 680 952 31 512 Q3 812 1023 30 501 Q4 927 1038 38 580 H C P S 870 789 698 984 1002 925 784 728 682 Time (quarters) Location=“Shanghai” item BJ = Beijing SH = Shanghai GZ = Guangzhou HK = Hong Kong Cuboid Construction of a lattice of cuboids Group By Base Cuboid: the cuboid that holds the lowest level of summarization H = Home entertainment C = Computer P = Phone S = Security Item (types) 19 Chapter II Data Mining and Knowledge Discovery Stars, Snowflakes, and Fact Constellations: Chapter II Data Mining and Knowledge Discovery The Star Schema: Schemas for Multidimensional Databases 2-D Relational Databases: Entity – Relationship Data Warehouse: Multidimensional Data Model Chapter II Data Mining and Knowledge Discovery time Dimension table time_key time_key day day day_of_the_week day_of_the_week month month quarter quarter year year sales fact table Item Dimension table time_key time_key item_key item_key branch_key branch_key location_key location_key dollars_sold dollars_sold units_sold units_sold item_key item_key item_name item_name brand brand type type supplier_type supplier_type branch dimension table branch dimension table branch_key branch_key branch_name branch_name branch_type branch_type A large central table (fact table) A set of smaller attendant tables (dimension tables), one for each dimension Chapter II Data Mining and Knowledge Discovery The Snowflake Schema: location dimension table Chapter II Data Mining and Knowledge Discovery time_key time_key day day day_of_the_week day_of_the_week month month quarter quarter year year It is a variant of the Star Schema Model. In a Snowflake Schema Model, some dimensions are normalized, thus the data are further split into additional tables. location_key location_key street street city city province_or_state province_or_state country country branch_key branch_key branch_name branch_name branch_type branch_type time dimension table In star schema, a data warehouse contains: sales fact table Item dimension table time_key time_key item_key item_key branch_key branch_key location_key location_key dollars_sold dollars_sold units_sold units_sold item_key item_key Item_name Item_name brand brand type type supplier_key supplier_key supplier dimension table supplier_key supplier_key supplier_type supplier_type The major difference between Snowflake and Star schema models is: location dimension table location_key location_key street street city city Chapter II Data Mining and Knowledge Discovery The dimension tables of the snowflake model may be kept in normalized form, for the purpose of reducing redundancies. (See the detailed explanation in book) city dimension table city_key city_key city city province_or_state province_or_state country country 20 Chapter II Data Mining and Knowledge Discovery The Fact Constellation Schema Chapter II Data Mining and Knowledge Discovery time dimension table Sophisticated applications may require multiple fact tables to share dimension tables. This kind of schema could be viewed as a collection of stars, and hence is called a galaxy schema or a fact constellation. Chapter II Data Mining and Knowledge Discovery Examples for Defining Star, Snowflake and time_key time_key day day day_of_the_week day_of_the_week month month quarter quarter year year sales fact table item dimension table time_key time_key Item_key Item_key branch_key branch_key location_key location_key dollars_sold dollars_sold units_sold units_sold item_key item_key Item_name Item_name brand brand type type supplier_type supplier_type location dimension table branch dimension table shipping fact table item_key item_key time_key time_key shipper_key shipper_key from_location from_location to_location to_location dollars_cost dollars_cost units_shipped units_shipped supplier dimension table supplier_key supplier_key supplier_type supplier_type Location_key Location_key shipper_type shipper_type location_key location_key street street city_key city_key city city province_or_state province_or_state country country branch_key branch_key branch_name branch_name branch_type branch_type Chapter II Data Mining and Knowledge Discovery Cube Definition: Fact Constellation Schemas The Data Mining Query Language ‘DMQL’ Dimension Definition: Chapter II Data Mining and Knowledge Discovery define cube <cube_name> [ <dimension_list>] : <measure_list> define dimension < dimension_name > as ( <attribute_or_subdimension_list>) Chapter II Data Mining and Knowledge Discovery Example for Star Schema Definition time Dimension table time_key time_key day day day_of_the_week day_of_the_week month month quarter quarter year year branch dimension table branch_key branch_key branch_name branch_name branch_type branch_type sales fact table time_key time_key item_key item_key branch_key branch_key location_key location_key dollars_sold dollars_sold units_sold units_sold item Dimension table item_key item_key item_name item_name brand brand type type supplier_type supplier_type location dimension table location_key location_key street street city city province_or_state province_or_state country country define cube sale_star [ time, item, branch, location ] : dollars_sold = sum (sales_in_dollars), units_sold = count (*) define dimension time as (time_key, day, day_of_week, month, quarter, year) define dimension item as (item_key, item_name, brand, type, supplier_type) define dimension branch as (branch_key, branch_name, branch_type) define dimension location as (location_key, street, city, province_or_state, country) 21 Chapter II Data Mining and Knowledge Discovery Example for Snowflake Schema Definition sales fact table time dimension table time_key time_key day day day_of_the_week day_of_the_week month month quarter quarter year year item dimension table time_key time_key item_key item_key branch_key branch_key location_key location_key dollars_sold dollars_sold units_sold units_sold supplier dimension table item_key item_key item_name item_name brand brand type type supplier_key supplier_key supplier_key supplier_key supplier_type supplier_type location dimension table branch dimension table location_key location_key street street city_key city_key branch_key branch_key branch_name branch_name branch_type branch_type city dimension table city_key city_key city city province_or_state province_or_state country country Chapter II Data Mining and Knowledge Discovery Chapter II Data Mining and Knowledge Discovery define cube sale_snowflake [ time, item, branch, location ] : dollar_sold = sum (sales_in_dollars), units_sold = count (*) define dimension time as (time_key, day, day_of_week, month, quarter, year) define dimension item as (item_key, item_name, brand, type, supplier(supplier_key, supplier_type)) define dimension branch as (branch_key, branch_name, branch_type) define dimension location as (location_key, street, city(city_key, city, province_or_state, country)) Chapter II Data Mining and Knowledge Discovery Example for Fact Constellation Schema Definition time dimension table sales fact table time_key time_key day day day_of_the_week day_of_the_week month month quarter quarter year year branch dimension table time_key time_key Item_key Item_key branch_key branch_key time_key time_key item_key item_key branch_key branch_key location_key location_key dollars_sold dollars_sold units_sold units_sold item dimension table item_key item_key item_name item_name brand brand type type supplier_type supplier_type location dimension table shipping fact table item_key item_key time_key time_key shipper_key shipper_key from_location from_location to_location to_location dollars_cost dollars_cost units_shipped units_shipped shipper dimension table shipper_key shipper_key shipper_name shipper_name location_key location_key shipper_type shipper_type Location_key Location_key city_key city_key city city province_or_state province_or_state country country Chapter II Data Mining and Knowledge Discovery define cube sales [ time, item, branch, location ] : dollars_sold = sum (sales_in_dollars), units_sold = count (*) define dimension time as (time_key, day, day_of_week, month, quarter, year) define dimension item as (item_key, item_name, brand, type, supplier_type) define dimension branch as (branch_key, branch_name, branch_type) define dimension location as (location_key, street, city, province_or_state, country) Chapter II Data Mining and Knowledge Discovery Categorization and Computation of Data define cube shipping [ time, item, shipper, from_location, to_location ] : dollars_cost = sum (cost_in_dollars), units_shipped = count (*) define dimension time as time in cube sales define dimension item as item in cube sales define dimension shipper as (shipper_key, shipper_name, location as location in cube sales, shipper_type) define dimension from_location as location in cube sales define dimension to_location as location in cube sales Cube Measures A data cube measure is a numerical function that can be evaluated at each point in the data cube space. A measure value is computed for a given point by aggregating the data corresponding to the respective dimension-value pairs defining the given point. 22 Chapter II Data Mining and Knowledge Discovery Measures can be organized into 3 categories: Distributive count(), sum(), min(), max() Algebraic avg(), min_N(), max_N(), standard_deviation() Holistic median(), mode(), rank() province_state city … Canada Canada British British Vancouver Vancouver … Ontario Ontario … … New NewYork York Buffalo Buffalo Illinois Illinois Chicago Chicago Chapter II Data Mining and Knowledge Discovery country province_or_state ($0…$1000] ($0…$1000] ($0…$200] ($200…$400] ($400…$600] ($0…$200] ($200…$400] ($400…$600] ($600…$800] ($600…$800] ($900…$1000] ($900…$1000] month week day street (a) B.2. The Multidimensional Data Model year quarter city Chapter II Data Mining and Knowledge Discovery Concept Hierarchies that are common to many application s may be predefined in the data mining system. Concept Hierarchies could also be defined by discretizing or grouping values for a given dimension or attribute. USA USA New NewYork York … … Victoria Toronto Ottawa Ottawa Victoria Toronto A Concept Hierarchy defines a sequence of mappings from a set of low-level concepts to higher-level, more general concepts. all all country Introduction of Concept Hierarchies Chapter II Data Mining and Knowledge Discovery all Chapter II Data Mining and Knowledge Discovery ($0…$100] ($0…$100] ($200…$300] ($200…$300] ($400…$500] ($400…$500] ($100…$200] ($100…$200] ($300…$400] ($300…$400] ($600…$700] ($600…$700] ($500…$600] ($500…$600] ($800…$900] ($800…$900] ($700…$800] ($700…$800] ($900…$1000] ($900…$1000] (b) 23 B.2. The Multidimensional Data Model B.2. The Multidimensional Data Model OLAP Operations in the Multidimensional Data Model Roll-up Drill-down Slice and dice Pivot(Rotate) Other OLAP operations (drill-across, drillthrough) Roll-up Aggregation on a data cube: Climbing up a concept hierarchy Dimension reduction Drill-down Navigation from less detailed data to more detailed data Stepping down a concept hierarchy Introducing additional dimensions B.2. The Multidimensional Data Model location continent Slice and dice customer country Slice: a selection on one dimension, resulting in a subcube Dice: definition of a subcube by a selection on two or more dimensions name street item Pivot (Rotate) Visualization, rotating the data axes in view, in order to provide an alternative presentation of the data. B.3. Data Warehouse Architecture Basics for the Design and Construction of Data Warehouses The Design of a Data Warehouse: A Business Analysis Framework To design an effective data warehouse, one needs to understand and analyze business needs and construct a business analysis framework. name brand category type month quarter year time B.3. Data Warehouse Architecture Four views regarding the design of a data warehouse: category city day group province_or_state The top-down view The data source view The data warehouse view The business query view 24 B.3. Data Warehouse Architecture Building and using a data warehouse is a complex task since it requires B.3. Data Warehouse Architecture The Process of Data Warehouse Design Business skills; Technology skills; Program management skills B.3. Data Warehouse Architecture A Three-tier Data Warehouse Architecture Choose a business process to model; Choose the grain of the business process; (granularity) Choose the dimensions that will apply to each fact table record; Choose the measures that will populate each fact table record. B.3. Data Warehouse Architecture Query/Report Analysis Data-Warehouse-oriented OLAP Server Data Mining Data-Mart-oriented OLAP Server Front-end Tools OLAP Server What is a data warehouse architecture like???? Administration Monitoring Metadata Repository Operational Databases B.3. Data Warehouse Architecture From the architecture point of view, there are 3 data warehouse architecture models Enterprise warehouse Data mart Virtual warehouse Data Mart Data Warehouse Data Mart Data Warehouse Server Data Mart External Sources Data B.3. Data Warehouse Architecture The top-down development of an enterprise warehouse serves as a systematic solution and minimizes integration problems. But expensive. A recommended method for development of data warehouse systems is to implement the warehouse in an incremental and evolutionary manner. 25 B.3. Data Warehouse Architecture Multi-tier Multi-tier data data warehouse warehouse Distributed Distributed data datamarts marts Data Data mart mart Enterprise Enterprise data data warehouse warehouse Data Data mart mart Model refinement B.3. Data Warehouse Architecture Model refinement Types of OLAP Servers: Relational OLAP (ROLAP) Servers Multidimensional OLAP (MOLAP) Servers Hybrid OLAP (HOLAP) Servers Define Defineaahigh-level high-levelcorporate corporatedata datamodel model B.4. Data Warehouse Implementation Efficient Computation of Data Cubes B.4. Data Warehouse Implementation Multi-way Array Aggregation in the Computation of Data Cubes The compute cube Operator and its Implementation define cube sales [item, city, year]: sum(sales_in_dollars) compute cube sales B.5. Further Development of Data Cube Technology B.6. From Data Warehousing to Data Mining Discovery-driven Exploration of Data Cubes Data Warehouse Usage Complex Aggregation at Multiple Granularities: Multifeature Cubes Other Developments 3 kinds of Data Warehouse application Information Processing Analytical Processing Data Mining 26 B.6. From Data Warehousing to Data Mining B.6. From Data Warehousing to Data Mining From OLAP to OLAM OLAP: On-Line Analytical Processing OLAM: On-Line Analytical Mining Reasons why OLAM is important: On-Line Analytical Mining integrates OLAP with data mining and mining knowledge in multidimensional databases. High quality of data in datawarehouses Available information processing infrastructure surrounding data warehouses OLAP-based exploratory data analysis On-Line selection of data mining functions Chapter III Fuzzy Theory and Application Chapter III Fuzzy Theory and Application Chapter III Fuzzy Theory and Application OUTLINE I. INTRODUCTION and BASICS – Lecture 1 A. Why fuzzy sets 1. Data/complexity reduction 2. Control and fuzzy logic 3. Pattern recognition and cluster analysis 4. Decision making B. Types of uncertainty 1. Deterministic, interval, probability 2. Fuzzy set theory, possibility theory OBJECTIVES 1. To introduce fuzzy sets and how they are used 2. To define some types of uncertainty and study what methods are used to with each of the types. 3. To define fuzzy numbers, fuzzy logic and how they are used 4. To study methods of how fuzzy sets can be constructed 5. To see how fuzzy set theory is used and applied in cluster analysis Chapter III Fuzzy Theory and Application II. FUZZY SETS AND SYSTEMS – Lecture 2 A. Definitions 1. Sets – classical sets, fuzzy sets, rough sets, fuzzy interval sets, type-2 fuzzy sets 2. Fuzzy numbers B. Operations on fuzzy sets 1. Union 2. Intersection 3. Complement C. Operations on fuzzy numbers 1. Arithmetic 2. Relations, equations 3. Fuzzy functions and the extension principle 27 Chapter III Fuzzy Theory and Application III. FUZZY THEORY APPLICATION – Lecture 3 A. Introduction B. Fuzzy propositions C. Fuzzy hedges D. Composition, calculating outputs E. Defuzzification / action IV. FUZZY SET METHODS Cluster analysis – Lecture 4 Lecture 1 INTRODUCTION AND BASICS Fuzzy sets are sets that have gradations of belonging EXAMPLES: Green BIG Near Chapter III Fuzzy Theory and Application Chapter III Fuzzy Theory and Application A. Why fuzzy sets? Classical sets, either an element belongs or it does not EXAMPLES: - Modeling with uncertainty requires more than probability theory - There are problems where boundaries are gradual - Set of integers – a real number is an integer or not - You are either in an airplane or not - Your bank account is x yuan, y jiao and z fen Chapter III Fuzzy Theory and Application EXAMPLES: What is the boundary of the China? Is the boundary a mathematical curve? What is the area of China? Is the area a real number? 1. Data reduction – driving a car, computing with language 2. Control and fuzzy logic a. Appliances, automatic gear shifting in a car b. Subway systems (control outperformed humans in giving smoother rides) 28 Chapter III Fuzzy Theory and Application Example: Temperature control in NASA space shuttles IF x AND y THEN z is A IF x IS Y THEN z is A Chapter III Fuzzy Theory and Application 3. Pattern recognition, cluster analysis A digital TV company that issues IC cards wants to discover whether or not it is lost or being illegally used prior to a customer reporting it missing … etc. If the temperature is hot and increasing very fast then air conditioner fan is set to very fast and air conditioner temperature is coldest. An Internet company wants to know what groups (sex, age, ethnic, profession, income level…) of users are accessing its portal content. There are four types of propositions we will study later. Chapter III Fuzzy Theory and Application 4. Decision making - Locate digital transmitters to optimally cover a given area - Locate service centers to optimally cover digital TV user network. - Position a satellite to cover the most number of satellite TV users - Design a content service in the following way: I want the service to be very popular, temporally optimized, last a rather long time and the cost of service is acceptable to subscribers. Chapter III Fuzzy Theory and Application Chapter III Fuzzy Theory and Application B. Types of Uncertainty 1. Deterministic – the difference between a known real number value and its approximation is a real number (a single number). Here one has error. For example, if we know the answer x must be the square root of 2 and we have an approximation y, then the error is x-y (or if you wish, y-x). Types of sets (figure from Klir&Yuan) 2. Interval – uncertainty is an interval. For example, measuring pi using Archimedes’ approach. 3. Probabilistic – uncertainty is a probability distribution function 4. Fuzzy – uncertainty is a fuzzy membership function 5. Possibilistic - uncertainty is a possibility distribution function, generated by nested sets 29 Chapter III Fuzzy Theory and Application Error, uncertainty - information/data is often imprecise, incoherent, incomplete DEFINITION: The error is the difference between the exact value (a real number) and a value at hand (an approximation). As such, when one talks about error, one presupposes that there exists a “true” (real number) value. The precision is the maximum number of digits that are used to measure an approximation. It is the property of the instrument that is being used to measure or calculate the (exact) value. When a subset is being used to measure/calculate, it corresponds to subset that can no longer be subdivided. It depends on the granularity of the input/output pairs (object/value pairs) or the resolution being used. Chapter III Fuzzy Theory and Application DEFINITION: Accuracy is the number of correct digits in an approximation. For example, a gps reading is (x,y) +/- … DEFINITION: Item of information – is an ‘A-O-V-C’ quadruple (attribute, object, value, confidence) (definition is from Dubois&Prade, Possibility Theory) Attribute: a function that attaches value to the object; for example: area, position, color; it’s the recipe that tells us how to obtain an output (value) from an input (object) Object: the entity (domain or input); for example, Sicily for area or my shirt for color or room 4.2 for temperature. Value: the assignment or output of the attribute; for example 211,417.6 sq. km. for Sicily or green for shirt Confidence: reliability of the information Chapter III Fuzzy Theory and Application VAGUENESS – lack of sharp distinction or boundaries, our ability to discriminate between different states of an event, undecidability (is a glass half full/empty) Chapter III Fuzzy Theory and Application AMBIGUITY: a one to many relationship; for example, she is tall, he is happy. There are a variety of alternatives 1. Non-specificity: Suppose one has a heart blockage and is prescribed a treatment. In this case “treatment” is a non-specificity in that it can be an angioplasty, medication, surgery (to name three alternatives) 2. Dissonance/contradiction: One physician says to operate and another says go to Hainan. Chapter III Fuzzy Theory and Application LECTURE SUMMARY INTRODUCTION and BASICS – Lecture 1 SET THEORY PROBABILITY POSSIBILITY THEORY FUZZY SET THEORY ROUGH SET THEORY A. Why fuzzy sets 1. Data/complexity reduction 2. Control and fuzzy logic 3. Pattern recognition and cluster analysis 4. Decision making B. Types of uncertainty 1. Deterministic, interval, probability 2. Fuzzy set theory, possibility theory 30 Chapter III Fuzzy Theory and Application Chapter III Fuzzy Theory and Application Example – Surface modeling Surface models - The problem: Given a set of reading of the bottom of the ocean whose values are uncertain, generate a surface that explicitly incorporates this uncertainty mathematically and visually - The approach: Consistent fuzzy surfaces ASSIGNED QUESTIONS: 1. Understand and Explain the A-O-V-C Quadruple of Information Item in Chinese. - Here with just introduce the associated ideas 2. By reading over the 1st and 2nd section of Book Chapter II, present the algorithmic manipulations of the fuzzy set z%= f%(x) Chapter III Fuzzy Theory and Application Imprecision in Points: Fuzzy Points (figures from Jorge dos Santos) Chapter III Fuzzy Theory and Application Transformation of real-valued functions to fuzzy functions Instead of a real-valued function z = f (x) or z = f (x, y) let’s now consider a fuzzy function z%= f%(x) or z%= f%(x, y) where every z% number element x or (x,y) is associated with a fuzzy . Statement of the Interpolation Problem Knowing the values {z% } of a fuzzy function over a finite set of i points 2D 3D {xi} or {(xi,yi)}, interpolate over the domain in question to obtain a (nested) set of surfaces that represent the uncertainty in the data. . Chapter III Fuzzy Theory and Application Computing surfaces Given a data set of fuzzy numbers: ~ z =1− d fuzzy triangular= a /b/ c N ~ p(x) = ∑~ zi Li (x) i =1 N [~ p(x)]α = ∑zi (α )Li (x) i =1 Chapter III Fuzzy Theory and Application Computing surfaces – Example ~z = 0 . 5 /1 . 5 / 2 , ~z = 0 . 75 /1 /1 . 5 1 2 L1 ( x ) = x + 2 , L 2 ( x ) = 3 x − 1 x = 1 ⇒ L1 (1) = 1 + 2 = 3, L 2 (1) = 3 *1 − 1 = 2 ⇒ ~ p (1) = 3 ~z1 + 2 ~z 2 = 1 . 5 / 4 . 5 / 6 + 1 . 5 / 2 / 3 = 3/6.5/9 [~ p (1)]α = 0 = [3, 9 ] [~ p (1)] 0 . 5 = [ 4 . 75 , 7 .75 ] [~ p (1)]1 = [6 . 5, 6 .5 ] 31 Fuzzy Interpolating Polynomial dos Santos & Lodwick) Consistent Fuzzy Surfaces (curves) The surfaces (curves) are defined enforcing the following properties: from Jorge p%(x(figure ) Utilizing alpha-levels to obtain fuzzy polynomials, we have: { z∈R : z= p (x), d ∈ [ z ] } ⎡p %(x)⎤ ≡ ⎡⎢ pα−(x), pα+(x)⎤⎥ = ⎦ ⎣ ⎦α ⎣ 1. The surfaces are defined analytically via the fuzzy functions; that is, model directly the uncertainty using fuzzy functions z%= f%(x) or z%= f%(x, y) pα+(x) 2. All fuzzy surfaces maintain the characteristics of the generating method. That is, if splines are being used then all generated fuzzy surfaces have the continuity and smoothness conditions associated with the splines being used. pα−(x) d i α i z2+α z1+α z2−α z1−α x1 x x2 Fuzzy Curves (figures from Jorge dos Santos & Lodwick) 2-D Example (from Jorge dos Santos & Lodwick) P. Lagrange 60 50 z 40 30 20 50 0 20 15 10 -50 5 10 -100 0 0 -10 -50 0 x 50 100 150 0 15 25 50 90 121 143 165 200 zi- 19.5 14.9 5.8 -3.9 39.0 22.3 32.1 29.4 2.5 zi1 20.0 15.0 6.0 -4.0 40.0 23.0 33.0 30.0 3.0 zi+ 20.3 15.6 6.3 -4.2 41.2 23.7 34.0 30.1 3.2 Spline linear xi 20 10 40 20 40 60 15 80 20 100 120 25140 60 80 100 120 140 160 30 180 200 60 200 40 20 0 -20 0 160 180 200 Details of the Consistent Fuzzy Cubic Spline (figures from Jorge dos Santos & Lodwick) Fuzzy Curves (figures from Jorge dos Santos & Lodwick) 50 Cubic Spline 40 z 30 33 20 10 32 0 -10 0 20 40 60 80 100 x 120 140 160 180 200 z 31 30 50 Consistent Cubic Spline 29 40 28 30 z 27 20 10 155 0 -10 0 20 40 60 80 100 x 120 140 160 180 160 x 165 170 200 32 Another Representation/View of the Fuzzy Points (figure from Jorge dos Santos & Lodwick) 3-D Example (from Jorge dos Santos & Lodwick) 200 180 160 35 140 y 30 120 z 25 100 20 15 80 200 10 60 150 5 40 -50 x 0 20 0 -50 50 0 50 100 x 150 200 Fuzzy Surface via Triangulation (figure from Jorge dos Santos & Lodwick) 250 y 100 50 100 150 200 0 Fuzzy Surfaces via Linear Splines (figure from Jorge dos Santos & Lodwick) Fuzzy Surfaces via Cubic Splines (figure from Jorge dos Santos & Lodwick) EXAMPLES Cidalia Fonte will go over in more detail the ideas introduced here at a later time. Example 1. Tejo River - The problem The dimension of water bodies, and consequently their position, is subject to variation over time, especially in regions which are frequently flooded or subject to tidal variations, creating considerable uncertainty in positioning these geographical entities. River Tejo is an example, since frequent floods occur in several places along its bed. The region near the village of Constância, where rivers Tejo and Zezere meet, was the chosen for this example. A fuzzy geographical entity corresponding to rivers Tejo and Zezere is considered a fuzzy set. To generate this fuzzy entity, the membership function has to be constructed. This was done using a Digital Elevation Model of the region, created from the contours of the 1:25 000 map of the Army Geographical Institute of Portugal and information regarding the daily means of the river water level registered in the hydrometric station of Almourol, located in the vicinity, from 1982 to 1990. The variation of the water level during these year are on the next slide: 33 T Example (figure from Cidalia Fonte & Lodwick) 1990 1989 1989 1988 1987 1986 1985 1984 1984 The river limits represented on the map 1983 μ (x, y) = f [z(x, y)] 12 10 8 6 4 2 0 -2 1982 The membership function of points to the fuzzy set is given by: meters above the 20m level Example 1 (figures from Cidalia Fonte & Lodwick) Line corresponding to the maximum water level registered during the considered period 100 f(z) 100% 80% 60% 40% 20% 0% μT( x , y ) y 1 Line corresponding to the region always submerged during the considered period x 20 21 22 23 24 25 26 28 29 30 altitude z Example 2 – Landcover/use (figures from Cidalia Fonte & Lodwick) Example 2 – Landcover/use continued μ Bareland ( x , y ) μ Water regions ( x , y ) Water regions a) b) Vegetation μ ( x, y ) = 1 Bareland μ ( x , y ) = 0.75 μ ( x , y ) = 0.5 μ ( x , y ) = 0.25 μ ( x, y ) = 0 c) μ Vegetation ( x , y ) d) GIS - Display y μ forest ( x, y ) 1 0 x y Chapter IV Knowledge Representation b) μ grass ( x, y ) 1 0 y x μwet regions ( x, y ) 1 c) 0 x a) 34 Chapter IV Knowledge Representation Formal Logic and Intelligent Systems Concepts of Knowledge Representation Rules and Frames Knowledge Representation Examples Key Issue of the Chapter: Concepts of Knowledge Representation Chapter IV Knowledge Representation Chapter IV Knowledge Representation Chapter IV Knowledge Representation Chapter IV Knowledge Representation 35 Chapter IV Knowledge Representation Chapter IV Knowledge Representation Chapter IV Knowledge Representation Chapter IV Knowledge Representation 36 Chapter IV Knowledge Representation Chapter IV Knowledge Representation Chapter IV Knowledge Representation Chapter IV Knowledge Representation Chapter IV Knowledge Representation Chapter IV Knowledge Representation 37 Chapter IV Knowledge Representation Chapter IV Knowledge Representation Chapter IV Knowledge Representation Chapter IV Knowledge Representation Chapter IV Knowledge Representation Chapter IV Knowledge Representation 38 Chapter IV Knowledge Representation Chapter IV Knowledge Representation Chapter IV Knowledge Representation Small 39 Chapter V Wavelet Analysis Content Concepts in Wavelet Analysis Characteristics of Wavelet Transform Computational Features of Wavelet Transform Anti-Wavelet Transform Reconstruction Kernel Categories of Wavelets Emphases Multi-resolution Analysis Theory Orthogonal Wavelet Transform Wavelet Package Analysis §5.1 Concepts in Wavelet Analysis Definition of the Wavelet Transform §5.1 Concepts in Wavelet Analysis and scaling of the basis function ψ (t ) . Given a basis function ψ (t ) ,Suppose: ψ a , b (t ) = t−b 1 ψ( ) a a Obviously, ψ a , b ( t ) are products of the shifting With the continuous changing of a, b , a set of functions, ψ a , b ( t ) , are produced. where a, b are constants, and a > 0 ψ a ,b (t ) = §5.1 Concepts in Wavelet Analysis This Thisindicates indicatesthat that x(t) x(t) isis Square-Integrable Square-Integrable For an x (t ) ∈ L ( R ) , 2 1 t−b ψ( ) a a §5.1 Concepts in Wavelet Analysis Since a, b and t are continuous variables, this definition of Wavelet Transform is known the Wavelet Transform of x(t ) is defined as: WTx ( a, b) = 1 t−b x( t )ψ ∗ ( )dt a a∫ = ∫ x (t )ψ a∗, b (t )dt = 〈 x (t ),ψ a , b (t )〉 as Continuous Wavelet Transform (CWT). CWT is the basis of Wavelet Analysis studies. WTx (a, b) = 1 t −b x(t )ψ ∗ ( )dt ∫ a a = ∫ x (t )ψ a∗, b (t )dt = 〈 x (t ),ψ a , b (t )〉 40 §5.1 Concepts in Wavelet Analysis §5.1 Concepts in Wavelet Analysis Important Concepts in Wavelet Transform Important Concepts in Wavelet Transform b : Time Shift a : Scaling Factor ψ (t ) : The Basic Wavelet or the Mother Wavelet ψ (t ) : The Basic Wavelet ψ a , b ( t ) : The Wavelet Basis the set of functions produced by shifting and scaling of the Mother Wavelet t −b or the Mother Wavelet 1 t −b x(t )ψ ∗ ( )dt a a∫ ∗ = ∫ x( t )ψ a , b ( t )dt = 〈 x( t ),ψ a , b ( t )〉 WTx (a, b) = §5.1 Concepts in Wavelet Analysis 1 x(t )ψ ∗ ( )dt a a∫ ∗ = ∫ x( t )ψ a , b ( t )dt = 〈 x( t ),ψ a , b ( t )〉 WTx (a, b) = §5.1 Concepts in Wavelet Analysis Functionality of Time Shift (Variable b) The Wavelet Transform could be understood ϕ (t ) as the Internal Product of the signal x(t) with a set of Wavelet Basis Functions. a t 4a ϕ (t − b) The Mother Wavelet could be Real or 3a Complex functions. t b a t −b ϕ( ), a = 2 a b §5.1 Concepts in Wavelet Analysis 2a b b t §5.1 Concepts in Wavelet Analysis The Expression of Wavelet Transform in the Frequency Domain ψ a , b (t ) = 1 t−b ψ( ) a a ⇔ Ψa , b (Ω ) = a Ψ ( aΩ )e − jΩb 1 < X (Ω ), Ψa ,b (Ω ) > 2π a +∞ = X ( Ω ) Ψ ∗ ( aΩ ) e j Ω b dΩ 2π ∫− ∞ WTx ( a, b) = 41 §5.2 Characteristics of Wavelet Transform §5.2 Characteristics of Wavelet Transform The Constant-Q Feature Q = Δ Ω / Ω0 Bandwidth versus Central Frequency ΔΩ / a = Δ Ω / Ω0 = Q Ω0 / a §5.2 Characteristics of Wavelet Transform a =1 a=2 a = 1/ 2 §5.2 Characteristics of Wavelet Transform Δt / 2 } } 2Δ Ω ( a = 1 / 2 ) 2Ω 0 ( a = 1) Ω0 0 ( a = 2) Ω 0 / 2 Δ } t } ΔΩ 2Δt } ΔΩ / 2 What is Information Fusion? Chapter VI Data and Information Fusion “Information fusion is an Information Process dealing with the: • [association, correlation, and combination of data and information] information from • [single and multiple sensors or sources] to achieve • [refined estimates of parameters, characteristics, events, and behaviors] behaviors for observed entities in an observed field of view •It is sometimes implemented as a Fully Automatic process or as a HumanHuman-Aiding process for Analysis and/or Decision Support 42 Data and Information Fusion Process & Functional Model Most Simply--Multiple types of data carrying various types of information (redundant and complementary) “Associated” or “Correlated” to the same object or event or behavior Multiple types of data Related to things of interest So that estimation algorithms (mathematical techniques)—or— automated reasoning methods (artificial intelligence techniques) can produce better estimates (than based on any single type of data) To improve estimates about those things These Basic Ideas are Transferable to Many Types of Problems How is Data/Information Fusion Done? Signal Processing L0: Sub-Object Association/ Correlation Application•Combinatoric Optimization Domain •Numerical/Statistical KnowledgeEstimation Techniques • Intel Sources • Air Surveillance • Surface Surveillance • Space Surveillance •OR, Statistical Methods, L1: Combinatoric Association/ Optim. Correlation • Numerical/Statistical Estimation Techniques of •Combinatoric Optimization • Knowledge-based, Symbolic Techniques Level 0 Level 1 Level 2 Level 3 Processing Processing Processing Processing • Intel Sources • Air Surveillance • Surface Surveillance World State of Interest • Space Surveillance •Instrumented, Intelligent Mfg Systems Identity of: Impact Assessment •Parts Human Computer Interaction •Benign •Vehicles •Critical •Organs; Tumors Level 4 Processing Process Refinement Processing Data Base Management System Human Engrg: Support Database Fusion •Decision-aiding Database •Active sensor control •Visualization/display •Patientmonitoring Systems •Fusion process control •Trust in automation systems A Basic Issue: “Association” --What measurement goes with what entity? --Because we then use (fuse) those multiple msmts to get an improved (fused) estimate of something How to formulate an approach to this problem? Measurement/Observable Estimate Propagated to Msmt Time “Closeness” score Human Computer Interaction L3: Impact Assessment • Numerical/Statistical Estimation Techniques • Knowledge-based, Symbolic Techniques Measurement error Instrumentation •Environmental effects Estimation/prediction error L4: Adaptive Control Sensor Formal (mathematical) • Source Management; Info-theoretic Techniques and Intelligent Control Theory • Process Adaptation; Control-theoretic Techniques Leads to the formulation of a classic OR Assignment problem with usual repertoire of solutions Data Fusion “Processing Tree” Data Fusion Tree Node Now exploit the Dat a Fusion Tree Node multiple observational data for a fused estimate DataCorrelation Data Association HW Radar Off-Board FN MN FN Target fensiv Management IRST Prior Data Fusion Nodes & Sources Situation Refinement Status or “Situation”: Adaptive Logic, eg: Broad Range HF/HE Techniques •Numerical/Statistical Estimation Techniques •Pattern Recognition Techniques Sub-object Data Object Association & Movement, Location, Refinement Estimation •Intelligent Transportation Systems Object Refinement Broad Range of Estimation Methods •Sensors and L2: Situation Refinement DATA FUSION DOMAIN Data Preparation (Common Referencing) Hypothesis Generation Hypothesis Evaluation Hypothesis Selection State Estimation & Prediction FN User or Next Fusion Node MN MN FN Core Proces sing MN Miss ile FN FN Things that can cause expected observations Optimally asigning the observations to an How it is that estimation process observations Source/Sensor Status Resource Management Controls which is are related to estimating a the entities or parameter of objects • Estimate/predict • Gating and generation • Detect and resolve aggregate states interest for object& the (A notion of - Kinematics. data conflicts attributes, ID of feasible and confirmed entity/object From each perspective (blue, red) “closeness”—a • Convert data to common association hypothesei •Estimate sensor/source misalignments time and coordinate frame • Scoring of “score”) •Feed forward source/sensor status • Compensate for data associations source misalignments • Select, delete, or feedback data associations Us er I nterf ace MN FN FN MN RWR MN MN FN Def ensiv e Mgmt. Countermeasure MN RFC M FN FN FN MN MN Expendables FN MN MN FN= Data F usi on Node MN= R esource Man agement Node Architecting these systems can be difficult 43 Fusion-Based Automatic Object Recognition Non-Defense Applications (A Precursor to Visualization) Condition-Based Maintenance (Multiply- instrumented, high-value equipment— predict/estimate “health” from sensor data) Multi-spectral mammography (Tumor detection from multiple imagery) Intelligent Transportation systems (Intersection collision avoidance from sidelooking radars and acoustic sensors) FUSED COMPONENT MATCH SCORES SAR ONLY 34o pose Collected BRDM-2 Target Model E-O ONLY Predicted 88o pose Collected Predicted SAR Component Match Scores E-O Component Match Scores • Generate Hypothesis (e.g. BRDM-2, 34o pose, articulation x) Match Scores ← Low • Predict Measurements High → • Evaluate Component-Level Match: Actual vs. Predicted • Select Hypothesis with Best Match Summary Data Fusion is an information process embodied in software, involving estimation algorithms to extract maximum information from multiple observations It is a maturing field of study requiring innovation in application 以下为中文参考内容 It has been successfully employed in a broad range of applications Applicability to anthropometrical needs warrants consideration 查询系统 数据分析 面向数据仓库的OLAP 服务 管理系统 监控系统 元数据存储 事务数据库 数据挖掘 面向数据集市的 OLAP 服务 前端工具层 OLAP服务层 数据集市 数据仓库服务层 数据集市 外部数据源 deduction 演绎 induction 归纳 导出,引出 推断,不明推论 predicate calculus 谓词积分 proposational network 命题网络 derivation abduction 数据集市 数据仓库 inference推论 数据层 44