Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Short Course on Risk Management in Water Resources Held at DCE, BUET on 13-14 March, 2010 Risk based Information and dissemination system Dr. A.K.M. Saiful Islam Institute of Water and Flood Management (IWFM) Bangladesh University of Engineering and Technology (BUET) Outline What is Information System and Benefits of using Information System in Risk Management Database Management System Geographic Information System Data Mining and Knowledge discovery Web based Information System What is an Information System? An information system can be defined technically as a set of interrelated components that collect (or retrieve), process, store, and distribute information to support decision making and control in an organization. Information technology is a contemporary term that describes the combination of computer technology (hardware and software) with telecommunications technology (data, image, and voice networks). Information System Diagram Environment Storage Input Processing Feedback Output Importance of information system in Risk management Information Systems are used in almost every aspect of Business today. To understand Business we must understand its Information Systems. They are rapidly becoming part of our everyday life. Legacy of Information Systems used for Decision Making Transaction Processing Systems (TPS), Management Information Systems (MIS) Decision Support Systems (DSS) Expert Systems (ES) Executive Information Systems (EIS) Geographic Information System (GIS) Geographic Information System (GIS) An Information System that is used to input, store , retrieve, manipulate, analyze and output geographically referenced data or geospatial data, in order to support decision making for planning and management of land use, natural resources, environment, transportation, urban facilities, and other administrative records GIS overlay of Spatial data Two different object layers can be overlaid which can result another layers An Application of GIS and Remote Sensing for Estimation of Potato Yield using remote sensing data Dr. Sujit Kumar Bala Dr. Saiful Islam Study Area and location of Field Control points Vegetation indices Normalized Difference Vegetation Index (NDVI): Healthy plants have a high NDVI values because of their high reflectance of infrared light, and relatively low reflectance of red light. NIR Re d NDVI NIR Re d MODIS Satellite Image processing MODIS 10x 10 deg grid all over the world Bangladesh grid is at h=26, v=06 Raw Image of LAI as HDF-5 file MODIS Image of LAI LAI over Bangladesh LAI over Munshigonj district LAI Plot using 50 field data points Criteria of classification Agricultural Area: Δ LAI > 0.5 Non-agricultural Area: Δ LAI < 0.5 Potato 0.5 > NDVI > 0.9 0.5 > LAI > 2.5 Phenological Curve based on NDVI Life Matrix of Potato Index NDVI LAI Definition Metric OnT 16 days 17 days Intersection of forward lag and smooth curve Starting date of VI high period OnV 0.45 0.91 Value of VI at forwards intersection VI at start of high period EndT 88 days 92 days Intersection of backwards lag and smooth curve End date of VI high period EndV 0.52 1.02 Value of VI at backwards intersection VI at end of high period MaxT 63 days 66 days Time of maximum raw corrected VI Date of maximum VI MaxV 0.85 2.15 Maximum value of corrected raw VI Maximum VI DurT 72 days 75 days Time from forwards to backwards intersections Length of VI high period RanV 0.40 1.24 Difference between minimum and maximum value of smooth curve Amplitude of season RIN 0.01 0.025 Slope of line from forwards intersection to raw maximum Rate of VI increase RDN 0.02 0.048 Slope of line from raw maximum value to backwards intersection Rate of VI decrease TINDVI 20 days 63 days Integrated area under smooth VI curve ‘Magnitude’’ of season DurNT 25 days 26 days Time from backwards to forwards intersection Length of VI low period RRINDN 0.50 0.52 Rate of increase/rate of decrease ‘‘Quality’’ of season Life Matrix (Contd..) Index NDVI LAI Definition Metric RRINDN 0.50 0.52 Rate of increase/rate of decrease ‘‘Quality’’ of season HRanTO 30 days 29 days Time of half range value at onset—equals OnV+(RanV/2) when rising Start of active growing season HRanVO 0.65 1.52 Half range value at onset—OnV+(RanV/2) VI at start of active growing season HRanTE 80 days 86 days Time of half range value at end—equals EndV+(RanV/2) when falling End of active growing season HRanVE 0.65 1.33 Half range value at end—EndV+(RanV/2) VI at end of active growing season HDurT 44 days 56 days Duration of period from HRanTO to HRanTE Duration of active growing season SMMaxT 56 days 56 days Time of maximum smooth VI curve Date of peak of season SMMaxV 0.79 1.93 Maximum value of smooth VI curve Value at peak of season SMMinT 96 days 94 days Time of minimum smooth VI curve Date of season minimum SMMinV 0.33 0.74 Minimum value of smooth VI curve Value of season minimum Changes of mean values of indices in the study area of Munshigonj District with days after plantation of potato 3 11 19 27 32 40 48 56 64 72 80 88 96 NDVI 0.50 0.50 0.53 0.58 0.61 0.70 0.70 - 0.73 0.72 0.69 0.63 0.53 FPAR 0.51 0.49 0.54 0.57 0.60 0.68 0.67 0.69 0.70 0.69 0.69 0.66 0.57 LAI 0.88 0.79 0.98 1.09 1.17 1.48 1.42 1.53 1.58 1.56 1.50 1.41 1.08 LAI 1.6 NDVI fPAR 1.4 vegetation index 1.2 1.0 0.8 0.6 days after plantation Chronological plot of Vegetation Indices 104 96 88 80 72 64 56 48 40 32 24 16 8 0.4 0 Day after plantati on Spatial Distribution of NDVI Yield and NDVImax Correlation y = 38.373x + 4.2526 R2 = 0.793 35 Based on Upazilla data Yield (Ton/ha) 30 25 20 15 10 5 0 0.00 0.20 0.40 0.60 0.80 NDVI 4.5 y = 8.7507x - 4.041 R2 = 0.659 Based on Field data Yield (t/ha)*10 4.0 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0 0.6 0.7 0.8 NDVI 0.9 Effect of Temperature on NDVI Maximum rate of growth occurred at lowest temperature Temperature distribution map using MODIS Thermal data Using Satellite Data and GIS to Investigate Drought Mr. Hasan Murad Dr. Saiful Islam Drought According to Mc Mohan and Diaz Arena (1982), “Drought is a period of abnormally dry weather sufficiently for the lack of precipitation to cause a serious hydrological imbalance and carries connotations of a moisture deficiency with respect to man’s usage of water. Study Area Map of the North-West region Study Area SPI Calculation Mathematically, SPI is calculated based on equation: Where, Xi is monthly rainfall record of the station; Xm is rainfall mean; and σ is the standard deviation. Monthly rainfall data from 2000 to 2008 in 6 rainfall stations are used as an input to SPI program. http://drought.unl.edu/monitor/spi/program/spi_progra m.htm#program Meteorological Drought: Dry Year 3 month interpolated SPI for 2006 (dry year)- August, September, October Drought Using NDVI Anomaly: Rainfall and NDVI for the year 2006 Final Task needed: Classification of Drought Risks Database Management System WFM 6103: hydrologic Information System © Dr. Akm Saiful Islam Benefits of Using A Database Data Integrity Consistent entries Data validation rules Ease of data entry with forms Minimize duplicate data entry Easy Reporting Database Management System • Information Systems process and manage data. Data Management involves “Capturing”, “Retrieval,” and “Storage” of data. Today’s DBMSs are based on sophisticated software and powerful computer hardware. Well known DBMS software includes ORACLE, Microsoft SQL Server, Sybase and MySQL(free download) among others. Data Models 1. 2. Hierarchical Network 3. Relational 4. Object oriented Relational Model Student Table Based on two important concepts: Key of relation - one to one, one to many, many to many Primary attribute – which can’t be duplicate Student Table * * Course Table Many to Many relationship Student ID Name CourseID 1 Mr. X 001 2 Mr. X 002 3 Mr. Y 003 Course table Cour seID Title Cre dit 001 RS & GIS in WM 3 002 Watershed Hydrology 3 003 Risk Management 3 Relational DM: Terminology A collection of data entities is typically known as a file. An individual data entity is typically known as a record. Different attributes of a record are typically known as fields. A key is a field or a set of fields that uniquely identifies a record. File (table) Fields Records Key Product Category + Product Type + Year A Word on Keys A Key maybe: A field or set of fields that are used to identify the record. A Primary key is a minimal set of fields that uniquely identifies the record. A Foreign key is a field that is a primary key in another relation Primary Key Foreign Keys Multidimensional Database In the multidimensional view, two dimensions are viewed at one time and the others are available to page through. Finding and analysing large numbers of records is possible. Car colour can easily be placed, as a page, along with years. Then page, row and column dimensions can be exchanged. Types of Databases Operational Databases Because of their efficient storage and speed of small additions or updates, E-R / Relational DBs are used (mostly) for transaction processing and we refer to these systems as Operational Systems, or Operational Databases. Sometimes the terms “Transaction” and “Production” are used as well. Analytical Databases Data that is used for decision making purposes, is typically stored in a different form than operational data. Analytical Databases, or Analytical Systems store data in a way that allows for long and complicated interactions with relatively few users. What is a Data Warehouse? A Data Warehouse (DW) is an IS designed to support analytical tasks. Integrates information from a variety of sources, and/or applications. Supports (relatively) few users with long interactions Data in a data warehouse cannot be changed! A Data Warehouse consists of: A Large Physical Database: This is the actual “warehouse.” It includes the data, as well as metadata (data about the organisation of the data in the Data Warehouse), and the processing logic used to process the data. The Logical Data Warehouse: This contains all the metadata, business rules, and processing logic, as well as the information required to access the actual data. (same as the Data Warehouse model) Data Marts: These are subsets of the data warehouse, used for functional, departmental, or regional purposes. Data Marts are build gradually, and are connected via the logical data warehouse. DSS and EIS: These are NOT part of the data warehouse, but they are applications that use the data warehouse. A Company Systems Transactions Distribution Data Mart Human Resources Data Mart Operational Analytical Decisions Sales Data Mart Example Corporate Data Model Data Warehouse Data Model CUSTOMER_INVOICE Invoice_ID Invoice_Date Customer_ID Customer_Address Description Message Status CUSTOMER_INVOICE Invoice_ID Invoice_Date Customer_ID Customer_Address Data unlikely to be used for DSS Adding an Element of Time • Data may change over time, but Operational Systems do not always “record” such change. • The time element is added only if it does not exist. Corporate Data Model CUSTOMER Customer_ID Name Birth_Date Marital_Status Credit_Rating Data Warehouse Data Model CUSTOMER Customer_ID Snapshot_Date Name Birth_Date Marital_Status Credit_Rating Example of Derived Data Corporate Data Model INVOICE Invoice_ID Product_ID Product_Code Quantity Unit Price Data Warehouse Data Model Derived Data INVOICE Invoice_ID Product_ID Product_Code Quantity Unit_Price Total_Amount Product_Cost Data Marts A Data Mart is also a “Data Warehouse,” but usually for a single “subject” area. It is common to model this single subject area using a starschema design. (A data mart may have more than one star schema) A star schema usually consists of a “central” table, called the fact table, and a set of satellite tables, known as dimensions, or dimension tables. The fact table has multiple joins which connect it to the dimensions. Dimension tables have a single join which connects them to the fact table. On-Line Analytical Processing (OLAP) Codd’s OLAP was an ambitious idea; The OLAP concept tried to introduce a new range of analytical systems that embrace the enterprise. Nowadays OLAP is used mainly by end-users, and usually sits on top of a Data Mart or a Data Warehouse. OLAP technology allows end-users to interact and perform (at least) basic analysis of the data. OLAP is most popular as a (data) visualisation tool. Multidimensional Tables Multidimensional Tables are also known as hypercubes, or datacubes. The datacube to the right was generated by TM/1 Perspectives, in Excel. The dialog box shows the dimensions that define this table. Drill-Down Quarter 1 has been drilled down to the individual months . Roll-up ROLL-UP: The Quarters have been rolled-up from the individual Quarters to the year. Example of Download Historical Climate Data: Canadian Climate Data www.cccma.bc.ec.gc.ca/hccd/index.shtml • Limited Number of Stations • Requires Registration 3. Selecting a Site By province By Province or Station Name Or by Proximity By City (limited) Location and Elevation Similar Data Mining or KDD Data Mining To fully complete the picture, at least in terms of methods and techniques used for extracting knowledge from data, there should be some discussion of Data Mining, or Knowledge Discovery in Databases (KDD). “Knowledge Discovery in Databases is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data.” (Fayyad et al. 1996). Mining and KDD Data Mining and Knowledge Discovery in Databases are often used synonymously. With respect to the KDD process: Data Analysts, Statisticians, MIS people tend to use the term Data Mining. Artificial Intelligence and Machine Learning researchers, tend to use the terms Knowledge Discovery in Databases. In the Artificial Intelligence and Machine Learning fields, the term data mining refers to the step(s) of the KDD process, when a particular algorithm or method is applied in order to discover knowledge. Disciplines behind knowledge discovery Machine Learning and Artificial Intelligence, with methods such as Rule Induction, Case-based Reasoning, Neural Networks, and Genetic Algorithms. Uncertainty Methods, and in particular methods originating from Statistical Science. These include Decision Trees, Bayesian Methods, Fuzzy Logic, Clustering, as well as Classical Statistics and Probability Theory. Database Techniques: Namely Association Rules. Technologies such as Data Warehousing, Data Marts, and OLAP are, in general referred to as “enabling technologies” for data mining. That is, their use is not of a primary role (KDD can be applied without these technologies), but is of primary significance, since the use of such technologies is beneficial for the application of KDD. Neural Nets Artificial Neural Networks, or simply Neural Networks are used for classification and prediction. A mathematical network is modelled with inputs and outputs to theoretical neurons. This structure resembles the human brain network of neurons, and is used “to create a system that could solve difficult problems and display behaviour that was much more complex than the simple pieces that made it up” (Berson & Smith 1997) ANN Layers Example of using ANN for Flood Forecasting of Bangladesh Data and Methodology Water Level Data BWDB daily water level data of 10 major stations on the three major rivers: Brahmaputra, Ganges and Meghna Remote sensing data TRMM 3B42 Data with spatial resolution is 0.25 degree (500S to 500N) and temporal resolution is 3 hours Figure-2: GBM Basin Figure-3: GBM Basins grid off 0.5 deg resolution. Grid is consists of 1056 cells (22 rows 48 columns) Figure-4: Flow network of GBM Basin Figure-5: Flow network and GBM Basin grid Figure-6: Neural network Figure-7: Neural network and GBM basin grid Figure-8: Neural network and Sub-basins Figure-9: Three hourly TRMM 3B42 Rainfall data with 0.25 degree spatial resolution Measured Vs Predicted WL Predicted Vs Observed Web based Information System XML An Extensible Markup Language (XML) document describes the structure of data. XML and HyperText Markup Language (HTML) have a similar syntax … both derived from Standard Generalized Markup Language (SGML). HTML is a small SGML application used on web (a DTD and a set of processing conventions). XML has no mechanism to specify the format for presenting data to the user. An XML document resides in its own file with an ‘.xml’ extension. Converting Relational Database to XML Example: Export the following data into XML and group books by store Relational Database: Store (sid, name, phone) Book (bid, title, authors) StoreBook (sid , bid, price, stock) price name phone Store sid stock StoreBook Book title authors bid Converting Relational Database to XML (Cont’d) XML: <store> <name> … </name> <phone> … </phone> <book> <title>… </title> <authors> … </authors> <price> … </price> </book> <book>…</book> … </store> XML Example <BOOKS> <book id=“123” loc=“library”> <author>Hull</author> <title>California</title> <year> 1995 </year> </book> <article id=“555” ref=“123”> <author>Su</author> <title> Purdue</title> </article> </BOOKS> BOOKS book article loc=“library” ref 123 author year 555 author title title Hull 1995 California Su Purdue Some companion W3C recommendations XML Schema- an XML based alternative to DTD, more powerful, Support namespace and data types XPATH: language for addressing parts of an XML document - used by XSLT Extensible Stylesheet Language (XSL): an XML vocabulary for specifying formatting semantics XSLT: language for transforming XML documents into other XML documents Thank You