* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Document
Survey
Document related concepts
Transcript
Lecture 2: Data Exploration Jianfei Chen School of Geographical Sciences GuangZhou University GunagZhou, 510405 China Email: [email protected] Chapter Outline 9.1 Introduction 9.2 Data Exploration 9.2.1 Descriptive Statistics 9.2.2 Graphs 9.2.3 Dynamic Graphs 9.2.4 Data Exploration and GIS 9.3 Vector Data Query 9.3.1 Attribute Data Query Box 9.1 Query Operations in ArcGIS 9.3.1.1 Logical Expressions 9.3.1.2 Type of Operation 9.3.1.3 Examples of Query Operation 9.3.1.4 Relational Database Query 9.3.1.5 Use SQL to Query a Database Box 9.2 More Examples of SQL Statement 9.3.2 Spatial Data Query 9.3.2.1 Feature Selection by Cursor 9.3.2.2 Feature Selection by Graphic 9.3.2.3 Feature Selection by Spatial Relationship Box 9.3 Expressions of Spatial Relationship in ArcView 9.3.2.4 Combination of Attribute and Spatial Data Queries 9.4 Raster Data Query 9.4.1 Query by Cell Value 9.4.2 Query Using Graphic Method 9.5 Charts 9.6 Geographic Visualization 9.6.1 Data Classification 9.6.1.1 Data Classification for Visualization Box 9.4 Data Classification Methods 9.6.1.2 Data Classification for Creating New Data 9.6.2 Data Aggregation 9.6.3 Map Comparison Applications: Data Exploration Task 1: Select Feature by Location Task 2: Select Feature by Graphic Task 3: Query Attribute Data from a Joint Table Task 4: Query Attribute Data from a Relational Database Task 5: Combine Spatial and Attribute Data Query Task 6: Query Raster Data What is Data Exploration? Data exploration is data-centered query and analysis. It allows the user to examine the general trends in the data, to take a close look at data subsets, and to focus on possible relationships between datasets. The purpose of data exploration is to better understand the data and to provide a starting point in formulating research questions and hypotheses. Data Exploration and GIS 1. Data exploration in GIS is functionally similar to exploratory data analysis and dynamic graphics in statistics. 2. Exploratory data analysis advocates the use of a variety of techniques for examining data more effectively as the first step in statistical analysis and as a precursor to more formal and structured data analysis. Dynamic graphics enhances exploratory data analysis by using multiple and dynamically linked windows and by letting the user directly manipulate data points in charts and diagrams. 3. Data exploration in GIS uses interactive and dynamically linked visual tools. Maps (both vector- and raster-based), graphs, and tables are displayed in multiple windows and dynamically linked. Graphics for Statistics Line Graph Bubbleplot Bar Chart Cumulative Frequency Graph Boxplot Scatter Plot Graphs for Spatial Data 3D plot Variogram cloud Dynamic Graphs: Brushing Vector Data Query 1. Attribute data query 2. Spatial data query Attribute data query 1. Logical expressions 2. Type of operation 3. Relational database query 4. SQL Logical Expressions 1. A simple logical expression contains two operands and a logical operator e.g., “class” = 2 2. Boolean connectors of AND, OR, XOR, and NOT connect two or more expressions in a query statement. The shaded portion represents the complement of data subset A (top), the union of data subsets A and B (middle), and the intersection of A and B (bottom). Three types of operation may be performed on the subset of 40 records: add more records to the subset (+2), remove records from the subset (-5), or select a smaller subset (20). Soil Theme Table musym Comp.dbf musym muid muid plantsym Forest.dbf Plantnm.dbf plantsym comname The keys relating three dBASE files in MUIR and the feature attribute table. The field comname in plantnm.dbf contains the common plant names. Relational Database Query PIN Sale date Acres Zone code Zoning PIN Owner P101 1-1098 1.0 1 resident ial P101 Wang P102 10-668 3.0 2 commer cial P101 Chang P103 3-797 2.5 2 commer cial P102 Smith 7-3078 1.0 P102 Jones P103 Costello P104 Smith P104 1 resident ial Relation 1: Parcel The key PIN relates the parcel and owner tables and allows use of SQL with both tables. Relation 2: Owner SQL SQL (Structured Query Language) is a standard query language designed for relational databases. The basic syntax of SQL, with the keywords in bold type, is select <attribute list> from <table> where <condition> The select keyword selects field(s) from a database, the from keyword selects table(s) from a database, and the where keyword specifies the condition or criteria for data query. Simple SQL select Sale_date from Parcel where PIN = ‘P101’ More SQL select Parcel.Sale_date from Parcel, Owner where Parcel.PIN = Owner.PIN AND Owner_name = ‘Costello’ where Parcel.PIN = Owner.PIN AND Owner_name like ‘C%’ where Parcel.PIN = Owner.PIN AND Owner_name in (‘Wang’, ‘Smith’, ’Jones’) Spatial Data Query 1. Feature selection by graphics 2. Feature selection by spatial relationship 3. Combination of attribute and spatial data queries A circle with a specified radius is drawn around Sun Valley. The circle is then used as a graphic object to select point features within the circular area. Feature Selection by Spatial Relationship 1. Containment—selects features that fall completely within features used for selection. Examples include finding schools within a selected county, and finding state parks within a selected state. 2. Intersect—selects features that intersect features used for selection. Examples include selecting land parcels that intersect a proposed road, and finding settlements that intersect an active fault line. 3. Proximity/Adjacency—selects features that are within a specified distance/no distance of features used for selection. Examples of spatial adjacency include selecting land parcels that are adjacent to a flood zone, and finding vacant lots that are adjacent to a new theme park. Combination of Attribute and Spatial Data Queries Find gas stations that are within one mile of a freeway exit in southern California and have annual revenue of $2 million: 1. Locate all freeway exits in the study area, and draw a circle around each exit with a 1-mile radius. Select gas stations within the circles through spatial data query. Then use attribute data query to find gas stations that have annual revenues exceeding $2 million. 2. Locate all gas stations in the study area, and select those stations with annual revenues exceeding $2 million through attribute data query. Next, use spatial data query to narrow the selection of gas stations to those within 1 mile of a freeway exit. Spatial data query Attribute data query Geographic Visualization Geographic visualization, sometimes called cartographic visualization, refers to the use of maps for setting up a context for processing visual information and for formulating research questions or hypotheses. Geographic visualization therefore has the same objective and the same types of interactivity as exploratory data analysis. Methods for Geographic Visualization 1. Data classification 2. Spatial aggregation 3. Map comparison The top map shows rate of unemployment in 1997 as either above or below the national average of 4.9%. The bottom map uses the mean and standard deviation (SD) for data classification. The top map shows percent population change by state, 1990–2000. The darker the symbol, the higher the percent increase. The bottom map shows percent population change by region. An example of using multiple maps in data exploration. In this view of deer relocations in SE Alaska, the focus is on the distribution of deer relocations along the clearcut/old forest edge. A bivariate map showing the combinations of (1) rate of unemployment in 1997, either > or <= the national average, and (2) rate of income change 1996–98, either > or <= the national average. Thank You!