Download select

Document related concepts

Database wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Big data wikipedia , lookup

SQL wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Clusterpoint wikipedia , lookup

Functional Database Model wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
城市空间信息技术
第十章 数据探查
胡嘉骢
不动产学院
博士
副教授
城市规划系主任
E-mail: [email protected]
手机: 13411361496(611496)
QQ: 4519210
CHAPTER 11
DATA EXPLORATION
•
•
•
•
•
11.1 Data Exploration 数据探查
11.2 Attribute Data Query 属性数据查询
11.3 Spatial Data Query 空间数据查询
11.4 Raster Data Query 栅格数据查询
11.5 Geographic Visualization 地理可视化
2
CHAPTER 11
DATA EXPLORATION
• Beginning of GIS analysis
• What do you do with a database of dozens of
layers and hundreds of attributes?
• Data exploration allows you to examine trends,
focus on relationships
• Better understand data
• Link maps, graphs, and tables
3
11.1 Data Exploration
• Exploratory data analysis
– Statistical analysis
• Dynamic graphics
• Data visualization
– Finding Gestalt 完形 (finding patterns and properties
in a data set)
– Posing(形成) queries
– Making comparisons
4
11.1.1 Descriptive Statistics
• Summarize values of a data set
– Range
– Median
– Mean
– Mode
– Quantile analysis
– Variance
– Standard deviation
– Z score
• GIS packages offer descriptive statistics
5
11.1.2 Graphs
• Visual display of data
• Numerous possibilities
6
Figure 11.1
A line graph. 折线
图
7
Figure 11.2
A histogram.
柱状图
8
Figure 11.3
A cumulative
distribution graph.累积
分布状况图
9
Figure 11.4
A scatterplot(散点
图)plotting % persons 18
years old in 2000 against
% population change,
1990–2000. A weak
positive relationship, with a
correlation coefficient of
0.376, is present between
the two variables.
10
11.1.2 Graphs
Figure 11.5
A bubbleplot showing %
population change,
1990–2000, along the xaxis; % persons under
18 years old in 2000
along the y-axis; and
state population in 2000
by the bubble size.
11
Figure 11.6
A boxplot based on the %
population change, 1990–
2000, data set.
12
Figure 11.7
Boxplot (a) suggests that the
data values follow a normal
distribution. Boxplot (b) shows
a positively skewed distribution
with a higher concentration of
data values near the high end.
The x’s in (b) may represent
outliers, which are more than
1.5 box lengths from the end of
the box. Boxplot (c) shows a
negatively skewed distribution
with a higher concentration of
data values near the low end.
13
11.1.2 Graphs
Figure 11.8
A QQ plot plotting % population change, 1990-2000 data value against
the standardized value from a normal distribution.
14
Figure 11.9
A 3-D plot showing annual precipitation at 105 weather stations in
Idaho. A north to south decreasing trend is apparent in the plot.
15
11.1.3 Dynamic Graphics
• Graphs displayed in multiple and dynamically
linked windows
• Directly manipulate data points
– Pose query in one window and get response in
another window
• Multiple linked windows optimal framework for
posing queries
16
Brushing
Figure 11.10
The scatterplot on the left is dynamically linked to the map on the right. The
“brushing” of two data points in the scatterplot highlights the corresponding
states (Washington and New Mexico) on the map.
17
Other Dynamic Graphic Manipulation
Methods
• Rotation
• Deletion
• Transformation
18
11.1.4 Data Exploration and GIS
• Similar to exploratory data analysis in statistics,
with tow differences
– In GIS it involves both spatial and attribute data
– Media for data exploration in GIS involves maps and
map features
19
11.2 Attribute Data Query
• Search attribute data in order to retrieve a data
subset
• Selected subset can be examined in a table,
displayed in charts, or linked to map features
• Expressions which must be interpretable by the
GIS
20
11.2.1 SQL (Structured Query Language)
• Data query language designed for relational
databases
• Designed by IBM in the 1970s and used by many
commercial database management systems
21
SQL Structure (Syntax)
•
•
•
•
•
•
select <attribute list>
from <relation>
where <condition>
select keyword selects fields
from selects tables
where specifies the condition or criterion for
data query
22
11.2.1 SQL
Figure 11.11
PIN relates the owner and parcel tables and allows use of SQL with both tables.
23
SQL Examples
•
•
•
•
Queries sale date of parcel coded P101
select Parcel.Sale_date
from Parcel
where Parcel.PIN = ‘P101’
24
SQL Examples
• Queries parcels larger than 2 acres that are
zoned commercial
• select Parcel.PIN
• from Parcel
• where Parcel.Acres > 2 AND Parcel.Zone_code
=2
25
SQL Examples
•
•
•
•
Queries sale date of parcel owned by Costello
select Parcel.Sale_date
from Parcel, Owner
where Parcel.PIN = Owner.PIN AND
Owner_name = ‘Costello’
• This query involves two tables which must be
joined first
26
11.2.2 Query Expressions
• where expression contains Boolean expressions
and Boolean connectors
27
Boolean Expressions
• Contains two operands and a logical operator
• Parcel.PIN = ‘P101’
• Operators include =, <, >, >=, <=, <>
28
Boolean Connectors
• AND, OR, XOR, NOT
• Used to connect two or more expressions
29
Figure 11.12
The shaded portion represents the
complement of data subset A (top), the
union of data subsets A and B (middle),
and the intersection of A and B
(bottom).
30
11.2.3 Type of Operation
• Select a subset and divide the data into two
groups
– Those containing the selected records
– Those containing the unselected records
• Three types of operations
– Add more records
– Subtract records
– Select smaller subset
31
Figure 11.13
Three types of operation may be performed on the subset of 40 records:
add more records to the subset (+2), remove records from the subset (-5),
or select a smaller subset (20).
32
11.2.4 Examples of Query
Operations
• Select a data subset and add more records to it
• Select a data subset and switch selection
• Select a data subset and select a smaller subset
from it
33
11.2.5 Relational Database Query
• Relational database often consists of many tables.
• A relational database query selects overlapping
records from all tables
• Must understand the structure of the database
• Can either join or relate the tables
34
Figure 11.14
The keys relating three dBASE files in the MUIR database and
the soil attribute table.
35
11.3 Spatial Data Query
• Retrieving data subset from a layer by working
directly with features
• Select features using cursor, graphic, or spatial
relationship between features.
• Results can be displayed on a map, linked to
records in a table, displayed in charts, or saved
as a new data set for further processing
36
11.3.1 Feature Selection by Cursor
• Pointing and selecting or by dragging a box
around the map features
37
11.3.2 Feature Selection by Graphic
• Uses a graphic, such as a circle, box, line or
polygon to select features that fall inside or are
intersected by the graphic
• Examples: selecting restaurants within a onemile radius of a hotel, selecting land parcels that
intersect a proposed highway, or finding owners
of land parcels within a proposed nature reserve
38
Figure 11.15
Select features by a
circle centered at Sun
Valley.
39
11.3.3 Feature Selection by Spatial
Relationship
• Select features based on their spatial
relationship to other features
• In same layer or in different layers
• Containment, intersect, proximity
40
Containment
• Select features that fall completely within
features for selection
• Schools within a particular county, state parks
within a particular state
41
Intersect
• Select features that intersect other features
• Selecting land parcels that intersect a proposed
road, urban areas that intersect a fault line
42
Proximity
• Select features within a specified distance of
other features
• State parks within ten miles of an interstate
highway
• Adjacency - when features to be selected and
selection features share common boundary
43
11.3.4 Combining Attributes and Spatial
Data Queries
• When data exploration requires both attribute
and spatial query
• Gas stations within one mile of freeway exits and
have an annual revenue exceeding $2 million
44
11.4 Raster Data Query
• Concept and some methods same as for vector
data query
• Practical differences
45
11.4.1 Query by Cell Value
• Operand(运算对象) is raster itself rather than a
field, as in vector query
• Boolean statement to separate cells that satisfy
the query statement from those that do not
46
Figure 11.16
Raster data query: slope = 2
and aspect = 1. Selected
cells are coded 1 and others
0 in the output raster.
47
11.4.2 Query by Select Features
• Query by using feature such as points, circles,
boxes, or polygons
48
11.5 Geographic Visualization
• Cartographic visualization
• Using maps to process visual information
• Data classification, spatial aggregation, map
comparison
49
11.5.1 Data Classification
• Groups based on statistics
50
Figure 11.17
Two classification schemes: above or
below the national average (a), and
mean and standard deviation (SD) (b).
51
11.5.2 Spatial Aggregation
• Groups data spatially
52
Figure 11.18
Two levels of spatial aggregation: by
state (a), and by region (b).
53
11.5.3 Map Comparison
• Compare data from different layers to examine
relationships
54
Figure 11.19
An example of map
comparison. Deer relocations
tend to be concentrated along
the clear-cut/old forest edge.
55
Other Options
• Place all layers on a screen and view them one at
at time
• Use set of adjacent views
• Use map symbols to show multiple data sets
56
Figure 11.20
A bivariate map: (1) rate of
unemployment in 1997,
either above or below the
national average, and (2)
rate of income change,
1996–1998, either above
or below the national
average.
57
谢 谢!