Download THE DESIGN AND IMPLEMENTATION OF GRASP:

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Big data wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Functional Database Model wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Database model wikipedia , lookup

Transcript
2001 Systems Engineering Capstone Conference • University of Virginia
G.R.A.S.P. (GEOSPATIAL REPOSITORY FOR ANALYSIS AND SAFETY PLANNING):
DESIGN AND IMPLEMENTATION OF A PROTOTYPICAL SPATIAL DATA REPOSITORY
FOR A MAJOR METROPOLITAN AREA
Student Team: Kristin Beuerle, Amy Garten, Wes McCoubrie, Ryan Smith
Faculty Advisor: Dr. Donald Brown, Department of Systems Engineering
Staff Advisor: Jason Dalton
Client Advisors: Elizabeth Groff, Eric Jefferis, and Debra Stoe, National Institute of Justice,
810 7th Street, Washington, D.C., [email protected]
KEYWORDS: Geographical Information System
(GIS), Data Warehouse, Geospatial Data, Safety
Planning, Data Analysis
ABSTRACT
Public safety in the United States, particularly in
urban centers throughout the country, has always been
of concern to public officials, law enforcement, and
citizens. To aid in formulating objective data-supported
public safety strategies, city decision makers often
employee researchers to conduct analyses. Researchers
conducting these analyses frequently seek out
information from multiple city data sources to ensure
the highest possible level of accuracy. Unfortunately,
under the previous model, researchers established
communication with and gained access permissions
from each separate data source. Furthermore, each data
source potentially could have supplied data in a
different format. This procedure, both time consuming
and inefficient, substantially hindered the speed at
which information was delivered to appropriate
decision makers.
The major research component of the United States
Department of Justice, the National Institute of Justice
enlisted the NIJ Capstone team to aid in streamlining
this safety research process. Specifically, the Capstone
team analyzed the feasibility of implementing a data
repository that integrates data from dispersed sources.
After establishing this feasibility, the Capstone team
then implemented GRASP, a Geospatial Repository for
Analysis and Safety Planning, in Charlotte, North
Carolina as a proof of concept (Beuerle, 2001).
INTRODUCTION
Before implementing policies and programs
regarding public safety within United States urban
areas, city decision makers must be confident that they
are combating the correct problem in the most effective
manner. Thus, to gain a clearer understanding of the
situation, decision makers often hire researchers to
conduct thorough analyses on current safety aspects of
the city. Since correlations among different safety
elements frequently reveal previously undetected
information, researchers more often than not include
information from a variety of sources in their analyses.
Therefore, the decision makers base their policies on
information founded in objective data analyses rather
than general observations or impressions.
Unfortunately, prior to GRASP, Charlotte’s data
were dispersed among many separate departments,
making it time consuming for researchers to collect
necessary data (Personal communication, 2000). While
the Police and Planning Commission Departments did
have access to a centralized data source, data were
shared only between them and were not easily
accessible to researchers. In addition, these
departments did not directly have access to or share
data with the Fire, Tax, Business Records, or
Engineering and Building Standards Departments. To
gather all the data needed, researchers had to physically
contact and meet with each department. In developing
GRASP, the Capstone team did this preliminary data
collection for the researchers and made the data
available via an Internet interface.
Fully exploring the interplay of various factors when
deciding the most appropriate safety measures also
links to many larger issues. For instance, as illustrated
by the Internet, information sharing not only broadens
55
G.R.A.S.P.
learning opportunities but also sparks curiosity since
information is so readily available. Similarly, the
Capstone team anticipates that GRASP will promote
and help to establish the importance of data sharing.
With data collaboration as a guiding principle,
researchers will hopefully find it easier and more
enticing to conduct multi-source analyses.
PROBLEM ANALYSIS
The main goal of this project was to create and
implement an integrated data access model for spatially
referenced safety data to be used for research and
policy planning in an urban jurisdiction. In working
toward this goal, the team researched relevant material,
thoroughly defined the project, developed the data
model and feasibility analysis, and began system
implementation (USDOJ, 2000).
Project Definition
To formally capture this information, the team
developed a Vision Statement, Descriptive and
Normative Scenarios, Conceptual Requirements, and a
Goal Tree. First, the Vision Statement defines the
project’s focus and scope. Once the National Institute
of Justice approved the Vision Statement, the team then
developed the Descriptive and Normative Scenarios
which detail the current and ideal data acquisition
processes, respectively, and present an example for
further clarification in which a researcher attempts to
analyze Juvenile crime data in relation to curfew
policies. Next, the team translated the client’s wishes
into structured attributes of the ideal system, also
known as Conceptual Requirements, on which to base
the data model design. Finally, the Goal Tree extends
the Conceptual Requirements into itemized goals,
objectives, and indices of performance to measure
success. The team developed a schedule of tasks that
the team had to meet to fulfill the project’s vision in the
form of a Gantt chart.
Pilot City Selection
The selection of Charlotte, North Carolina as the
pilot GRASP city involved a structured decision
process. Initially, the Capstone team formulated a list
of potential sites – Charlotte, North Carolina and
Richmond, Fairfax, and Newport News, Virginia –
based on recommendations from the National Institute
of Justice. Officials from the Police, Planning
Commission, Tax, and Transportation Departments
were contacted via the phone and emailed background
56
project information as well as a request for information
concerning the current format and storage method of
data, the amount and type of available data, and the
city’s willingness to participate in this study. Once this
information was collected, the Capstone team evaluated
each city on the following criteria: distance from
Charlottesville, availability of data, format of data, data
release method, data quality, time spanned, amount of
data, willingness of the area to cooperate, and interest
level of data. Charlotte received the highest score, and
after final approval from the National Institute of
Justice, it officially became the pilot sight for GRASP.
DATA COLLECTION
Although initial contacts showed the Charlotte
administrators to be extremely excited about
participating in the GRASP project, it was still
necessary to visit each department and present the
project to them in person. This allowed the Capstone
team to 1) explain the concept of the system in more
detail and 2) better emphasize the importance of their
data to our success. Two separate trips were made to
the city, November 22nd and February 21st-23rd, each
packed with informational meetings with the different
department heads and other key employees.
As the team had discovered during phone
conversations, the entire city was extremely helpful and
technology oriented. We were originally anticipating
having to dig through file cabinets for the data, but
were pleasantly surprised by their current database
structure. Charlotte is indeed a forward-thinking city
and we anticipate that GRASP will meet more
technology-oriented obstacles as it is expanded to other
cities.
There were, however, a number of reoccurring
concerns that arose as we presented the concept. The
Police Department, in particular, was concerned with
the implications of security breaches and was not
willing to release any kind of sensitive data. A number
of other departments noted that some form of feedback
loop was necessary to provide them with a preliminary
notice as to what kinds of studies are going to be
conducted. This would give the Fire Department, for
example, a chance to prepare an official statement in
response to a new report detailing a rise in fire
fatalities.
With an assurance that these concerns and
suggestions would make their way into the final system,
each department agreed to contribute data. CDs were
2001 Systems Engineering Capstone Conference • University of Virginia
burnt by the City Planning Department, Tax, and
Engineering, while the Police Department, Fire, and
Business Services would send the data via an FTP site
that the Capstone team created.
it using both SQL Server 6.5 and SQL Server 7.0. The
tax department also uses a SQL Server/ESRI
combination to hold their combination of spatial and
tabular data.
Upon receipt of the data, it was immediately put
through a data cleansing process that separated relevant
fields from those that would be of no interest to safety
researchers. The following data organization hierarchy
used by the Engineering and Building Standards
Department was adopted to add structure to the 2
gigabytes of data that was collected (Police Data not
included in original hierarchy):
The power of the GRASP system lies in the fact that
this time consuming process is taken off the shoulders
of the researcher and absorbed by the system. The data
collection process is only performed during the initial
systems set-up in a GRASP city and is kept up to date
through electronic updates handled by the
administrator. The Capstone team performed all the
hard work of gathering the data, so that the researcher
can focus more on the actual study.





Police Data
Demographic Data
Employment Data
Boundary and District Data
o North Carolina
o Regional
o USA
o Mecklenburg
 Government and Political
 School
Land Development Data
o Building Permits
o Facilities
o Land Cover
o New Development
o Schools
o Infrastructure
 Airports
 Parks
 Rail
 Roads
 Sewer
 Utilities
 Water
o Physical Environment
 Environmental
 Hydrology
 Vegetation
 Soils
All of this data varies in both amount and type. The
Police department uses ESRI products, MS Access,
SPSS, SaS, and Oracle 8i to store spatial and tabular
data. The Fire department still uses a mainframe
system, and in order to analyze or create shape files
they have to copy data from a shadow file into excel
and then geocode it themselves. The Business Services
department does not house any spatial data but it does
collect important data, relevant to the system, and stores
IMPLEMENTATION
With the Charlotte, NC data collected, cleaned, and
safely stored within the Systems Engineering
department, the project could now make the transition
from conceptual design to actual implementation. The
first design decision that had to be made was what
software package to use as the back-end of GRASP to
house the spatial and tabular data we collected.
Software Selection
The benefits that spatial data can provide to tabular
data led to a rather difficult decision between two
options of software: a Database Management System
(DBMS) that holds spatial data or a DBMS that would
hold only relational tables with the map files stored
separately on the server. The functionality between the
two is significantly different: a spatial DBMS allows
for higher querying capability by querying spatial files
directly, while the second option does not query the
spatial layers. This added capability, however, comes
with a significantly higher cost. Other aspects we
looked at while researching software alternatives
included infrastructure, data marts and Online
Analytical Processing (OLAP), and supporting
technologies (Gonzales, 2000).
The NIJ decided to use a non-spatial option for three
reasons. One, the purpose of database is to make a lot
of data available to researchers and other analysts who
have their own software to use for the analysis. Two,
the initial costs to build the GRASP system were lower.
Three, future costs for adding new data layers would be
less expensive. The final decision was to use SQL
Server 2000 because of its increased (with respect to its
previous versions) scalability, responsiveness, and
security. Two companies using SQL Server have in
57
G.R.A.S.P.
fact recently won grand prizes in the Database
Scalability Program (Microsoft, 2000). The 2000
edition made SQL Server just as capable of handling
the responsibilities of the GRASP system as Oracle 8i,
and its costs were significantly lower (Garten, 2001).
The best source was determined based on data
completeness and level of detail.
The GRASP prototype was implemented using SQL
Server 2000 running on a DELL PowerEdge 1400
Server. HTML was used to do the web page front-end
design, while the use of ColdFusion enabled the web
page to interact dynamically with the data warehouse.
Figure 1 illustrates a high-level design of the overall
system (Smith, 2001).
Also, some data was changed from its original
format before being entered into our system. Business
Services sent their data to us in one big table. But in
order to reduce redundancy in our system, speed up
querying, and reduce the amount of data we will need
updated in the future, we split this table into three: one,
containing a list of license types and their codes,
another containing each business, and a middle table
that relates the two together through the business ID
and each license code associated with it. In the future,
except for new license codes or small changes, most
updates will only change two of these tables.
Figure 1: Conceptual design of GRASP
In order to provide the functionality to the front-end
system called for by the National Institute of Justice,
administrative tables were created in the relational
model. These consist of a tables containing user
information, information on the organizations that a
researcher is working for, access levels, each project
title and its description, and queries performed under
each project. There is also a table that relates a user to
their projects and the organization for which the project
is being done. Along with the administrative tables are
a table that holds information about the spatial layers
available to the users and a table that holds all the fields
that come in the spatial layers and relates the field to its
respective layer.
Technical Notes
Database Development
GRASP’s spatial data is stored in a separate file on
the server, and the name of the file and its location are
stored inside a table in SQL Server. While the table is
not related to any other tables, it does store searchable
metadata and researcher access levels for each layer.
The data stored inside the relational model consists
of data received from Business Services, Police data,
and GRASP functionality tables. Putting the data into
one format was a relatively simple process. All of the
GRASP data is imported directly into SQL Server using
data transformation services. Data Transformation
Services (DTS) is a service of SQL Server 2000, which
helps to extract, transform, and load data from
heterogeneous sources.
There was also a surprising amount of overlap
amongst spatial data received from different sources.
58
This database design was entered into Microsoft’s
SQL server 2000, the database management system
chosen for the back-end of GRASP. Although this
software did not provide the spatial capabilities that
would have allowed us to make our repository totally
relational, it did have a number of advantages over
other systems. First, it came with a price tag
significantly less than its major competitors, a major
positive when developing any sort of prototype.
Second, the Capstone team had previous experience
using SQL Server, which allowed us to move right from
design to implementation without spending time
learning the intricacies of a new system.
Web Interface Development
Although it may not do as much work as the
database, the graphical user interface is by far the most
crucial component in the entire GRASP system. It is
simultaneously the face and brains of GRASP,
providing a psychologically satisfying view into the
complications of the database. It provides both the first
2001 Systems Engineering Capstone Conference • University of Virginia
impression as well as a lasting mental image of what
the system is all about. A functional, friendly, and
engaging interface can draw researchers back time after
time while the opposite could render the efforts of the
entire project useless.
Using previously created storyboards of each page,
the visual elements of the site were created and laid into
place using HTML. The actual functionality of the
interface, which includes being able to generate queries
that search the database and embed logic into the web
pages, was done using the Cold Fusion Application
Server package. This allowed the web site to connect
to the SQL Server 2000 database server containing all
the spatial, tabular, and image data. It also allowed for
the intended functionalities of the web design to be
realized which include logging in users, providing
security, creating dynamic pages from the database, and
most importantly providing the requested data to the
user. Cold Fusion Markup Language (CFML) is a tagbased language that we used to create these
functionalities in the otherwise HTML only web pages.
CFML tags were the tools we employed to select the
rows of data from tables and the entire selected shape
files on manipulating the user requested data files
stored on the server zipping them into do
keep the data secure from hackers while limiting the
range of database access granted to a single researcher.
New users will need to register themselves on the next
page.
New users will then reach a registration page, as
shown in Figure 3. This page collects information both
about the researcher and the study that is being
performed. This specific data is then incorporated into
the feedback loop mechanism, which is an
automatically generated email from the server, sends a
preliminary notice to the Charlotte departments
concerning the kind of study this is going to be
performed using their data. The application is sent
automatically via email to the NIJ and is reviewed and a
log-in and password are either granted or denied. The
email that system sends to the NIJ or GRASP
administrator has all the information the user entered
about themselves and their project and a link to an
approval web page. The web page has the same
information presented in the email, but also has an
approve or disapprove button and GRASP administrator
can grant or deny access to the user by clicking the
appropriate button. This makes the approval process as
fast and easy as possible for the administrator.
Site Layout and Functionality
Figure 2 illustrates the GRASP splash screen that
appears when a user first reaches the system using a
web browser. On the left toolbar, there are text boxes
that allow previous user to log into the system using the
password supplied by the NIJ. The log-in and password
Figure 3: New User Registration screen
Figure 2: GRASP Welcome screen found on
the world-wide-web.
With the study approved, a user can then proceed to
select the kind and amount of data to download. Figure
4 illustrates a typical tabular data querying page. Here
the user is able to customize data queries by selecting
the table, date ranges, and even specific fields of that
table. These specific queries can then be saved into a
project file for later use. This feature will be of
particular use to those researchers who perform a study
of the same variables at regular intervals. They can
59
G.R.A.S.P.
simply log back into the system, pull up their old query,
and replicate their study using the most up-to-date
information.
which the researcher can import into any application
they want to analyze the data. This allows the user the
ability to download only the incident data they want,
rather than having to select the all of the incidents in the
entire table. For example, this would be very helpful if
a researcher is studying a trend of incidents over the
past year. This researcher would only have to
download the select fields they want for the last year,
rather than the entire dataset over its complete time
span, which could be 10 years or more. This minimizes
the download time for the user and maximizes
convenience by giving them only the specific data they
request (McCoubrie, 2001).
Data Update Method
Figure 4: Typical querying screen
Data Dispersion Method
After the user selects the data they want whether it is
tabular incident data, or shape layer files, the final
component of the GRASP system involves getting the
user selected data from the SQL database to the actual
researchers. We handled this differently for the two
types of data.
The interface provides two different ways to
download the data. For layers that have shape files, the
user can search through the available layers select the
ones they want and the system, utilizing a custom
CMFL tag, will compress the selected layers into a
compressed zip file and download the file to the
requesting user. This compression agent significantly
improves transfer times, allowing huge image and
shape files, for example, to be sent with minimal
difficulties. A typical shape file may be several
Megabytes in size and a typical user may want to
download several shape files at a time. The system
compresses all the requested files into one zip file that
is much smaller and will download much faster.
Tabular Incident data is handled differently because
our system can allow more flexibility to the user. For
incident tables, the user selects the incident table he or
she wants and then they have the ability to select the
individual fields from that table, and the date range they
want for those incidents (see Figure 4). Then the
system exports the data into a comma-delimited text file
60
The first step in ensuring that the safety researcher
receives the most comprehensive and up-to-date data
available is to ensure that the system itself contains the
correct data. The entire GRASP concept would be
undermined if the data collection process were overly
bulky or time consuming. As a result, each
contributing department in Charlotte has agreed upon a
regular update schedule in which they will provide new
data on a monthly or quarterly basis. A password
protected FTP site has been established that allows
those departments to send their data directly over the
Internet. This data is then automatically uploaded into
the repository back-end.
In an attempt to ease the workload on the
participating departments, the prototype GRASP system
includes a recommendation for the hiring of a full-time
database administrator. It is this person’s job to make
sure that the data is properly downloaded, cleansed, and
inserted into the database. The administrator’s job
description also includes ensuring system security and
continuously maintaining the system to make sure it
stays on-line.
TESTING
The Capstone team employed a two-phase testing
process that included both developer and end-user
evaluations. First, the Capstone team, as the original
developers, worked through every segment of the
system in search of bugs and functionality problems.
GRASP was then made available for critique by
researchers within the NIJ. The system was evaluated
for a week by these researchers, who recorded their
comments, suggestions, and errors encountered. These
notes were collected by the Capstone team and
incorporated into the final design.
2001 Systems Engineering Capstone Conference • University of Virginia
CONCLUSIONS AND RECOMMENDATIONS
This project reached a high-level of success because
of the level of cooperation it achieved with the host
city. The organizations we worked with in Charlotte
were very forward thinking and understood the
importance of our project. Future implementations of
GRASP or similar projects must help the participating
organizations see how the project will benefit everyone
involved. The organizations must be made active
stakeholders in the system and have some incentive to
cooperate. Our project had very aggressive goals for
such a short timeline and the extensive willingness to
cooperate we encountered was a tremendous help in
being able to keep up with the timeline we set.
As we progressed through the design process, from
goal development to testing, it has become clear what a
tremendous potential this project has to revolutionize
the research community. Typical data collection can
take anywhere from five to seven months. With the
GRASP system implemented, this time frame is cut to
five to seven days.
REFERENCES
Beuerle, K. 2001. “The Design and Implementation of
GRASP: Geospatial Repository for Analysis and
Safety Planning.” Technical Report. Department of
Systems Engineering, University of Virginia,
Charlottesville, VA.
Garten, A. “Building an Integrated Data Model for
GRASP (Geospatial Repository for Analysis and Safety
Planning).” Technical Report. Department of Systems
Engineering, University of Virginia, Charlottesville,
VA.
Gonzales, M. L. 2000. “Last One Standing” Online.
http://www.intelligententerprise.com.
McCoubrie, W. 2001. “G.R.A.S.P. (Geospatial
Repository for Analysis and Safety Planning): Design
and Implementation of a Prototypical Spatial Data
Repository for a Major Metropolitan Area.” Technical
Report. Department of Systems Engineering,
University of Virginia, Charlottesville, VA.
Smith, E.R. 2001. “Design and Implementation of
GRASP.” Technical Report. Department of Systems
Engineering, University of Virginia, Charlottesville,
VA.
United States Department of Justice (USDOJ). 2000.
“Statement of Work.” Office of Justice Programs.
Unpublished department report.
BIOGRAPHIES
Kristin Beuerle, a 4th-year Systems Engineering major,
calls Pasadena, MD her home. Besides hanging with
the Capstone team, Kristin enjoys singing with Jubilate
and playing IM softball, even despite the fact that she
broke her hand first semester during softball playoffs.
After a great deal of relaxing and traveling during the
summer, Kristin will begin her job in September at
Deloitte Consulting in Washington, D.C.
Amy Garten is a 4th-year Systems Engineering student
from the proud state of West Virginia. When she is not
twirling around with the University Dance Club or
having fun with her Sigma Kappa gals, Amy can
usually be found watching cheesy romantic comedies
on her digital cable. In July, Amy will begin at Cap
Gemini Ernst & Young in Northern Virginia.
Ryan Smith, formally known as the ice hockey legend
“Firewall,” hails from Springfield, VA right next to one
of DC’s finest prisons. While many of his SE
classmates make their way up to NOVA post
graduation, Ryan will be venturing out to sunny San
Fran to work for Intraspect, a knowledge management
software development company. He will be missed, but
as wise men have said, ‘They will all come crawling
back someday.’
Wes Yes McCoubrie, straight out of New Jersey, is
ironically also a Systems Engineering major here at
UVA. In addition to being the current president of the
Club Baseball team and a member of Pi Lambda Phi,
Wes has repeatedly impressed the team with his mad
presentation skills. After traveling through Europe,
Wes be joining Amy at Cap Gemini Ernst & Young.
Microsoft. 2001. "Microsoft Customers Win Grand
Prizes in Database Scalability Program" Online.
http://www.wintercorp.com/MicrosoftRelease021201.ht
m
61