* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download THE DESIGN AND IMPLEMENTATION OF GRASP:
Survey
Document related concepts
Transcript
2001 Systems Engineering Capstone Conference • University of Virginia G.R.A.S.P. (GEOSPATIAL REPOSITORY FOR ANALYSIS AND SAFETY PLANNING): DESIGN AND IMPLEMENTATION OF A PROTOTYPICAL SPATIAL DATA REPOSITORY FOR A MAJOR METROPOLITAN AREA Student Team: Kristin Beuerle, Amy Garten, Wes McCoubrie, Ryan Smith Faculty Advisor: Dr. Donald Brown, Department of Systems Engineering Staff Advisor: Jason Dalton Client Advisors: Elizabeth Groff, Eric Jefferis, and Debra Stoe, National Institute of Justice, 810 7th Street, Washington, D.C., [email protected] KEYWORDS: Geographical Information System (GIS), Data Warehouse, Geospatial Data, Safety Planning, Data Analysis ABSTRACT Public safety in the United States, particularly in urban centers throughout the country, has always been of concern to public officials, law enforcement, and citizens. To aid in formulating objective data-supported public safety strategies, city decision makers often employee researchers to conduct analyses. Researchers conducting these analyses frequently seek out information from multiple city data sources to ensure the highest possible level of accuracy. Unfortunately, under the previous model, researchers established communication with and gained access permissions from each separate data source. Furthermore, each data source potentially could have supplied data in a different format. This procedure, both time consuming and inefficient, substantially hindered the speed at which information was delivered to appropriate decision makers. The major research component of the United States Department of Justice, the National Institute of Justice enlisted the NIJ Capstone team to aid in streamlining this safety research process. Specifically, the Capstone team analyzed the feasibility of implementing a data repository that integrates data from dispersed sources. After establishing this feasibility, the Capstone team then implemented GRASP, a Geospatial Repository for Analysis and Safety Planning, in Charlotte, North Carolina as a proof of concept (Beuerle, 2001). INTRODUCTION Before implementing policies and programs regarding public safety within United States urban areas, city decision makers must be confident that they are combating the correct problem in the most effective manner. Thus, to gain a clearer understanding of the situation, decision makers often hire researchers to conduct thorough analyses on current safety aspects of the city. Since correlations among different safety elements frequently reveal previously undetected information, researchers more often than not include information from a variety of sources in their analyses. Therefore, the decision makers base their policies on information founded in objective data analyses rather than general observations or impressions. Unfortunately, prior to GRASP, Charlotte’s data were dispersed among many separate departments, making it time consuming for researchers to collect necessary data (Personal communication, 2000). While the Police and Planning Commission Departments did have access to a centralized data source, data were shared only between them and were not easily accessible to researchers. In addition, these departments did not directly have access to or share data with the Fire, Tax, Business Records, or Engineering and Building Standards Departments. To gather all the data needed, researchers had to physically contact and meet with each department. In developing GRASP, the Capstone team did this preliminary data collection for the researchers and made the data available via an Internet interface. Fully exploring the interplay of various factors when deciding the most appropriate safety measures also links to many larger issues. For instance, as illustrated by the Internet, information sharing not only broadens 55 G.R.A.S.P. learning opportunities but also sparks curiosity since information is so readily available. Similarly, the Capstone team anticipates that GRASP will promote and help to establish the importance of data sharing. With data collaboration as a guiding principle, researchers will hopefully find it easier and more enticing to conduct multi-source analyses. PROBLEM ANALYSIS The main goal of this project was to create and implement an integrated data access model for spatially referenced safety data to be used for research and policy planning in an urban jurisdiction. In working toward this goal, the team researched relevant material, thoroughly defined the project, developed the data model and feasibility analysis, and began system implementation (USDOJ, 2000). Project Definition To formally capture this information, the team developed a Vision Statement, Descriptive and Normative Scenarios, Conceptual Requirements, and a Goal Tree. First, the Vision Statement defines the project’s focus and scope. Once the National Institute of Justice approved the Vision Statement, the team then developed the Descriptive and Normative Scenarios which detail the current and ideal data acquisition processes, respectively, and present an example for further clarification in which a researcher attempts to analyze Juvenile crime data in relation to curfew policies. Next, the team translated the client’s wishes into structured attributes of the ideal system, also known as Conceptual Requirements, on which to base the data model design. Finally, the Goal Tree extends the Conceptual Requirements into itemized goals, objectives, and indices of performance to measure success. The team developed a schedule of tasks that the team had to meet to fulfill the project’s vision in the form of a Gantt chart. Pilot City Selection The selection of Charlotte, North Carolina as the pilot GRASP city involved a structured decision process. Initially, the Capstone team formulated a list of potential sites – Charlotte, North Carolina and Richmond, Fairfax, and Newport News, Virginia – based on recommendations from the National Institute of Justice. Officials from the Police, Planning Commission, Tax, and Transportation Departments were contacted via the phone and emailed background 56 project information as well as a request for information concerning the current format and storage method of data, the amount and type of available data, and the city’s willingness to participate in this study. Once this information was collected, the Capstone team evaluated each city on the following criteria: distance from Charlottesville, availability of data, format of data, data release method, data quality, time spanned, amount of data, willingness of the area to cooperate, and interest level of data. Charlotte received the highest score, and after final approval from the National Institute of Justice, it officially became the pilot sight for GRASP. DATA COLLECTION Although initial contacts showed the Charlotte administrators to be extremely excited about participating in the GRASP project, it was still necessary to visit each department and present the project to them in person. This allowed the Capstone team to 1) explain the concept of the system in more detail and 2) better emphasize the importance of their data to our success. Two separate trips were made to the city, November 22nd and February 21st-23rd, each packed with informational meetings with the different department heads and other key employees. As the team had discovered during phone conversations, the entire city was extremely helpful and technology oriented. We were originally anticipating having to dig through file cabinets for the data, but were pleasantly surprised by their current database structure. Charlotte is indeed a forward-thinking city and we anticipate that GRASP will meet more technology-oriented obstacles as it is expanded to other cities. There were, however, a number of reoccurring concerns that arose as we presented the concept. The Police Department, in particular, was concerned with the implications of security breaches and was not willing to release any kind of sensitive data. A number of other departments noted that some form of feedback loop was necessary to provide them with a preliminary notice as to what kinds of studies are going to be conducted. This would give the Fire Department, for example, a chance to prepare an official statement in response to a new report detailing a rise in fire fatalities. With an assurance that these concerns and suggestions would make their way into the final system, each department agreed to contribute data. CDs were 2001 Systems Engineering Capstone Conference • University of Virginia burnt by the City Planning Department, Tax, and Engineering, while the Police Department, Fire, and Business Services would send the data via an FTP site that the Capstone team created. it using both SQL Server 6.5 and SQL Server 7.0. The tax department also uses a SQL Server/ESRI combination to hold their combination of spatial and tabular data. Upon receipt of the data, it was immediately put through a data cleansing process that separated relevant fields from those that would be of no interest to safety researchers. The following data organization hierarchy used by the Engineering and Building Standards Department was adopted to add structure to the 2 gigabytes of data that was collected (Police Data not included in original hierarchy): The power of the GRASP system lies in the fact that this time consuming process is taken off the shoulders of the researcher and absorbed by the system. The data collection process is only performed during the initial systems set-up in a GRASP city and is kept up to date through electronic updates handled by the administrator. The Capstone team performed all the hard work of gathering the data, so that the researcher can focus more on the actual study. Police Data Demographic Data Employment Data Boundary and District Data o North Carolina o Regional o USA o Mecklenburg Government and Political School Land Development Data o Building Permits o Facilities o Land Cover o New Development o Schools o Infrastructure Airports Parks Rail Roads Sewer Utilities Water o Physical Environment Environmental Hydrology Vegetation Soils All of this data varies in both amount and type. The Police department uses ESRI products, MS Access, SPSS, SaS, and Oracle 8i to store spatial and tabular data. The Fire department still uses a mainframe system, and in order to analyze or create shape files they have to copy data from a shadow file into excel and then geocode it themselves. The Business Services department does not house any spatial data but it does collect important data, relevant to the system, and stores IMPLEMENTATION With the Charlotte, NC data collected, cleaned, and safely stored within the Systems Engineering department, the project could now make the transition from conceptual design to actual implementation. The first design decision that had to be made was what software package to use as the back-end of GRASP to house the spatial and tabular data we collected. Software Selection The benefits that spatial data can provide to tabular data led to a rather difficult decision between two options of software: a Database Management System (DBMS) that holds spatial data or a DBMS that would hold only relational tables with the map files stored separately on the server. The functionality between the two is significantly different: a spatial DBMS allows for higher querying capability by querying spatial files directly, while the second option does not query the spatial layers. This added capability, however, comes with a significantly higher cost. Other aspects we looked at while researching software alternatives included infrastructure, data marts and Online Analytical Processing (OLAP), and supporting technologies (Gonzales, 2000). The NIJ decided to use a non-spatial option for three reasons. One, the purpose of database is to make a lot of data available to researchers and other analysts who have their own software to use for the analysis. Two, the initial costs to build the GRASP system were lower. Three, future costs for adding new data layers would be less expensive. The final decision was to use SQL Server 2000 because of its increased (with respect to its previous versions) scalability, responsiveness, and security. Two companies using SQL Server have in 57 G.R.A.S.P. fact recently won grand prizes in the Database Scalability Program (Microsoft, 2000). The 2000 edition made SQL Server just as capable of handling the responsibilities of the GRASP system as Oracle 8i, and its costs were significantly lower (Garten, 2001). The best source was determined based on data completeness and level of detail. The GRASP prototype was implemented using SQL Server 2000 running on a DELL PowerEdge 1400 Server. HTML was used to do the web page front-end design, while the use of ColdFusion enabled the web page to interact dynamically with the data warehouse. Figure 1 illustrates a high-level design of the overall system (Smith, 2001). Also, some data was changed from its original format before being entered into our system. Business Services sent their data to us in one big table. But in order to reduce redundancy in our system, speed up querying, and reduce the amount of data we will need updated in the future, we split this table into three: one, containing a list of license types and their codes, another containing each business, and a middle table that relates the two together through the business ID and each license code associated with it. In the future, except for new license codes or small changes, most updates will only change two of these tables. Figure 1: Conceptual design of GRASP In order to provide the functionality to the front-end system called for by the National Institute of Justice, administrative tables were created in the relational model. These consist of a tables containing user information, information on the organizations that a researcher is working for, access levels, each project title and its description, and queries performed under each project. There is also a table that relates a user to their projects and the organization for which the project is being done. Along with the administrative tables are a table that holds information about the spatial layers available to the users and a table that holds all the fields that come in the spatial layers and relates the field to its respective layer. Technical Notes Database Development GRASP’s spatial data is stored in a separate file on the server, and the name of the file and its location are stored inside a table in SQL Server. While the table is not related to any other tables, it does store searchable metadata and researcher access levels for each layer. The data stored inside the relational model consists of data received from Business Services, Police data, and GRASP functionality tables. Putting the data into one format was a relatively simple process. All of the GRASP data is imported directly into SQL Server using data transformation services. Data Transformation Services (DTS) is a service of SQL Server 2000, which helps to extract, transform, and load data from heterogeneous sources. There was also a surprising amount of overlap amongst spatial data received from different sources. 58 This database design was entered into Microsoft’s SQL server 2000, the database management system chosen for the back-end of GRASP. Although this software did not provide the spatial capabilities that would have allowed us to make our repository totally relational, it did have a number of advantages over other systems. First, it came with a price tag significantly less than its major competitors, a major positive when developing any sort of prototype. Second, the Capstone team had previous experience using SQL Server, which allowed us to move right from design to implementation without spending time learning the intricacies of a new system. Web Interface Development Although it may not do as much work as the database, the graphical user interface is by far the most crucial component in the entire GRASP system. It is simultaneously the face and brains of GRASP, providing a psychologically satisfying view into the complications of the database. It provides both the first 2001 Systems Engineering Capstone Conference • University of Virginia impression as well as a lasting mental image of what the system is all about. A functional, friendly, and engaging interface can draw researchers back time after time while the opposite could render the efforts of the entire project useless. Using previously created storyboards of each page, the visual elements of the site were created and laid into place using HTML. The actual functionality of the interface, which includes being able to generate queries that search the database and embed logic into the web pages, was done using the Cold Fusion Application Server package. This allowed the web site to connect to the SQL Server 2000 database server containing all the spatial, tabular, and image data. It also allowed for the intended functionalities of the web design to be realized which include logging in users, providing security, creating dynamic pages from the database, and most importantly providing the requested data to the user. Cold Fusion Markup Language (CFML) is a tagbased language that we used to create these functionalities in the otherwise HTML only web pages. CFML tags were the tools we employed to select the rows of data from tables and the entire selected shape files on manipulating the user requested data files stored on the server zipping them into do keep the data secure from hackers while limiting the range of database access granted to a single researcher. New users will need to register themselves on the next page. New users will then reach a registration page, as shown in Figure 3. This page collects information both about the researcher and the study that is being performed. This specific data is then incorporated into the feedback loop mechanism, which is an automatically generated email from the server, sends a preliminary notice to the Charlotte departments concerning the kind of study this is going to be performed using their data. The application is sent automatically via email to the NIJ and is reviewed and a log-in and password are either granted or denied. The email that system sends to the NIJ or GRASP administrator has all the information the user entered about themselves and their project and a link to an approval web page. The web page has the same information presented in the email, but also has an approve or disapprove button and GRASP administrator can grant or deny access to the user by clicking the appropriate button. This makes the approval process as fast and easy as possible for the administrator. Site Layout and Functionality Figure 2 illustrates the GRASP splash screen that appears when a user first reaches the system using a web browser. On the left toolbar, there are text boxes that allow previous user to log into the system using the password supplied by the NIJ. The log-in and password Figure 3: New User Registration screen Figure 2: GRASP Welcome screen found on the world-wide-web. With the study approved, a user can then proceed to select the kind and amount of data to download. Figure 4 illustrates a typical tabular data querying page. Here the user is able to customize data queries by selecting the table, date ranges, and even specific fields of that table. These specific queries can then be saved into a project file for later use. This feature will be of particular use to those researchers who perform a study of the same variables at regular intervals. They can 59 G.R.A.S.P. simply log back into the system, pull up their old query, and replicate their study using the most up-to-date information. which the researcher can import into any application they want to analyze the data. This allows the user the ability to download only the incident data they want, rather than having to select the all of the incidents in the entire table. For example, this would be very helpful if a researcher is studying a trend of incidents over the past year. This researcher would only have to download the select fields they want for the last year, rather than the entire dataset over its complete time span, which could be 10 years or more. This minimizes the download time for the user and maximizes convenience by giving them only the specific data they request (McCoubrie, 2001). Data Update Method Figure 4: Typical querying screen Data Dispersion Method After the user selects the data they want whether it is tabular incident data, or shape layer files, the final component of the GRASP system involves getting the user selected data from the SQL database to the actual researchers. We handled this differently for the two types of data. The interface provides two different ways to download the data. For layers that have shape files, the user can search through the available layers select the ones they want and the system, utilizing a custom CMFL tag, will compress the selected layers into a compressed zip file and download the file to the requesting user. This compression agent significantly improves transfer times, allowing huge image and shape files, for example, to be sent with minimal difficulties. A typical shape file may be several Megabytes in size and a typical user may want to download several shape files at a time. The system compresses all the requested files into one zip file that is much smaller and will download much faster. Tabular Incident data is handled differently because our system can allow more flexibility to the user. For incident tables, the user selects the incident table he or she wants and then they have the ability to select the individual fields from that table, and the date range they want for those incidents (see Figure 4). Then the system exports the data into a comma-delimited text file 60 The first step in ensuring that the safety researcher receives the most comprehensive and up-to-date data available is to ensure that the system itself contains the correct data. The entire GRASP concept would be undermined if the data collection process were overly bulky or time consuming. As a result, each contributing department in Charlotte has agreed upon a regular update schedule in which they will provide new data on a monthly or quarterly basis. A password protected FTP site has been established that allows those departments to send their data directly over the Internet. This data is then automatically uploaded into the repository back-end. In an attempt to ease the workload on the participating departments, the prototype GRASP system includes a recommendation for the hiring of a full-time database administrator. It is this person’s job to make sure that the data is properly downloaded, cleansed, and inserted into the database. The administrator’s job description also includes ensuring system security and continuously maintaining the system to make sure it stays on-line. TESTING The Capstone team employed a two-phase testing process that included both developer and end-user evaluations. First, the Capstone team, as the original developers, worked through every segment of the system in search of bugs and functionality problems. GRASP was then made available for critique by researchers within the NIJ. The system was evaluated for a week by these researchers, who recorded their comments, suggestions, and errors encountered. These notes were collected by the Capstone team and incorporated into the final design. 2001 Systems Engineering Capstone Conference • University of Virginia CONCLUSIONS AND RECOMMENDATIONS This project reached a high-level of success because of the level of cooperation it achieved with the host city. The organizations we worked with in Charlotte were very forward thinking and understood the importance of our project. Future implementations of GRASP or similar projects must help the participating organizations see how the project will benefit everyone involved. The organizations must be made active stakeholders in the system and have some incentive to cooperate. Our project had very aggressive goals for such a short timeline and the extensive willingness to cooperate we encountered was a tremendous help in being able to keep up with the timeline we set. As we progressed through the design process, from goal development to testing, it has become clear what a tremendous potential this project has to revolutionize the research community. Typical data collection can take anywhere from five to seven months. With the GRASP system implemented, this time frame is cut to five to seven days. REFERENCES Beuerle, K. 2001. “The Design and Implementation of GRASP: Geospatial Repository for Analysis and Safety Planning.” Technical Report. Department of Systems Engineering, University of Virginia, Charlottesville, VA. Garten, A. “Building an Integrated Data Model for GRASP (Geospatial Repository for Analysis and Safety Planning).” Technical Report. Department of Systems Engineering, University of Virginia, Charlottesville, VA. Gonzales, M. L. 2000. “Last One Standing” Online. http://www.intelligententerprise.com. McCoubrie, W. 2001. “G.R.A.S.P. (Geospatial Repository for Analysis and Safety Planning): Design and Implementation of a Prototypical Spatial Data Repository for a Major Metropolitan Area.” Technical Report. Department of Systems Engineering, University of Virginia, Charlottesville, VA. Smith, E.R. 2001. “Design and Implementation of GRASP.” Technical Report. Department of Systems Engineering, University of Virginia, Charlottesville, VA. United States Department of Justice (USDOJ). 2000. “Statement of Work.” Office of Justice Programs. Unpublished department report. BIOGRAPHIES Kristin Beuerle, a 4th-year Systems Engineering major, calls Pasadena, MD her home. Besides hanging with the Capstone team, Kristin enjoys singing with Jubilate and playing IM softball, even despite the fact that she broke her hand first semester during softball playoffs. After a great deal of relaxing and traveling during the summer, Kristin will begin her job in September at Deloitte Consulting in Washington, D.C. Amy Garten is a 4th-year Systems Engineering student from the proud state of West Virginia. When she is not twirling around with the University Dance Club or having fun with her Sigma Kappa gals, Amy can usually be found watching cheesy romantic comedies on her digital cable. In July, Amy will begin at Cap Gemini Ernst & Young in Northern Virginia. Ryan Smith, formally known as the ice hockey legend “Firewall,” hails from Springfield, VA right next to one of DC’s finest prisons. While many of his SE classmates make their way up to NOVA post graduation, Ryan will be venturing out to sunny San Fran to work for Intraspect, a knowledge management software development company. He will be missed, but as wise men have said, ‘They will all come crawling back someday.’ Wes Yes McCoubrie, straight out of New Jersey, is ironically also a Systems Engineering major here at UVA. In addition to being the current president of the Club Baseball team and a member of Pi Lambda Phi, Wes has repeatedly impressed the team with his mad presentation skills. After traveling through Europe, Wes be joining Amy at Cap Gemini Ernst & Young. Microsoft. 2001. "Microsoft Customers Win Grand Prizes in Database Scalability Program" Online. http://www.wintercorp.com/MicrosoftRelease021201.ht m 61