Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Microsoft Jet Database Engine wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Relational model wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Clusterpoint wikipedia , lookup
Functional Database Model wikipedia , lookup
Precipitation Data Analysis Database Creation for Raw Precipitation Data, Analysis, and Ease of Use Steven Arhart, Kelly Chappell, Jaqueline Pacheco, Dan Tracer Department of Civil and Environmental Engineering University of Utah, Salt Lake City, Utah and manipulation, and make the data available to the general public. Abstract—Large amounts of data have created a need for more sophisticated data management software. The most common type of program to organize these large amounts of data is a database, which organizes data into a format that is easier for computers to register. Datasets from precipitation gauges in and around the Salt Lake Valley were obtained and imported into a database in Microsoft Access. From here, erroneous data was removed and the remaining data was organized. In Microsoft Access, different queries were created to produce different results and tables. These queries are designed to be simple and ask for basic specifications such as dates and gauges. The software program R was connected to the Access database to explore the data sets and create daily precipitation plots and statistical analysis of the precipitation data. A HydroServer Lite website was also created that will allow a user to view the data and the specific location of the gauge. I. INTRODUCTION Figure 1 – Map of five precipitation gauges The National Climate Data Center (NCDC) and Salt Lake County (County) installed and maintains numerous weather stations in the Salt Lake Valley and the surrounding mountains. These devices record data for numerous weather criteria, such as temperature, humidity, and precipitation. Using these weather stations different goals and tools were achieved, which can be seen below. • Create a database using Microsoft Access for the storage and manipulation of precipitation data from the sites mentioned above. • Demonstrate scripting abilities to connect the database to a statistical analysis program like R. • Write queries to manipulate the data into a file type that is supported by the storm water modeling software SWMM. • Establish a set of data analysis queries in Microsoft Access that will enable a novice user to conduct data analysis with no prior computer programming knowledge. • Populate a website that utilizes HydroServer Lite for easy data retrieval of the precipitation data. The map above shows the five weather stations that have been selected for analysis of precipitation rates. Two stations operated by the NCDC, one near Mountain Dell Reservoir (gage number 425892) located in Parley’s Canyon, Utah and one at the Salt Lake International Airport (gage number 427598). The stations operated by Salt Lake County are located at the Rose Park Fire Station (gage number 10 at 285 N. 1000 W.), Red Butte Gardens (gage number 20 at 2200 E. Red Butte Canyon Rd.), and the Salt Lake County Complex (gage number 30 at 2001 S. State St.). Precipitation data from these stations were obtained in the “.txt” file format and then imported into a database using Microsoft Access. This is a good way to store large amounts of data and make it accessible for other programs to query, analyze, or model the raw data. As shown in figure 2, the data collected at the different monitoring sites is written by the loggers as “.txt” files. These files are then imported in to the Access database. This database will store all of the data and make it available to sort, and analyze the data. Depending on the end goal of the data it may then be exported as a file suitable for HyroserverLite, SWMM, or be plotted and further analyzed using R. Microsoft Excel is the most popular program to handle large amounts of data, but it was not created for that purpose and lacks many of the tools and stability that databases possess. Most researchers have not transitioned to the use of databases and query language because of the steep learning curve associated with them. This paper will enable a user with little to no experience in database management and querying to create a database with their raw data, perform simple querying 1 database. For example, R can access the table with just the NCDC precipitation data. Once R is connected to the database whether it’s in Microsoft Access, MySQL, SQLite, Oracle, etc. there are many different types of analyses that can be conducted using these programs. For the rainfall datasets created in Microsoft Access, R was used to analyze and create the following processes for simple statistical purposes. Short examples of the script are shown in the report as well as the full script in the appendix. The following list is the procedure and process that was created as a script in R. a) Installing the package RODBC allows R to create a pathway or connection to different DBMS.Through the library(RODBC) script a channel was created with the following script: odbcConnectAccess2007(testdb). Testdb is the path to the precipitation database. Other functions for connecting to a microsoft access database include odbcConnect and odbcDriverConnect. OdbcConnect can be created so that a username and password is required for access to the database when opening R. The odbcDriverConnect allows the user to choose different drivers such as mySQL, Oracle, SQL server, etc. Figure 2 – Data Path This paper will detail the creation of a database using Microsoft Access, manipulation of raw precipitation data from several sites in and around Salt Lake City, instructions for manipulations and querying of the data, and the creation of a website that utilizes Hydroserver Lite to allow for data querying and manipulation through a user’s web browser. The Hydroserver lite program does not have as many analysis and querying features, but provides a user with a simpler interface to interact with and manipulate the data. II. b) Using the sqlFetch function creates a connection to the following two queries from our database in Microsoft Access: DailyRainfallValueSLC_Raw_Data_Crosstab and DailyRainfallValueNCDC_Raw_Data_Crosstab. MANIPULATION OF RAW PRECIPITATION DATA In order to make the data useable, the must be converted into a format that can be imported into Access where querying and manipulation could be performed. The raw data files from County precipitation gauges were obtained from Shannon Reynolds, a PhD student at the University of Utah, who was working with the data. The data was collected from weather stations and automatically compiled on the County’s computers for analysis. The data were column separated text files with headings consisting of Date, Time, 24 hr Time, Gauge Number, Increment, and Monthly Total. This data was imported into the database with minimal manipulation and a full set of instructions of how this was accomplished is located in Section I of the Appendix. The raw NCDC data had varying data types and break locations that required additional manipulation to get into a format that Microsoft Access would recognize. Again, a full set of instruction for a new user to accomplish this is located in Section II of the Appendix. c) A summary of this data was then construcated, which shows the minimum, maximum, mean value, etc. An example of these summaries of precipitation values is shown in Figure 3. Using Access, the manipulated precipitation data can be formatted so that it can be recognized by the storm water modeling software Storm Water Management Modeling (SWMM). Detailed instructions to complete this task using SWMM and Excel are located in Section III of the Appendix. III. Figure 3 - Statistical Analysis of Precipitation Data d) The next major command that was used was plot. This command allows us to create a plot of the data and structure the text, size, color, ticks, etc. Figure 4 - Precipitation Comparison for 2009 shows a plot of the daily precipitation totals for the NCDC and County rainfall gauges. A number of observations and analysis can be achieved by looking at the different plots shown below. ANALYSIS USING R Another tool that is useful when analyzing this data is the software R. With basic knowledge of this program R is able to connect to a variety of different database management systems (DBMS) such as Microsoft Access. R can not only connect to just the database but separate queries or tables with in the 2 From the summary, above, it can be seen that the data shown in Figure 4 ranges from 03/18/2008 - 11/20/2009. Although the plot shows more than 2009, this label only shows up because this is the only change of calendar year in this data. In comparison the range of data in Figure 5 extends from 01/01/2007 to 01/01/2010. This data range is represented because the data from the array Prec1 is the basis of the plot and is the controlling dataset. Precip1 is a selection of the daily precipitation values from the Salt Lake County gauge #10. Notice in Figure 5 - Precipitation Comparison for 2007 through 2010, the date range is much larger. This is because the array Prec2 (Daily precipitation from NCDC gauge 425892) is now the controlling dataset. Also, in both plots notice the horizontal lines that form from the grouping of the red NCDC datasets. These lines form because of the significant figures in the data. The daily data is only presented as 0, 0.1, 0.2, 0.3…0.9, whereas the Salt Lake County rain gauges are tipping buckets. These tipping buckets record the data in a different manner allowing it to be two significant figures. Figure 5 - Precipitation Comparison for 2007 through 2010 Lastly, in Salt Lake City the monthly precipitation varies drastically from month to month. June, July, and August are dry months in the state of Utah; in other words, it is common to go weeks without any precipitation. This can be seen in Figure 5. For June, July and August the density or population of data points is much less compared to months like February through May. e) The last part of the analysis using R was to calculate the percent exceedence for the datasets. Using the annual and monthly precipitation records from 1983-2010 provided by NOAA, observations were made to determine a value for comparing our data sets to that of historical data. From Figure 6 it can be seen that Salt Lake City averages 0.6 – 2.0 inches of rain each month varying from wet to dry months. From this data a precipitation value of 0.5 in/day was considered for comparison. In comparison, June, July and August are dry months and may only get .5-1.0 inches of rain over the entire month. Figure 6 – Monthly Precipitation from 1981-2010 for SLC Running the lines of code, shown below, display the percent of data points above 0.5 in/day. The County gauge recorded daily rainfall greater than 0.5 inches 1% of the time. The NCDC gauge recorded a mere 3.8% of the time. Figure 4 - Precipitation Comparison for 2009 Figure 7 – Percent Exceedence of 0.5 in/day Precipitation 3 • Therefore, R is a useful tool to analyze and present datasets graphically. There are many different ways to represent the same results, and there are minor changes that can completely change the datasets and/or results. This report provides a brief example of some of the numerous queries and manipulations that can be performed on a database using R. The attached appendices show specific queries and code that was used to accomplish the results stated earlier and provides the reader with an introduction to the program. IV. • • • SHORT EXPLENATION OF TABLES AND QUERIES The database is made up of several tables and queries, that will sort, manipulate, and store the raw data. These queries and details are explained further in the “User’s Manual” in the appendix, but they will briefly be described here. • • • • • • • • Delete_Export_To_HydroserverLite – Deletes the data in the Export_To_HydroserverLite table to allow for a different data set. Delete_Export_To_SWIMM– Deletes the data in the Export_To_SWMMT table. EnterDateRangeAndGageNCDC_Raw_Data- Allows user to input gage numbers and date ranges to sort data, or weed out erroneous data. This can be sorted and exported if desired. Find duplicates for NCDC_Raw_Data- Allows user to find duplicate data in the NCDC raw data table. This data can then be deleted or be available for further analysis, and can be exported and sorted if desired. Tables SLC_Raw_Data – Stores all of the raw data from Salt Lake County rain gages. Currently in database are gages 10, 20, and 30; but additional gages can easily be added. NCDC_Raw_DataT – Stores all of the raw data from NCDC rain gages. Currently in database are gages 425892, and 427598; but additional gages can easily be added. Export_To_SWMMT - This table is populated from the NCDC and SLC raw data tables and is in a format that is exportable to SWMM software. Export_To_HydroserverLite - This table is populated from the NCDC and SLC raw data tables and is in a format that can be uploaded to HydroserverLite. Queries The SLC queries perform the same way as the NCDC queries with exception to the AddDataFromNewNCDC_Table ToNCDC_Raw_DataT query . In order to avoid repetitiveness only the NCDC queries are described below. AddDataFromNCDC_Raw_DataToExport_ToHydro serverLite – Populates the Export_To_HydroserverLite table from the NCDC_Raw_DataT table by gage number and date range. AddDataFromNCDC_Raw_DataTtoExport_ToSWM MT- Populates the Export_To_SWMMT table from the NCDC_Raw_DataT table by gage number and date range. AddDataFromNewNCDC_Table ToNCDC_Raw_DataT – Adds’ data from a newly imported NCDC table to the NCDC_Raw_DataT table. DailyRainfallValueNCDC_Raw_Data_CrosstabComputes daily rainfall values by gage number and date range. Can be sorted or exported if desired. Figure 8 - Tables and Queries V. EXAMPLE OF SQL STATEMENTS A significant amount of SQL script was required to create the different queries in the database. We will not attempt to go through all of the SQL statements in detail, but will briefly discuss a few different examples to highlight the SQL script and thought process. 4 Hour_, Minute([Time]) AS Minute_, NCDC_Raw_DataT.[Hpcp/100] AS Precip_Value FROM NCDC_Raw_DataT WHERE (((Right([Station],6))=[Enter Gauge #]) AND ((NCDC_Raw_DataT.[Hpcp/100])<999) AND ((NCDC_Raw_DataT.NewDate)>=[Enter Start Date] And (NCDC_Raw_DataT.NewDate)<=[Enter End Date])); Once a NCDC text file is imported as a table in the database, it is convenient to add this data to one table that includes all of the NCDC data (Raw_Data) because queries and additional tables can be based on this Raw_Data table instead of many tables and unions. This helps protect the database as well as make it more user friendly. Several inherent problems that come with the NCDC data is the date format, Station format, and the hourly precipitation (HPCP) value. The date that comes from the text file is in the format yyyymmdd (i.e. 19671205). Access only recognizes this as an integer and not a date. The station that comes from NCDC is a text (i.e. COOP:425892). In order to make sure that SWMM and other software will recognize this as a gauge number the COOP: needs to be dropped. Additionally the HPCP values are in hundredths of inches and should be converted to inches for accurate analysis. To fix these problems an append query was created to add the data from the newly imported table (in this example SLC_IAP_1948_2011) to the “NCDC_Raw_DataT” table while adding three new fields Hpcp/100 (converts original Hpcp to inches), NewDate (converts the OldDate to display as mm/dd/yyyy), and Gauge #, (converts Station to an integer number). The sample SQL code is given below: Figure 10 – SQL statement Hydroserver lite also requires a specific format of data in order to work. The fields are LocalDateTime in yyyy-mm-dd hh:mm:ss, and DataValue. So a query had to be built to take the date and time fields in Raw_Data convert them to one field yyyy:mm:dd hh:mm:ss, and add this data to the Export_To_Hydroserver lite table. From this table a .csv file can be created to upload to HydroserverLite, The sample SQL code is given below: INSERT INTO Export_To_Hydroserver lite ( LocalDateTime, DataValue ) SELECT [NewDate] & " " & [Time] AS Expr1, NCDC_Raw_DataT.[Hpcp/100] FROM NCDC_Raw_DataT WHERE (((NCDC_Raw_DataT.[Hpcp/100])<999) AND ((NCDC_Raw_DataT.[Gauge #])=[Enter Gauge #]) AND ((NCDC_Raw_DataT.NewDate)>=[Enter Start Date] And (NCDC_Raw_DataT.NewDate)<=[Enter End Date])); INSERT INTO NCDC_Raw_DataT ( Station, Station_Name, Elevation, Latitude, Longitude, OldDate, [Time], Hpcp, Flag, [Hpcp/100], NewDate, [Gauge #] ) SELECT SLC_IAP_1948_2011.Station AS Expr1, SLC_IAP_1948_2011.Station_Name AS Expr2, SLC_IAP_1948_2011.Elevation AS Expr3, SLC_IAP_1948_2011.Latitude AS Expr4, SLC_IAP_1948_2011.Longitude AS Expr5, SLC_IAP_1948_2011.OldDate AS Expr6, SLC_IAP_1948_2011.Time AS Expr7, SLC_IAP_1948_2011.Hpcp AS Expr8, SLC_IAP_1948_2011.Flags AS Expr9, [Hpcp]/100 AS Expr9, Mid([OldDate],5,2) & "/" & Right([OldDate],2) & "/" & Left([OldDate],4) AS Expr11, Right([Station],6) AS Expr12 FROM SLC_IAP_1948_2011; Figure 11– SQL statement The DailyRainfallValue queries are useful when wanting to see the daily rainfall values from a specific gauge and date range. In order to create this query, it is needed to group all of the data taken from a single day and add up the precipitation values then use the pivot command. The sample SQL code is given below: TRANSFORM Sum(NCDC_Raw_DataT.[Hpcp/100]) AS [SumOfHpcp/100] SELECT NCDC_Raw_DataT.NewDate FROM NCDC_Raw_DataT WHERE (((NCDC_Raw_DataT.[Hpcp/100])<999) AND ((NCDC_Raw_DataT.[Gauge #])=425892) AND ((NCDC_Raw_DataT.NewDate)>=#1/1/2007# And (NCDC_Raw_DataT.NewDate)<=#1/1/2010#)) GROUP BY NCDC_Raw_DataT.NewDate, NCDC_Raw_DataT.[Hpcp/100] ORDER BY NCDC_Raw_DataT.[Gauge #] PIVOT NCDC_Raw_DataT.[Gauge #]; Figure 9 – SQL statement To upload this data into SWMM the needed fields are Gauge_ID, Year_, Month_, Day_, Hour, Minute, and Precip_Value. The following SQL statement shows how to break the date and time fields into multiple fields and add this data to the Export_To_SWMMT table. This also excludes any 999 or higher precipitation values. The sample SQL code is given below: INSERT INTO Export_To_SWMMT ( Gauge_ID, Year_, Month_, Day_, Hour_, Minute_, Precip_Value ) SELECT Right([Station],6) AS Gauge_ID, Year([NewDate]) AS Year_, Month([NewDate]) AS Month_, Day([NewDate]) AS Day_, Hour([Time]) AS Figure 12 – SQL statement 5 VI. WEBSITE UTILIZING HYDROSERVER LITE HydroServer Lite (HydroServer) is standalone hydrological data publishing tool developed by CUAHSI that is fully compatible with CUAHSI HIS and HydroDesktop. HydroServer websites provide user-friendly platforms for public access of data that require no installation by the user. HIS Central, a public catalogue of public HydroServers, provides a wider opportunity for data sharing as the HydroServers are tagged and made searchable. Uploading data on a HydroServer provides an additional source of storage of the data that will be available longer compared to data stored solely on hardware. Installation of a HydroServer is relatively simple and requires MySQL and PHP support from the webhosting site. The webhosting site, http://www.000webhost.com, was used to host http://www.jpacheco.comuv.com/client, which contains the precipitation data for the NCDC and the County sites. HydroServer installation must be successfully completed to customize the appearance of the page; this includes but is not limited to the page background, banner, and home page image. Notice the images above and the customization of the two logos. Before any data can be uploaded to the HydroServer the sites, variable, source, and method information must be defined. The images below show the different Hydroserver menus that are used for customization and adding data. These different process involving the sites, variables, sources, methods can be seen under the database management tab. Another feature of Hyrdoserver is the ability to add other users like teachers and students. Different levels of authorization can be granted per user, but the basic user will have a username, password, and the ability to add their own data to the Hydroserver client. Although this can be useful for group work, only the administrators are able to work with the database management tab. Figure 13- Hydroserver Map & Menu Once data is uploaded into Hydroserver it will show up in the map on the main home screen. In the image above notice the five points which represent the different precipitation gauges. The data for each gauge was uploaded from a .csv file that Microsoft Access created. The queries in Microsoft Access provide automated retrieval of comma separated variable files for use in HydroServer. The .csv files must be formatted as LocalDateTime,DateValue where the LocalDateTime is yyyymm-dd hh:mm:ss. HydroServer is a good tool for sharing data, however, only post-1970 data is compatible. Any pre1970 data will generate an “invalid year for date on row 1” error. The County sites, contained data from 2007-2012 and were successfully uploaded to the HydroServer. Mountain Dell contained data from 1967-2011 and 93% of the data was uploaded. Salt Lake International Airport contained data from 1948-2011 and only 75% of the data was compatible with the HydroServer. 6 Data retrieval from the HydroServer is simple and only requires the user to click on a site location from the map shown in figure 13, shown above. Once the site is selected the site information, data plot, and data table are available. Different variables and methods can be chosen as well. The site information includes latitude, longitude, and different measurements taken at that site. This information could be useful for stream gauges that are collecting data for many different variables like stream flow, temperature, pH, dissolved oxygen, and nutrients. Figure 15 – Hydroserver Plot With these different tools acquired in Hydroserver a researcher, student, professor, etc. is able to add and share data. Also, this is a good way to store or back up important data files that one may not want to lose. Groups or organizations with usernames and passwords can access and use this platform to import and share data. Other users are only able to view and download the data making it more secure. VII. CONCLUSION Server based databases are ideal for storing large amounts of data logger precipitation data files. These files can become overwhelming and get lost if not properly managed. An advantage of storing data on servers is that the data will still be accessible compared to data that is stored on hardware that becomes obsolete. Knowledge of database server management and structured query language is required to successfully establish, manage, and use this type of database. Figure 14 – Accessing Precipitation Data via Hydroserver Lite The data plot and table are easily accessible and can be changed to show different date ranges. For example, the image below shows to ranges of data. The very bottom shows the entire dataset; by changing the viewing window on that the plot will shift according to the chosen date range. Once the desired plot is shown it can be exported as a JPEG, PNG, PDF, or SVG. Students, instructors, and researchers with little to no knowledge of server databases can use Microsoft Access to create desktop databases. Microsoft Access databases are portable and can be easily accessed via any computer that has Microsoft Access installed. The established database used to store the NCDC and County precipitation data files was used to format the raw data files for compatibility. Additionally, the data analysis queries in Microsoft Access will enable someone with no programing knowledge to conduct simple analysis through the database. The User’s Manual for Salt Lake Precipitation Database, in the Appendix, provides instructions for users to recreate a similar database and update it by importing new data. R can be used to perform more complex statistical analysis once connected to the Microsoft Access database. Additional queries in the database export data into files that are supported by SWMM and HydroServer Lite providing a wider range of 7 data sharing. Overall, the simplified, automated retrieval of data through the Microsoft Access database enables users to more easily manipulate and manage large amount of data efficiently. VIII. ACKNOWLEDGEMENTS We would like to acknowledge Dr. Ames, Dr. Burian, and Dr. Horsburgh for their guidance in this project and the tools that they provided through the Hydroinformatics course. We would also like to acknowledge Shannon Reynolds, PhD candidate at the University of Utah, for providing the raw data for both sites and presenting the problem associated with these data sets that became the basis for the project. IX. REFERENCES Conner, L., Ames, D.P., Gill, R.A., HydroServer Lite as an Open Source Solution for Archiving and Sharing Environmental Data for Independent University Labs, Ecological Informatics, Available online, September 5, 2013. Pending volume and issue assignment.http://www.sciencedirect.com/science/article/ pii/S1574954113000770 [2] Horsburgh, J. (2011). HydroServer Website: A Web application for presenting the capabilities of a HydroServer. 1-13., http://his.cuahsi.org/documents/HydroServerWebsiteSoft wareManual_8-26-2011.pdf [3] Kadlec, J., Ames, D. (2012). HydroServer Lite Installation Guide. Hydroinformatics. Lecture Materials on course Canvas site. Accessed November 7, 2013. [4] Venables, W.N., D.M. Smith, and the R Core Team (2013). An Introduction to R, Version 3.0.2, http://cran.rproject.org/doc/manuals/R-intro.pdf [1] 8 APPENDIX – USER’S MANUAL FOR SALT LAKE PRECIPITATION DATABASE I. IMPORTING NEW DATA FROM SALT LAKE COUNTY PRECIPITATION GAUGES The files that come from Salt Lake County rain gauges are text files with columns consisting of Date, Time, 24hr Time, Gauge #, Increment (inches), and Monthly Total (inches). In order to import these files into the database you must first open the text file and delete the first six rows that describe the date range and gauge number. The column headings are now in the first row of the text file. Save this file and remember its location. Figure 16 - Raw Salt Lake County Precipitation Data Figure 17 - Raw Precipitation Data with Headings Removed Next, open the rainfall database. The third tab from the right in the ribbon is called “External Data”. Click “External Data” then hit “Text File” in the Import and Link category. A “Get External data – Text File” box opens up. Browse for the text file you wish to upload. Then click on “Append a copy of the records to the table” and select “SLC_Raw_Data” from the drop down menu then hit OK. The next window asks if the data is delimited or fixed width. The text files that come from Salt Lake County are “Delimited” hit next. Now mark the box that says “First Row Contains Field Names”. A pop up message might appear that says the first row has names that can’t be used, just hit OK. The wizard will assign the correct names. Hit next again and then finish. Now that data was just added to the SLC_Raw_Data table. Double click on the table to see all of the data currently in the table. You will notice that there will be some blank fields. This is because of the formatting that the text files are in, but that’s fine because it will be cleaned up later, depending on what the end goal of the data is. 9 Figure 18 - Imported SLC Raw Data II. IMPORTING DATA FROM NCDC PRECIPITATION GAUGES Importing NCDC data into the database is slightly different than importing Salt Lake County data because of the different formats. Access has a hard time recognizing the data types and breaks in the NCDC data. The text files that come from NCDC rain gauges have column names of STATION, STATION_NAME, ELEVATION, LATITIUDE, DATE, HPCP, and Measurement Flag Quality Flag. Open the text file and delete both the column names and the row. Save this file and remember its location. Figure 19 - Raw NCDC Data File 10 Figure 20 - NCDC Precipitation Data with Column Names Deleted Next open the rainfall database. The third tab from the right in the ribbon is called “External Data”. Click “External Data” then hit “Text File” in the Import and Link category. A “Get External data – Text File” box opens up. Browse for the text file you wish to upload. In the next window check choose fixed width and click next. The wizard will try and guess where you want field breaks. Double click lines to remove them and single click to create a field break to end up with 9 columns and hit next. Figure 21 - Importing NCDC Text File Starting from the left to right name them, Station, Station_Name, Elevation, Latitude, Longitude, OldDate, Time, Hpcp, and Flags. Accordingly, be sure the data types are text, text, long integer, double, double, long integer, Date/Time, long integer, and text. Then hit “Next”, “Next” again and “Finish” (or change the name). The table should appear on the left, double click to view. One will notice the difference between importing data from NCDC and Salt Lake County is that when importing Salt Lake County data it is imported into an already existing table, but when importing NCDC data we create a new table. This is because access does not do a very good job at recognizing the data types and formats in the NCDC text files. Once the table is imported into the database we must perform a couple of queries to transform the date column into a useable format, as well as convert the Hpcp column into inches as it currently reads in hundredths of inches. We must also import this table into the NCDC_Raw_DataT table. Right click on the query “AddDataFromNew_NCDC_TableToNCDC_Raw_DataT” and click “design view”. Change the name of the table (in this example “SLC_IAP_1984_2011”) to the name of the table you imported in the table row. Be sure to do this in all fields that have a table specified. After the query is run, a message indicating how many rows are about to be added. If this looks correct, hit “OK”. 11 Figure 22 - Adding New NCDC Table to NCDC_Raw_DataT Now the data in this table has been added to the table “NCDC_Raw_DataT”. Be sure to check to see if the data has been added properly, if it has you can delete the old table as the “NCDC_Raw_DataT” will contain all of the data plus additional computations. Figure 23 - Imported NCDC_Raw_DataT 12 III. CREATE A FILE FROM SLC_Raw__Data OR NCDC_Raw_DataT FOR USE IN SWMM First, be sure there is no data in the “Export_TO_SWMMT”. If there is data, delete it with the “Delete_Export_To_SWMMQ” query. Next, double click on the “AddDataFromSLC_Raw_DataToExport_To_SMMMT” or “AddDataFromNCDC_Raw_DataTtoExport_To_SMMMT” query. It will ask if you want to run the query, click “Yes”. Next enter the gauge # you wish to export to SWMM (additional gauges can be added later) then “OK”. Figure 24 - Creating SWMM Data File Next enter the start date in the form mm/dd/yyyy then “OK” and enter the end date and “OK”. You are then asked if you are sure you want to append row(s), click “Yes”. Double click on the “Export_To_SWMMT” table to see your data. If additional data (more gauge’s, different time period) are wanted you can run the “AddDataFromSLC_Raw_DataToExport_To_SMMMT” query again and it will add additional data to the “Export_To_SWMMT” table. To export the data to SWMM click on the “External Data” tab in the ribbon. Then click on “Excel” or “Tex File” Depending on your preference (Excel file has fewer steps). If Excel is chosen just specify the name and then click “OK”. Open the document in Excel and save it as a .csv file. 13 Figure 25 - Creating SWMM Data File using Microsoft Excel If you choose to export into text format, choose file name hit click “OK”. Next be sure delimited is marked and click “Next”. Be sure Comma is marked and check ”Include Field Names on First Row”, then click “Next” and then “Finish”. You now have a comma separated text file useable in SWMM. 14 Figure 26 - Creating SWMM Data File with Text IV. CREATE A FILE FROM SLC_Raw_Data or NCDC_Raw_DataT Hydroserver Lite SUITABLE FOR HYDROSERVER LITE Follow the same steps as Create a file from SLC_Raw_Data suitable for SWMM software except run the “AddDataFromSLC_Raw_DataToExport_To_Hydroserver lite” or “AddDataFromNCDC_Raw_DataTtoExport_To_Hydroserver lite” query instead of the “AddDataFromSLC_Raw_DataToExport_To_SWMMT” query. All other steps are the same except once the excel file is created, open it and select the entire “LocalDateTime” column. Right click and select “Format Cells”. Under “custom” type in “yyyy-mm-dd hh:mm:ss” as the type. Save as a .csv file. V SEE DAILY RAINFALL VALUES Right click on the query “DailyRainfallValueSLC_Raw_Data_Crosstab” for Salt Lake County Data or “DailyRainfallValueNCDC_Raw_Data_Crosstab” for NCDC data and click on “design view”. Now change the Criteria under Gauge # and Date to include the correct gauges and date ranges that you are seeking. Click “Run”. This file shows the daily rainfall for the specified gauge and dates. This can be sorted any way you like and it can be exported as well. 15 Figure 27 - Daily Precipitation Values VI. FIND DUPLICATES Double click on the “Find duplicates for SLC_Raw_Data” “Find duplicates for NCDC_Raw_Data” to see data that has matching gauge #, date, & time. It is important to note that although duplicates may be shown in this query, it might not be duplicate data. For example at gauge #10 on 3/12/2007 at 14:30 it might have recorded .01 inches of rainfall two times. This doesn’t necessarily mean that it is the same actual recording though, because it may have recorded the rainfall at 14:29:31 and 14:30:29 but both would have been rounded to 14:30. However, this query is useful to look at to see trends of possible duplicate data. VII. DELETE SWMM OR HYDROSERVER LITE EXPORTABLE DATA Run the “Delete_Export_To_Hydroserver lite” or “Delete_Export_To_SWMM” queries to delete the data in these tables. Enter “1” when prompted (added as a safety measure). Note this does not delete the Raw_Data tables. VIII. FIND DATA BASE ON GAUGE #, DATE RANGE, AND PRECIPITATION VALUES Run the “EnterDateRangeAndGaugeIDNCDC_Raw_Data” or “EnterDateRangeAndGaugeIDSLC_Raw_Data” queries and follow the prompts. This will produce results that can be sorted or exported. 16 IX. REXAMPLE R QUERY SCRIPT library(RODBC) testdb <- file.path("C:\\Users\\stevo\\Dropbox\\HydroinformaticsGroup\\Salt Lake Rainfall Database.accdb") channel1 <- odbcConnectAccess2007(testdb) Prec1 <- sqlFetch(channel1, "DailyRainfallValueSLC_Raw_Data_Crosstab") dim(Prec1) library(Hmisc) describe(Prec1) newdata1 <- Prec1[1:600,] newdata1 Prec2 <- sqlFetch(channel1, "DailyRainfallValueNCDC_Raw_Data_Crosstab") dim(Prec2) library(Hmisc) describe(Prec2) newdata2 <- Prec2[1:300,] newdata2 summary(newdata1) summary(newdata2) x2 <- newdata2$NewDate x1 <- newdata1$Date y2 <- newdata2$"425892" y1 <- newdata1$"10" xlim <-range(c(x1,x2)) ylim <-range(c(y1,y2)) plot(x2,y2, col = "red", main="Daily Precipitation", xlab="Date", ylab="Precipitation Total (in.)", points(x1,y1,col = "black")) ticks=c(2008, 2009, 2010) axis(side = 1, at = ticks) minor.tick(nx=2,ny=4) legend("topleft", inset = .05, c("NCDC", "SLCo"), text.col=c("red","black"), pch = c(1,1), col=c("red","black")) summary(Prec1) summary(Prec2) totalRows1 <- nrow(Prec1) exceedvalue <- Prec1[ which(Prec1$"10"> .5), ] exceedRows1 <- nrow(exceedvalue) exceedencePercent1 <- 100*exceedRows1/totalRows1 exceedencePercent1 totalRows2 <- nrow(Prec2) exceedvalue <- Prec2[ which(Prec2$"425892"> .5), ] exceedRows2 <- nrow(exceedvalue) exceedencePercent2 <- 100*exceedRows2/totalRows2 exceedencePercent2 Prec1 Prec2 17