Download Precipitation Data Analysis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Database wikipedia , lookup

Big data wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Relational model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Clusterpoint wikipedia , lookup

Functional Database Model wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Database model wikipedia , lookup

Transcript
Precipitation Data Analysis
Database Creation for Raw Precipitation Data, Analysis, and Ease of Use
Steven Arhart, Kelly Chappell, Jaqueline Pacheco, Dan Tracer
Department of Civil and Environmental Engineering
University of Utah, Salt Lake City, Utah
and manipulation, and make the data available to the general
public.
Abstract—Large amounts of data have created a need for more
sophisticated data management software. The most common
type of program to organize these large amounts of data is a
database, which organizes data into a format that is easier for
computers to register. Datasets from precipitation gauges in and
around the Salt Lake Valley were obtained and imported into a
database in Microsoft Access. From here, erroneous data was
removed and the remaining data was organized. In Microsoft
Access, different queries were created to produce different results
and tables. These queries are designed to be simple and ask for
basic specifications such as dates and gauges. The software
program R was connected to the Access database to explore the
data sets and create daily precipitation plots and statistical
analysis of the precipitation data. A HydroServer Lite website
was also created that will allow a user to view the data and the
specific location of the gauge.
I.
INTRODUCTION
Figure 1 – Map of five precipitation gauges
The National Climate Data Center (NCDC) and Salt Lake
County (County) installed and maintains numerous weather
stations in the Salt Lake Valley and the surrounding mountains.
These devices record data for numerous weather criteria, such
as temperature, humidity, and precipitation. Using these
weather stations different goals and tools were achieved, which
can be seen below.
•
Create a database using Microsoft Access for the storage
and manipulation of precipitation data from the sites
mentioned above.
•
Demonstrate scripting abilities to connect the database to a
statistical analysis program like R.
•
Write queries to manipulate the data into a file type that is
supported by the storm water modeling software SWMM.
•
Establish a set of data analysis queries in Microsoft Access
that will enable a novice user to conduct data analysis with
no prior computer programming knowledge.
•
Populate a website that utilizes HydroServer Lite for easy
data retrieval of the precipitation data.
The map above shows the five weather stations that have
been selected for analysis of precipitation rates. Two stations
operated by the NCDC, one near Mountain Dell Reservoir
(gage number 425892) located in Parley’s Canyon, Utah and
one at the Salt Lake International Airport (gage number
427598). The stations operated by Salt Lake County are located
at the Rose Park Fire Station (gage number 10 at 285 N. 1000
W.), Red Butte Gardens (gage number 20 at 2200 E. Red Butte
Canyon Rd.), and the Salt Lake County Complex (gage number
30 at 2001 S. State St.). Precipitation data from these stations
were obtained in the “.txt” file format and then imported into a
database using Microsoft Access. This is a good way to store
large amounts of data and make it accessible for other
programs to query, analyze, or model the raw data. As shown
in figure 2, the data collected at the different monitoring sites is
written by the loggers as “.txt” files. These files are then
imported in to the Access database. This database will store all
of the data and make it available to sort, and analyze the data.
Depending on the end goal of the data it may then be exported
as a file suitable for HyroserverLite, SWMM, or be plotted and
further analyzed using R.
Microsoft Excel is the most popular program to handle large
amounts of data, but it was not created for that purpose and
lacks many of the tools and stability that databases possess.
Most researchers have not transitioned to the use of databases
and query language because of the steep learning curve
associated with them. This paper will enable a user with little
to no experience in database management and querying to
create a database with their raw data, perform simple querying
1
database. For example, R can access the table with just the
NCDC precipitation data. Once R is connected to the database
whether it’s in Microsoft Access, MySQL, SQLite, Oracle, etc.
there are many different types of analyses that can be
conducted using these programs.
For the rainfall datasets created in Microsoft Access, R was
used to analyze and create the following processes for simple
statistical purposes. Short examples of the script are shown in
the report as well as the full script in the appendix. The
following list is the procedure and process that was created as a
script in R.
a) Installing the package RODBC allows R to create a
pathway or connection to different DBMS.Through the
library(RODBC) script a channel was created with the
following script: odbcConnectAccess2007(testdb). Testdb is
the path to the precipitation database. Other functions for
connecting to a microsoft access database include
odbcConnect and odbcDriverConnect. OdbcConnect can be
created so that a username and password is required for access
to the database when opening R. The odbcDriverConnect
allows the user to choose different drivers such as mySQL,
Oracle, SQL server, etc.
Figure 2 – Data Path
This paper will detail the creation of a database using
Microsoft Access, manipulation of raw precipitation data from
several sites in and around Salt Lake City, instructions for
manipulations and querying of the data, and the creation of a
website that utilizes Hydroserver Lite to allow for data
querying and manipulation through a user’s web browser. The
Hydroserver lite program does not have as many analysis and
querying features, but provides a user with a simpler interface
to interact with and manipulate the data.
II.
b) Using the sqlFetch function creates a connection to the
following two queries from our database in Microsoft Access:
DailyRainfallValueSLC_Raw_Data_Crosstab and
DailyRainfallValueNCDC_Raw_Data_Crosstab.
MANIPULATION OF RAW PRECIPITATION DATA
In order to make the data useable, the must be converted
into a format that can be imported into Access where querying
and manipulation could be performed. The raw data files from
County precipitation gauges were obtained from Shannon
Reynolds, a PhD student at the University of Utah, who was
working with the data. The data was collected from weather
stations and automatically compiled on the County’s computers
for analysis. The data were column separated text files with
headings consisting of Date, Time, 24 hr Time, Gauge Number,
Increment, and Monthly Total. This data was imported into the
database with minimal manipulation and a full set of
instructions of how this was accomplished is located in Section
I of the Appendix. The raw NCDC data had varying data types
and break locations that required additional manipulation to get
into a format that Microsoft Access would recognize. Again, a
full set of instruction for a new user to accomplish this is
located in Section II of the Appendix.
c) A summary of this data was then construcated, which
shows the minimum, maximum, mean value, etc. An example
of these summaries of precipitation values is shown in Figure
3.
Using Access, the manipulated precipitation data can be
formatted so that it can be recognized by the storm water
modeling software Storm Water Management Modeling
(SWMM). Detailed instructions to complete this task using
SWMM and Excel are located in Section III of the Appendix.
III.
Figure 3 - Statistical Analysis of Precipitation Data
d) The next major command that was used was plot. This
command allows us to create a plot of the data and structure
the text, size, color, ticks, etc. Figure 4 - Precipitation
Comparison for 2009 shows a plot of the daily precipitation
totals for the NCDC and County rainfall gauges. A number of
observations and analysis can be achieved by looking at the
different plots shown below.
ANALYSIS USING R
Another tool that is useful when analyzing this data is the
software R. With basic knowledge of this program R is able to
connect to a variety of different database management systems
(DBMS) such as Microsoft Access. R can not only connect to
just the database but separate queries or tables with in the
2
From the summary, above, it can be seen that the data shown
in Figure 4 ranges from 03/18/2008 - 11/20/2009. Although
the plot shows more than 2009, this label only shows up
because this is the only change of calendar year in this data. In
comparison the range of data in Figure 5 extends from
01/01/2007 to 01/01/2010. This data range is represented
because the data from the array Prec1 is the basis of the plot
and is the controlling dataset. Precip1 is a selection of the
daily precipitation values from the Salt Lake County gauge
#10. Notice in Figure 5 - Precipitation Comparison for 2007
through 2010, the date range is much larger. This is because
the array Prec2 (Daily precipitation from NCDC gauge
425892) is now the controlling dataset.
Also, in both plots notice the horizontal lines that form from
the grouping of the red NCDC datasets. These lines form
because of the significant figures in the data. The daily data is
only presented as 0, 0.1, 0.2, 0.3…0.9, whereas the Salt Lake
County rain gauges are tipping buckets. These tipping buckets
record the data in a different manner allowing it to be two
significant figures.
Figure 5 - Precipitation Comparison for 2007 through 2010
Lastly, in Salt Lake City the monthly precipitation varies
drastically from month to month. June, July, and August are
dry months in the state of Utah; in other words, it is common
to go weeks without any precipitation. This can be seen in
Figure 5. For June, July and August the density or population
of data points is much less compared to months like February
through May.
e) The last part of the analysis using R was to calculate
the percent exceedence for the datasets. Using the annual and
monthly precipitation records from 1983-2010 provided by
NOAA, observations were made to determine a value for
comparing our data sets to that of historical data. From Figure
6 it can be seen that Salt Lake City averages 0.6 – 2.0 inches
of rain each month varying from wet to dry months. From this
data a precipitation value of 0.5 in/day was considered for
comparison. In comparison, June, July and August are dry
months and may only get .5-1.0 inches of rain over the entire
month.
Figure 6 – Monthly Precipitation from 1981-2010 for SLC
Running the lines of code, shown below, display the percent of
data points above 0.5 in/day. The County gauge recorded daily
rainfall greater than 0.5 inches 1% of the time. The NCDC
gauge recorded a mere 3.8% of the time.
Figure 4 - Precipitation Comparison for 2009
Figure 7 – Percent Exceedence of 0.5 in/day Precipitation
3
•
Therefore, R is a useful tool to analyze and present datasets
graphically. There are many different ways to represent the
same results, and there are minor changes that can completely
change the datasets and/or results. This report provides a brief
example of some of the numerous queries and manipulations
that can be performed on a database using R. The attached
appendices show specific queries and code that was used to
accomplish the results stated earlier and provides the reader
with an introduction to the program.
IV.
•
•
•
SHORT EXPLENATION OF TABLES AND QUERIES
The database is made up of several tables and queries, that will
sort, manipulate, and store the raw data. These queries and
details are explained further in the “User’s Manual” in the
appendix, but they will briefly be described here.
•
•
•
•
•
•
•
•
Delete_Export_To_HydroserverLite – Deletes the
data in the Export_To_HydroserverLite table to allow
for a different data set.
Delete_Export_To_SWIMM– Deletes the data in the
Export_To_SWMMT table.
EnterDateRangeAndGageNCDC_Raw_Data- Allows
user to input gage numbers and date ranges to sort
data, or weed out erroneous data. This can be sorted
and exported if desired.
Find duplicates for NCDC_Raw_Data- Allows user
to find duplicate data in the NCDC raw data table.
This data can then be deleted or be available for
further analysis, and can be exported and sorted if
desired.
Tables
SLC_Raw_Data – Stores all of the raw data from Salt
Lake County rain gages. Currently in database are
gages 10, 20, and 30; but additional gages can easily
be added.
NCDC_Raw_DataT – Stores all of the raw data from
NCDC rain gages. Currently in database are gages
425892, and 427598; but additional gages can easily
be added.
Export_To_SWMMT - This table is populated from
the NCDC and SLC raw data tables and is in a format
that is exportable to SWMM software.
Export_To_HydroserverLite - This table is populated
from the NCDC and SLC raw data tables and is in a
format that can be uploaded to HydroserverLite.
Queries
The SLC queries perform the same way as the NCDC
queries with exception to the
AddDataFromNewNCDC_Table
ToNCDC_Raw_DataT query . In order to avoid
repetitiveness only the NCDC queries are described
below.
AddDataFromNCDC_Raw_DataToExport_ToHydro
serverLite – Populates the
Export_To_HydroserverLite table from the
NCDC_Raw_DataT table by gage number and date
range.
AddDataFromNCDC_Raw_DataTtoExport_ToSWM
MT- Populates the Export_To_SWMMT table from
the NCDC_Raw_DataT table by gage number and
date range.
AddDataFromNewNCDC_Table
ToNCDC_Raw_DataT – Adds’ data from a newly
imported NCDC table to the NCDC_Raw_DataT
table.
DailyRainfallValueNCDC_Raw_Data_CrosstabComputes daily rainfall values by gage number and
date range. Can be sorted or exported if desired.
Figure 8 - Tables and Queries
V.
EXAMPLE OF SQL STATEMENTS
A significant amount of SQL script was required to create the
different queries in the database. We will not attempt to go
through all of the SQL statements in detail, but will briefly
discuss a few different examples to highlight the SQL script
and thought process.
4
Hour_, Minute([Time]) AS Minute_,
NCDC_Raw_DataT.[Hpcp/100] AS Precip_Value
FROM NCDC_Raw_DataT
WHERE (((Right([Station],6))=[Enter Gauge
#]) AND ((NCDC_Raw_DataT.[Hpcp/100])<999)
AND ((NCDC_Raw_DataT.NewDate)>=[Enter
Start Date] And
(NCDC_Raw_DataT.NewDate)<=[Enter End
Date]));
Once a NCDC text file is imported as a table in the database, it
is convenient to add this data to one table that includes all of
the NCDC data (Raw_Data) because queries and additional
tables can be based on this Raw_Data table instead of many
tables and unions. This helps protect the database as well as
make it more user friendly. Several inherent problems that
come with the NCDC data is the date format, Station format,
and the hourly precipitation (HPCP) value. The date that
comes from the text file is in the format yyyymmdd (i.e.
19671205). Access only recognizes this as an integer and not
a date. The station that comes from NCDC is a text (i.e.
COOP:425892). In order to make sure that SWMM and other
software will recognize this as a gauge number the COOP:
needs to be dropped. Additionally the HPCP values are in
hundredths of inches and should be converted to inches for
accurate analysis. To fix these problems an append query was
created to add the data from the newly imported table (in this
example SLC_IAP_1948_2011) to the “NCDC_Raw_DataT”
table while adding three new fields Hpcp/100 (converts
original Hpcp to inches), NewDate (converts the OldDate to
display as mm/dd/yyyy), and Gauge #, (converts Station to an
integer number). The sample SQL code is given below:
Figure 10 – SQL statement
Hydroserver lite also requires a specific format of data in order
to work. The fields are LocalDateTime in yyyy-mm-dd
hh:mm:ss, and DataValue. So a query had to be built to take
the date and time fields in Raw_Data convert them to one field
yyyy:mm:dd hh:mm:ss, and add this data to the
Export_To_Hydroserver lite table. From this table a .csv file
can be created to upload to HydroserverLite, The sample SQL
code is given below:
INSERT INTO Export_To_Hydroserver lite (
LocalDateTime, DataValue )
SELECT [NewDate] & " " & [Time] AS Expr1,
NCDC_Raw_DataT.[Hpcp/100]
FROM NCDC_Raw_DataT
WHERE (((NCDC_Raw_DataT.[Hpcp/100])<999)
AND ((NCDC_Raw_DataT.[Gauge #])=[Enter
Gauge #]) AND
((NCDC_Raw_DataT.NewDate)>=[Enter Start
Date] And
(NCDC_Raw_DataT.NewDate)<=[Enter End
Date]));
INSERT INTO NCDC_Raw_DataT ( Station,
Station_Name, Elevation, Latitude,
Longitude, OldDate, [Time], Hpcp, Flag,
[Hpcp/100], NewDate, [Gauge #] )
SELECT SLC_IAP_1948_2011.Station AS
Expr1, SLC_IAP_1948_2011.Station_Name AS
Expr2, SLC_IAP_1948_2011.Elevation AS
Expr3, SLC_IAP_1948_2011.Latitude AS
Expr4, SLC_IAP_1948_2011.Longitude AS
Expr5, SLC_IAP_1948_2011.OldDate AS
Expr6, SLC_IAP_1948_2011.Time AS Expr7,
SLC_IAP_1948_2011.Hpcp AS Expr8,
SLC_IAP_1948_2011.Flags AS Expr9,
[Hpcp]/100 AS Expr9, Mid([OldDate],5,2) &
"/" & Right([OldDate],2) & "/" &
Left([OldDate],4) AS Expr11,
Right([Station],6) AS Expr12
FROM SLC_IAP_1948_2011;
Figure 11– SQL statement
The DailyRainfallValue queries are useful when wanting to
see the daily rainfall values from a specific gauge and date
range. In order to create this query, it is needed to group all of
the data taken from a single day and add up the precipitation
values then use the pivot command. The sample SQL code is
given below:
TRANSFORM Sum(NCDC_Raw_DataT.[Hpcp/100])
AS [SumOfHpcp/100]
SELECT NCDC_Raw_DataT.NewDate
FROM NCDC_Raw_DataT
WHERE (((NCDC_Raw_DataT.[Hpcp/100])<999)
AND ((NCDC_Raw_DataT.[Gauge #])=425892)
AND ((NCDC_Raw_DataT.NewDate)>=#1/1/2007#
And
(NCDC_Raw_DataT.NewDate)<=#1/1/2010#))
GROUP BY NCDC_Raw_DataT.NewDate,
NCDC_Raw_DataT.[Hpcp/100]
ORDER BY NCDC_Raw_DataT.[Gauge #]
PIVOT NCDC_Raw_DataT.[Gauge #];
Figure 9 – SQL statement
To upload this data into SWMM the needed fields are
Gauge_ID, Year_, Month_, Day_, Hour, Minute, and
Precip_Value. The following SQL statement shows how to
break the date and time fields into multiple fields and add this
data to the Export_To_SWMMT table. This also excludes any
999 or higher precipitation values. The sample SQL code is
given below:
INSERT INTO Export_To_SWMMT ( Gauge_ID,
Year_, Month_, Day_, Hour_, Minute_,
Precip_Value )
SELECT Right([Station],6) AS Gauge_ID,
Year([NewDate]) AS Year_,
Month([NewDate]) AS Month_,
Day([NewDate]) AS Day_, Hour([Time]) AS
Figure 12 – SQL statement
5
VI.
WEBSITE UTILIZING HYDROSERVER LITE
HydroServer Lite (HydroServer) is standalone hydrological
data publishing tool developed by CUAHSI that is fully
compatible with CUAHSI HIS and HydroDesktop.
HydroServer websites provide user-friendly platforms for
public access of data that require no installation by the user.
HIS Central, a public catalogue of public HydroServers,
provides a wider opportunity for data sharing as the
HydroServers are tagged and made searchable. Uploading
data on a HydroServer provides an additional source of
storage of the data that will be available longer compared to
data stored solely on hardware.
Installation of a HydroServer is relatively simple and requires
MySQL and PHP support from the webhosting site. The
webhosting site, http://www.000webhost.com, was used to
host http://www.jpacheco.comuv.com/client, which contains
the precipitation data for the NCDC and the County sites.
HydroServer installation must be successfully completed to
customize the appearance of the page; this includes but is not
limited to the page background, banner, and home page image.
Notice the images above and the customization of the two
logos. Before any data can be uploaded to the HydroServer the
sites, variable, source, and method information must be
defined. The images below show the different Hydroserver
menus that are used for customization and adding data. These
different process involving the sites, variables, sources,
methods can be seen under the database management tab.
Another feature of Hyrdoserver is the ability to add other users
like teachers and students. Different levels of authorization
can be granted per user, but the basic user will have a
username, password, and the ability to add their own data to
the Hydroserver client. Although this can be useful for group
work, only the administrators are able to work with the
database management tab.
Figure 13- Hydroserver Map & Menu
Once data is uploaded into Hydroserver it will show up in the
map on the main home screen. In the image above notice the
five points which represent the different precipitation gauges.
The data for each gauge was uploaded from a .csv file that
Microsoft Access created. The queries in Microsoft Access
provide automated retrieval of comma separated variable files
for use in HydroServer. The .csv files must be formatted as
LocalDateTime,DateValue where the LocalDateTime is yyyymm-dd hh:mm:ss. HydroServer is a good tool for sharing
data, however, only post-1970 data is compatible. Any pre1970 data will generate an “invalid year for date on row 1”
error. The County sites, contained data from 2007-2012 and
were successfully uploaded to the HydroServer.
Mountain
Dell contained data from 1967-2011 and 93% of the data was
uploaded. Salt Lake International Airport contained data from
1948-2011 and only 75% of the data was compatible with the
HydroServer.
6
Data retrieval from the HydroServer is simple and only
requires the user to click on a site location from the map
shown in figure 13, shown above. Once the site is selected the
site information, data plot, and data table are available.
Different variables and methods can be chosen as well. The
site information includes latitude, longitude, and different
measurements taken at that site. This information could be
useful for stream gauges that are collecting data for many
different variables like stream flow, temperature, pH,
dissolved oxygen, and nutrients.
Figure 15 – Hydroserver Plot
With these different tools acquired in Hydroserver a
researcher, student, professor, etc. is able to add and share
data. Also, this is a good way to store or back up important
data files that one may not want to lose. Groups or
organizations with usernames and passwords can access and
use this platform to import and share data. Other users are only
able to view and download the data making it more secure.
VII. CONCLUSION
Server based databases are ideal for storing large amounts of
data logger precipitation data files. These files can become
overwhelming and get lost if not properly managed. An
advantage of storing data on servers is that the data will still be
accessible compared to data that is stored on hardware that
becomes obsolete.
Knowledge of database server
management and structured query language is required to
successfully establish, manage, and use this type of database.
Figure 14 – Accessing Precipitation Data via Hydroserver Lite
The data plot and table are easily accessible and can be
changed to show different date ranges. For example, the image
below shows to ranges of data. The very bottom shows the
entire dataset; by changing the viewing window on that the
plot will shift according to the chosen date range. Once the
desired plot is shown it can be exported as a JPEG, PNG,
PDF, or SVG.
Students, instructors, and researchers with little to no
knowledge of server databases can use Microsoft Access to
create desktop databases. Microsoft Access databases are
portable and can be easily accessed via any computer that has
Microsoft Access installed. The established database used to
store the NCDC and County precipitation data files was used
to format the raw data files for compatibility. Additionally,
the data analysis queries in Microsoft Access will enable
someone with no programing knowledge to conduct simple
analysis through the database. The User’s Manual for Salt
Lake Precipitation Database, in the Appendix, provides
instructions for users to recreate a similar database and update
it by importing new data.
R can be used to perform more complex statistical analysis
once connected to the Microsoft Access database. Additional
queries in the database export data into files that are supported
by SWMM and HydroServer Lite providing a wider range of
7
data sharing. Overall, the simplified, automated retrieval of
data through the Microsoft Access database enables users to
more easily manipulate and manage large amount of data
efficiently.
VIII. ACKNOWLEDGEMENTS
We would like to acknowledge Dr. Ames, Dr. Burian, and Dr.
Horsburgh for their guidance in this project and the tools that
they provided through the Hydroinformatics course. We
would also like to acknowledge Shannon Reynolds, PhD
candidate at the University of Utah, for providing the raw data
for both sites and presenting the problem associated with these
data sets that became the basis for the project.
IX.
REFERENCES
Conner, L., Ames, D.P., Gill, R.A., HydroServer Lite as
an Open Source Solution for Archiving and Sharing
Environmental Data for Independent University
Labs, Ecological
Informatics,
Available
online,
September 5, 2013. Pending volume and issue
assignment.http://www.sciencedirect.com/science/article/
pii/S1574954113000770
[2] Horsburgh, J. (2011). HydroServer Website: A Web
application for presenting the capabilities of a
HydroServer.
1-13.,
http://his.cuahsi.org/documents/HydroServerWebsiteSoft
wareManual_8-26-2011.pdf
[3] Kadlec, J., Ames, D. (2012).
HydroServer Lite
Installation Guide. Hydroinformatics. Lecture Materials
on course Canvas site. Accessed November 7, 2013.
[4] Venables, W.N., D.M. Smith, and the R Core Team
(2013). An Introduction to R, Version 3.0.2, http://cran.rproject.org/doc/manuals/R-intro.pdf
[1]
8
APPENDIX – USER’S MANUAL FOR SALT LAKE PRECIPITATION DATABASE
I.
IMPORTING NEW DATA FROM SALT LAKE COUNTY PRECIPITATION GAUGES
The files that come from Salt Lake County rain gauges are text files with columns consisting of Date, Time, 24hr Time, Gauge #,
Increment (inches), and Monthly Total (inches). In order to import these files into the database you must first open the text file
and delete the first six rows that describe the date range and gauge number. The column headings are now in the first row of the
text file. Save this file and remember its location.
Figure 16 - Raw Salt Lake County Precipitation Data
Figure 17 - Raw Precipitation Data with Headings Removed
Next, open the rainfall database. The third tab from the right in the ribbon is called “External Data”. Click “External Data” then
hit “Text File” in the Import and Link category. A “Get External data – Text File” box opens up. Browse for the text file you
wish to upload. Then click on “Append a copy of the records to the table” and select “SLC_Raw_Data” from the drop down
menu then hit OK. The next window asks if the data is delimited or fixed width. The text files that come from Salt Lake County
are “Delimited” hit next. Now mark the box that says “First Row Contains Field Names”. A pop up message might appear that
says the first row has names that can’t be used, just hit OK. The wizard will assign the correct names. Hit next again and then
finish. Now that data was just added to the SLC_Raw_Data table. Double click on the table to see all of the data currently in the
table. You will notice that there will be some blank fields. This is because of the formatting that the text files are in, but that’s
fine because it will be cleaned up later, depending on what the end goal of the data is.
9
Figure 18 - Imported SLC Raw Data
II.
IMPORTING DATA FROM NCDC PRECIPITATION GAUGES
Importing NCDC data into the database is slightly different than importing Salt Lake County data because of the different
formats. Access has a hard time recognizing the data types and breaks in the NCDC data. The text files that come from NCDC
rain gauges have column names of STATION, STATION_NAME, ELEVATION, LATITIUDE, DATE, HPCP, and
Measurement Flag Quality Flag. Open the text file and delete both the column names and the row. Save this file and remember
its location.
Figure 19 - Raw NCDC Data File
10
Figure 20 - NCDC Precipitation Data with Column Names Deleted
Next open the rainfall database. The third tab from the right in the ribbon is called “External Data”. Click “External Data” then
hit “Text File” in the Import and Link category. A “Get External data – Text File” box opens up. Browse for the text file you
wish to upload. In the next window check choose fixed width and click next. The wizard will try and guess where you want field
breaks. Double click lines to remove them and single click to create a field break to end up with 9 columns and hit next.
Figure 21 - Importing NCDC Text File
Starting from the left to right name them, Station, Station_Name, Elevation, Latitude, Longitude, OldDate, Time, Hpcp, and
Flags. Accordingly, be sure the data types are text, text, long integer, double, double, long integer, Date/Time, long integer, and
text. Then hit “Next”, “Next” again and “Finish” (or change the name). The table should appear on the left, double click to view.
One will notice the difference between importing data from NCDC and Salt Lake County is that when importing Salt Lake
County data it is imported into an already existing table, but when importing NCDC data we create a new table. This is because
access does not do a very good job at recognizing the data types and formats in the NCDC text files.
Once the table is imported into the database we must perform a couple of queries to transform the date column into a useable
format, as well as convert the Hpcp column into inches as it currently reads in hundredths of inches. We must also import this
table into the NCDC_Raw_DataT table. Right click on the query “AddDataFromNew_NCDC_TableToNCDC_Raw_DataT” and
click “design view”. Change the name of the table (in this example “SLC_IAP_1984_2011”) to the name of the table you
imported in the table row. Be sure to do this in all fields that have a table specified. After the query is run, a message indicating
how many rows are about to be added. If this looks correct, hit “OK”.
11
Figure 22 - Adding New NCDC Table to NCDC_Raw_DataT
Now the data in this table has been added to the table “NCDC_Raw_DataT”. Be sure to check to see if the data has been added
properly, if it has you can delete the old table as the “NCDC_Raw_DataT” will contain all of the data plus additional
computations.
Figure 23 - Imported NCDC_Raw_DataT
12
III.
CREATE A FILE FROM SLC_Raw__Data OR NCDC_Raw_DataT FOR USE IN SWMM
First, be sure there is no data in the “Export_TO_SWMMT”. If there is data, delete it with the “Delete_Export_To_SWMMQ”
query.
Next,
double
click
on
the
“AddDataFromSLC_Raw_DataToExport_To_SMMMT”
or
“AddDataFromNCDC_Raw_DataTtoExport_To_SMMMT” query. It will ask if you want to run the query, click “Yes”. Next
enter the gauge # you wish to export to SWMM (additional gauges can be added later) then “OK”.
Figure 24 - Creating SWMM Data File
Next enter the start date in the form mm/dd/yyyy then “OK” and enter the end date and “OK”. You are then asked if you are sure
you want to append row(s), click “Yes”. Double click on the “Export_To_SWMMT” table to see your data. If additional data
(more gauge’s, different time period) are wanted you can run the “AddDataFromSLC_Raw_DataToExport_To_SMMMT” query
again and it will add additional data to the “Export_To_SWMMT” table. To export the data to SWMM click on the “External
Data” tab in the ribbon. Then click on “Excel” or “Tex File” Depending on your preference (Excel file has fewer steps). If Excel
is chosen just specify the name and then click “OK”. Open the document in Excel and save it as a .csv file.
13
Figure 25 - Creating SWMM Data File using Microsoft Excel
If you choose to export into text format, choose file name hit click “OK”. Next be sure delimited is marked and click “Next”. Be
sure Comma is marked and check ”Include Field Names on First Row”, then click “Next” and then “Finish”. You now have a
comma separated text file useable in SWMM.
14
Figure 26 - Creating SWMM Data File with Text
IV.
CREATE A FILE FROM SLC_Raw_Data or NCDC_Raw_DataT Hydroserver Lite SUITABLE FOR
HYDROSERVER LITE
Follow the same steps as Create a file from SLC_Raw_Data suitable for SWMM software except run the
“AddDataFromSLC_Raw_DataToExport_To_Hydroserver lite” or “AddDataFromNCDC_Raw_DataTtoExport_To_Hydroserver
lite” query instead of the “AddDataFromSLC_Raw_DataToExport_To_SWMMT” query. All other steps are the same except
once the excel file is created, open it and select the entire “LocalDateTime” column. Right click and select “Format Cells”.
Under “custom” type in “yyyy-mm-dd hh:mm:ss” as the type. Save as a .csv file.
V
SEE DAILY RAINFALL VALUES
Right click on the query “DailyRainfallValueSLC_Raw_Data_Crosstab” for Salt Lake County Data or
“DailyRainfallValueNCDC_Raw_Data_Crosstab” for NCDC data and click on “design view”. Now change the Criteria under
Gauge # and Date to include the correct gauges and date ranges that you are seeking. Click “Run”. This file shows the daily
rainfall for the specified gauge and dates. This can be sorted any way you like and it can be exported as well.
15
Figure 27 - Daily Precipitation Values
VI.
FIND DUPLICATES
Double click on the “Find duplicates for SLC_Raw_Data” “Find duplicates for NCDC_Raw_Data” to see data that has matching
gauge #, date, & time. It is important to note that although duplicates may be shown in this query, it might not be duplicate data.
For example at gauge #10 on 3/12/2007 at 14:30 it might have recorded .01 inches of rainfall two times. This doesn’t necessarily
mean that it is the same actual recording though, because it may have recorded the rainfall at 14:29:31 and 14:30:29 but both
would have been rounded to 14:30. However, this query is useful to look at to see trends of possible duplicate data.
VII.
DELETE SWMM OR HYDROSERVER LITE EXPORTABLE DATA
Run the “Delete_Export_To_Hydroserver lite” or “Delete_Export_To_SWMM” queries to delete the data in these tables. Enter
“1” when prompted (added as a safety measure). Note this does not delete the Raw_Data tables.
VIII.
FIND DATA BASE ON GAUGE #, DATE RANGE, AND PRECIPITATION VALUES
Run the “EnterDateRangeAndGaugeIDNCDC_Raw_Data” or “EnterDateRangeAndGaugeIDSLC_Raw_Data” queries and follow
the prompts. This will produce results that can be sorted or exported.
16
IX.
REXAMPLE R QUERY SCRIPT
library(RODBC)
testdb <- file.path("C:\\Users\\stevo\\Dropbox\\HydroinformaticsGroup\\Salt Lake Rainfall Database.accdb")
channel1 <- odbcConnectAccess2007(testdb)
Prec1 <- sqlFetch(channel1, "DailyRainfallValueSLC_Raw_Data_Crosstab")
dim(Prec1)
library(Hmisc)
describe(Prec1)
newdata1 <- Prec1[1:600,]
newdata1
Prec2 <- sqlFetch(channel1, "DailyRainfallValueNCDC_Raw_Data_Crosstab")
dim(Prec2)
library(Hmisc)
describe(Prec2)
newdata2 <- Prec2[1:300,]
newdata2
summary(newdata1)
summary(newdata2)
x2 <- newdata2$NewDate
x1 <- newdata1$Date
y2 <- newdata2$"425892"
y1 <- newdata1$"10"
xlim <-range(c(x1,x2))
ylim <-range(c(y1,y2))
plot(x2,y2, col = "red", main="Daily Precipitation", xlab="Date", ylab="Precipitation Total (in.)",
points(x1,y1,col = "black"))
ticks=c(2008, 2009, 2010)
axis(side = 1, at = ticks)
minor.tick(nx=2,ny=4)
legend("topleft", inset = .05, c("NCDC", "SLCo"),
text.col=c("red","black"), pch = c(1,1), col=c("red","black"))
summary(Prec1)
summary(Prec2)
totalRows1 <- nrow(Prec1)
exceedvalue <- Prec1[ which(Prec1$"10"> .5), ]
exceedRows1 <- nrow(exceedvalue)
exceedencePercent1 <- 100*exceedRows1/totalRows1
exceedencePercent1
totalRows2 <- nrow(Prec2)
exceedvalue <- Prec2[ which(Prec2$"425892"> .5), ]
exceedRows2 <- nrow(exceedvalue)
exceedencePercent2 <- 100*exceedRows2/totalRows2
exceedencePercent2
Prec1
Prec2
17