Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics in WR: Lecture 1 • Key Themes – Knowledge discovery in hydrology – Introduction to probability and statistics – Definition of random variables • Reading: Helsel and Hirsch, Chapter 1 How is new knowledge discovered? After completing the Handbook of Hydrology in 1993, I asked myself the question: how is new knowledge discovered in hydrology? I concluded: • By deduction from existing knowledge • By experiment in a laboratory • By observation of the natural environment Deduction – Isaac Newton • Deduction is the classical path of mathematical physics – Given a set of axioms – Then by a logical process – Derive a new principle or equation • In hydrology, the St Venant equations for open channel flow and Richard’s equation for unsaturated flow in soils were derived in this way. Three laws of motion and law of gravitation http://en.wikipedia.org/wiki/Isaac_Newton (1687) Experiment – Louis Pasteur • Experiment is the classical path of laboratory science – a simplified view of the natural world is replicated under controlled conditions • In hydrology, Darcy’s law for flow in a porous medium was found this way. Pasteur showed that microorganisms cause disease & discovered vaccination Foundations of scientific medicine http://en.wikipedia.org/wiki/Louis_Pasteur Observation – Charles Darwin • Observation – direct viewing and characterization of patterns and phenomena in the natural environment • In hydrology, Horton discovered stream scaling laws by interpretation of stream maps Published Nov 24, 1859 Most accessible book of great scientific imagination ever written Conclusion for Hydrology • Deduction and experiment are important, but hydrology is primarily an observational science • discharge, water quality, groundwater, measurement data collected to support this. Great Eras of Synthesis • Scientific progress occurs continuously, but there are great eras of synthesis – many developments happening at once that fuse into knowledge and fundamentally change the science 2020 2000 Hydrology (synthesis of water observations leads to knowledge synthesis) 1980 1960 Geology (observations of seafloor magnetism lead to plate tectonics) 1940 1920 1900 Physics (relativity, structure of the atom, quantum mechanics) Hydrologic Science It is as important to represent hydrologic environments precisely with data as it is to represent hydrologic processes with equations Physical laws and principles (Mass, momentum, energy, chemistry) Hydrologic Process Science (Equations, simulation models, prediction) Hydrologic conditions (Fluxes, flows, concentrations) Hydrologic Information Science (Observations, data models, visualization Hydrologic environment (Physical earth) A sea change in computing Massive Data Sets Federation, Integration, Collaboration Evolution of Many-core and Multicore Parallelism everywhere The power of the Client + Cloud Access Anywhere, Any Time There will be more scientific data generated in the next five years than in the history of humankind What will you do with 100 times more computing power? Distributed, loosely-coupled, applications at scale across all devices will be the norm Slide from Jeff Dozier, UCSB Emergence of a fourth research paradigm 1. Thousand years ago – Experimental Science – 2. Description of natural phenomena Last few hundred years – Theoretical Science – 3. Newton’s Laws, Maxwell’s Equations… Last few decades – Computational Science – 4. Simulation of complex phenomena Today – Data-Intensive Science – Scientists overwhelmed with data sets from many different sources • • • – Data captured by instruments Data generated by simulations Data generated by sensor networks eScience is the set of tools and technologies to support data federation and collaboration • • • For analysis and data mining For data visualization and exploration For scholarly communication and dissemination (With thanks to Jim Gray) Slide from Jeff Dozier, UCSB 2 . 4G c2 a a 3 a2 Data Cube – What, Where, When Time, T “When” A data value D Space, L “Where” Variable, V “What” Continuous Space-Time Data Model -NetCDF Time, T Coordinate dimensions {X} D Space, L Variables, V Variable dimensions {Y} Discrete Space-Time Data Model Time, TSDateTime TSValue Space, FeatureID Variables, TSTypeID Hydrologic Statistics Time Series Analysis Geostatistics Multivariate analysis How do we understand space-time correlation fields of many variables? 288 USGS sites with flow and Nitrogen data These sites are ones that were used for the Sparrow model that continue to be operational to 2008 http://water.usgs.gov/nawqa/sparrow/ Colorado River at Austin, Tx (08158000) Mean Annual Flow Mean Annual Flow, Colorado River at Austin (1929-2008) 8000 7000 Discharge (cfs) 6000 5000 4000 3000 2000 1000 0 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 2020 Is there a relation between flow and water quality? Mean Annual Flow, Colorado River at Austin (1929-2008) 6000 4000 2000 0 1920 Colorado River at Austin 1940 1960 1980 2000 3.52020 3 Total Nitrogen (mg/l) Discharge (cfs) 8000 2.5 Total Nitrogen in water 2 1.5 1 0.5 0 Jun-68 Dec-73 May-79 Nov-84 May-90 Oct-95 Are Annual Flows Correlated? Correlation of Annual Flows (Colorado River at Austin) 8000 7000 Last Year's Discharge (cfs) 6000 5000 4000 3000 2000 1000 0 0 1000 2000 3000 4000 5000 This Year's Discharge (cfs) 6000 7000 8000