Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Theoretical computer science wikipedia , lookup
Geographic information system wikipedia , lookup
Predictive analytics wikipedia , lookup
Neuroinformatics wikipedia , lookup
Pattern recognition wikipedia , lookup
Regression analysis wikipedia , lookup
Corecursion wikipedia , lookup
Steps 3 & 4: Analysis & interpretation of evidence Hypothesis testing: a simple scenario SCENARIO 1 • DO measured upstream & downstream over 9 months SCENARIO 2 • Upstream = 9.3 mg/L — Downstream = 8.4 mg/L Upstream = 7.9 mg/L — Downstream = 4.2 mg/L — • Difference significant at P<0.05 DO measured upstream & downstream over 3 months — • Difference not significant at P<0.05 Which scenario presents a stronger case for DO causing impairment? 2 Using statistics responsibly Use caution in interpreting differences • Look at magnitude & consistency of differences, rather than statistical significance • Statistical significance detects differences exceeding natural variance — Does not detect stressor effects — Does not equal biological significance • Can use statistics, but also use your head — Consider relationship between minimum detectable difference (power) & biologically relevant difference 3 That said, graphs & statistics can help a lot… STEP 2 STEP 3 STEP 4 List candidate causes Evaluate data from case Evaluate data from elsewhere Regression analysis Predicting environmental condition from biological observations (PECBO) Scatterplots Boxplots Correlation analysis Conditional probability analysis Classification & regression tree Species sensitivity distributions (SSDs) Quantile regression 4 5 6 7 8 Sampling/data collection issues • Want to collect candidate cause & biological response variables at same sites, at same times • Want to time sampling to accurately assess exposures • — For continuous exposures, try to catch sensitive life stages & conditions that enhance exposure or effects — For episodic exposures, try to characterize episodes through continuous, event, and post-event sampling Want to collect good reference comparisons, in space & time — At unimpaired sites, at same times you sample impaired sites 9 10 Data analysis tools • Species sensitivity distribution (SSD) generator — Tool that calculates & plots proportion of species affected at different levels of exposure in laboratory toxicity tests 11 Data analysis tools • Species sensitivity distribution (SSD) generator — • Command-line R scripts — — • Tool that calculates & plots proportion of species affected at different levels of exposure in laboratory toxicity tests Powerful and free statistical package [http://www.r-project.org/] R scripts provided for PECBO (predicting environmental conditions from biological observations) CADStat — Menu-driven package of data visualization & statistical techniques, based on JGR (Java GUI for R) 12 R R is a powerful and free statistical package… [http://www.r-project.org/] …but it’s run at the command line, so there’s a steep learning curve 13 JGR/CADStat: a gentler introduction to R • • JGR provides point-and-click tools for basic R functions — Loading data – Scatterplots — Setting up a workspace – Boxplots — Managing data in your workspace Correlation analysis — Installing packages – – Linear regression – Quantile regression CADStat adds several tools to JGR — Point-and-click interfaces for tools useful in causal analysis – Conditional probability analysis — Rendering of analysis results in browser window – PECBO — Additional help files 14 Load your data into CADStat, or use example data… 15 16 Where are we? • Step 1 — Biological effects of interest defined — Impaired & reference sites identified • Step 2 — Map completed — Conceptual model developed — Candidate causes identified — Data identified & collected Now we’re ready to analyze available data & use results as evidence in Steps 3 & 4 17 Begin with data exploration • Want to examine relationships between different variables — Linear & more complex — Expected & unexpected • 1st step is visualizing data — Scatterplots & boxplots — No statistics, hypotheses, or p-values 18 Looking at your data… examples from exercise PC1 – reference PC2 – impaired Dissolved oxygen (mg/L) 7.3 7.9 Canopy cover moderate low EPT taxa richness over time 2004 2005 2006 2007 PC1 18 16 15 16 PC2 8 9 13 15 In Fall 2004, an unpermitted industrial discharge with high metal concentrations was discovered and removed, just upstream of PC2. • Which candidate cause do the data relate to? • Do the data support or weaken the case for the candidate cause? • What type of evidence do the data represent? 19 Boxplots • Depict distribution of observations – – Center box indicates spread of 50% of observations Whiskers indicate either max & min values, or 1.5X the interquartile range • Good data exploration & visualization tool • Can be used to evaluate data from case & data from elsewhere – – – Spatial/temporal co-occurrence Stressor-response relationships from the field or from other field studies Causal pathway 20 Scatterplots – • % Non-insects 12 H G D E C B F 200 400 600 800 A 1000 Conductivity (uS/cm) Good data exploration & visualization tool – • Dependent variable (y-axis) vs. independent variable (x-axis) Often dependent variable is biological response, independent variable is measure of candidate cause 10 – 8 Graphical displays of matched data 6 • EPT Richness 14 I Can view several at once in scatterplot matrix Can be used to evaluate data from case & from elsewhere – – Stressor-response relationships from the field, from other field studies, & from laboratory studies Causal pathway 21 Examples from exercise: scatterplots EPT taxa richness 20 15 PC1 10 5 PC2 0 15 20 25 30 Maximum summer water temperature (˚C) • Do the data support or weaken the case for temperature as a candidate cause? • What type(s) of evidence do these data provide? 22 EPT taxa richness Examples from exercise: scatterplots 20 20 15 15 10 10 5 5 0 0 0 1 2 Log [Zn], ug/L 3 0 1 2 Log [Cu], ug/L 3 • Do the data support or weaken the cases for copper & zinc as candidate causes? • What type(s) of evidence do these data provide? 23 Correlation analysis • Method for measuring degree of association between 2 variables in a matched data set – Correlation coefficient provides quantitative measure of degree to which 2 variables co-vary Elevation Water Temp Air Temp Log Conductivity 1 -0.75 -0.25 -0.02 Water Temp -0.75 1 0.64 0.07 Air Temp -0.25 0.64 1 0.39 Log Conductivity -0.02 0.07 0.39 1 Elevation • Good data exploration tool • Can be used to evaluate data from case – – Stressor-response relationships from the field Causal pathway 24 Regression analysis • Method for quantifying relationship between dependent & independent variable – • Can have ≥ 1 independent variables Models can be used to: – Predict value of response variable for new values of explanatory variable – Estimate value of explanatory variable needed to account for change in response variable – Model natural variation in stressors, to define reference expectations 90th percentile prediction limits 25 Quantile regression • Models relationship between specified quantile of response variable and explanatory variable (stressor) – – 50th quantile gives median line, under which 50% of observed responses occur 90th quantile gives line under which 90% of observed responses occur • Provides means of estimating location of upper boundary on scatterplot • Used to evaluate stressor-response relationships from other field studies (Step 4: Evaluating data from elsewhere) 26 Applying quantile regression SUPPORTS WEAKENS 27 Predicting environmental conditions from biological observations (PECBO) • • Uses taxa list & niche characteristics to predict environmental conditions at site Maximum likelihood inference based on taxonenvironment relationships Predicted temperature if both Ameletus & Diphetor are present at site Ameletus Diphetor • Useful when environmental data not available 28 Using PECBO to verify predictions • Biologically-based predictions can provide valuable supporting evidence, under “Verified predictions” — • If stressor elevated at site, can test hypothesis that biologically-based prediction should also be elevated Tools for calculating taxonenvironmental relationships & PECBO available on CADDIS R2 = 0.67 29 Species sensitivity distributions (SSDs) An SSD is a statistical distribution describing the variation of responses of a set of species to a stressor; it can be used to estimate the proportion of species adversely affected at a given stressor intensity. SSD plot showing LC50s for arthropods exposed to dissolved cadmium 30 Now where are we? • • Step 1 — Biological effects of interest defined — Impaired & reference sites identified Step 2 — Map completed — Conceptual model developed — Candidate causes identified — Data identified & collected • Steps 3 & 4 — Available data examined & analyzed — Available data organized into types of evidence Now we’re ready to evaluate & score the evidence across candidate causes 31