Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
14 January 2009 2009 AMS Artificial Intelligence Conference A Data Mining Approach to Soil Temperature and Moisture Prediction Bill Myers Seth Linden, Gerry Wiener Project Overview and Goals • Improve soil temperature and moisture prediction • Integrate and Evaluate NASA-MODIS data sets – Leaf Area Index (LAI) – Green Vegetation Fraction (i.e. FPAR) – Albedo • Deliver tailored products to end users – Soil forecasts will drive Agriculture-specific models (e.g. pest models) – RAL partnered with DTN/Meteorlogix – DTN DSS delivers Ag-specific forecasts to 80,000 users Soil State Prediction • Current soil state modified by atmospheric forcing conditions Solar Energy Weather • Heat and moisture are transferred between adjacent nodes • Typically done with a physical model, called a Land-Surface Model (LSM) Subsurface Nodes Fixed Node Physical Model • This project uses the High Resolution Land Data Assimilation System and the Noah LSM – Used by NCEP as part of the NAM (WRF model) • Many parameters are necessary to model soil type and land surface characteristics – Affect incident solar energy, heat transfer, etc – Parameters must be generalized • “Sandy loam” will have same parameterization at all sites • Chemical compositions of “sandy loam” differ between sites – Heat and moisture transfer will not be exact at ANY site • Goal of this study: Determine if a data mining approach can produce results comparable to those of the physical model Data Mining System • Regression Tree (Cubist) – – – – – Available from www.rulequest.com Looks for patterns in data Builds rule-based numerical models Rules are developed based on training data At each leaf node, a regression equation is developed that best fits that subset of the training data – Effectively, linear approximations are being made when certain conditions are met – Soil state forecasts are generated by applying rule set to forecast data • Training Data – 29 Soil Climate Analysis Network (SCAN) sites – Two years of observational history at each site used to develop rules – NCAR scientists were consulted to determined most important inputs to soil state evolution – These were extracted or derived from observed variable set Regression Tree Model Generation • 10 Regression trees were developed for each site – One regression tree for soil temperature and soil moisture at each depth (5, 10, 20, 50, 100 cm) • Input variables: – – – – – – – – Julian day Air Temperature Delta air temperature (in current hr) Downward Shortwave Radiation Wind Speed Dew point temperature Precip amt Previous soil state: • Previous hour’s soil temperature and moisture at adjacent depths • A target variable (e.g. Current Soil Temp at 5 cm) was provided with each hour’s data Example training data • | Names file for 5cm temperature prediction • ST5_curr | Predictand in list of variables below • • • • • • • • • • • • • • siteID: date: mon: AirT: deltaT: dsw: wspd: TD: qpf: ST5_prev: ST10_prev: SM5_prev: SM10_prev: ST5_curr: ignore ignore continuous continuous continuous continuous continuous continuous continuous continuous continuous continuous continuous continuous | SCAN site ID | YYYYMMDDHH | fraction of Julian year | 2m air temp (avg over last hr) | air temp change over last hour | avg downward shortwave radiation over last hr | avg wind speed over last hour | avg dew point temp over last hour | precip amt over last hour | 5 cm soil temp at previous hour | 10 cm soil temp at previous hour | 5 cm soil moisture at previous hour | 10 cm soil moisture at previous hour | 5 cm soil temp at previous hour Sample line of training data 2001, 2007110211, 0.9167, 4.53, -0.89, 0.00, 2.81, -3.28, 0.00, 8.158, 9.847, 33.858, 39.616, 8.32 Time of year Air Temp Wind Speed No Precip Dewpoint Temp Air Temp Falling in this hour No downward Radiation (night) Previous hour’s soil moisture at 5 cm and 10cm Previous hour’s soil temperature at 5 cm and 10cm Current hour’s 5 cm Soil T (Predictand) Rules Development and Application • Regression Trees generated for each predictand at each site – Separate tree for Soil Temperature and Moisture at each depth – Two years of training data for most sites – Example rule and associated regression: if dsw <= 0.09 and ST5_prev > 12.05 ST5_curr = -0.211 + 0.3165 dsw + 0.83 ST5_prev + 0.13 ST10_prev + 0.02 AirT + 0.02 TD • 48 hour forecasts were generated iteratively – Starting with observed soil state and first hour’s weather predictions – Regression trees were applied for each predictand to generate forecast state at hour 1 – Using the forecast soil state and weather predictions, the next hours’ forecasts were generated iteratively • Soil forecasts generated for 2007 growing season (April-June) – Data Mining and HRLDAS forecasts were compared to observations Results • Statistically, data mining better than HRLDAS at nearly all the 29 SCAN sites • Median (and quartile) MAEs significantly lower for data mining • Data mining errors generally 30%+ lower than HRLDAS errors Soil Temperature Errors Data Mining Solid Lines, HRLDAS dashed 4 5 cm degC 3 10 cm 2 20 cm 1 50 cm 0 0 1 M Quartile 3 4 Summary • Data mining with Cubist Regression Trees • Applicability to non-observing sites • • • • • • • • • Reduces soil temperature and moisture errors Simple to develop rules Rules/Regressions can be displayed easily Regression Tree forecasts tuned to the site HRLDAS forecast parameters are more generic Rules, as developed are site specific Not valid away from that location HRLDAS can generate forecasts at any location Observing sites do not begin to cover all land use and soil type combinations Future Directions • • • Add vegetation state (from NASA MODIS data) to data mining training sets to determine see these results can be improved upon Train Cubist with all obs sites lumped together but include land use and soil type as input variables Investigate combining data mining approach and LSM to get best of both Acknowledgements This research effort has been supported by a NASAROSES grant. We appreciate the help provided by personnel at the USDA Natural Resources Conservation Service, and various NASA labs. Soil forecast web site: www.rap.ucar.edu/projects/nasa-ag/ hrldas/display_hrldas_animation.html Cubist is available at www.rulequest.com