Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Using Bayesian Networks to Predict Water Quality in Sydney Harbour Final Presentation Name: Shannon Watson Supervisors: Ann Nicholson & Charles Twardy Introduction The application domain Bayesian Networks Knowledge Engineering Weka Results Conclusion and Future Work The Domain - Sydney Harbour Water Quality for recreational use Beachwatch / Harbourwatch Programs Bacteria samples used as pollution indicators Many variables influencing Bacterial levels – rainfall, tide, wind, sunlight temperature, ph etc Past studies Hose et al. used multi dimension scaling model of Sydney harbour – low predictive accuracy - unable to handle the noisy bacteria samples Other models developed by the USEPA to model estuaries are: QUAL2E – Steady-state receiving water model WASP – Time Varying dispersion model EFDC – 3D hydrodynamic model EPA in Sydney want a model applying the causal knowledge of the domain Bayesian Networks Directed acyclic graphs Nodes = random variable/uncertain quantities Links/arcs = casual relationships between variables Strengths of the links are quantified by conditional probabilities Uses Bayes theorem to resolve uncertainties Graphical structure allows non-technical people to understand a technical model. Bayesian Networks – An Example Bayesian Networks – An example Knowledge Engineering Used to keep structure of large projects in software engineering discipline Two major implementations -Spiral model/Waterfall Model Spiral Model can be used in the application of Bayesian Networks to keep in control of the Network as it progresses from a small to large Bayesian network Problems Encountered Getting the data into a usable format -solved by learning how to use tools such as sed and awk, perl. Understanding the domain – what amount of rainfall would cause high levels of bacteria? Through Netica was unable to determine a clear breaks in the rainfall data Weka The Weka package is a Java implementation of a number of machine Learners. Implements Industry standard is C4.5 It was hoped that running the machine learners may be able to help understand data Also run other machine Learners Naïve Bayes AODE KeoghTan Results - Entc. EPA’s Current Model vs. One of My Models Naïve Bayes C4.5 EPA Current Model Davidson 76(2.5) 75(3.5) 73(3.4) Parsley 91(0.6) 93(0.7) 91(-0.2) Woolwick 74(4) 73(5.1) 70(5.0) Woodford 72(2.2) 69(3.4) 69(3.4) My model 73(4.0) 91(-0.2) 70(4.9) 69(4) Predicting Water Quality Knowledge of the domain has been greatly advance through use of Netica to Visualise models Machine Learners quickly respond with Experts own ideas that the data is noisy Domain can’t easily be modeled No way to easily state what levels of rainfall determine when there will be high bacterial readings Conclusion Use Bayesian networks Create a model of Sydney harbour Aim to predict the water quality for EPA Use Spiral Model Machine Learners Further Work Evaluate the models Add more variables to models and revaluate Recommend models to Sydeny EPA Questions