Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Ver 1.0 Presentation/Paper April 9-12 2006 Santa Fe, New Mexico USA Author(s): Matthew Waite Affiliation: St. Petersburg Times 490 First Avenue S. St. Petersburg, Florida 33701 [email protected] E-mail: Title: “What can journalists and social scientists learn from image analysts about database verification? Toward a methodological framework for determining database accuracy” Abstract: The last place a researcher seeking to determine how many acres of wetlands the U. S. Army Corps of Engineers allows to be wiped out by development would find useful data is in the corps own permit database. Of the more than 150,000 permit records in the system in Florida, fewer than 2,500 of them have an entry in the permitted acreage field. Only three fields were ever reliably filled out, corps officials say; one of them is automatically generated by the permitting system and the other two are dates: the day the permit came in, and the day it left. To answer the question of how many acres of wetlands were lost in Florida, the St. Petersburg Times turned to satellite imagery analysis. Imagery analysis has a common framework for analyzing imagery very similar to the scientific method. The last step – the most important step many analysts argue – is the accuracy assessment. Since any remote sensing analysis is never going to be 100 percent because the remote nature of the analysis, the accuracy assessment is critical to determining how close to accurate an analysis is. The accuracy assessment framework transfers well to database research. Substitute database for remote sensing data and it becomes clear: No database is ever 100 percent accurate, and an accuracy assessment is critical to knowing how accurate a database is. An accuracy assessment takes a sample of X – in remote sensing it would be image pixels or polygons that correspond to a place on the ground – and compares it to either ground reference data or other data for that place. There are several accuracy assessment frameworks, two of which would scale well to databases: simple random sampling and stratified random sampling. In simple random sampling, a random sample of data is taken and verified, and out of that comes an accuracy level – a percentage saying how accurate a dataset is at a given confidence level and margin of error. In stratified random sampling, a particular class of data is more heavily sampled because of its importance to the analysis. It could be a particular subset of the data that a story or study focuses on, or a field in the data that is critical to the analysis. Either method would produce an independent and publishable statement of how valid and accurate assumptions and extrapolations from a dataset are. This can only strengthen the credibility of the analysis. This paper would explore pros and cons of both methods.●