Download What can journalists and social scientists learn from image analysts

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Ver 1.0 Presentation/Paper
April 9-12 2006
Santa Fe, New Mexico USA
Author(s):
Matthew Waite
Affiliation:
St. Petersburg Times
490 First Avenue S.
St. Petersburg, Florida 33701
[email protected]
E-mail:
Title:
“What can journalists and social scientists learn from image
analysts about database verification? Toward a methodological
framework for determining database accuracy”
Abstract:
The last place a researcher seeking to determine how many acres of wetlands
the U. S. Army Corps of Engineers allows to be wiped out by development
would find useful data is in the corps own permit database. Of the more than
150,000 permit records in the system in Florida, fewer than 2,500 of them
have an entry in the permitted acreage field. Only three fields were ever
reliably filled out, corps officials say; one of them is automatically generated
by the permitting system and the other two are dates: the day the permit
came in, and the day it left.
To answer the question of how many acres of wetlands were lost in Florida,
the St. Petersburg Times turned to satellite imagery analysis. Imagery
analysis has a common framework for analyzing imagery very similar to the
scientific method. The last step – the most important step many analysts
argue – is the accuracy assessment. Since any remote sensing analysis is
never going to be 100 percent because the remote nature of the analysis, the
accuracy assessment is critical to determining how close to accurate an
analysis is.
The accuracy assessment framework transfers well to database research.
Substitute database for remote sensing data and it becomes clear: No
database is ever 100 percent accurate, and an accuracy assessment is critical
to knowing how accurate a database is.
An accuracy assessment takes a sample of X – in remote sensing it would be
image pixels or polygons that correspond to a place on the ground – and
compares it to either ground reference data or other data for that place.
There are several accuracy assessment frameworks, two of which would scale
well to databases: simple random sampling and stratified random sampling.
In simple random sampling, a random sample of data is taken and verified,
and out of that comes an accuracy level – a percentage saying how accurate
a dataset is at a given confidence level and margin of error. In stratified
random sampling, a particular class of data is more heavily sampled because
of its importance to the analysis. It could be a particular subset of the data
that a story or study focuses on, or a field in the data that is critical to the
analysis.
Either method would produce an independent and publishable statement of
how valid and accurate assumptions and extrapolations from a dataset are.
This can only strengthen the credibility of the analysis. This paper would
explore pros and cons of both methods.●