Download Slide - CIRSS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Exploring problems of data mobility,
sharing and reuse
Rob Procter
Mark Hartswood, Stuart Anderson, Paul
Taylor, Lilian Blot
1
Overview
• The eResearch vision.
• Background to this study.
• Earlier studies of data mobility, sharing and reuse.
• Fieldwork findings and implications.
• Conclusions.
2
The eResearch vision
• The eResearch vision promotes collaboration,
interdisciplinary work and ‘reduced time to
discovery’ as the keys to future scientific
advances.
• Increased data sharing and re-use is seen as
fundamental to the realisation of this vision.
3
Background to this study
• eDiaMoND was a UK e-Science programme
project to create a shared national archive of
digital mammograms from the UK breast
screening programme, and use it to support a
range of activities, including training.
• A follow-on project (LEMI) developed a
training tool in collaboration with clinicians.
• Its aim was to draw upon archive materials
and use them in ‘live’ training situations.
4
The UK National Breast Screening
Programme
• Breast cancer is the most common cause of
cancer in the UK.
• Screening by mammography (breast X-Rays)
offered every three years to women between 50
and 70 years of age.
• Mammograms examined by trained readers for
signs of abnormality.
• Abnormal cases are recalled for further tests at an
assessment clinic.
– 3-6% are recalled and about 0.3-0.6% are malignant.
5
e-DiaMoND
Digital mammogram archive
Research
• Epidemiology
• Image analysis
Practice
• Training
• Remote reading
eDiaMoND blueprint document, 2005
http://www.ediamond.ox.ac.uk/publications/blueprint-Final.pdf
LEMI
Training
Screening tool
Lesion Zoo
6
eDiaMoND data sharing and re-use
model
Metadata
Originating
context
Data archive
Data archive
Use context
Earlier studies of eDiaMoND
• Jirotka, M. et al (2005) Collaboration and Trust in Healthcare Innovation:
The eDiaMoND Case Study. JCSCW
– Problematised the idea of remote reading.
– Understanding the circumstances of mammogram production and use
important for trust in the data.
• Coopmans, C. (2006) Making Mammograms Mobile: Suggestions for a
Sociology of Data Mobility. Information, Communication and Society
– Problematised the idea of data mobility.
– “An understanding of mobility … does not only emphasize that transit
is an active achievement but also draws attention to the craft like
nature of that achievement: the artful connecting of time, space,
material and immaterial elements into a ‘mobility effect.’”
8
Questions motivating this study
• How should we understand the
relationship between data and its
originating context?
• What happens when people actually
engage with the data to do something
purposeful?
9
How should we understand the
relationship between data and context?
• Berg and Goorman (1999)
describe medical data as
‘entangled’ with the context of
its production.
• Words like ‘disentangled’ seem
to imply that data can somehow
liberated from its context.
• Berg and Goorman argue that
the more contexts data has to
be usable in, the more work
needed to disentangle it.
10
Patient records and data structures
Rich
Heterogeneous
Redundant
Documenting and guiding
practice
Implicit relations
Partial
Selected
Explicit relations
11
Encounters with eDiaMoND data
• Problems emerging when encountering the data in
relation to:
– Application development.
– Set selection.
– Training.
• We will examine:
– How problems were recognised, diagnosed and fixed.
– Who was involved and what resources they needed.
12
Example 1: Data correction work
• Couldn’t be done
automatically:
– Data not of sufficient
quality
• But enough data
embedded in the digital
artefacts that a skilled
person could correct.
13
Example 2: Selecting cases to include
in training sets
14
Uncovering omissions
15
Example 3: Training
16
Mentoring the trainee
17
Findings: 1
• Use of the data led to different sorts of data
‘problem’ emerging, requiring different sorts
of resources to diagnose and repair.
• We had to go back to source and make
corrections, additions, sometimes change the
data model.
• Making sense of data depends on some
understanding of the context of production.
• It was difficult to predict a priori what
contextual information to preserve and what
to discard.
18
Findings: 2
• Studies of data mobility focus on need for work to
‘disentangle’ or ‘decontextualise’ data, but making
interpretation and use of data less dependent on the
originating context is only a part contributor to
mobility.
• While we carve out a ‘chunk of context’, we also
throw away significant detail, and no longer have
easy access to the full range of resources that we
would usually depend upon for making sense of its
contents.
19
Implications
• Moving on from eDiaMoND data curation model:
– Tacit assumption that data abstracted from a working
context can be treated as self-sufficient.
• Better access to originating contexts:
– Interpretative practices attendant on data re-use involve
linking originating and use context by some other means
than that provided by metadata.
• Ease of correcting and amending data in-situ:
– Facilities need to be available at point of use, and not
separated out into different processes and activities.
20
Conclusions: 1
• Achieving data mobility is less about making it independent of
the context of production, and more about appropriately
maintaining and carefully managing links to that context.
• We find that users continually (re)appraise data based on
their understandings of practices associated with its
production and abstraction.
• This is also shown in Zimmerman’s study of data reuse by
ecologists, whereby the appropriateness of using third party
datasets is gauged according to what ecologists know and
understand about the specific phenomena and data collection
practices.
21
Conclusions: 2
• Zimmerman asked ecologists to report
retrospectively how they selected data for reuse
whereas, in our study, we examined actual occasions
of data reuse.
• While agreeing that greater detail of data collection
practices should be made available, we take the
more radical step of recommending capture of richer
representations of the originating context.
22
Conclusions: 3
• We need to move away from ideas of linear processes and
static data sets towards thinking of data as more organic,
‘living’ artefacts in need of periodic amendment, repair,
renewal and retirement.
• If we shift our focus to accommodate non-linear aspects of
data collection and the dynamic character of ‘live’ data, then
this opens various opportunities for a radical reconfiguration
of a variety of data management practices.
• This reconfiguration of data management needs to be taken
seriously if the benefits of increased data re-use and sharing
envisaged by eResearch are going to be realised fully.
23