Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Exploring problems of data mobility, sharing and reuse Rob Procter Mark Hartswood, Stuart Anderson, Paul Taylor, Lilian Blot 1 Overview • The eResearch vision. • Background to this study. • Earlier studies of data mobility, sharing and reuse. • Fieldwork findings and implications. • Conclusions. 2 The eResearch vision • The eResearch vision promotes collaboration, interdisciplinary work and ‘reduced time to discovery’ as the keys to future scientific advances. • Increased data sharing and re-use is seen as fundamental to the realisation of this vision. 3 Background to this study • eDiaMoND was a UK e-Science programme project to create a shared national archive of digital mammograms from the UK breast screening programme, and use it to support a range of activities, including training. • A follow-on project (LEMI) developed a training tool in collaboration with clinicians. • Its aim was to draw upon archive materials and use them in ‘live’ training situations. 4 The UK National Breast Screening Programme • Breast cancer is the most common cause of cancer in the UK. • Screening by mammography (breast X-Rays) offered every three years to women between 50 and 70 years of age. • Mammograms examined by trained readers for signs of abnormality. • Abnormal cases are recalled for further tests at an assessment clinic. – 3-6% are recalled and about 0.3-0.6% are malignant. 5 e-DiaMoND Digital mammogram archive Research • Epidemiology • Image analysis Practice • Training • Remote reading eDiaMoND blueprint document, 2005 http://www.ediamond.ox.ac.uk/publications/blueprint-Final.pdf LEMI Training Screening tool Lesion Zoo 6 eDiaMoND data sharing and re-use model Metadata Originating context Data archive Data archive Use context Earlier studies of eDiaMoND • Jirotka, M. et al (2005) Collaboration and Trust in Healthcare Innovation: The eDiaMoND Case Study. JCSCW – Problematised the idea of remote reading. – Understanding the circumstances of mammogram production and use important for trust in the data. • Coopmans, C. (2006) Making Mammograms Mobile: Suggestions for a Sociology of Data Mobility. Information, Communication and Society – Problematised the idea of data mobility. – “An understanding of mobility … does not only emphasize that transit is an active achievement but also draws attention to the craft like nature of that achievement: the artful connecting of time, space, material and immaterial elements into a ‘mobility effect.’” 8 Questions motivating this study • How should we understand the relationship between data and its originating context? • What happens when people actually engage with the data to do something purposeful? 9 How should we understand the relationship between data and context? • Berg and Goorman (1999) describe medical data as ‘entangled’ with the context of its production. • Words like ‘disentangled’ seem to imply that data can somehow liberated from its context. • Berg and Goorman argue that the more contexts data has to be usable in, the more work needed to disentangle it. 10 Patient records and data structures Rich Heterogeneous Redundant Documenting and guiding practice Implicit relations Partial Selected Explicit relations 11 Encounters with eDiaMoND data • Problems emerging when encountering the data in relation to: – Application development. – Set selection. – Training. • We will examine: – How problems were recognised, diagnosed and fixed. – Who was involved and what resources they needed. 12 Example 1: Data correction work • Couldn’t be done automatically: – Data not of sufficient quality • But enough data embedded in the digital artefacts that a skilled person could correct. 13 Example 2: Selecting cases to include in training sets 14 Uncovering omissions 15 Example 3: Training 16 Mentoring the trainee 17 Findings: 1 • Use of the data led to different sorts of data ‘problem’ emerging, requiring different sorts of resources to diagnose and repair. • We had to go back to source and make corrections, additions, sometimes change the data model. • Making sense of data depends on some understanding of the context of production. • It was difficult to predict a priori what contextual information to preserve and what to discard. 18 Findings: 2 • Studies of data mobility focus on need for work to ‘disentangle’ or ‘decontextualise’ data, but making interpretation and use of data less dependent on the originating context is only a part contributor to mobility. • While we carve out a ‘chunk of context’, we also throw away significant detail, and no longer have easy access to the full range of resources that we would usually depend upon for making sense of its contents. 19 Implications • Moving on from eDiaMoND data curation model: – Tacit assumption that data abstracted from a working context can be treated as self-sufficient. • Better access to originating contexts: – Interpretative practices attendant on data re-use involve linking originating and use context by some other means than that provided by metadata. • Ease of correcting and amending data in-situ: – Facilities need to be available at point of use, and not separated out into different processes and activities. 20 Conclusions: 1 • Achieving data mobility is less about making it independent of the context of production, and more about appropriately maintaining and carefully managing links to that context. • We find that users continually (re)appraise data based on their understandings of practices associated with its production and abstraction. • This is also shown in Zimmerman’s study of data reuse by ecologists, whereby the appropriateness of using third party datasets is gauged according to what ecologists know and understand about the specific phenomena and data collection practices. 21 Conclusions: 2 • Zimmerman asked ecologists to report retrospectively how they selected data for reuse whereas, in our study, we examined actual occasions of data reuse. • While agreeing that greater detail of data collection practices should be made available, we take the more radical step of recommending capture of richer representations of the originating context. 22 Conclusions: 3 • We need to move away from ideas of linear processes and static data sets towards thinking of data as more organic, ‘living’ artefacts in need of periodic amendment, repair, renewal and retirement. • If we shift our focus to accommodate non-linear aspects of data collection and the dynamic character of ‘live’ data, then this opens various opportunities for a radical reconfiguration of a variety of data management practices. • This reconfiguration of data management needs to be taken seriously if the benefits of increased data re-use and sharing envisaged by eResearch are going to be realised fully. 23