Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
UK-HDAN Proposal Development Workshop : Well Sorted Materials 6th January 2017 Contents Introduction 1 Original Question 2 Dendrogram 3 Tree Map 4 Heat Map 5 Raw Group Data 6 For an online, interactive version of the visualisations in this document, go here: www.well-sorted.org/output/January2017UKHDANProjectIdeas Introduction Dear participant, Thank you for taking part in submitting and sorting your ideas. This document contains several visualisations of your ideas, grouped by the average of your online sorts. They are: Dendrogram - This tree shows each submitted idea and its similarity to the others. The lower two ideas 'join' the more people grouped those two ideas together. For example, if two ideas join at the bottom, every person grouped those two together. Tree Map - This visualisation presents an 'average' grouping. It is calculated by 'cutting' the Dendrogram at the dashed line so that any items which join lower than that line are placed in the same group. In addition, rectangles which share a side of the same length are more similar to each other than their peers. Heat Map - This visualisation shows a similarity matrix where each idea is given a colour at the intersection with another idea, showing how similar the two are. This is useful to see how well formed a group is. The more red there is in a group (shown by the black lines), the more similar the ideas inside it were judged to be. Raw Group Data - This table shows every submitted idea and its longer description. They are shown in the same order as the Dendrogram (so similar ideas are close to each other) and split into the coloured groups used in the Tree Map. In addition, each idea has been given a unique number so they are easier to find. References [1] Methven, T. S., Padilla, S., Corne, D. W., & Chantler, M. J. (2014, February). Research Strategy Generation: Avoiding Academic 'Animal Farm'. In Proceedings of the companion publication of the 17th ACM conference on Computer supported cooperative work & social computing (pp. 25-28). ACM. doi>10.1145/2556420.2556785 UK-HDAN Proposal Development Workshop - Materials Generated by Well-Sorted.Org Page 1 Original Question On this page, we present the original question which was posed by the event organiser. This was originally presented above input boxes on the Well Sorted website itself, so we present it here as close to how it originally appeared as possible. The original question was as follows: Please submit your Health Data Analytics Project Ideas below. Project Title – this should be a short (max. 50 characters) informative summary of the idea in the form <methodology required>for [<problem area>]. e.g. Time series analysis for continuous mental health data If you are purely a methodologist, just omit the second part Project Description – this should detail the healthcare opportunity/data science challenge you are seeking to address (max. 255 characters) Because we are trying to bring together methodologists & problem holders, we need you to express your ideas in a way that is accessible to both groups. UK-HDAN Proposal Development Workshop - Materials Generated by Well-Sorted.Org Page 2 Dendrogram UK-HDAN Proposal Development Workshop - Materials Generated by Well-Sorted.Org Page 3 Tree Map UK-HDAN Proposal Development Workshop - Materials Generated by Well-Sorted.Org Page 4 Heat Map UK-HDAN Proposal Development Workshop - Materials Generated by Well-Sorted.Org Page 5 Page 6 removed Raw Group Data: Blue Group Colour # Title Description Blue 5 Improving early disease detection Could understanding pattern of health care use and existing conditions help predict who will develop a serious/chronic health conditions or improved its early diagnosis? 6 Predictive Analytics for Emergency Departments Using internal / external / historic / forecast data, to create a set of predictive models for hospital ED's that allow resources to be better utilised and assets leveraged. Improve throughput / patient experience / budgets. 7 Prevalence and patterns of Multimorbidity The ageing population has resulted in an increase in the prevalence of people living with more than one long term condition. To better manage and care for these patients we need to identify the highly prevalent combinations of these conditions. 8 Development of Multimorbidity over the life course The ageing population has resulted in an increase in the prevalence of people living with more than one long term condition. To develop prevention strategies we need to understand the typical development of different conditions over a person's life. 9 Using cohort intelligence to improve svce redesign Using algorithm driven, cohort analytics to provide transformation teams with a deep understanding of complex, multi morbid population needs. This will inform the exploration and evaluation of new care model responses that meet those needs. 10 ProjectDataSphere: Open Oncology & Models ProjectDataSpehere is an open-source data source which contains ~36000 patients worth of clinical trial time-series data across numerous cancer types. Combination of mathematical biology and biostatistics approaches could lead to new disease insights. UK-HDAN Proposal Development Workshop - Materials Generated by Well-Sorted.Org Page 7 Raw Group Data: Green Group Colour # Title Description Green 11 A platform for rapid learning in healthcare OpenClinical.net is a toolset for creating and deploying executable models of clinical decision-making and workflow. The key challenge now is to "close the loop", to capture, aggregate and learn from data acquired when using the models for supporting care 12 Data tools driving collaborative transformation Using robust epidemiology, clinical ontologies and financial structures to co-develop 'trusted' pre-validated service redesign tools. These will facilitate systemwide collaborative working and the development of sustainable transformation plans. 13 Bridging the Gap: From Problems to Methods A common obstacle to interdisciplinary research is the use of different terminologies or descriptions of research problems and methods in different areas of science. We need a new approach for mapping between problems and methods across disciplines. UK-HDAN Proposal Development Workshop - Materials Generated by Well-Sorted.Org Page 8 Raw Group Data: Orange Group Colour # Title Description Orange 14 Recognising topological patterns in data Assumes data clusters have geometric meanings and study them by topological theories. Problem: How can data be clustered geometrically? 15 GM method for continous MH data for evaluation We propose genetic matching (GM) method to maximize balance of observed covariates between treated and control groups for economic evaluation within observational studies. We estimate the average treatment effect on the treated by one-to-one matching. 16 Causal inference methods for big data Championed benefits of big data rely on prediction; not a single claim involves causal inference. For health and wellbeing, interest lies in understanding putative causes, in order to intervene. We desperately need causal inference methods for big data. 17 Mathematical Biology meets Biostatistics There is a growth in new algorithms to analyse healthcare data. These approaches come from a "top down" perspective and so lack any link to basic science. There is clearly an opportunity to marry basic science with "top down" approaches via biomathematics 18 Algorithmic bias in health research As researchers, it is important to learn from, and try not to reproduce the inherent biases present in our data sets and researchers. How will lack of diversity amongst data scientists and under-recorded users of healthcare exacerbate health inequalities? 19 Data science meets knowledge engineering Data science & machine learning are strongest in "analysis tasks" (diagnosis, prediction...). Patient care also requires "synthesis tasks" (planning, design...) studied by knowledge engineers. This project will show how to unify the two approaches. 20 Analysing data by Machine learning algorithms It is specifically applied in streaming data. Machine learning is used to optimise the patterns of data. Problem: what about errors and missing values? Machine learning cumulates errors iteratively. 21 Missing data methods for big data Dealing with missingness is a challenge for big data, relying on multiple imputation or inverse probability weighting, but each requires careful specification of the UK-HDAN Proposal Development Workshop - Materials Generated by Well-Sorted.Org Page 9 missingness model, which becomes intractable for big data. We seek to address this. 22 Multiple imputation of outcome variables We propose multiple imputation method to addressing missing outcome data values for economic evaluation within RCT. We first impute missing variables and then analyses the completed data set and integrate it into final economic evaluation results. 23 Time series analysis of daily self-report Large dataset of daily self-report data (up to 365 days per individual) and hourly weather data; how to do time series analysis / case crossover, correcting for seasonality and other confounders and handling missing data UK-HDAN Proposal Development Workshop - Materials Generated by Well-Sorted.Org Page 10 Raw Group Data: Purple Group Colour # Title Description Purple 24 Analysing detailed event data to stratify people Health records include tens of thousands of different event types (e.g., diagnoses). If we can stratify (group) people according to the events in their records, then health professionals will be able to make better clinical and planning decisions. 25 Top-down clustering to identify new cohorts Use unsupervised machine learning techniques to discover the 'natural segments' patient populations. Success can point towards new definitions of comorbidities and disease/treatment outcomes. This is turn can influence the way care is commissioned. 26 Feature selection to Identify children at risk To define a minimum set of features (read codes) that helps GPs identifying children at risk of disease. Case study: Asthma. The set can be later used to assess completeness of data gathered during consultations. 27 Deep Learning for Phenotyping Recent advances in deep learning include generative adversarial networks and variational autoencoders; these systems offer the capacity to discover latent structure in complex data, and could therefore be complementary to conventional biomedical analysis. 28 Hierarchical events in temporal data mining Algorithms searching for patterns are limited due to the number of events. Composing them into hierarchical structure (e.g Read codes) and deciding what level of the event description should be used could lead to better discoveries in health domain. 29 Classification using dynamic data Patient data do not stay idle over time and the evolution of data includes a wealth of information. Identifying patterns in time (and possibly space) can enhance preventative medicine and early interventions. 30 Sensor Informatics via Bayesian Gaussian Processes Bayesian methods for time-series analysis have been explored for various applications in health informatics (often ICU applications), but much remains to be done in the integration of hospital care (and home care) via reliable mobile monitoring. UK-HDAN Proposal Development Workshop - Materials Generated by Well-Sorted.Org Page 11 Raw Group Data: Yellow Group Colour # Title Description Yellow 31 Reliability Assessment of Reported Methods The reliability of research methods reported in scientific papers often depends on the replicability of the original studies. We propose to develop a Knowledge Base of recent methodologies that can be used for reliability/validity assessment purposes. 32 Dealing with temporality & vagueness in EHR text Text analytics is successfully used to support health data analytics. However, there are open questions. For example, the vague and nuanced nature of some EHR text makes analysis difficult, and the linking of events to times is largely unsolved. 33 Ambiguous diagnoses in primary care data Primary care data is usually in the form of codes and free text. Users can add codes, the codes have varying levels of diagnostic specificity and there use is often based on personal preferences, resulting in ambiguity and inconstancy in diagnoses. 34 Business rules to thoroughly profile data quality Organisations such as NHS Digital have mature data quality procedures, but can only perform basic checks. If they can develop more comprehensive rules then the data quality improvements will have widespread benefit for secondary uses of health data. 35 Data quality of longitudinal health data New Visual Analitycs tools to investigate data quality of longidutinal data of EHR. Completeness and Correctness are the most important features to first investigate since the others (concordance, consistency) depend on the quality of these. UK-HDAN Proposal Development Workshop - Materials Generated by Well-Sorted.Org Page 12 Raw Group Data: Pink Group Colour # Title Description Pink 36 Joint models - intensive longitudinal MH-SWB data Joint models for intensive longitudinal health data will develop ways of combining intensive longitudinal data on mental health and well-being measures to explore whether trajectories are parallel, divergent or overlapping to some degree(GLLAMM/SEM model) 37 Attrition in longitudinal mHealth studies mHealth / smartphones provide new opportunities for healthcare research. Engagement is, however, often lower than in traditional studies. New methods for increasing engagement and correcting for attrition is therefore essential. 38 "digital enhancing" of 3-D understanding of HRQOL digital enhancement of three dimensional understanding of health related quality of life HRQOL develops methods to identify if a social dimensionscaptured digitally-e.g. social media can clarify "healthrelated QOL" as primary/1st, 2nd and 3rd dimensions UK-HDAN Proposal Development Workshop - Materials Generated by Well-Sorted.Org Page 13 Raw Group Data: Silver Group Colour # Title Description Silver 39 Agile safety assurance methodology for mHealth app We are developing a new approach to assuring the safety of mHealth apps based on integrating safety assurance activities into agile practices in a way that can address the challenges resulting from mHealth apps. 40 Principled validation for smartphone applications The market is a blizzard of health devices and apps, few of which are validated in any sense of the word, and fewer still which have any measurable health benefit. Behavioural science needs to be more central to this endeavour and linked to ultimate gain 41 Mobile data for personal health forecasting How can we make use of mobile and sensor data, in combination with digital models of disease, to predict and prevent negative health episodes, and to encourage positive behaviour? Issues related to the Vs of big data, but also integration and interface. 42 Analyse computer interaction for mental health Opportunity-early diagnosis helps to mitigate, and possibly cure, many mental health conditions (e.g. dementia). Develop a computer based monitoring system to non intrusively track and data mine people's behaviour with computers. 43 Behaviour change from IT healthcare management Challenge is to present the data mining results in health care (e.g mental health/ dementia onset) to users to effect behaviour change either for self referral or to modify their behaviour when their treatment needs to be changed. 44 Measurement of social behaviour in Parkinson's Existing smartphone apps and wearables for Parkinson's disease tend to focus on motor symptoms. Social and communicative difficulties can be significant in Parkinson's; a tool to track social behaviour and communication could inform therapy development. UK-HDAN Proposal Development Workshop - Materials Generated by Well-Sorted.Org Page 14 Raw Group Data: Brown Group Colour # Title Description Brown 45 Linking commercial and clinical data To move towards prediction of sickness, how and where do we capture data from our "social determinants of health"? What can we learn from the commercial marketing world and online behaviours it captures, in order to predict clinical episodes? 46 Distributed analysis: administrative & health data Secure linkage & coanalysis of distributed health & administrative data that is vertically partitioned. We have approaches under development in DataSHIELD (www.datashield.ac.uk) based on secure multi-party computing. Seeking collaborators & applications. 47 Privacy-preserving data mining Building systems that allow analysis on the lowest level of details for confidential/sensitive data from one or more sources without revealing sensitive information to end users. 48 Data Protection and Research Ethics Committees A study of the issues RECs have with data protection issues, particularly focussing on any differences with the Confidentiality Advisory Group of the HSCIC. 49 What are users doing with the data? How can we audit / track user analyses to determine if there is misuse of sensitive research data. Applications include identifying potential inferential disclosure and data misuse. Seeking existing solutions or collaborators to co-develop solutions. 50 A Pilot Study of an Anonymisation Matrix We have developed an anonymisation framework to guide decision-making by RECs. We would like to trial this framework to assess its utility. UK-HDAN Proposal Development Workshop - Materials Generated by Well-Sorted.Org Powered by TCPDF (www.tcpdf.org) Page 15