Download 2.2 Sequential Pattern Mining - University of South Australia

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Matheson Ramsey
ramml003
Using an ontology in place of flat data for Sequential Pattern
Mining
A minor thesis for the degree of
Bachelor of Computer Science (Honours)
School of Computer and Information Science
University of South Australia
13/06/2010
Supervisor
Jan Stanek
i
Table Of Contents
Glossary .................................................................................................................................................. iv
1 Introduction ......................................................................................................................................... 1
1.1 Motivation..................................................................................................................................... 3
1.2 Potential Contributions ................................................................................................................. 3
1.3 Field of thesis ................................................................................................................................ 4
1.4 Research Question ........................................................................................................................ 4
2 Literature Review ................................................................................................................................. 4
2.1 Data Mining in Health Informatics ................................................................................................ 4
2.2 Sequential Pattern Mining ............................................................................................................ 6
2.3 Drug Ontologies ............................................................................................................................ 7
2.4 Electronic Health Records ............................................................................................................. 7
3 Methodology ........................................................................................................................................ 8
3.1 Raw data ....................................................................................................................................... 9
3.2 Pre-processing............................................................................................................................... 9
3.3 Sequential Pattern Mining .......................................................................................................... 10
3.4 Results analysis ........................................................................................................................... 11
3.5 Expected Outcomes .................................................................................................................... 11
4 Ethical Considerations........................................................................................................................ 12
5 Bibliography ....................................................................................................................................... 12
6 Project Plan ........................................................................................................................................ 15
Appendix A – Ethics Approval Application ............................................................................................ 16
ii
Table of Figures
Figure 1: a prescription pathway ............................................................................................................ 1
Figure 2: a therapeutic pathway ............................................................................................................. 1
Figure 3: example of ATC drug hierarchy for Propicillin ......................................................................... 2
Figure 4: Program process flow .............................................................................................................. 8
Figure 5: using the WHOCC online ATC index ....................................................................................... 10
Figure 6: re-coding dosage information................................................................................................ 10
Figure 7: example of preparing the pathways for the sequential pattern mining ............................... 11
iii
Glossary
Pathway
Clinical Pathway (CP)
Prescription Pathway
(PP)
Therapeutic Pathway
(TP)
Tuple
Flat data
Node
Hierarchy
Ontology
Granularity
Contamination
Dilution
Pattern
Sequential Pattern
Mining (SPM)
A sequence of drug prescriptions over time
A pathway that has been designed to be followed in order to treat a certain
condition or disease
The pathway as seen by the prescriber. It is only concerned with showing
what drugs were prescribed and when; it does not involve which drugs are
being consumed at the same time
The pathway as seen by the recipient. It takes in to account dosages, so
that it is apparent when multiple drugs are being taken at the same time,
and also when no drugs are being taken
A single item in a dataset; one row of values
All attributes in the data are numeric or categorical; no embedded objects,
objects within objects, etc.
A single item that links to others. If a node links to another node it is that
nodes parent. If a Node is linked to by another node it is that nodes child
A data structure that has multiple levels of nodes.
A hierarchy with a strict “is-a” relationship between parent and child levels
of nodes
The level of the hierarchy that is being referred to. A higher granularity
implies working further down the hierarchy (more granular, hence more
specific), while a low granularity implies working higher up the hierarchy
(less granular, hence more general)
Using a level of the hierarchy that is too high, so there are too many
unrelated subgroups included
Using a level of the hierarchy that is too low, so information becomes
overwhelming or too specific
A series of prescriptions that occurs often in the dataset
The process of extracting patterns from the dataset
iv
v
1 Introduction
The use of electronic support systems in health care is increasingly important (Hillestad et al. 2005).
Studies have shown that 90% of general practitioners (GPs) use a clinical software package, and 98%
of these GPs use the clinical packages for prescribing (McInnes, Saltman & Kidd 2006). This means
there is an abundant amount of rich heath data available on general practitioners computers.
However, a lot of this data is stored as free text, and as a result it is not easily interpreted. Using
computers for writing prescriptions offers several benefits for GP’s, so prescription data is one of the
most complete and structured types of data in general practice (Hassey, Gerrett & Wilson 2001).
The prescription data can be represented in a number of ways to aid the GPs; such as tabular
summaries (Wroe et al. 2000), or as a series of connected nodes where the nodes represents
different prescriptions; forming a prescription pathway (Stanek et al. 2005). A prescription pathway
represents all the prescriptions for a particular patient over a certain period of time as prescribed by
the GP. We can also capture the therapeutic pathway, which is a combination of the drug
prescriptions and the amounts prescribed; to give an idea of what combination of drugs was being
taken at any given time. These concepts are visualised in Figures 1 and 2. Creating pathways can
make the data more interpretable, easier to follow, and additional relationships between
prescriptions may become apparent.
Figure 1: a prescription pathway
Figure 2: a therapeutic pathway
These concepts of pathways are based on the use of flat prescription data. However, drugs are by
nature hierarchical, and can be modelled in an ontology; such as the Anatomical Therapeutic
Chemical (ATC) drug classification, where different levels of the ontology represent different groups
of drugs (WHO 2010). The different levels range from specific chemical substances to broad
anatomical main groups (see Figure 3). As a reference, Figure 1 and 2 can be seen as using the fifth
and most specific level of the ontology. If an ontology like this is used with the pathways, it is
possible to analyse the prescription pathway at different levels of granularity to obtain different
kinds of information about the prescriptions for different methods and applications.
1
Figure 3: example of ATC drug hierarchy for Propicillin
Different information exists at different levels of the hierarchy. A low granularity implies a more
general pathway (for example, level 1 or 2) and might give the observer a simpler understanding of
what the pathway is trying to achieve. Meanwhile a high granularity gives us more a specific
pathway (for example, level 5), which can be used to copy the pathway at the most granular level.
Conversely, changing the granularity can have precarious effects. For example, in Figure 3 we have a
model for Propicillin. At level three we can see it is part of the penicillins group. We know that some
people have allergies to Penicillins; so if we want to data-mine for this pattern, then logically the
best level to operate at is level three. If we list all the drugs at more specific levels (e.g. J01CE03 –
Propicillin); the pattern will get weaker, as it will be spread across a large amount of variables, hence
the patterns will be diluted. Conversely, if we use a lower granularity (e.g. J02 – Antibacterials for
systemic use); the pattern showing penicillin allergy may disappear, as the J02 group contains other
groups of antibacterial drugs without penicillin-type allergy - the pattern becomes contaminated by
unrelated drugs. The concepts of dilution and contamination will be used throughout this paper as
defined here.
These notions highlight the potential importance of incorporating an ontology knowledge base for
the discovery of patterns. The basic approach to discovery of these pathways is sequential pattern
2
mining (SPM). Sequential pattern mining is the process of trying to find the relationships between
occurrences of sequential events, to find if there exists any specific order of the occurrences (Zhao &
Bhowmick 2003). By performing SPM on the prescription data and implementing the knowledge of a
drug ontology, we have the potential to find many more new and interesting patterns that are not
present in the mining of flat data. There are many algorithms to perform sequential pattern mining,
which we will explore in section 2.2.
1.1 Motivation
Traditional path-mining algorithms work on flat data without taking into account hierarchies. This
may not be sufficient in some cases, as given the concept of dilution and contamination; we expect
that some important patterns can be missed if the underlying data is not processed at the optimal
level of granularity. It is important to investigate this use of an ontology to explore what difference it
makes to the success of the data mining.
The prescription data we are using is by nature hierarchical, and so we have the opportunity to
explore the impact of an ontology on the sequential pattern mining. There is no current indication as
to what effect a change in granularity will have on the usefulness of the sequential pattern mining.
For the application of prescription pathways: if the exact chemical components used in each
prescription are always mined; the associations may be too weak. If a higher level of granularity is
always selected, the pathway may be too ambiguous. This research will explore how changing the
granularity will affect the sequence pattern discovery process. This has wider implications in the field
of data mining; as this use of an ontology could benefit the approach to finding patterns in data.
For our domain; being able to ascertain which granularities are appropriate for which purposes will
make the prescription data far more useful for GPs. Simply omitting all other granularities in favour
of one will result in potential loss of important information. This research is necessary to identify the
effect of using an ontology on the pattern discovery.
1.2 Potential Contributions
The results of this research will show how and what impact the use of the ontology has on the
success of the sequential pattern mining. If we are successful in showing that the ability to detect
patterns in the dataset depends on the selection of the correct granularity; this will be a motivation
for further research to find ways to assess the granularity of the datasets prior to the application of
standard data-mining algorithms. This is a necessary first step of exploring the influence of
ontologies on pattern mining that could lead to enhanced methods of a-priori manipulation of a
dataset to enhance the effects of the data mining.
This research has the potential to reveal how much detail is required for prescription information to
be meaningful and usable, and how much abstraction of the pathways is possible before the
patterns become too ambiguous to be significant. This could have potential applications to GPs
analysing prescription information in practice. Better delivery of prescription pathways to GPs could
help identify anomaly cases; identify practitioner’s trends in prescribing; and monitor adherence to
clinical pathways. This system could be integrated in the practice review process for GPs to reflect on
and adapt their methods. Overall this study has the potential to improve the quality of general
practice.
3
1.3 Field of thesis
Health informatics; Data Mining: Sequential Pattern Mining; Ontologies
1.4 Research Question
This project will focus on analysing the impact of applying an ontology for data mining typically flat
data. We will answer the question ‘what impact with the use on an ontology in place of flat data
have on the success of sequential pattern mining?’
2 Literature Review
This section will focus on some previous research into several core aspects of this minor thesis. We
explored some similar work involving data mining in health informatics, as well as some
supplementary literature to support aspects of the project. We will cover the data mining aspect
with some sequential pattern mining methods; the drug ontology with some conceptual work on
ontologies and some previous work with drug ontologies; and the nature of the raw data with
research regarding the use of electronic health records.
2.1 Data Mining in Health Informatics
There is a large amount of research in health informatics that uses data mining. Health informatics is
the science of applying Information age technology to serve the specialised needs of public health
(Friede, Blum & McDonald 1995). Data mining has become a key benefactor to the progress of the
integration of health information systems into general practice.
This research will draw on concepts proposed in the work by Stanek et al. (2005). In their research, a
method is proposed that compares practice patterns to clinical pathways. Their research is focused
on patients with diabetes and hypertension. They also use the ATC drug coding system, but do not
fully utilise the hierarchical nature of the drug ontology for their data mining, and instead use a set
granularity for all experiments. These methods are tractable for smaller domains, however we
intend to apply our methodology to a far broader area, where their methods quickly become
problematic. We intend to further the efforts of the data mining by fully utilising the hierarchical
nature of the data into the data mining processes, and operate in a domain outside of only diabetes
and hypertension.
Some other work involves the adaption of Bayesian Networks for discovering temporal-state
transition patterns, specifically in the hemodialysis process (Lin, Chiu & Wu 2002). Their research
focuses on learning clinical pathways, so that pathways for admitted patients can be predicted. They
use a rich set a data including more attributes such as test results; and create a set of states, events,
and actions. Their research proves very successful; but does not implement knowledge of an
ontology for the drugs. Medical data is by natural hierarchical; and in our research we intent to
discover if the concepts of contamination and dilution are important and if they need to be
recognised by researchers performing data mining on medical data. Also, their research involves a
different data set with attributes that we do not have access to for our research. It is also specific to
hemodialysis, which limits its applications. The research shows a similar approach to the adaption of
data mining techniques to health data.
4
Bei et al. (2005) perform some related work with a system they call Portal. Their research is focused
on improving the quality of procedures by giving continuous support to physicians. They perform
some rule extraction for the selection of pacemaker systems for new patients, and implement a
simple business logic flowchart system to automatically classify new patients. This type of
implementation would not be suitable for this project; due to the immense size of the flowchart
required to model all possible prescription pathways. The researchers go on to identify the potential
for data mining for long processes (such as long prescription patterns). This research highlights the
need for optimised support systems to reduce costs and improve the quality of procedures.
Another associated piece of research is the use of the Hidden Markov Model (HMM) to learn clinical
pathways (Lin, Hsieh & Pan 2005). They model the process of spontaneous delivery of patients, and
develop a 4-state pathway that accurately encompasses normal spontaneous delivery. The model is
trained with the patient data, and visualised in a manner that simplifies the pathway for doctors.
They intend to learn the clinical pathway, so their outcomes are defined; whereas out research aims
to simply mine the data for patterns and see what emerges. There is also no use of a drug ontology,
and the data used is not strictly prescription data. The outcome of their research is a model that
accurately recreates clinical pathways that can be used to predict possible paths for an admitted
patient. This paper shows another interesting approach to the use of data mining in health
informatics.
Work by Riou, Pouliquen & Beeux (1999) aims to predict the best drug for a prescription based on
the clinical background of the patient. The methodology is not so much about mining for patterns,
and more about analysing patient’s disorders, pathophysiological conditions, age, and other factors
to determine the next step in the clinical process. This tool was developed from the premise that
junior residents and medical students have difficulties selecting the most appropriate drug for a
given scenario, which is an additional motivation for our research. They do propose the use of the
ATC codes, but decide against it due to the limitations of only maintaining one use, and not fitting
indicators or other properties, and instead opt to develop their own drug knowledge base. Whilst
several of these factors do not impact our research, they do identify a key shortcoming of the ATC
classification; the fact that drugs can only exist in one place in the ATC ontology, whereas in reality
drugs can have multiple uses. Whilst this factor is important, at our level of research it is not worth
modifying the ATC classification to accommodate for this due to the increase in complexity of the
resulting knowledge structure, as a result of cross-links, etc. Some further research could involve the
adaption of our techniques to a methodology that does account for this flaw. Their research differs
from ours in that ours aims to search for arbitrary patterns in existing prescription data, whereas
theirs methodically analyses certain data values to determine the precise next step for a single
patient.
Another piece of related research is regarding mining time dependency patterns in clinical pathways
(Lin et al. 2001). They intend to find patterns of process execution sequences that showcase the
dependant relation between activities. The researchers develop a method to discover the patterns
of clinical pathways using patient records and clinical log data. Their research covers the broader
domain of complete clinical pathways, so additional data is used in the process, whereas our
research focuses on the pathways relating to drug prescription only. Also, there is no use of
ontologies, which this research offers, displaying some of the differences between the research
efforts.
5
2.2 Sequential Pattern Mining
The concept of mining for patterns in sequences of data has been implemented and improved in
many applications. It stems from the field of Data Mining; which is the process of extracting
interesting information or patterns from information repositories (Chen, Han & Yu 1996). Sequential
pattern mining (SPM) is the process of trying to find the relationships between occurrences of
sequential events, to find if there exists any specific order of the occurrences (Zhao & Bhowmick
2003). Many methods of SPM have been developed, some of which we will explore here and
evaluate for this project.
One of the earliest and possibly the simplest algorithms developed for SPM is AprioriAll (Agrawal &
Srikant 1995). It is based on the Apriori principle from data mining for association rules, and is a very
base-level method for finding sequential patterns. It finds single frequently occurring items in the
dataset and then attempts to find sequences of them. It is a very simple method, but is very
computationally expensive as it requires multiple database scans. The straightforward nature of the
algorithm may make it serviceable for our studies, as it is less likely to be disrupted by peculiar
outliers or trends in the data.
Lin & Lee (2002) propose another method called MEMISP (MEMory Indexing for Sequential Pattern
mining). It is a faster solution; however their method increases in complexity as the database size
increases. As this project has the potential to scale to very large amounts of data, this is not ideal.
Whilst other methods such as AprioriAll are likely to take large periods of time for large databases as
well, they will require less time to be invested in development, which is preferred.
SPIRIT (Sequential Pattern mining with Regular expression constraints) is an optimised algorithm
designed to mine user-specified patterns (Garofalakis, Rastogi & Shim 1999). This may be useable if
we decide to search for particular sequences of drug prescriptions, such as known clinical pathways.
Dowsey et al (1999) show that the use of clinical pathways reduces the duration of admission for
patients. By searching for parts of clinical pathways, we could monitor general practitioners
adherence to them. However, as this project is not directed at any particular set of drugs or clinical
pathways, the sheer volume of possible pathways to test against makes this impractical, but it could
provide an interesting extension to investigate.
There are also methods involving multiple attributes, called multi-dimensional sequential pattern
mining (Pinto et al. 2001). These are useful for adding new information such as age groups and
demographics to patterns. However, due to the nature of the data being used for this project and
the fact that no other data will be guaranteed to be usable, we are unlikely to use this approach in
the current phase.
Another extension on conventional pattern mining is incremental mining, introduced in
Parthasarathy et al.’s work (1999) and explored by Zhang et al. (2001). This is useful for datasets that
continue to change over the time, which is likely to be the case for the patient prescription data if
the system was implemented in practice. However, in this project space the dataset will not be
changing, and so this will not be implemented. This could prove another interesting extension to the
sequential pattern mining in the future nonetheless.
Periodic Pattern Analysis involves the limiting of pattern mining to certain periods, to specify when
to check for recurring patterns (Han, Dong & Yin 1999). This could be used to explicitly find monthly
6
or yearly patterns. Naturally this is most effective if the data is collected over long periods of time. If
time permits, this concept could prove an interesting addition to the mining.
Other research in the area includes optimising for linked objects in a distributed system (Chen, Park
& Yu 1998), and creating hybrid combinations of other methods (LeniC & Kokol 2002). Many
advanced and optimised methods have been developed for sequential pattern mining, but many are
specific to certain domains or types of data, which prove unsuitable for this project.
2.3 Drug Ontologies
Gruber offers a definition of ontologies; he says they are explicit formal specifications of the terms in
the domain and relations among them (1993). Noy & McGuinness say ontologies are used to share
common understanding of the structure of information, to make domain assumptions explicit, to
separate domain knowledge from the operational knowledge and to analyse domain knowledge
(2001). In an attempt to liken these definitions to this project; the domain will be the set of
prescription drugs, and we will be formally defining the explicit “is-a” relationship between drugs
and their parent groups.
There are several implementations of ontologies for this domain. As discussed earlier, the
Anatomical Therapeutic Chemical (ATC) classification is one such drug ontology (WHO 2010). This
ontology provides a simple hierarchical breakdown of the field of prescribable drugs, and is easily
accessible from the organisations website. Whilst the ATC classification does have some flaws which
we will address in section 3.2, it is a widely used standard and will prove ideal for our research.
In 1998 Rector et al. proposed some requirements for developing ontologies to be used in medicine.
They identify that an ontology should be treated as an "assembly language", and that it should be
viewed as a “pure tree in which the branches at each level are disjoint but nonexhaustive
subconcepts of the parent concept” (Rector et al. 1998). These elements provide the basis of some
further work into developing drug ontologies.
One such piece of research involves the development of Prodigy: a reusable and automaticallyclassified ontology to describe the chemical composition of the drugs, as well as a dictionary of
prescribable products, which includes more volatile information such as the pack sizes and
preparations (Solomon et al. 1999). Whilst this does create a more robust and descriptive system, it
complicates the knowledge base by incorporating non-useful or unavailable data (for this research)
into the drug ontology, so this method is not ideal for this research.
There has also been work by Wroe et al. They use a descriptive logic named Grail to implement an
ontology based on existing pathology and physiology ontologies to create formal descriptions of a
generic drug’s clinical properties. This is used to include indications, contradictions, side effects and
other properties in the definition of the drugs (Wroe et al. 2000). This proves promising for sorting
and grouping drugs, and possibly finding multi-dimensional patterns, however it will not be
necessary for this project, as we are not concerned with those additional attributes.
2.4 Electronic Health Records
The data we will be using for the program will consist of electronic forms of patient records. Storing
electronic health records (EHR) has become prevalent in general practice. Keeping digital copies of
7
health data presents many opportunities as well as legal issues and complications, which we will
explore here.
In Hillestad et al. (2005) they discuss the estimated savings, costs, safety benefits and other health
benefits in order to show the potential profit that the use of electronic medical records can produce
for the industry. They compare the use of I.T. in health to many other sectors such as
telecommunications, securities trading and retail to forecast the financial benefits of investing in
health informatics. This research shows the importance of dedicating resources to the development
of electronic health records.
There has been research involving extensions to EHR, such as the development of Virtual Medical
Records (vMR) (Johnson et al. 2001). These are an abstraction of conventional medical records;
stripped down to things necessary for modelling guidelines and protocols. This ongoing interest in
digital health information stresses how prevalent it is becoming, and how we must use it
appropriately.
Replicating health records on computer systems presents many legal issues regarding privacy and
ownership of information, as highlighted by Friedman (2006) and Hodge, Gostin & Jacobson (1999).
Debate continues to occur regarding the use of EHR for research, and this is the driving factor behind
the need to de-identify data before analysing it, to avoid any privacy issues.
3 Methodology
In this minor thesis we will propose a system to pre-process the data, extract prescription and
therapeutic pathways from patient data, and then execute the sequential pattern mining on the
prescription information. Finally we will reflect on and interpret the results. A process flow that
outlines the running of the program can be seen in Figure 4.
Figure 4: Program process flow
This section will explain the nature of the raw data, the methods used to pre-process the data, as
well as the application of the sequential pattern mining. We will also summarise how we will
evaluate the results, and give some expected outcomes.
8
3.1 Raw data
It is important to talk about the nature of the raw data we will be dealing with before explaining our
methods. Each tuple of the dataset will contain the information for one single drug prescription.
Cumulative prescriptions for single patients will be spread over multiple tuples. We expect each
tuple will have the following attributes:
Attribute
patient_pkey
filenumber
provider_pkey
date
script_number
drug_name
dosage
dose
repeats
packsize
quantity
form
formulary
druggen generic pbs
use
Meaning
A code unique to each patient
A number relating to the General Practice records
A code unique to each prescription provider
The date the prescription was given
Irrelevant
The name of the drug prescribed in plain text
How often to take the doses
How much of the drug to take at one time
How many times to repeat the dosage
How many dosages in a pack
How many packs given to the patient
The form of the prescribed drug
Irrelevant
Irrelevant
The condition the prescribed drug is treating
The only attributes relevant to this research are the date, drug_name, dosage, dose, repeats, as well
as the patient_key. The patient_key is needed to link prescriptions related to the same patient. In no
way this information can lead to re-identification of the person from the data we are using for this
research.
3.2 Pre-processing
The data we will be working with requires some cleaning and processing before it is suitable for
sequential path mining. We intend to modify the data to make it suitable, without jeopardising the
integrity or value of it. The first issue we have to address is the possibility of missing values. The
issue of missing values exists in any dataset, despite the prescription data being the most reliable
available (Hassey, Gerrett & Wilson 2001). We will disregard any patients with missing values in the
key attributes identified above in any of their tuples.
The next step will be to replace the drug names with their respective ATC codes. We can do this by
creating a global in Cache that maps all drug names to their respective codes. This can be generated
by utilising the World Health Organisation Collaborating Centre (WHOCC) online ATC classification
index. The WHOCC offers a service where a drug name can be entered, and the respective ATC code
is received. We can use this to automatically map the drug names to ATC codes to create our
knowledge base.
9
Figure 5: using the WHOCC online ATC index
At this point we will also address one of the shortcomings of the ATC classification for this minor
thesis – by re-coding combination drugs. The ATC system assigns unique codes to certain
combination drugs. For our research, we would prefer if these combination drugs were represented
by the codes for each of their components – as if multiple single drugs were prescribed. We can do
this by simply modifying our global to reflect these changes.
The next step of the pre-processing is to unify the forms of the dosage, dose and repeats. Different
GPs have the tendency to record dosage information differently. Different methods of recording
such as ‘1 daily’, ‘1tds’, ‘1 n’, etc need to be recoded to ‘1’. We can do this in our code; by checking
for all the different forms of dosages and replacing them with their respective simpler
representation.
Figure 6: re-coding dosage information
Finally we must generate a prescription pathway and a therapeutic pathway for each patient. The
prescription pathway can be constructed by linking all sequential patient prescriptions. The
therapeutic pathway requires knowledge of dosages and pack sizes, to accurately portray which
drugs were being taken in combination at any given time. This can be realised by creating hybrid
events for multiple drugs being taken at the one time, for example “J01CE03 + J01CE04”. The
prescription pathway and therapeutic pathway will be associated with the respective patient
records.
Any ambiguities and unresolved cases will need to be resolved manually. At this point the preprocessing has been completed, and we will have a prescription pathway and a therapeutic pathway
for each patient to perform sequential pattern mining on.
3.3 Sequential Pattern Mining
We will be performing the sequential pattern mining based on the ATC drug codes. We will run a
series of experiments to search for patterns in the given dataset. We will use different levels of
granularity of the underlying data (using the ATC ontology to generate the experiments). We start
10
with the level five flat data, then go up and generate experiments by recoding individual members of
the pathways into higher levels of the ontology.
Figure 7: example of preparing the pathways for the sequential pattern mining
The tests are likely to take a long time, so the program will be designed to run in batch mode. In all
these experiments we will observe the strength of patterns found, and analyse them at the end of
the process.
3.4 Results analysis
Once we have the results of the pattern mining, we will reflect on the strength of the rules, and also
on their interestingness. The strength will purely be a measure of how supported the patterns are in
the dataset. The interestingness will be a human judgement as to how usable and apparent the
information in the pattern is. For example, a pattern that shows a clear link between two seemingly
unrelated groups would be very interesting. A pattern that is contaminated by drugs unrelated to
the actual prescription will be uninteresting. Similarly a pattern that has been too diluted to be
interpreted efficiently will be uninteresting. Ideally we would conduct a user survey, and ask industry
professionals and general practitioners to reflect on the interestingness and usefulness of the
patterns. However, due to time constraints and ethical considerations, this will most likely not be
possible, and instead our personally professional opinions will be used.
3.5 Expected Outcomes
If the experiments are successful, we expect a number of outcomes. We expect that as we move up
the ontology, the strength of the patterns will at least be consistent, but is likely to also increase.
Logically, if there is a pattern at the fifth level, then it will also be present at the fourth level, because
the fourth level encapsulates the fifth level. For example, if we find the pattern “A00AA00 ->
B000BB00”; then “A00AA -> B000BB” will occur at least as frequently, so it will also be a pattern. The
patterns also have the potential to strengthen as we travel up the ontology, as the number of
variables decreases. For example, if we have a prescription pathway of “A00AA00 -> B000BB00 ->
A00AA01 -> B000BB02”, the base-level pattern mining will not find any patterns, but at the fourth
level we see a pattern emerge: “A00AA -> B000BB -> A00AA -> B000BB”. This increase in the
11
quantity and strength of patterns does not necessarily mean better patterns. We may have more
support for the patterns as we travel up the ontology, but they will be more generalised. This may be
useful, but it may also be detrimental to the interpretation of the results if the path is too
contaminated. This will be reflected in the interestingness.
From this research we hope to learn more about how the use of an ontology can impact the use of
sequential pattern mining. We expect that more patterns will be discoverable, and certainly more
useful patterns will emerge, but it is also likely that some quite trivial and contaminated patterns will
appear, which will need to be addressed in the interestingness of the patterns.
4 Ethical Considerations
This research is based on real prescription data from general practices. The data used for this
research was used in similar previous studies and was properly de-identified (i.e. no identifiers or
other data related to the identity of the patient is contained in the data). We will seek ethics
approval to re-use the dataset for the current study. The data that is collected will need to be stored
for seven years after the research is concluded, due to standard university procedures.
An ethics approval form has been submitted to ensure to upmost adherence to ethical research
policies, which is included in Appendix A.
5 Bibliography
Agrawal, R & Srikant, R 1995, 'Mining Sequential Patterns', paper presented at the Eleventh
International Conference on Data Engineering.
Bei, A, Luca, SD, Ruscitti, G & Salamon, D 2005, 'Health-Mining: a Disease Management Support
Service based on Data Mining and Rule Extraction', paper presented at the Engineering in Medicine
and Biology Society, 2005. IEEE-EMBS 2005. 27th Annual International Conference of the.
Chen, M, Han, J & Yu, P 1996, 'Data mining: An overview from a database perspective', IEEE
Transactions on Knowledge and Data Engineering, vol. 8, no. 6, pp. 866-883.
Chen, M, Park, J & Yu, P 1998, 'Efficient data mining for path traversal patterns', IEEE Transactions on
Knowledge and Data Engineering, vol. 10, no. 2, pp. 209-221.
Dowsey, M, Kilgour, M, Santamaria, N & Choong, P 1999, 'Clinical pathways in hip and knee
arthroplasty: a prospective randomised controlled study', Medical Journal of Australia, vol. 170, pp.
59-61.
Friede, A, Blum, H & McDonald, M 1995, 'Public health informatics: how information-age technology
can strengthen public health', Annual Review of Public Health, vol. 16, no. 1, pp. 239-252.
12
Friedman, D 2006, 'HIPAA and research: how have the first two years gone?', American journal of
ophthalmology, vol. 141, no. 3, p. 543.
Garofalakis, M, Rastogi, R & Shim, K 1999, 'SPIRIT: Sequential pattern mining with regular expression
constraints', paper presented at the 25th International Conference on Very Large Data Bases
Gruber, T 1993, 'A translation approach to portable ontology specifications', Knowledge acquisition,
vol. 5, pp. 199-199.
Han, J, Dong, G & Yin, Y 1999, 'Efficient mining of partial periodic patterns in time seriesdatabase',
paper presented at the International Conference on Data Engineering 1999.
Hassey, A, Gerrett, D & Wilson, A 2001, 'A survey of validity and utility of electronic patient records
in a general practice', British Medical Journal, vol. 322, no. 7299, p. 1401.
Hillestad, R, Bigelow, J, Bower, A, Girosi, F, Meili, R, Scoville, R & Taylor, R 2005, 'Can electronic
medical record systems transform health care? Potential health benefits, savings, and costs', Health
Affairs, vol. 24, no. 5, p. 1103.
Hodge, J, Gostin, L & Jacobson, P 1999, 'Legal Issues Concerning Electronic Health Information:
Privacy, Quality, and Liability', JAMA, vol. 282, no. 15, pp. 1466-1471.
Johnson, P, Tu, S, Musen, M & Purves, I 2001, 'A virtual medical record for guideline-based decision
support', paper presented at the AMIA Annual Symposium 2001.
LeniC, M & Kokol, P 2002, 'Combining classifiers with multimethod approach', Soft computing
systems: design, management and applications, p. 374.
Lin, F, Chiu, C & Wu, S 2002, 'Using Bayesian networks for discovering temporal-state transition
patterns in Hemodialysis', paper presented at the 35th Annual Hawaii International Conference on
System Sciences.
Lin, F, Chou, S, Pan, S & Chen, Y 2001, 'Mining time dependency patterns in clinical pathways',
International Journal of Medical Informatics, vol. 62, pp. 11-25.
Lin, F, Hsieh, L & Pan, S 2005, 'Learning Clinical Pathway Patterns by Hidden Markov Model', paper
presented at the 38th Annual Hawaii International Conference on System Sciences
McInnes, D, Saltman, D & Kidd, M 2006, 'General practitioners' use of computers for prescribing and
electronic health records: results from a national survey', Medical Journal of Australia, vol. 185, no.
2, p. 88.
Noy, N & McGuinness, D 2001, Ontology development 101: A guide to creating your first ontology,
Citeseer.
Parthasarathy, S, Zaki, M, Ogihara, M & Dwarkadas, S 1999, 'Incremental and interactive sequence
mining', paper presented at the eighth international conference on Information and knowledge
management
13
Pinto, H, Han, J, Pei, J, Wang, K, Chen, Q & Dayal, U 2001, 'Multi-dimensional sequential pattern
mining', paper presented at the tenth international conference on Information and knowledge
management
Rector, A, Zanstra, P, Solomon, W, Rogers, J, Baud, R, Ceusters, W, Claassen, W, Kirby, J, Rodrigues, J
& Mori, A 1998, 'Reconciling users’ needs and formal requirements: issues in developing a reusable
ontology for medicine', IEEE Transactions on Information Technology in BioMedicine, vol. 2, no. 4, p.
229.
Riou, C, Pouliquen, B & Beeux, PL 1999, 'A computer-assisted drug prescription system: the model
and its implementation in the ATM knowledge base', Meth Inform Med, vol. 38, pp. 25-30.
Solomon, W, Wroe, C, Rector, A, Rogers, J, Fistein, J & Johnson, P 1999, 'A reference terminology for
drugs', paper presented at the AMIA Annual Symposium 1999.
Stanek, J, Iankov, I, Gadzhanova, S, Warren, J & Misan, G 2005, 'Guideline-based General Practice
Data Mining', HIC 2005 and HINZ 2005: Proceedings, p. 254.
WHO 2010, 'World Health Organisation Collaborating Centre for Drug Statistics Methodology',
http://www.whocc.no/.
Wroe, C, Solomon, W, Rector, A & Rogers, J 2000, 'DOPAMINE: a tool for visualizing clinical
properties of generic drugs', paper presented at the 14th European Conference on Articial
Intelligence.
Zhang, M, Kao, B, Yip, C & Cheung, D 2001, 'A GSP-based efficient algorithm for mining frequent
sequences', paper presented at the International Conference on Artificial Intelligence 2001.
Zhao, Q & Bhowmick, S 2003, 'Sequential pattern mining: A survey', ITechnical Report CAIS Nayang
Technological University Singapore, pp. 1–26.
14
6 Project Plan
Date
5th March 2010
12th March 2010
th
13 March – 15th March 2010
16th March - 31st March 2010
1st April – 14th April 2010
15th April 2010
16th April – 16th May 2010
17th May – 1st June 2010
3rd June – 8th June
11th June 2010
13th June 2010
14th June - 25th July 2010
26th July – 1st August 2010
2nd August – 22nd August 2010
23rd August – 30th September 2010
10th October 2010
24th October 2010
22nd November 2010
29th November 2010
Task
Chose Supervisor
Decide on field of thesis
Develop Project Plan
Research topic
Write annotated bibliography
Submit Annotated Bibliography
Finalise Research Question
Write Minor Thesis introduction, literature review
Work on ethics proposal for obtaining data
Write Minor Thesis ethical considerations
Prepare presentation slides
Write Minor Thesis methodology
Finalise presentation slides
Minor Thesis Proposal presentation
Submit Minor Thesis Proposal
Submit Ethics Proposal Form
Familiarising with Cache programming suite
Program Pre-processing, pathway output
Program testing
Implementing Sequential Pattern Mining methods
Evaluating Sequential Pattern Mining results
Write Minor Thesis results and discussion
Minor Thesis draft to supervisor
Submit Minor Thesis
Comments for corrections received, adjust Minor Thesis
Submit Final bound copies
15
Appendix A – Ethics Approval Application
University of South Australia
Human Ethics Application
Protocol Number : 0000020574
Application Title : Impact of applying Hierarchical Structure to typically Flat Data for path mining prescription pathways
Date of Submission : N/A
Primary Investigator : Mr Matheson Lee Ramsey
Prior Assessment
Non-UniSA HREC
UniSA HREC
Project details
Research Ethics Advisor
Project type
1.1 Has another Human Research Ethics Committee (other than UniSA) reviewed this research project before and does
this clearance/approval accurately describe the project as it is to be conducted?*
Yes No
2.1 Is this application a resubmission of an application that was considered by UniSA HREC and the decision was 'Not
Approved: Resubmit', 'Not Approved' or "Approved subject to" and the status has expired (ie amendments were not
made within the 6 month timeframe.
Please note if your application is Approved subject to and 6 months has not lapsed then you should use the application
you submitted to make the required changes. *
Yes: Not approved: resubmit
Yes: Not Approved
Yes: Approved subject to and the status has expired
No
3.1 Name of Research Ethics Adviser
This question is not answered.
3.2 Has the Research Ethics Adviser conducted an ethics workshop in the last 12 months?*
No
3.3 Have you attended human ethics training in the last 12 months?*
Yes No
4.1 Main type of research (e.g. staff, PhD). *
Honours
Course Approval
PhD
Masters by Course work
Masters by Research
Professional Doctorate
Undergraduate
Graduate Diploma / Graduate Certificate
Staff
Other
4.2.1 Please note that, if you are a student applicant, your application will be forwarded to your principal supervisor
once submitted for their approval. If they are satisfied with your application it will be forwarded to the relevant review
group. If your supervisor requires changes to be made then your application will be returned to you to make the
required changes.
4.3 Other type of research (e.g. staff, PhD). Please select all that apply*
None
Honours
Course Approval
PhD
Masters by Course work
Masters by Research
Professional Doctorate
Undergraduate
Graduate Diploma / Graduate Certificate
Staff
Other
16
Project details
Resources
Project funding
Ownership of Data
5.1 Title of research project*
Impact of applying Hierarchical Structure to typically Flat Data for path mining prescription pathways
5.2 Plain English title*
Testing what kind of impact on path mining success of applying a hierarchical structure to data that is
normally not stored in a hierarchy, such as prescription data
5.3 What are the aims of your research*
-evaluate the impact and usefulness of using a hierarchy to store typically flat data
-evaluate what levels in the hierarchy produce the strongest and most useful paths
5.4 List your research questions or hypotheses. Your protocol should clearly identify the questions which you want your
research to answer.*
What is the impact on path mining of applying a hierarchy structure to typically flat data?
5.5 Explain the need for, and value of, your research. Place the aims in the context of existing research or practice.
(You must include a list of not more than 10 key references as an attachment to support your answer to this question.
These are to be attached to the Attachment tab available from the Application Overview screen).*
The research presents an opportunity to explore and understand how vague information can be before it is no
longer useful (the higher up a hierarchy the more vague the information is, as it encompasses more
elements).
It also presents an opportunity to discover the impact of using a hierarchy to elaborate typically flat data. If a
mark improvement is found, this could lead to the adoption of hierarchical structures for other applications,
which could lead to increased running speeds or more accurate path mining, depending on the domain.
There is a need to explore the impact of using a hierarchy as some flat data (such as prescription drug
information) is too vast to perform complete path mining in feasible time frames.
5.6 Proposed commencement date*
05/07/2010
5.7 Proposed completion date*
01/11/2010
6.1 Have you applied for funding for this project (other than divisional funds)*
Yes No
8.1 Detail who will own the data and the results of your research (student researchers normally own their own research
and data unless there is a written agreement between the student and the University / third party; staff research and
data is normally owned by UniSA).
Please select all that apply.*
UniSA
Student researcher
Other
8.2 Does the owner of the information or any other party have any right to impose limitations or conditions on the
publication of the results
of this project?*
Yes No
8.3 Please note that it is the researcher's responsibility to ensure that, where required, an appropriate agreement is in
place. If you are unsure whether this is needed, please consult the UniSA website . Do you require an agreement
regarding ownership or do you currently have an agreement in place?*
An agreement is required A signed agreement is in place An agreement is not required
Please note that you must inform UniSA HREC once the agreement has been signed. Final ethics approval cannot be
given until confirmation is received.
9.1 The information which will be stored at the completion of this project is of the following type(s). Please select all
that apply.*
Individually identifiable
Re-identifiable
Non-identifiable
9.2 Where will the data be stored (please be specific with the address e.g. If stored at UniSA please specify which
campus and the office/room location)*
the data will be stored at the Mawson Lakes campus of UniSA in D2-03
9.3 For how long will the information be stored after the completion of the project? Why has this period been chosen?*
5 years - to ensure any queries after the completion of the project can be answered, and to quell any later
accusations of copying.
9.4 In what formats will the information be stored during the research project? (eg. paper copy, computer file on floppy
disk or CD, audio tape, USB memory stick, videotape, film). *
computer file
17
9.5 How will information, in all forms, be disposed after the retention time has lapsed? (Please refer to the Ownership
and Retention of Data Policy. The Head of School (or equivalent) must be aware of this process.*
deletion of computer file and any backups (on the single same machine)
9.6 Will any other individual(s), organisation(s) or researcher(s) (other than those listed on the Investigators tab) have
authority to use or have access to the information? *
Yes No
9.7 Specify the measures to be taken to ensure the security of information from misuse, loss, or unauthorised access
while stored during the research project? (eg. will identifiers be removed and at what stage? Will the information be
physically stored in a locked cabinet?)*
the data will be immediately de-identified as this identifying information is of no use to the study. The data
will be stored on a laptop computer using a strong user password to protect from misuse.
9.8 What arrangements are in place with regard to the storage of the information collected for, used in, or generated
by this project in the event that the principal researcher / investigator ceases to be engaged at the current
organisation? (Please refer to the Ownership and Retention of Data Policy.*
If the principal researcher ceases to be engaged in the study the data will become the responsibility of the
supervisor Jan Stanek
10.1 Please refer to the UniSA website : Do you require insurance cover for this project"*
Yes No
11.1 Is the activity archival research? A large proportion of activity involving the analysis of documents, publicly
available information, or previously collected data may be outside the scope of the University's human research ethics
arrangements.*
Yes No
11.2 Is the work being conducted only for UniSA administrative / service delivery purposes?*
Yes No
Scope
Scope
Research type and participants
Research type
Participant information
12.1 Should the work be characterised as quality assurance or an audit, rather than human research within the scope of
the University's human research ethics arrangements?*
Yes No
12.2 Is the work a practical exercise or test conducted for teaching purposes in a University administered facility? (
Please refer to Appendix 2 of Guidelines for Evaluation Activities Involving UniSA Students and Staff) *
Yes No
13.1 Is the work a routine experiment or procedure conducted for teaching purposes in a University administered
facility? *
Yes No
13.2 Is the work / data collection conducted by a student only for teaching / learning purposes? *
Yes No
13.2.1 Will the results be published / presented in any way other than a paper / product produced purely for
assessment purposes ?*
Yes No
14.1 This project involves: (Please select all that apply.)*
Research using qualitative methods
Research using quantitative methods, population level data or databanks, e.g survey research, epidemiological research
None of the above
14.2 What research methodologies will you use? (Please select all that apply.) *
Anonymous questionnaires
Internet questionnaires
Questionnaires requesting intimate personal, identifying, or sensitive information
Other questionnaires
Face to face interviews which do not request personal or sensitive information
Face to face interviews which request personal or sensitive information
Telephone survey which does not request personal or sensitive information
Telephone survey which requests personal or sensitive information
Focus groups
Action Research
Observation of participant's usual activities
Observation of an activity set up for the purposes of the study
Access to medical records (or records which contain intimate personal information, and are individually identifiable and
are not publicly available)
Experiment or testing of a procedure, drug or equipment
Use of biological hazards, GMOs or pathogenic organisms
Use of carcinogenic and/or toxic chemicals, including heavy metals
Use of Radiation (Ionising and/or Non-ionising, but not Ultrasound)
Other
18
14.2.1 Please describe what research methodology you will use.*
none of these methodologies apply. We are only interested in obtaining the de-identified data.
14.3 Will you be audio-taping, video-taping, or taking photographs of participants during the course of the study?
Please select all that apply.*
Audio-taping
Videotaping
Photographs
No
Selection of participants
Project start, end, location details
Irregular consent process
Limited disclosure / waive consent
Covert observations
15.1 How many participant groups are involved in this research project? *
0
15.3 What is the expected total number of participants in this project at all sites?*
0
16.1 What process(es) will be used to identify potential participants?*
there are no participants
16.2 Will potential participants be 'screened' or given a test/questionnaire to assess their suitability as a participant for
the study?*
Yes No
16.3 Describe how initial contact will be made with potential participants.*
No contact
16.4 Is an advertisement, e-mail, website, letter or telephone call proposed as the form of initial contact with potential
participants?*
Yes No
16.5 List the selection and, if appropriate to your study, the exclusion criteria for participants.*
there are no participants
16.6 If it became known that a person or participant group was recruited to, participated in, or was excluded from the
research, would that knowledge expose the person to any disadvantage or risk?*
Yes No Not Applicable
17.1 Will the research be undertaken in Australia?*
Yes No
17.1.1 In which town(s)/city(ies)/State(s) of Australia will the research be undertaken in? *
Adelaide, South Australia
17.1.2 In how many Australian organisations will the research be conducted? *
0
17.2 Will the research be undertaken overseas?*
Yes No
17.3 Are there any time-critical aspects of the research project of which the review committee should be aware?*
Yes No
18.1 Does the research involve limited disclosure to participants. Please refer to the National Statement. *
Yes No
18.2 Are you asking the HREC / review body to waive the requirement of consent? Refer to the National Statement*
Yes No
19.1 Does the research involve covert observation? Refer to the National Statement*
Yes No
Deception
Project type
Project type
Participants
Recruitment
Risk to Participants
Risk to participants
Right to Privacy
20.1 Does the research involve deception. Refer to the National Statement*
Yes No
21.1 Does the research involve any of the following? Please select all that apply.*
Drugs, narcotics, poisons, placebo will be ingested / injected, or an invasive procedure will be administered
Clinical trials
Cellular therapy
The collection and / or use of human samples. This includes tissue, blood or other body fluid collection / extraction
Genetic testing and/or genetic research
Human gametes or use or creation of human embryos
A practice or intervention which is an alternative to a standard practice or intervention
Investigating workplace practices which could possibly impact on workplace relationships
19
Conducting the research overseas and recruiting participants
None of the above
38.1 Who will you be recruiting as participants for this study? (If there is a high chance that you will be recruiting one
of these groups, you
should also select that participant group).*
General public (over 18 years of age)
Members of a collectivity
People whose first language is not English
People who are illiterate
Pregnant women/human foetus
Children
People who are in a dependent or unequal relationship
People who are highly dependent on medical care
People with a cognitive impairment
Aboriginal and/or Torres Strait Islander peoples
People who may be involved in illegal activity
Not recruiting participants
Other
38.2 Does the research involve issues likely to be considered significant to Indigenous peoples?*
Yes No Not Applicable
51.1 Please select all that apply. This research project:*
Has the potential to expose participants to potential civil, criminal or other proceedings
Makes it possible for third parties to identify participants
Involves a risk of physical injury
Involves human exposure to ionising and/or non-ionising radiation (including X-ray)
Involves exposure to disease or infection
Involves pain or significant discomfort
Involves psychological or emotional stress
Involves sensitive personal information
Could expose participants to potential loss of professional reputation, market standing, or employability
Could result in significant negative impact upon personal relations
Offers an inducement which could be considered coercive
Involves the participation of people who legally cannot provide voluntary and informed consent for their participation in
research
None of the above
Collection method
Collection method
Participants Relationships
Consent
Consent process
66.1 Does IS42 or the Commonwealth Privacy Act apply to the research (eg access to identified personal data held by
third parties subject to privacy regimes)? Refer to the Privacy law*
Yes No
67.1 Will the source of the information about participants used in this research project be collected directly from the
participant? (e.g. asking participants directly about their medical history)*
Yes No
67.2 Will the source of the information about participants used in this research project be collected from another person
about the participant? (e.g. asking participants' doctors about their patients medical history)*
Yes No
67.3 Will the source of the information about participants involve the use or disclosure of information by an agency,
authority or organization (other than UniSA)? (e.g. accessing participants' medical records)*
Yes No
67.4 Will the source of the information about participants involve the use of information which you or your organisation
Collected previously for a purpose other than this research project?*
Yes No
67.5 Describe how information collected about participants will be used in this project.*
data will be de-identified immediately as the identifying information poses no use for the study.
67.6 Indicate whichever of the following applies to this project: Please select all that apply.*
Information collected for, used in, or generated by, this project will not be used for any other purpose.
Information collected for, used in, or generated by, this project will/may be used for another purpose by
the researcher for which ethical approval will be sought.
Information collected for, used in, or generated by, this project is intended to be used for establishing a database/data
collection/register for future use by the researcher for which ethical approval will be sought.
Information collected for, used in, or generated by, this project will/may be made available to a third party for a
subsequent use or which ethical approval will be sought
Other
20
68.1 Is there an existing relationship or one likely to arise during the research, between the potential participants and
any member of the research team or an organisation involved in the research?*
Yes No
68.2 Does the researcher / investigator have another role in relation to the participant?*
Yes No
68.3 Will the research impact upon, or change, an existing relationship between participants and researcher /
investigator or organisations.?*
Yes No
69.1 Will consent for participation in this research be sought from all participants? Refer to the National Statement*
Yes No
69.1.1 Explain why consent will not be sought from all participants.*
there are no participants.
70.1 Describe the consent process, ie how participants or those deciding for them will be informed about, and choose
whether or not to
participate in, the project.*
no participants
Risks and benefits
Risks and benefits
Risks and benefits cont.
Researcher training
70.2 If a participant or person on behalf of a participant chooses not to participate, are there specific consequences of
which they should be made aware, prior to making this decision?*
Yes No
70.3 If a participant or person on behalf of a participant chooses to withdraw from the research, are there specific
consequences of which they should be made aware, prior to giving consent?*
Yes No
70.4 Can individual participants be identifiable by other members of their group? (e.g. co-workers, focus group
members etc.)*
Yes No
70.7 Will consent be specific or extended or unspecified? Refer to section 2.2.14-2.2.18 of the National Statement*
Specific Extended Unspecific
Please note that when answering the following questions, only risks beyond those encountered in
everyday life are relevant. Refer to the National Statement
71.1 Are there any risks to participants as a result of participation in this research project (eg physical, psychological,
spiritual, emotional,
legal, social, financial well-being, employability or professional relationships)?*
Yes No
71.2 What expected benefits (if any) will this research have for the wider community?*
-provide an insight into the requirement for specificity of useful informational - IE, in a hierarchical structure,
how far up the hierarchy can we go before paths that are mined are too vague?
-allow for future work which can autonomously detect the most useful level of information for a specific
purpose. This could result in more accurate prescriptions, and more successful treatments.
71.3 What expected benefits (if any) will this research have for participants?*
data being de-identified means there will be no personal benefit
71.4 Are there any other risks involved in this research? eg. to the research team, the organisation, others (eg physical,
psychological, spiritual, emotional, legal, social, financial well-being, employability or professional relationships)*
Yes No
72.1 Is it anticipated that the research will lead to commercial benefit for the investigator(s) and or the research
sponsor(s)?*
Yes No
72.2 Is there a risk that the dissemination of results could cause harm of any kind to individual participants - whether
their physical, psychological, spiritual, emotional, legal, social or financial well-being, or to their employability or
professional relationships - or to their communities?*
Yes No
72.3 Describe how the researchers / investigators intend to monitor the conduct and progress of the research project?*
-data will be de-identified at first opportunity. There will be no opportunities for misconduct as long as the
data is successfully de-identified.
72.4 It is mandatory for researchers to report suspected cases of child abuse/neglect, domestic violence,
bullying, illegal activities, use of illicit substances, abuse of elderly persons, professional negligence etc.
72.4.1 Is it likely that this will be disclosed during the course? *
Yes No
73.1 List the relevant qualifications, experiences and /or skills of the research team which equip them to conduct this
research*
3 years study at UniSA learning ethical conduct
21
Reporting of results
Reporting of results cont.
Peer review
Declaration
Minor experience in health informatics with research placement and ongoing work with minor thesis
73.2 Do the researchers involved in this research project require any additional training in order to undertake this
research?*
Yes No
74.1 Is it intended that results of the research that relate to a specific participant be reported to that participant?*
Yes No Not Applicable
74.2 Is the research likely to produce information of personal significance to individual participants?*
Yes No
74.3 Will individual participant's results be recorded with their personal records?*
Yes No Not Applicable
74.4 Is it intended that all or some of the results that relate to a specific participant be reported to anyone other than
that participant?*
Yes No
74.5 Will research participants have the opportunity to receive a copy of your final report or summary of the findings if
they wish?*
Yes No
74.5.2 Why will participants not be provided with a copy of the final report or summary of the findings?*
there are no participants
75.1 Is the research likely to reveal a significant risk to the health or well being of persons other than the participant
(eg family members, colleagues)?*
Yes No
75.2 Is there a risk that the dissemination of results could cause harm of any kind to individual participants - whether
their physical, psychological, spiritual, emotional, social or financial well-being, or to their employability or professional
relationships - or to their communities?*
Yes No
75.3 How is it intended to disseminate the results of the research? Please select all that apply.*
Thesis/dissertation
Journal article/s
Research paper
Conference presentation
Commissioned report
Other
75.4 Will the confidentiality of participants and their data be protected in the dissemination of research results?*
Yes No Not Applicable
75.4.1 Explain how confidentiality of participants and their data will be protected in the dissemination of research
results*
de identified data will be used, and so no confidential information will be revealed in the dissemination
76.1 Provide details of the anticipated duration of the data collection / human research phase of the project.*
simple obtain some data from previous researcher and/or database - collection should take no longer than 1-2
days
76.2 Has the research proposal, including design, methodology and evaluation undergone, or will it undergo, a peer
review process?*
Yes No
Declaration
The Primary Contact for this project is responsible for the application that is submitted and must be the
one to agree to
the following statement.
"On behalf of the research team for this project, I confirm that all members of the research have read the current
NHMRC National Statement on Ethical Conduct in Human Research. The research team accepts responsibility for the
ethical and appropriate conduct of the procedures detailed in this application, confirm that the research team will
conduct this project in accordance with the principles described in the National Statement, and confirm that the
research team will comply with any other condition laid down by the University of South Australia's Human Research
Ethics Committee."*
I agree
22