Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Matheson Ramsey ramml003 Using an ontology in place of flat data for Sequential Pattern Mining A minor thesis for the degree of Bachelor of Computer Science (Honours) School of Computer and Information Science University of South Australia 13/06/2010 Supervisor Jan Stanek i Table Of Contents Glossary .................................................................................................................................................. iv 1 Introduction ......................................................................................................................................... 1 1.1 Motivation..................................................................................................................................... 3 1.2 Potential Contributions ................................................................................................................. 3 1.3 Field of thesis ................................................................................................................................ 4 1.4 Research Question ........................................................................................................................ 4 2 Literature Review ................................................................................................................................. 4 2.1 Data Mining in Health Informatics ................................................................................................ 4 2.2 Sequential Pattern Mining ............................................................................................................ 6 2.3 Drug Ontologies ............................................................................................................................ 7 2.4 Electronic Health Records ............................................................................................................. 7 3 Methodology ........................................................................................................................................ 8 3.1 Raw data ....................................................................................................................................... 9 3.2 Pre-processing............................................................................................................................... 9 3.3 Sequential Pattern Mining .......................................................................................................... 10 3.4 Results analysis ........................................................................................................................... 11 3.5 Expected Outcomes .................................................................................................................... 11 4 Ethical Considerations........................................................................................................................ 12 5 Bibliography ....................................................................................................................................... 12 6 Project Plan ........................................................................................................................................ 15 Appendix A – Ethics Approval Application ............................................................................................ 16 ii Table of Figures Figure 1: a prescription pathway ............................................................................................................ 1 Figure 2: a therapeutic pathway ............................................................................................................. 1 Figure 3: example of ATC drug hierarchy for Propicillin ......................................................................... 2 Figure 4: Program process flow .............................................................................................................. 8 Figure 5: using the WHOCC online ATC index ....................................................................................... 10 Figure 6: re-coding dosage information................................................................................................ 10 Figure 7: example of preparing the pathways for the sequential pattern mining ............................... 11 iii Glossary Pathway Clinical Pathway (CP) Prescription Pathway (PP) Therapeutic Pathway (TP) Tuple Flat data Node Hierarchy Ontology Granularity Contamination Dilution Pattern Sequential Pattern Mining (SPM) A sequence of drug prescriptions over time A pathway that has been designed to be followed in order to treat a certain condition or disease The pathway as seen by the prescriber. It is only concerned with showing what drugs were prescribed and when; it does not involve which drugs are being consumed at the same time The pathway as seen by the recipient. It takes in to account dosages, so that it is apparent when multiple drugs are being taken at the same time, and also when no drugs are being taken A single item in a dataset; one row of values All attributes in the data are numeric or categorical; no embedded objects, objects within objects, etc. A single item that links to others. If a node links to another node it is that nodes parent. If a Node is linked to by another node it is that nodes child A data structure that has multiple levels of nodes. A hierarchy with a strict “is-a” relationship between parent and child levels of nodes The level of the hierarchy that is being referred to. A higher granularity implies working further down the hierarchy (more granular, hence more specific), while a low granularity implies working higher up the hierarchy (less granular, hence more general) Using a level of the hierarchy that is too high, so there are too many unrelated subgroups included Using a level of the hierarchy that is too low, so information becomes overwhelming or too specific A series of prescriptions that occurs often in the dataset The process of extracting patterns from the dataset iv v 1 Introduction The use of electronic support systems in health care is increasingly important (Hillestad et al. 2005). Studies have shown that 90% of general practitioners (GPs) use a clinical software package, and 98% of these GPs use the clinical packages for prescribing (McInnes, Saltman & Kidd 2006). This means there is an abundant amount of rich heath data available on general practitioners computers. However, a lot of this data is stored as free text, and as a result it is not easily interpreted. Using computers for writing prescriptions offers several benefits for GP’s, so prescription data is one of the most complete and structured types of data in general practice (Hassey, Gerrett & Wilson 2001). The prescription data can be represented in a number of ways to aid the GPs; such as tabular summaries (Wroe et al. 2000), or as a series of connected nodes where the nodes represents different prescriptions; forming a prescription pathway (Stanek et al. 2005). A prescription pathway represents all the prescriptions for a particular patient over a certain period of time as prescribed by the GP. We can also capture the therapeutic pathway, which is a combination of the drug prescriptions and the amounts prescribed; to give an idea of what combination of drugs was being taken at any given time. These concepts are visualised in Figures 1 and 2. Creating pathways can make the data more interpretable, easier to follow, and additional relationships between prescriptions may become apparent. Figure 1: a prescription pathway Figure 2: a therapeutic pathway These concepts of pathways are based on the use of flat prescription data. However, drugs are by nature hierarchical, and can be modelled in an ontology; such as the Anatomical Therapeutic Chemical (ATC) drug classification, where different levels of the ontology represent different groups of drugs (WHO 2010). The different levels range from specific chemical substances to broad anatomical main groups (see Figure 3). As a reference, Figure 1 and 2 can be seen as using the fifth and most specific level of the ontology. If an ontology like this is used with the pathways, it is possible to analyse the prescription pathway at different levels of granularity to obtain different kinds of information about the prescriptions for different methods and applications. 1 Figure 3: example of ATC drug hierarchy for Propicillin Different information exists at different levels of the hierarchy. A low granularity implies a more general pathway (for example, level 1 or 2) and might give the observer a simpler understanding of what the pathway is trying to achieve. Meanwhile a high granularity gives us more a specific pathway (for example, level 5), which can be used to copy the pathway at the most granular level. Conversely, changing the granularity can have precarious effects. For example, in Figure 3 we have a model for Propicillin. At level three we can see it is part of the penicillins group. We know that some people have allergies to Penicillins; so if we want to data-mine for this pattern, then logically the best level to operate at is level three. If we list all the drugs at more specific levels (e.g. J01CE03 – Propicillin); the pattern will get weaker, as it will be spread across a large amount of variables, hence the patterns will be diluted. Conversely, if we use a lower granularity (e.g. J02 – Antibacterials for systemic use); the pattern showing penicillin allergy may disappear, as the J02 group contains other groups of antibacterial drugs without penicillin-type allergy - the pattern becomes contaminated by unrelated drugs. The concepts of dilution and contamination will be used throughout this paper as defined here. These notions highlight the potential importance of incorporating an ontology knowledge base for the discovery of patterns. The basic approach to discovery of these pathways is sequential pattern 2 mining (SPM). Sequential pattern mining is the process of trying to find the relationships between occurrences of sequential events, to find if there exists any specific order of the occurrences (Zhao & Bhowmick 2003). By performing SPM on the prescription data and implementing the knowledge of a drug ontology, we have the potential to find many more new and interesting patterns that are not present in the mining of flat data. There are many algorithms to perform sequential pattern mining, which we will explore in section 2.2. 1.1 Motivation Traditional path-mining algorithms work on flat data without taking into account hierarchies. This may not be sufficient in some cases, as given the concept of dilution and contamination; we expect that some important patterns can be missed if the underlying data is not processed at the optimal level of granularity. It is important to investigate this use of an ontology to explore what difference it makes to the success of the data mining. The prescription data we are using is by nature hierarchical, and so we have the opportunity to explore the impact of an ontology on the sequential pattern mining. There is no current indication as to what effect a change in granularity will have on the usefulness of the sequential pattern mining. For the application of prescription pathways: if the exact chemical components used in each prescription are always mined; the associations may be too weak. If a higher level of granularity is always selected, the pathway may be too ambiguous. This research will explore how changing the granularity will affect the sequence pattern discovery process. This has wider implications in the field of data mining; as this use of an ontology could benefit the approach to finding patterns in data. For our domain; being able to ascertain which granularities are appropriate for which purposes will make the prescription data far more useful for GPs. Simply omitting all other granularities in favour of one will result in potential loss of important information. This research is necessary to identify the effect of using an ontology on the pattern discovery. 1.2 Potential Contributions The results of this research will show how and what impact the use of the ontology has on the success of the sequential pattern mining. If we are successful in showing that the ability to detect patterns in the dataset depends on the selection of the correct granularity; this will be a motivation for further research to find ways to assess the granularity of the datasets prior to the application of standard data-mining algorithms. This is a necessary first step of exploring the influence of ontologies on pattern mining that could lead to enhanced methods of a-priori manipulation of a dataset to enhance the effects of the data mining. This research has the potential to reveal how much detail is required for prescription information to be meaningful and usable, and how much abstraction of the pathways is possible before the patterns become too ambiguous to be significant. This could have potential applications to GPs analysing prescription information in practice. Better delivery of prescription pathways to GPs could help identify anomaly cases; identify practitioner’s trends in prescribing; and monitor adherence to clinical pathways. This system could be integrated in the practice review process for GPs to reflect on and adapt their methods. Overall this study has the potential to improve the quality of general practice. 3 1.3 Field of thesis Health informatics; Data Mining: Sequential Pattern Mining; Ontologies 1.4 Research Question This project will focus on analysing the impact of applying an ontology for data mining typically flat data. We will answer the question ‘what impact with the use on an ontology in place of flat data have on the success of sequential pattern mining?’ 2 Literature Review This section will focus on some previous research into several core aspects of this minor thesis. We explored some similar work involving data mining in health informatics, as well as some supplementary literature to support aspects of the project. We will cover the data mining aspect with some sequential pattern mining methods; the drug ontology with some conceptual work on ontologies and some previous work with drug ontologies; and the nature of the raw data with research regarding the use of electronic health records. 2.1 Data Mining in Health Informatics There is a large amount of research in health informatics that uses data mining. Health informatics is the science of applying Information age technology to serve the specialised needs of public health (Friede, Blum & McDonald 1995). Data mining has become a key benefactor to the progress of the integration of health information systems into general practice. This research will draw on concepts proposed in the work by Stanek et al. (2005). In their research, a method is proposed that compares practice patterns to clinical pathways. Their research is focused on patients with diabetes and hypertension. They also use the ATC drug coding system, but do not fully utilise the hierarchical nature of the drug ontology for their data mining, and instead use a set granularity for all experiments. These methods are tractable for smaller domains, however we intend to apply our methodology to a far broader area, where their methods quickly become problematic. We intend to further the efforts of the data mining by fully utilising the hierarchical nature of the data into the data mining processes, and operate in a domain outside of only diabetes and hypertension. Some other work involves the adaption of Bayesian Networks for discovering temporal-state transition patterns, specifically in the hemodialysis process (Lin, Chiu & Wu 2002). Their research focuses on learning clinical pathways, so that pathways for admitted patients can be predicted. They use a rich set a data including more attributes such as test results; and create a set of states, events, and actions. Their research proves very successful; but does not implement knowledge of an ontology for the drugs. Medical data is by natural hierarchical; and in our research we intent to discover if the concepts of contamination and dilution are important and if they need to be recognised by researchers performing data mining on medical data. Also, their research involves a different data set with attributes that we do not have access to for our research. It is also specific to hemodialysis, which limits its applications. The research shows a similar approach to the adaption of data mining techniques to health data. 4 Bei et al. (2005) perform some related work with a system they call Portal. Their research is focused on improving the quality of procedures by giving continuous support to physicians. They perform some rule extraction for the selection of pacemaker systems for new patients, and implement a simple business logic flowchart system to automatically classify new patients. This type of implementation would not be suitable for this project; due to the immense size of the flowchart required to model all possible prescription pathways. The researchers go on to identify the potential for data mining for long processes (such as long prescription patterns). This research highlights the need for optimised support systems to reduce costs and improve the quality of procedures. Another associated piece of research is the use of the Hidden Markov Model (HMM) to learn clinical pathways (Lin, Hsieh & Pan 2005). They model the process of spontaneous delivery of patients, and develop a 4-state pathway that accurately encompasses normal spontaneous delivery. The model is trained with the patient data, and visualised in a manner that simplifies the pathway for doctors. They intend to learn the clinical pathway, so their outcomes are defined; whereas out research aims to simply mine the data for patterns and see what emerges. There is also no use of a drug ontology, and the data used is not strictly prescription data. The outcome of their research is a model that accurately recreates clinical pathways that can be used to predict possible paths for an admitted patient. This paper shows another interesting approach to the use of data mining in health informatics. Work by Riou, Pouliquen & Beeux (1999) aims to predict the best drug for a prescription based on the clinical background of the patient. The methodology is not so much about mining for patterns, and more about analysing patient’s disorders, pathophysiological conditions, age, and other factors to determine the next step in the clinical process. This tool was developed from the premise that junior residents and medical students have difficulties selecting the most appropriate drug for a given scenario, which is an additional motivation for our research. They do propose the use of the ATC codes, but decide against it due to the limitations of only maintaining one use, and not fitting indicators or other properties, and instead opt to develop their own drug knowledge base. Whilst several of these factors do not impact our research, they do identify a key shortcoming of the ATC classification; the fact that drugs can only exist in one place in the ATC ontology, whereas in reality drugs can have multiple uses. Whilst this factor is important, at our level of research it is not worth modifying the ATC classification to accommodate for this due to the increase in complexity of the resulting knowledge structure, as a result of cross-links, etc. Some further research could involve the adaption of our techniques to a methodology that does account for this flaw. Their research differs from ours in that ours aims to search for arbitrary patterns in existing prescription data, whereas theirs methodically analyses certain data values to determine the precise next step for a single patient. Another piece of related research is regarding mining time dependency patterns in clinical pathways (Lin et al. 2001). They intend to find patterns of process execution sequences that showcase the dependant relation between activities. The researchers develop a method to discover the patterns of clinical pathways using patient records and clinical log data. Their research covers the broader domain of complete clinical pathways, so additional data is used in the process, whereas our research focuses on the pathways relating to drug prescription only. Also, there is no use of ontologies, which this research offers, displaying some of the differences between the research efforts. 5 2.2 Sequential Pattern Mining The concept of mining for patterns in sequences of data has been implemented and improved in many applications. It stems from the field of Data Mining; which is the process of extracting interesting information or patterns from information repositories (Chen, Han & Yu 1996). Sequential pattern mining (SPM) is the process of trying to find the relationships between occurrences of sequential events, to find if there exists any specific order of the occurrences (Zhao & Bhowmick 2003). Many methods of SPM have been developed, some of which we will explore here and evaluate for this project. One of the earliest and possibly the simplest algorithms developed for SPM is AprioriAll (Agrawal & Srikant 1995). It is based on the Apriori principle from data mining for association rules, and is a very base-level method for finding sequential patterns. It finds single frequently occurring items in the dataset and then attempts to find sequences of them. It is a very simple method, but is very computationally expensive as it requires multiple database scans. The straightforward nature of the algorithm may make it serviceable for our studies, as it is less likely to be disrupted by peculiar outliers or trends in the data. Lin & Lee (2002) propose another method called MEMISP (MEMory Indexing for Sequential Pattern mining). It is a faster solution; however their method increases in complexity as the database size increases. As this project has the potential to scale to very large amounts of data, this is not ideal. Whilst other methods such as AprioriAll are likely to take large periods of time for large databases as well, they will require less time to be invested in development, which is preferred. SPIRIT (Sequential Pattern mining with Regular expression constraints) is an optimised algorithm designed to mine user-specified patterns (Garofalakis, Rastogi & Shim 1999). This may be useable if we decide to search for particular sequences of drug prescriptions, such as known clinical pathways. Dowsey et al (1999) show that the use of clinical pathways reduces the duration of admission for patients. By searching for parts of clinical pathways, we could monitor general practitioners adherence to them. However, as this project is not directed at any particular set of drugs or clinical pathways, the sheer volume of possible pathways to test against makes this impractical, but it could provide an interesting extension to investigate. There are also methods involving multiple attributes, called multi-dimensional sequential pattern mining (Pinto et al. 2001). These are useful for adding new information such as age groups and demographics to patterns. However, due to the nature of the data being used for this project and the fact that no other data will be guaranteed to be usable, we are unlikely to use this approach in the current phase. Another extension on conventional pattern mining is incremental mining, introduced in Parthasarathy et al.’s work (1999) and explored by Zhang et al. (2001). This is useful for datasets that continue to change over the time, which is likely to be the case for the patient prescription data if the system was implemented in practice. However, in this project space the dataset will not be changing, and so this will not be implemented. This could prove another interesting extension to the sequential pattern mining in the future nonetheless. Periodic Pattern Analysis involves the limiting of pattern mining to certain periods, to specify when to check for recurring patterns (Han, Dong & Yin 1999). This could be used to explicitly find monthly 6 or yearly patterns. Naturally this is most effective if the data is collected over long periods of time. If time permits, this concept could prove an interesting addition to the mining. Other research in the area includes optimising for linked objects in a distributed system (Chen, Park & Yu 1998), and creating hybrid combinations of other methods (LeniC & Kokol 2002). Many advanced and optimised methods have been developed for sequential pattern mining, but many are specific to certain domains or types of data, which prove unsuitable for this project. 2.3 Drug Ontologies Gruber offers a definition of ontologies; he says they are explicit formal specifications of the terms in the domain and relations among them (1993). Noy & McGuinness say ontologies are used to share common understanding of the structure of information, to make domain assumptions explicit, to separate domain knowledge from the operational knowledge and to analyse domain knowledge (2001). In an attempt to liken these definitions to this project; the domain will be the set of prescription drugs, and we will be formally defining the explicit “is-a” relationship between drugs and their parent groups. There are several implementations of ontologies for this domain. As discussed earlier, the Anatomical Therapeutic Chemical (ATC) classification is one such drug ontology (WHO 2010). This ontology provides a simple hierarchical breakdown of the field of prescribable drugs, and is easily accessible from the organisations website. Whilst the ATC classification does have some flaws which we will address in section 3.2, it is a widely used standard and will prove ideal for our research. In 1998 Rector et al. proposed some requirements for developing ontologies to be used in medicine. They identify that an ontology should be treated as an "assembly language", and that it should be viewed as a “pure tree in which the branches at each level are disjoint but nonexhaustive subconcepts of the parent concept” (Rector et al. 1998). These elements provide the basis of some further work into developing drug ontologies. One such piece of research involves the development of Prodigy: a reusable and automaticallyclassified ontology to describe the chemical composition of the drugs, as well as a dictionary of prescribable products, which includes more volatile information such as the pack sizes and preparations (Solomon et al. 1999). Whilst this does create a more robust and descriptive system, it complicates the knowledge base by incorporating non-useful or unavailable data (for this research) into the drug ontology, so this method is not ideal for this research. There has also been work by Wroe et al. They use a descriptive logic named Grail to implement an ontology based on existing pathology and physiology ontologies to create formal descriptions of a generic drug’s clinical properties. This is used to include indications, contradictions, side effects and other properties in the definition of the drugs (Wroe et al. 2000). This proves promising for sorting and grouping drugs, and possibly finding multi-dimensional patterns, however it will not be necessary for this project, as we are not concerned with those additional attributes. 2.4 Electronic Health Records The data we will be using for the program will consist of electronic forms of patient records. Storing electronic health records (EHR) has become prevalent in general practice. Keeping digital copies of 7 health data presents many opportunities as well as legal issues and complications, which we will explore here. In Hillestad et al. (2005) they discuss the estimated savings, costs, safety benefits and other health benefits in order to show the potential profit that the use of electronic medical records can produce for the industry. They compare the use of I.T. in health to many other sectors such as telecommunications, securities trading and retail to forecast the financial benefits of investing in health informatics. This research shows the importance of dedicating resources to the development of electronic health records. There has been research involving extensions to EHR, such as the development of Virtual Medical Records (vMR) (Johnson et al. 2001). These are an abstraction of conventional medical records; stripped down to things necessary for modelling guidelines and protocols. This ongoing interest in digital health information stresses how prevalent it is becoming, and how we must use it appropriately. Replicating health records on computer systems presents many legal issues regarding privacy and ownership of information, as highlighted by Friedman (2006) and Hodge, Gostin & Jacobson (1999). Debate continues to occur regarding the use of EHR for research, and this is the driving factor behind the need to de-identify data before analysing it, to avoid any privacy issues. 3 Methodology In this minor thesis we will propose a system to pre-process the data, extract prescription and therapeutic pathways from patient data, and then execute the sequential pattern mining on the prescription information. Finally we will reflect on and interpret the results. A process flow that outlines the running of the program can be seen in Figure 4. Figure 4: Program process flow This section will explain the nature of the raw data, the methods used to pre-process the data, as well as the application of the sequential pattern mining. We will also summarise how we will evaluate the results, and give some expected outcomes. 8 3.1 Raw data It is important to talk about the nature of the raw data we will be dealing with before explaining our methods. Each tuple of the dataset will contain the information for one single drug prescription. Cumulative prescriptions for single patients will be spread over multiple tuples. We expect each tuple will have the following attributes: Attribute patient_pkey filenumber provider_pkey date script_number drug_name dosage dose repeats packsize quantity form formulary druggen generic pbs use Meaning A code unique to each patient A number relating to the General Practice records A code unique to each prescription provider The date the prescription was given Irrelevant The name of the drug prescribed in plain text How often to take the doses How much of the drug to take at one time How many times to repeat the dosage How many dosages in a pack How many packs given to the patient The form of the prescribed drug Irrelevant Irrelevant The condition the prescribed drug is treating The only attributes relevant to this research are the date, drug_name, dosage, dose, repeats, as well as the patient_key. The patient_key is needed to link prescriptions related to the same patient. In no way this information can lead to re-identification of the person from the data we are using for this research. 3.2 Pre-processing The data we will be working with requires some cleaning and processing before it is suitable for sequential path mining. We intend to modify the data to make it suitable, without jeopardising the integrity or value of it. The first issue we have to address is the possibility of missing values. The issue of missing values exists in any dataset, despite the prescription data being the most reliable available (Hassey, Gerrett & Wilson 2001). We will disregard any patients with missing values in the key attributes identified above in any of their tuples. The next step will be to replace the drug names with their respective ATC codes. We can do this by creating a global in Cache that maps all drug names to their respective codes. This can be generated by utilising the World Health Organisation Collaborating Centre (WHOCC) online ATC classification index. The WHOCC offers a service where a drug name can be entered, and the respective ATC code is received. We can use this to automatically map the drug names to ATC codes to create our knowledge base. 9 Figure 5: using the WHOCC online ATC index At this point we will also address one of the shortcomings of the ATC classification for this minor thesis – by re-coding combination drugs. The ATC system assigns unique codes to certain combination drugs. For our research, we would prefer if these combination drugs were represented by the codes for each of their components – as if multiple single drugs were prescribed. We can do this by simply modifying our global to reflect these changes. The next step of the pre-processing is to unify the forms of the dosage, dose and repeats. Different GPs have the tendency to record dosage information differently. Different methods of recording such as ‘1 daily’, ‘1tds’, ‘1 n’, etc need to be recoded to ‘1’. We can do this in our code; by checking for all the different forms of dosages and replacing them with their respective simpler representation. Figure 6: re-coding dosage information Finally we must generate a prescription pathway and a therapeutic pathway for each patient. The prescription pathway can be constructed by linking all sequential patient prescriptions. The therapeutic pathway requires knowledge of dosages and pack sizes, to accurately portray which drugs were being taken in combination at any given time. This can be realised by creating hybrid events for multiple drugs being taken at the one time, for example “J01CE03 + J01CE04”. The prescription pathway and therapeutic pathway will be associated with the respective patient records. Any ambiguities and unresolved cases will need to be resolved manually. At this point the preprocessing has been completed, and we will have a prescription pathway and a therapeutic pathway for each patient to perform sequential pattern mining on. 3.3 Sequential Pattern Mining We will be performing the sequential pattern mining based on the ATC drug codes. We will run a series of experiments to search for patterns in the given dataset. We will use different levels of granularity of the underlying data (using the ATC ontology to generate the experiments). We start 10 with the level five flat data, then go up and generate experiments by recoding individual members of the pathways into higher levels of the ontology. Figure 7: example of preparing the pathways for the sequential pattern mining The tests are likely to take a long time, so the program will be designed to run in batch mode. In all these experiments we will observe the strength of patterns found, and analyse them at the end of the process. 3.4 Results analysis Once we have the results of the pattern mining, we will reflect on the strength of the rules, and also on their interestingness. The strength will purely be a measure of how supported the patterns are in the dataset. The interestingness will be a human judgement as to how usable and apparent the information in the pattern is. For example, a pattern that shows a clear link between two seemingly unrelated groups would be very interesting. A pattern that is contaminated by drugs unrelated to the actual prescription will be uninteresting. Similarly a pattern that has been too diluted to be interpreted efficiently will be uninteresting. Ideally we would conduct a user survey, and ask industry professionals and general practitioners to reflect on the interestingness and usefulness of the patterns. However, due to time constraints and ethical considerations, this will most likely not be possible, and instead our personally professional opinions will be used. 3.5 Expected Outcomes If the experiments are successful, we expect a number of outcomes. We expect that as we move up the ontology, the strength of the patterns will at least be consistent, but is likely to also increase. Logically, if there is a pattern at the fifth level, then it will also be present at the fourth level, because the fourth level encapsulates the fifth level. For example, if we find the pattern “A00AA00 -> B000BB00”; then “A00AA -> B000BB” will occur at least as frequently, so it will also be a pattern. The patterns also have the potential to strengthen as we travel up the ontology, as the number of variables decreases. For example, if we have a prescription pathway of “A00AA00 -> B000BB00 -> A00AA01 -> B000BB02”, the base-level pattern mining will not find any patterns, but at the fourth level we see a pattern emerge: “A00AA -> B000BB -> A00AA -> B000BB”. This increase in the 11 quantity and strength of patterns does not necessarily mean better patterns. We may have more support for the patterns as we travel up the ontology, but they will be more generalised. This may be useful, but it may also be detrimental to the interpretation of the results if the path is too contaminated. This will be reflected in the interestingness. From this research we hope to learn more about how the use of an ontology can impact the use of sequential pattern mining. We expect that more patterns will be discoverable, and certainly more useful patterns will emerge, but it is also likely that some quite trivial and contaminated patterns will appear, which will need to be addressed in the interestingness of the patterns. 4 Ethical Considerations This research is based on real prescription data from general practices. The data used for this research was used in similar previous studies and was properly de-identified (i.e. no identifiers or other data related to the identity of the patient is contained in the data). We will seek ethics approval to re-use the dataset for the current study. The data that is collected will need to be stored for seven years after the research is concluded, due to standard university procedures. An ethics approval form has been submitted to ensure to upmost adherence to ethical research policies, which is included in Appendix A. 5 Bibliography Agrawal, R & Srikant, R 1995, 'Mining Sequential Patterns', paper presented at the Eleventh International Conference on Data Engineering. Bei, A, Luca, SD, Ruscitti, G & Salamon, D 2005, 'Health-Mining: a Disease Management Support Service based on Data Mining and Rule Extraction', paper presented at the Engineering in Medicine and Biology Society, 2005. IEEE-EMBS 2005. 27th Annual International Conference of the. Chen, M, Han, J & Yu, P 1996, 'Data mining: An overview from a database perspective', IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 6, pp. 866-883. Chen, M, Park, J & Yu, P 1998, 'Efficient data mining for path traversal patterns', IEEE Transactions on Knowledge and Data Engineering, vol. 10, no. 2, pp. 209-221. Dowsey, M, Kilgour, M, Santamaria, N & Choong, P 1999, 'Clinical pathways in hip and knee arthroplasty: a prospective randomised controlled study', Medical Journal of Australia, vol. 170, pp. 59-61. Friede, A, Blum, H & McDonald, M 1995, 'Public health informatics: how information-age technology can strengthen public health', Annual Review of Public Health, vol. 16, no. 1, pp. 239-252. 12 Friedman, D 2006, 'HIPAA and research: how have the first two years gone?', American journal of ophthalmology, vol. 141, no. 3, p. 543. Garofalakis, M, Rastogi, R & Shim, K 1999, 'SPIRIT: Sequential pattern mining with regular expression constraints', paper presented at the 25th International Conference on Very Large Data Bases Gruber, T 1993, 'A translation approach to portable ontology specifications', Knowledge acquisition, vol. 5, pp. 199-199. Han, J, Dong, G & Yin, Y 1999, 'Efficient mining of partial periodic patterns in time seriesdatabase', paper presented at the International Conference on Data Engineering 1999. Hassey, A, Gerrett, D & Wilson, A 2001, 'A survey of validity and utility of electronic patient records in a general practice', British Medical Journal, vol. 322, no. 7299, p. 1401. Hillestad, R, Bigelow, J, Bower, A, Girosi, F, Meili, R, Scoville, R & Taylor, R 2005, 'Can electronic medical record systems transform health care? Potential health benefits, savings, and costs', Health Affairs, vol. 24, no. 5, p. 1103. Hodge, J, Gostin, L & Jacobson, P 1999, 'Legal Issues Concerning Electronic Health Information: Privacy, Quality, and Liability', JAMA, vol. 282, no. 15, pp. 1466-1471. Johnson, P, Tu, S, Musen, M & Purves, I 2001, 'A virtual medical record for guideline-based decision support', paper presented at the AMIA Annual Symposium 2001. LeniC, M & Kokol, P 2002, 'Combining classifiers with multimethod approach', Soft computing systems: design, management and applications, p. 374. Lin, F, Chiu, C & Wu, S 2002, 'Using Bayesian networks for discovering temporal-state transition patterns in Hemodialysis', paper presented at the 35th Annual Hawaii International Conference on System Sciences. Lin, F, Chou, S, Pan, S & Chen, Y 2001, 'Mining time dependency patterns in clinical pathways', International Journal of Medical Informatics, vol. 62, pp. 11-25. Lin, F, Hsieh, L & Pan, S 2005, 'Learning Clinical Pathway Patterns by Hidden Markov Model', paper presented at the 38th Annual Hawaii International Conference on System Sciences McInnes, D, Saltman, D & Kidd, M 2006, 'General practitioners' use of computers for prescribing and electronic health records: results from a national survey', Medical Journal of Australia, vol. 185, no. 2, p. 88. Noy, N & McGuinness, D 2001, Ontology development 101: A guide to creating your first ontology, Citeseer. Parthasarathy, S, Zaki, M, Ogihara, M & Dwarkadas, S 1999, 'Incremental and interactive sequence mining', paper presented at the eighth international conference on Information and knowledge management 13 Pinto, H, Han, J, Pei, J, Wang, K, Chen, Q & Dayal, U 2001, 'Multi-dimensional sequential pattern mining', paper presented at the tenth international conference on Information and knowledge management Rector, A, Zanstra, P, Solomon, W, Rogers, J, Baud, R, Ceusters, W, Claassen, W, Kirby, J, Rodrigues, J & Mori, A 1998, 'Reconciling users’ needs and formal requirements: issues in developing a reusable ontology for medicine', IEEE Transactions on Information Technology in BioMedicine, vol. 2, no. 4, p. 229. Riou, C, Pouliquen, B & Beeux, PL 1999, 'A computer-assisted drug prescription system: the model and its implementation in the ATM knowledge base', Meth Inform Med, vol. 38, pp. 25-30. Solomon, W, Wroe, C, Rector, A, Rogers, J, Fistein, J & Johnson, P 1999, 'A reference terminology for drugs', paper presented at the AMIA Annual Symposium 1999. Stanek, J, Iankov, I, Gadzhanova, S, Warren, J & Misan, G 2005, 'Guideline-based General Practice Data Mining', HIC 2005 and HINZ 2005: Proceedings, p. 254. WHO 2010, 'World Health Organisation Collaborating Centre for Drug Statistics Methodology', http://www.whocc.no/. Wroe, C, Solomon, W, Rector, A & Rogers, J 2000, 'DOPAMINE: a tool for visualizing clinical properties of generic drugs', paper presented at the 14th European Conference on Articial Intelligence. Zhang, M, Kao, B, Yip, C & Cheung, D 2001, 'A GSP-based efficient algorithm for mining frequent sequences', paper presented at the International Conference on Artificial Intelligence 2001. Zhao, Q & Bhowmick, S 2003, 'Sequential pattern mining: A survey', ITechnical Report CAIS Nayang Technological University Singapore, pp. 1–26. 14 6 Project Plan Date 5th March 2010 12th March 2010 th 13 March – 15th March 2010 16th March - 31st March 2010 1st April – 14th April 2010 15th April 2010 16th April – 16th May 2010 17th May – 1st June 2010 3rd June – 8th June 11th June 2010 13th June 2010 14th June - 25th July 2010 26th July – 1st August 2010 2nd August – 22nd August 2010 23rd August – 30th September 2010 10th October 2010 24th October 2010 22nd November 2010 29th November 2010 Task Chose Supervisor Decide on field of thesis Develop Project Plan Research topic Write annotated bibliography Submit Annotated Bibliography Finalise Research Question Write Minor Thesis introduction, literature review Work on ethics proposal for obtaining data Write Minor Thesis ethical considerations Prepare presentation slides Write Minor Thesis methodology Finalise presentation slides Minor Thesis Proposal presentation Submit Minor Thesis Proposal Submit Ethics Proposal Form Familiarising with Cache programming suite Program Pre-processing, pathway output Program testing Implementing Sequential Pattern Mining methods Evaluating Sequential Pattern Mining results Write Minor Thesis results and discussion Minor Thesis draft to supervisor Submit Minor Thesis Comments for corrections received, adjust Minor Thesis Submit Final bound copies 15 Appendix A – Ethics Approval Application University of South Australia Human Ethics Application Protocol Number : 0000020574 Application Title : Impact of applying Hierarchical Structure to typically Flat Data for path mining prescription pathways Date of Submission : N/A Primary Investigator : Mr Matheson Lee Ramsey Prior Assessment Non-UniSA HREC UniSA HREC Project details Research Ethics Advisor Project type 1.1 Has another Human Research Ethics Committee (other than UniSA) reviewed this research project before and does this clearance/approval accurately describe the project as it is to be conducted?* Yes No 2.1 Is this application a resubmission of an application that was considered by UniSA HREC and the decision was 'Not Approved: Resubmit', 'Not Approved' or "Approved subject to" and the status has expired (ie amendments were not made within the 6 month timeframe. Please note if your application is Approved subject to and 6 months has not lapsed then you should use the application you submitted to make the required changes. * Yes: Not approved: resubmit Yes: Not Approved Yes: Approved subject to and the status has expired No 3.1 Name of Research Ethics Adviser This question is not answered. 3.2 Has the Research Ethics Adviser conducted an ethics workshop in the last 12 months?* No 3.3 Have you attended human ethics training in the last 12 months?* Yes No 4.1 Main type of research (e.g. staff, PhD). * Honours Course Approval PhD Masters by Course work Masters by Research Professional Doctorate Undergraduate Graduate Diploma / Graduate Certificate Staff Other 4.2.1 Please note that, if you are a student applicant, your application will be forwarded to your principal supervisor once submitted for their approval. If they are satisfied with your application it will be forwarded to the relevant review group. If your supervisor requires changes to be made then your application will be returned to you to make the required changes. 4.3 Other type of research (e.g. staff, PhD). Please select all that apply* None Honours Course Approval PhD Masters by Course work Masters by Research Professional Doctorate Undergraduate Graduate Diploma / Graduate Certificate Staff Other 16 Project details Resources Project funding Ownership of Data 5.1 Title of research project* Impact of applying Hierarchical Structure to typically Flat Data for path mining prescription pathways 5.2 Plain English title* Testing what kind of impact on path mining success of applying a hierarchical structure to data that is normally not stored in a hierarchy, such as prescription data 5.3 What are the aims of your research* -evaluate the impact and usefulness of using a hierarchy to store typically flat data -evaluate what levels in the hierarchy produce the strongest and most useful paths 5.4 List your research questions or hypotheses. Your protocol should clearly identify the questions which you want your research to answer.* What is the impact on path mining of applying a hierarchy structure to typically flat data? 5.5 Explain the need for, and value of, your research. Place the aims in the context of existing research or practice. (You must include a list of not more than 10 key references as an attachment to support your answer to this question. These are to be attached to the Attachment tab available from the Application Overview screen).* The research presents an opportunity to explore and understand how vague information can be before it is no longer useful (the higher up a hierarchy the more vague the information is, as it encompasses more elements). It also presents an opportunity to discover the impact of using a hierarchy to elaborate typically flat data. If a mark improvement is found, this could lead to the adoption of hierarchical structures for other applications, which could lead to increased running speeds or more accurate path mining, depending on the domain. There is a need to explore the impact of using a hierarchy as some flat data (such as prescription drug information) is too vast to perform complete path mining in feasible time frames. 5.6 Proposed commencement date* 05/07/2010 5.7 Proposed completion date* 01/11/2010 6.1 Have you applied for funding for this project (other than divisional funds)* Yes No 8.1 Detail who will own the data and the results of your research (student researchers normally own their own research and data unless there is a written agreement between the student and the University / third party; staff research and data is normally owned by UniSA). Please select all that apply.* UniSA Student researcher Other 8.2 Does the owner of the information or any other party have any right to impose limitations or conditions on the publication of the results of this project?* Yes No 8.3 Please note that it is the researcher's responsibility to ensure that, where required, an appropriate agreement is in place. If you are unsure whether this is needed, please consult the UniSA website . Do you require an agreement regarding ownership or do you currently have an agreement in place?* An agreement is required A signed agreement is in place An agreement is not required Please note that you must inform UniSA HREC once the agreement has been signed. Final ethics approval cannot be given until confirmation is received. 9.1 The information which will be stored at the completion of this project is of the following type(s). Please select all that apply.* Individually identifiable Re-identifiable Non-identifiable 9.2 Where will the data be stored (please be specific with the address e.g. If stored at UniSA please specify which campus and the office/room location)* the data will be stored at the Mawson Lakes campus of UniSA in D2-03 9.3 For how long will the information be stored after the completion of the project? Why has this period been chosen?* 5 years - to ensure any queries after the completion of the project can be answered, and to quell any later accusations of copying. 9.4 In what formats will the information be stored during the research project? (eg. paper copy, computer file on floppy disk or CD, audio tape, USB memory stick, videotape, film). * computer file 17 9.5 How will information, in all forms, be disposed after the retention time has lapsed? (Please refer to the Ownership and Retention of Data Policy. The Head of School (or equivalent) must be aware of this process.* deletion of computer file and any backups (on the single same machine) 9.6 Will any other individual(s), organisation(s) or researcher(s) (other than those listed on the Investigators tab) have authority to use or have access to the information? * Yes No 9.7 Specify the measures to be taken to ensure the security of information from misuse, loss, or unauthorised access while stored during the research project? (eg. will identifiers be removed and at what stage? Will the information be physically stored in a locked cabinet?)* the data will be immediately de-identified as this identifying information is of no use to the study. The data will be stored on a laptop computer using a strong user password to protect from misuse. 9.8 What arrangements are in place with regard to the storage of the information collected for, used in, or generated by this project in the event that the principal researcher / investigator ceases to be engaged at the current organisation? (Please refer to the Ownership and Retention of Data Policy.* If the principal researcher ceases to be engaged in the study the data will become the responsibility of the supervisor Jan Stanek 10.1 Please refer to the UniSA website : Do you require insurance cover for this project"* Yes No 11.1 Is the activity archival research? A large proportion of activity involving the analysis of documents, publicly available information, or previously collected data may be outside the scope of the University's human research ethics arrangements.* Yes No 11.2 Is the work being conducted only for UniSA administrative / service delivery purposes?* Yes No Scope Scope Research type and participants Research type Participant information 12.1 Should the work be characterised as quality assurance or an audit, rather than human research within the scope of the University's human research ethics arrangements?* Yes No 12.2 Is the work a practical exercise or test conducted for teaching purposes in a University administered facility? ( Please refer to Appendix 2 of Guidelines for Evaluation Activities Involving UniSA Students and Staff) * Yes No 13.1 Is the work a routine experiment or procedure conducted for teaching purposes in a University administered facility? * Yes No 13.2 Is the work / data collection conducted by a student only for teaching / learning purposes? * Yes No 13.2.1 Will the results be published / presented in any way other than a paper / product produced purely for assessment purposes ?* Yes No 14.1 This project involves: (Please select all that apply.)* Research using qualitative methods Research using quantitative methods, population level data or databanks, e.g survey research, epidemiological research None of the above 14.2 What research methodologies will you use? (Please select all that apply.) * Anonymous questionnaires Internet questionnaires Questionnaires requesting intimate personal, identifying, or sensitive information Other questionnaires Face to face interviews which do not request personal or sensitive information Face to face interviews which request personal or sensitive information Telephone survey which does not request personal or sensitive information Telephone survey which requests personal or sensitive information Focus groups Action Research Observation of participant's usual activities Observation of an activity set up for the purposes of the study Access to medical records (or records which contain intimate personal information, and are individually identifiable and are not publicly available) Experiment or testing of a procedure, drug or equipment Use of biological hazards, GMOs or pathogenic organisms Use of carcinogenic and/or toxic chemicals, including heavy metals Use of Radiation (Ionising and/or Non-ionising, but not Ultrasound) Other 18 14.2.1 Please describe what research methodology you will use.* none of these methodologies apply. We are only interested in obtaining the de-identified data. 14.3 Will you be audio-taping, video-taping, or taking photographs of participants during the course of the study? Please select all that apply.* Audio-taping Videotaping Photographs No Selection of participants Project start, end, location details Irregular consent process Limited disclosure / waive consent Covert observations 15.1 How many participant groups are involved in this research project? * 0 15.3 What is the expected total number of participants in this project at all sites?* 0 16.1 What process(es) will be used to identify potential participants?* there are no participants 16.2 Will potential participants be 'screened' or given a test/questionnaire to assess their suitability as a participant for the study?* Yes No 16.3 Describe how initial contact will be made with potential participants.* No contact 16.4 Is an advertisement, e-mail, website, letter or telephone call proposed as the form of initial contact with potential participants?* Yes No 16.5 List the selection and, if appropriate to your study, the exclusion criteria for participants.* there are no participants 16.6 If it became known that a person or participant group was recruited to, participated in, or was excluded from the research, would that knowledge expose the person to any disadvantage or risk?* Yes No Not Applicable 17.1 Will the research be undertaken in Australia?* Yes No 17.1.1 In which town(s)/city(ies)/State(s) of Australia will the research be undertaken in? * Adelaide, South Australia 17.1.2 In how many Australian organisations will the research be conducted? * 0 17.2 Will the research be undertaken overseas?* Yes No 17.3 Are there any time-critical aspects of the research project of which the review committee should be aware?* Yes No 18.1 Does the research involve limited disclosure to participants. Please refer to the National Statement. * Yes No 18.2 Are you asking the HREC / review body to waive the requirement of consent? Refer to the National Statement* Yes No 19.1 Does the research involve covert observation? Refer to the National Statement* Yes No Deception Project type Project type Participants Recruitment Risk to Participants Risk to participants Right to Privacy 20.1 Does the research involve deception. Refer to the National Statement* Yes No 21.1 Does the research involve any of the following? Please select all that apply.* Drugs, narcotics, poisons, placebo will be ingested / injected, or an invasive procedure will be administered Clinical trials Cellular therapy The collection and / or use of human samples. This includes tissue, blood or other body fluid collection / extraction Genetic testing and/or genetic research Human gametes or use or creation of human embryos A practice or intervention which is an alternative to a standard practice or intervention Investigating workplace practices which could possibly impact on workplace relationships 19 Conducting the research overseas and recruiting participants None of the above 38.1 Who will you be recruiting as participants for this study? (If there is a high chance that you will be recruiting one of these groups, you should also select that participant group).* General public (over 18 years of age) Members of a collectivity People whose first language is not English People who are illiterate Pregnant women/human foetus Children People who are in a dependent or unequal relationship People who are highly dependent on medical care People with a cognitive impairment Aboriginal and/or Torres Strait Islander peoples People who may be involved in illegal activity Not recruiting participants Other 38.2 Does the research involve issues likely to be considered significant to Indigenous peoples?* Yes No Not Applicable 51.1 Please select all that apply. This research project:* Has the potential to expose participants to potential civil, criminal or other proceedings Makes it possible for third parties to identify participants Involves a risk of physical injury Involves human exposure to ionising and/or non-ionising radiation (including X-ray) Involves exposure to disease or infection Involves pain or significant discomfort Involves psychological or emotional stress Involves sensitive personal information Could expose participants to potential loss of professional reputation, market standing, or employability Could result in significant negative impact upon personal relations Offers an inducement which could be considered coercive Involves the participation of people who legally cannot provide voluntary and informed consent for their participation in research None of the above Collection method Collection method Participants Relationships Consent Consent process 66.1 Does IS42 or the Commonwealth Privacy Act apply to the research (eg access to identified personal data held by third parties subject to privacy regimes)? Refer to the Privacy law* Yes No 67.1 Will the source of the information about participants used in this research project be collected directly from the participant? (e.g. asking participants directly about their medical history)* Yes No 67.2 Will the source of the information about participants used in this research project be collected from another person about the participant? (e.g. asking participants' doctors about their patients medical history)* Yes No 67.3 Will the source of the information about participants involve the use or disclosure of information by an agency, authority or organization (other than UniSA)? (e.g. accessing participants' medical records)* Yes No 67.4 Will the source of the information about participants involve the use of information which you or your organisation Collected previously for a purpose other than this research project?* Yes No 67.5 Describe how information collected about participants will be used in this project.* data will be de-identified immediately as the identifying information poses no use for the study. 67.6 Indicate whichever of the following applies to this project: Please select all that apply.* Information collected for, used in, or generated by, this project will not be used for any other purpose. Information collected for, used in, or generated by, this project will/may be used for another purpose by the researcher for which ethical approval will be sought. Information collected for, used in, or generated by, this project is intended to be used for establishing a database/data collection/register for future use by the researcher for which ethical approval will be sought. Information collected for, used in, or generated by, this project will/may be made available to a third party for a subsequent use or which ethical approval will be sought Other 20 68.1 Is there an existing relationship or one likely to arise during the research, between the potential participants and any member of the research team or an organisation involved in the research?* Yes No 68.2 Does the researcher / investigator have another role in relation to the participant?* Yes No 68.3 Will the research impact upon, or change, an existing relationship between participants and researcher / investigator or organisations.?* Yes No 69.1 Will consent for participation in this research be sought from all participants? Refer to the National Statement* Yes No 69.1.1 Explain why consent will not be sought from all participants.* there are no participants. 70.1 Describe the consent process, ie how participants or those deciding for them will be informed about, and choose whether or not to participate in, the project.* no participants Risks and benefits Risks and benefits Risks and benefits cont. Researcher training 70.2 If a participant or person on behalf of a participant chooses not to participate, are there specific consequences of which they should be made aware, prior to making this decision?* Yes No 70.3 If a participant or person on behalf of a participant chooses to withdraw from the research, are there specific consequences of which they should be made aware, prior to giving consent?* Yes No 70.4 Can individual participants be identifiable by other members of their group? (e.g. co-workers, focus group members etc.)* Yes No 70.7 Will consent be specific or extended or unspecified? Refer to section 2.2.14-2.2.18 of the National Statement* Specific Extended Unspecific Please note that when answering the following questions, only risks beyond those encountered in everyday life are relevant. Refer to the National Statement 71.1 Are there any risks to participants as a result of participation in this research project (eg physical, psychological, spiritual, emotional, legal, social, financial well-being, employability or professional relationships)?* Yes No 71.2 What expected benefits (if any) will this research have for the wider community?* -provide an insight into the requirement for specificity of useful informational - IE, in a hierarchical structure, how far up the hierarchy can we go before paths that are mined are too vague? -allow for future work which can autonomously detect the most useful level of information for a specific purpose. This could result in more accurate prescriptions, and more successful treatments. 71.3 What expected benefits (if any) will this research have for participants?* data being de-identified means there will be no personal benefit 71.4 Are there any other risks involved in this research? eg. to the research team, the organisation, others (eg physical, psychological, spiritual, emotional, legal, social, financial well-being, employability or professional relationships)* Yes No 72.1 Is it anticipated that the research will lead to commercial benefit for the investigator(s) and or the research sponsor(s)?* Yes No 72.2 Is there a risk that the dissemination of results could cause harm of any kind to individual participants - whether their physical, psychological, spiritual, emotional, legal, social or financial well-being, or to their employability or professional relationships - or to their communities?* Yes No 72.3 Describe how the researchers / investigators intend to monitor the conduct and progress of the research project?* -data will be de-identified at first opportunity. There will be no opportunities for misconduct as long as the data is successfully de-identified. 72.4 It is mandatory for researchers to report suspected cases of child abuse/neglect, domestic violence, bullying, illegal activities, use of illicit substances, abuse of elderly persons, professional negligence etc. 72.4.1 Is it likely that this will be disclosed during the course? * Yes No 73.1 List the relevant qualifications, experiences and /or skills of the research team which equip them to conduct this research* 3 years study at UniSA learning ethical conduct 21 Reporting of results Reporting of results cont. Peer review Declaration Minor experience in health informatics with research placement and ongoing work with minor thesis 73.2 Do the researchers involved in this research project require any additional training in order to undertake this research?* Yes No 74.1 Is it intended that results of the research that relate to a specific participant be reported to that participant?* Yes No Not Applicable 74.2 Is the research likely to produce information of personal significance to individual participants?* Yes No 74.3 Will individual participant's results be recorded with their personal records?* Yes No Not Applicable 74.4 Is it intended that all or some of the results that relate to a specific participant be reported to anyone other than that participant?* Yes No 74.5 Will research participants have the opportunity to receive a copy of your final report or summary of the findings if they wish?* Yes No 74.5.2 Why will participants not be provided with a copy of the final report or summary of the findings?* there are no participants 75.1 Is the research likely to reveal a significant risk to the health or well being of persons other than the participant (eg family members, colleagues)?* Yes No 75.2 Is there a risk that the dissemination of results could cause harm of any kind to individual participants - whether their physical, psychological, spiritual, emotional, social or financial well-being, or to their employability or professional relationships - or to their communities?* Yes No 75.3 How is it intended to disseminate the results of the research? Please select all that apply.* Thesis/dissertation Journal article/s Research paper Conference presentation Commissioned report Other 75.4 Will the confidentiality of participants and their data be protected in the dissemination of research results?* Yes No Not Applicable 75.4.1 Explain how confidentiality of participants and their data will be protected in the dissemination of research results* de identified data will be used, and so no confidential information will be revealed in the dissemination 76.1 Provide details of the anticipated duration of the data collection / human research phase of the project.* simple obtain some data from previous researcher and/or database - collection should take no longer than 1-2 days 76.2 Has the research proposal, including design, methodology and evaluation undergone, or will it undergo, a peer review process?* Yes No Declaration The Primary Contact for this project is responsible for the application that is submitted and must be the one to agree to the following statement. "On behalf of the research team for this project, I confirm that all members of the research have read the current NHMRC National Statement on Ethical Conduct in Human Research. The research team accepts responsibility for the ethical and appropriate conduct of the procedures detailed in this application, confirm that the research team will conduct this project in accordance with the principles described in the National Statement, and confirm that the research team will comply with any other condition laid down by the University of South Australia's Human Research Ethics Committee."* I agree 22