Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Using Online Mining Techniques to inform Formative Evaluations: An Analysis of YouTube Video Comments about Chronic Pain Corresponding author: Aude Bicquelet Research Director NatCen Social Research 35 Northampton Square London EC1V 0AX Tel: 0207 549 7023 e-mail: [email protected] Biography: Dr. Aude Bicquelet is a Research Director at NatCen Social Research. Her research interests lie at the intersection of Mixed-Methods, Evidence-based and Public Health Policy. An important part of her work revolves around the use and application of Text-Mining techniques to analyse large corpora – i.e. public consultations; open-ended responses and Social Media/Web data. Abstract Despite the growing body of research analysing information posted on social media, very few studies have focused on how ‘naturally occurring data’ could inform formative evaluations in health research. This paper argues that exploratory data mining techniques such as descending hierarchical classification, cluster and correspondence analysis could usefully be employed either as stand-alone or mixed methods in the design of needs assessments on health-related issues. To this end, the paper reports on the application of text mining techniques to analyse YouTube video comments on chronic pain. Key words: Text-Mining; Social Media; Formative Evaluation; Health; Needs assessment. 1. Introduction Formative evaluations (FE) are a key component of health research (Bauman and Nutbeam 2013; Patton 2008, 2012). Among the various types of formative evaluations, needs assessments are particularly important because they help identify who might need a particular program, how great the need is and the extent to which this need might be met (Pawson and Tilley 1997). In order to understand the needs and the procedures required to improve a particular treatment and/or to innovate in an area where medical and psychological supports have been unsuccessful, inputs from patients, their families and carers are critical (see Harrison et al. 1997; Weale 2007; Rowe and Shepherd 2002). This chimes with the National Health Service’s stance and the guidelines for regulatory agencies such as the National Institute for Health and Care Excellence and the Care Quality Commission, which stress the importance of fostering public engagement in health care (i.e. see 1992 White Paper Local Voices and the NHS and Community Care Act of 1990). Traditional techniques for involving stakeholders in FE, needs assessments and/or decision-making on health-related issues include (a) elicitation techniques (typically used in small-n studies) and (b) surveys and consultations (typically used in large-n studies). Elicitation techniques such as interviews and focus groups have the advantage of obtaining ‘thick’ or ‘rich’ data, offering a window on patients’ and/or carers’ day-to-day lives and experiences. Yet, they usually include a relatively small sample of a population. Consequently, they risk excluding important perspectives from the analysis. Moreover, they potentially generate long and diverse answers difficult to analyse with traditional approaches to qualitative data analysis, such as thematic and/or discourse analysis (see Pope et al. 2000). By contrast, surveys and large-scale consultations offer the possibility of analysing a larger sample and can potentially reduce or prevent geographical dependence (when conducted remotely). But given the number of answers requiring analysis, inputs must be guided or restricted in questionnaires. Hence, they are unlikely to capture ‘thick data’ or precise accounts. Other disadvantages traditionally associated with surveys and consultations include (a) low response rates and (b) the difficulty to capture information on sensitive issues (see Bishop and Davis 2002; Presser et al. 2004). Text mining techniques (TMTs) are an alternative third way of gathering and looking at data about patients’ experiences and their needs. Although established TMTs such as descending hierarchical classification, cluster analysis and correspondence analysis have been usefully employed in the analysis of large-scale consultations on health-related issues (see Bicquelet and Weale 2011), they have rarely been applied to the analysis of online information. This paper argues that TMTs could prove valuable in assisting the development of formative evaluations by summarising the vast amount of ‘naturally occurring’ data contained within Internet health communities. To this end, the paper reports on the application of TMTs to analyse YouTube video comments on chronic pain. 2. Using Text Mining Techniques to analyse YouTube Video Comments on Chronic Pain Due to the rapidly growing availability of online documents and Internet content (and the emerging need for their quick and effective interpretation), the automated analysis of large textual corpora has received increasing attention in recent years (see in particular Cooley et al. 1997; Feldman & Sanger 2007; Bauer & Gaskell 2009). Descended from the older and established tradition of data mining, Web mining and text mining have mainly been employed in two separate strands of research. The first has been to discover and process data across thousands, and sometimes millions, of pages on the Web (see Lagus et al. 1999; Tanabe et al. 1999). The second has been used for more ‘local’ analyses of Internet sources, such as newsgroups, message boards or electronic brainstorming sessions (see for instance Tong and Yager 2006). Although definitions vary, text mining can be understood as the process of extracting information in large corpora to automatically identify patterns and relationships in textual data (Feldman and Sanger 2007: 17). Text mining is similar to reading in that both activities involve extracting meaning from strings of letters. However, the computational and statistical analysis of text differs from reading in two important respects. First, computer-enabled approaches can process and summarise far more text than any person has time to read. Second, such approaches are able to extract meaning from text that is missed by human readers who may overlook certain patterns because they do not conform to prior beliefs or expectations (see Bicquelet & Weale 2011). Text mining is a very inductive, exploratory approach to data analysis where a key element is the linking together of the extracted information to form new facts (or new hypotheses) to be explored further by more conventional means of experimentation (see Hearst 2003). Although widely applied in political science (Bailey 2008; Bara et al. 2007; Weale et al. 2012), marketing (Netzer et al. 2012; Linoff and Berry 2011; Glance et al. 2005) and the analysis of consumer behaviour (Archak et al. 2011), text mining has been less utilised to extract meaningful information from naturally occurring data on health-related issues. Case selection and sampling To demonstrate how TMTs can usefully be applied to the analysis of online data and how they can feedback into the design of formative evaluations, we analyse comments about ‘chronic pain’ posted under YouTube videos. A variety of mediums exist through which individuals can access and exchange health information online including websites, support groups, chat rooms and instant messaging. We focus on comments posted on YouTube specifically because, unlike other social media websites (such as Facebook or Twitter), little information is associated with the identity of online commentators, who are also informed that their comments will be made visible to a wide community of viewers. Several important studies have looked at the content of online comments on physical disabilities (Braithwaite et al. 1999), Huntington’s disease (Coulson et al. 2007), childhood cancer (Coulson & Greenwood 2012) and Alzheimers patients (Preece 2000). However, very little research has focused on comments spontaneously posted on social media by chronic pain patients (with the notable exception of Smedley et al. 2015). This is a significant absence especially because ‘talking’ about pain is often perceived as a challenge by patients, doctors and health providers or social care services seeking to enhance patients’ experiences and their wellbeing. Communicating Chronic Pain Chronic pain is a complex biopsychosocial phenomenon that is not easily communicated and difficult to measure objectively (see Jackson et al. 2005). Although there have been dramatic advances in our understanding and management of chronic pain over the last 20 years (see McCaffrey et al. 1997; Ward & Gordon 1996; McCracken & Eccleston 2003), it is often poorly recognized and inadequately assessed (Brennan & Cousins 2004; Foley 1997; Hill 1995; Rich 1997). For instance, comparisons of self-reports and the estimates of a person’s pain by healthcare professionals have revealed systematic underestimation leading, at times, to failure to deliver needed care (Kappesser et al. 2006; Prkachin et al. 2007). Standardised diagnostic tools such as the McGill Pain Questionnaire (Melzack 1975) attempt to measure pain by translating it into words that describe its sensory, spatial and evaluative properties, yet qualitative studies have shown that this terminology does not correspond to patients’ spontaneous descriptions (McCracken & Eccleston 2003). The current focus on intrapersonal features of pain – its biology (on the one hand) and suffering as a psychological experience (on the other hand) – is often considered inadequate to address the complex social nature of the phenomenon (see Blyth et al. 2007; Sullivan 2008). Hence the necessity to turn to patients’ experiences – expressed in their own words – to better understand their needs and inform programs. In order to select comments for analysis, we entered the key words ‘Chronic Pain Patient’ in the YouTube search engine. We used the word ‘patient’ to avoid generic medical videos providing explanations and/or descriptions of the causes and symptoms of chronic pain. The key term search returned 217,000 results. We selected only the most commented-upon videos. Table 1 below provides an overview of our units of sampling. At the top of the list, the most commented video included 145 comments for 45,863 views. Videos including less than 2 comments were not included in the analysis. Overall, we gathered 763 comments posted under 14 videos. The total number of words in our corpus of data was 59,322. [Table 1 about here] The Alceste Software To assist in the analysis of online comments on chronic pain, we use a computer-assisted content analysis package named Alceste1. This software was originally developed and applied to the study of the humanities (Reinert 1993). Its use has more recently spread to the social sciences (Lahlou 1996; Allum 1998) and has attracted researchers seeking to analyse political speeches (Schonhardt-Bailey 2005), parliamentary debates (Schonhardt-Bailey 2008; Bara, Weale & Bicquelet 2007), opinion polls (Brugidou 2003) and large public consultations (Bicquelet & Weale 2011). Alceste combines textual and statistical analyses. It relies upon co-occurrence analysis, which is the statistical analysis of frequent word pairs in a text or corpus (in this case, YouTube comments). Within this corpus, homogeneous subsets of words are identified on the basis of their lexical profile (Brugidou 2003: 418). As defined by Max Reinert, this ‘method can be used to determine the main word distribution pattern within a text or a discourse (a corpus). The technical procedure leads to selecting classes, each determined by a pool of words mathematically linked together and having the highest significant frequency of occurrence, i.e, those which the speakers Alceste stands for Analyse des Lexèmes Co-occurents dans les Énnoncés Simples d’un texte - Analysis of the co-occurring lexemes within the simple statements of a text. Its algorithm, based on Benzecri’s important contributions in textual statistics, was created by Max Reinert at the CNRS. 1 tended to use most repeatedly. These classes with their content and function words subscribe to different types of discourse with their specific vocabulary and syntax’ (cited in Brugidou 2003: 418). In other words, the software is designed to quantify the text in order to extract its most significant structures. Two major assumptions at the core of this method are, first, that the most significant structures are closely linked to the distribution of words in a text and, second, that repetition is the main factor for the stabilization (identification) of a discursive activity. What Alceste therefore carries out is a particular reading of the text that does not take into account the meaning of the words but focuses instead on their recurrence and co-occurrence in sentence segments. Steps of Analysis In its initial phase, the software breaks down the corpus into two kinds of context unit: initial context unit (ICU) and elemental context unit (ECU). ICUs are sampling units corresponding to the divisions of the text specified by the user, to which one or several variables can be assigned. In the analyses below, each comment constitutes an ICU. Each ICU has been coded with the label of the Video (V1, V2, Vn). At this stage, the software runs three successive processes: Word Recognition, Lemmatization, Syntax Categorization. This means that different forms of the same word (in the form of plurals, suffixes, etc.) are reduced to the root form and that irregular verbs are transformed to the indicative. The corpus is subdivided into ‘function words’ (articles, prepositions, conjunctions, pronouns and auxiliary verbs) and ‘content words’ (nouns, verbs, adjectives and adverbs). The corpus is then fragmented into ECUs. These are ‘gauged sentences that the program automatically constructs based on word length and punctuation in the text’ (Schonhardt-Bailey 2005: 705). ECUs are classified according to the distribution of their vocabulary. A data matrix is created allowing an analysis of statistical similarities and dissimilarities of words in order to identify repetitive language patterns. This classification proceeds by successively splitting the ECUs into classes on the basis of vocabulary oppositions: ‘the procedure searches for maximally separate patterns of co-occurrence between the words classes’ (Schonhardt-Bailey 2005: 710). Classes are constructed according to the lexical content of each ECU by grouping similar forms (or words) on the basis of X2 discriminating criteria. What is obtained through this decreasing hierarchical classification is a number of classes of words that should be representative of the main themes of the corpus analysed. 3. Results Figure 1 below shows that five classes of arguments have been automatically selected by the software. These classes represent the main themes, sometimes also called ‘dimensions’, raised in the comments (see Schonhardt-Bailey 2005). Figure 1 reveals that the largest class is class 2 (292 ECUs were grouped under this class because they contain similar key terms). With 105 ECUs, the smallest class is class 5. [Figure 1 about here] Figure 2 below is a cluster analysis displaying the most frequent key terms used by YouTube commentators gathered together under different classes. Those key terms have not been selected by the analyst; they have been automatically selected by the software based on their occurrence and co-occurrence. To interpret and give meaning to the classes identified by the algorithm, we look at the key terms and the sentence segments (ECUs) from which they were extracted. [Figure 2 about here] In class 1, YouTube commentators thank each other for sharing their experiences in the videos posted on the website. The emphasis is on tolerance and empathy for chronic pain sufferers. The ‘honest’ depiction of the accounts is found particularly valuable, while some stories are praised as being really ‘inspiring’. The most common key terms associated with this class include: thank, sharing, chronic, bless, inspiring and honest. One of the most representative sentence segments under class 1, is: ‘This is so very inspiring. Concrete proof that self-care and living through honesty and responsibility is the true way. Thank you Brendan for sharing your wisdom. I enjoyed your video Brendan. You share some very honest, simple truths that I feel so many people overlook2.’(ECU no. 160 X2 = 0.05). Class 2 illustrates how YouTube and other social media provide new avenues for communicating Key terms in bold are associated with the class discussed. i.e. ‘inspiring; self; living; honesty; responsibility; true way (etc...) are all associated with Class 1. 2 pain outside clinical contexts. This is where chronic pain sufferers express their frustration in their own words, such as: go, say, head, f*, thing, wonder. Typical sentence segments include: ‘It’s so fucking hard. I don’t know how to deal with the situation any more. I got my medical marijuana card. I had to try something. For two straight weeks for the first time in a decade I didn't suffer from peripheral neuropathy. Today is a bad day, but I'm not even mad.’ (ECU no. 15 X2 = 0.03) Class 3 deals with the symptoms and consequences of, and daily practices to cope with, chronic pain; including key words such as: sleep, school, bed, wake, sore, and headache and ECUs such as: ‘I am 16 right now and I had glandular fever when I was 15 which in the last year has led to chronic fatigue syndrome. I constantly feel sluggish, sick, massive headaches and body aches and just lack energy all round.’ (ECU no. 648 X2 = 0.05). Class 4 deals with the alternative medicines and the use of illegal drugs to cope with the syndromes of chronic pain. This is also the class where chronic pain sufferers discuss their encounters with clinicians and their sometimes conflicting relationships. The class comprises key terms such as: contract, drug, pharmacy, prescription, law and illegal. Typical sentence segments include: ‘I used to be 110% against marijuana. But after dealing with "medical professionals" who made it obvious they didn't want to help and a doctor who loved to under-medicate to keep the FDA off his back, I became a 110% supporter of medical marijuana. In fact I now also support legalizing MJ all together.’ (ECU no. 1128 X2 = 0.05). Finally in class 5, chronic pain sufferers discuss the risks associated with different types of medication. Particular concerns such as addiction and overdose are frequently mentioned along with increased risks of depression associated with some treatments against pain. Key terms in this class include: opiate, medication, overdose, narcotic, death and toxic. Typical ECUs include: ‘Short term they work great but you rapidly become tolerant and they stop working. You do not become tolerant to the side effects however. They suppress the immune system, increase depression, worsen sleep quality, and therefore pain, and even in the small group of patients who get long term pain relief they do not improve function.’ (EC no. 888 X2 = 0.04). The correspondence analysis result, depicted in Figure 3, is simply a projection of the key terms and classes in two-dimensional space. The clustering of pretty much all the classes around the centre of the axes indicates a large degree of convergence between the different topics discussed in different classes. In other words, even if the themes identified are different, there are shared patterns of argumentation between the different classes. [Figure 3 about here] 4. Discussion From a purely substantive perspective, our results are very much in line with studies outlining a comprehensive formulation of pain that is inclusive of multiple biological, psychological and social features (see, for example, Craig 2009). From a more methodological perspective, our results are in line with studies that have focused on the new forms of social communication of pain. Gonzalez-Polledo and Tarr (2014), for instance, have shown how traditional illness narratives are remade on social media in less traditional narrative structures. They argue that new forms of mediation and social media dynamics transform pain narratives, which, in turn, has implications for our understandings of the formats of pain communication. The remaking of narrative structures is evident in our class 1, where empathy, understanding and honesty stand out as cardinal values in the process of communicating chronic pain. People who might be quite cautious about sharing their experiences indicate on YouTube that there is a particular way in which they feel comfortable about sharing their stories. This finding could inform the design of a formative evaluation on chronic pain, emphasizing the need to be particularly careful and thoughtful about the way participants’ narratives are elicited. Doing so is all the more important as Morley et al. (2008) and Herbette and Rime (2004) have shown that chronic pain patients may feel inhibited from talking about their pain and frustration, because they want to be seen as functioning well or they anticipate misunderstanding and stigmatization. Our class 2 highlights another important dimension. It captures chronic pain sufferers’ need to express their experiences outside a clinical context. A formative evaluation wishing to take this need into account might consider, for instance, how non-traditional interventions (i.e. art-based interventions) could be used to channel frustration and give an outlet to chronic pain patients to express themselves. Studies of pain expression through photography, such as Padfield (2003, 2011), and of disability in the digital age (Ginsburg 2012) attest to how the creative process involved in communicating about pain through multiple media has the potential to transform the experience of pain by shifting its locus from within to outside the body. Other studies have also demonstrated that in the process of sharing pain experiences and meanings and of engaging in new dynamics that produce new caring and support relations, new forms of patient expertise emerge through communicating about chronic illness online (Ziebland 2004). Medical and humanities approaches from literature and philosophy have often overemphasised pain as an individual problem rather than situating it in a social context (see Craig 2009). However, there are now new possibilities for communicating pain beyond clinical contexts, which could be taken into account in a programmatic context. Class 3 also constitutes a particularly useful vein of data offering essentially descriptive findings of the most common symptoms experienced by chronic pain sufferers. As remarked elsewhere by Ernst and Parikka (2013) chronic pain expressions in social media are becoming a growing archive that can be accessed, analysed and re-analysed for different purposes. This archive conveys multiple experiences of what it means to live with pain and could be used as triangulation data in a formative evaluation to confirm (or not) results produced by elicitation techniques, survey answers and/or case studies. Robustness checks are often difficult to perform in a programmatic context because access to primary data might be difficult or because the timeframe of the study might not allow repeated trials. However, the use of TMTs to analyse naturally occurring data potentially offers new avenues to test the validity and reliability of results obtained via more traditional qualitative or quantitative techniques. Class 4 provides important information difficult to obtain via traditional elicitation techniques where social desirability, fear of judgement or stigma may bias responses. This is where chronic pain sufferers admit turning to alternative medicines, illegal drugs and alcohol to cope with chronic pain syndromes. Issues expressed here pertaining to pain management via illegal or dangerous means could inform not only the results of a formative evaluation but also topic guides or coding strategies of additional methods of data collection/analysis employed in the evaluation. They could also feedback into new recruitment strategies for case studies. The other important theme recurrent in this class (i.e. the often conflicting relationships between doctors and chronic pain sufferers) could be explored furthered via, for instance, openended questionnaires or semi-structured interviews. Abundant research on the chasm between patients and doctors in the clinical management of chronic pain (see Eccleston et al. 1997; Kugelmann 1999; Sullivan et al. 2006; McCrystal et al. 2011) highlights that pain communication in clinical contexts is often fraught by differences in expectations and outcomes. A formative evaluation dedicated to understand this chasm could usefully employ TMTs to crosscompare a variety of experiences and identify recurring issues. Finally in class 5, concerns raised by chronic pain sufferers regarding the risks associated with certain kinds of medication could inform interviews or focus group topic guides with patients. More importantly, perhaps, these results could be used in a formative evaluation to warn health professionals about the common fears experienced by patients about to commence a particular treatment. An important challenge identified in the literature on how health information is gathered and shared online is how health professionals themselves can respond to the more ‘Internet-informed’ patient (see McMullan 2006). Although acquiring and sharing information from online health communities can improve patients’ understanding of their condition and selfcare, they have often been implicated as culprits in the dissemination of misleading information because much of the guidance offered by support-group members is based on personal experience and often lacks professional expertise (Cotten & Gupta 2004; Cline & Haynes 2001). Commentators have also pointed out that patients who visit their doctors with inappropriate or misinterpreted information from the Internet will do little to enhance doctor–patient communication (see Ziebland 2004). But, communication could be improved simply if health professional themselves were better informed about the common fears and sometimes the common ‘myths’ disseminated on online health communities. This is an issue that a needs assessment using TMTs could certainly help address by providing a quick, yet systematic, analysis of the voluminous data readily available online. 5. Conclusion The Internet is now a heavily relied upon source of reference material for the public that transcends existing geographical and regulatory boundaries and where the distinction between ‘experience’ and ‘expertise’ is blurred. In many ways, it is an optimal way to gather and share health information. It affords individuals privacy, immediacy, convenience, anonymity and a variety of perspectives on the same topic. In addition, the cloak of confidentiality afforded by the anonymous nature of the Internet is advantageous in that it allows users to ask awkward, sensitive or detailed questions without the risk of facing judgment, scrutiny or stigma. Evaluators and researchers would do well to tap into the resources offered by the Internet, but of course the sheer amount of information posted on the Web requires appropriate methods of analysis. This is a methodological challenge that can be met with the use of text mining techniques. In comparison to traditional elicitation techniques and participant observation (i.e. small-n studies), TMTs offer the possibility to analyse a large sample of the population. They offer the advantage of not invading patients’ privacy’. In certain cases, they offer greater access to a population, and they considerably reduce practical issues associated with the organisation of interviews and focus groups. In comparison to surveys/ consultations (i.e. large-n studies), TMTs are able to capture a richer vein of data. They avoid traditional issues associated with the design of surveys and consultations questions – that is, the necessity to avoid ‘closed’, ‘leading’ and ‘vague’ questions – and typical pitfalls such as non-response. However, there is a great deal of uncertainty around how to harness the opportunities of analysing the wealth of information posted online in a representative, robust and ethical way. Despite their usefulness and efficiency, analyses of online comments with text mining techniques do raise a host of concerns. First, although this is an issue associated with online data analysis generally rather than with text mining techniques specifically, it is difficult to identify whether comments posted on Internet websites have been truly drafted by patients (and/or their carers). Internet discussion forums often also lack monitoring, hence data (or comments) selected will have to be carefully chosen and, most of the time, purposively sampled by the analyst. Second, the use of TMTs to analyse online comments may raise issues of representativeness, where the views of one cohort in a population (having access, technical skills and inclination to post comments on Internet websites) are over represented while the views of others are excluded – i.e. the so-called ‘digital divide’ (see Brodie et al. 2000; Norris 2001; Cotton and Gupta 2004; Wyatt et al. 2005). Third, online commentators may not expect to be research subjects. For example, Sharf (1999: 247) has described various negative experiences of following up Web posts and email lists for further research. One woman, on being contacted by a researcher seeking consent to gain insights from breast cancer patients about their personal experiences, became hostile, accusing the researcher of behaving voyeuristically and ‘taking advantage of people in distress’3. 3 See Esyenbach and Till (2001) for related ethical issues in qualitative research on Internet health communities. To conclude, the analysis of online comments using TMTs may be used as a stand-alone method in formative evaluation in certain circumstances: (a) when the research is entirely exploratory and no other source of data is available to complement the study, (b) when the whole population has been able to express comments on a particular issue (in which case, researchers will be analysing a census), or (c) when the aim of the FE is to capture repetitive patterns only; that is, marginal points of views do not matter. The analysis of online comments using TMTs may be used in combination with traditional methods in FE when text mining analyses of online data are used as an initial exploratory phase to which researchers/analysts wish to add an inductive or deductive phase. TMTs can be used as a means of informing topic guides to elicit further answers on a particular topic of interest and the strategy of coding data already collected via elicitation techniques or surveys/consultations (see Hsieh and Shannon 2005). They can be employed when researchers or analysts wish to triangulate results obtained via elicitation/observation techniques and/or surveys and consultations. Text mining analyses of online data can be used to check whether answers obtained via interviews or survey responses are plausible or not. Or, the results of TMTs can provide a ‘springboard’ for large-n studies, to help identify variables of interest or new hypotheses to be tested (Table 2 below provides a summary of our recommendations). [Table 2 about here] Overall, while the promises of text mining for marketing and advertising have largely been established, they remain to be confirmed for the design of FE in health research. Nevertheless, the explosion of Big Data and the popularity of online communities might precipitate the need to integrate TMTs in a variety of evaluation processes in the near future. This paper suggests that TMTs may be used as stand-alone methods only under minimal circumstances and are most appropriate when complementing other methods in a majority of applications. Despite this caveat, these methods offer considerable added value to the development and implementation of formative evaluations, even if only used as robustness checks or reliability tests. Bibliography Allum, N. C. (1998). A social representations approach to the comparison of three textual corpora using ALCESTE (Doctoral dissertation, London School of Economics and Political Science). Anand, K. J. S., & Craig, K. D. (1996). New perspectives on the definition of pain. Pain-Journal of the International Association for the Study of Pain, 67(1), 3-6. Archak, N., Ghose, A., & Ipeirotis, P. G. (2011). Deriving the pricing power of product features by mining consumer reviews. Management Science, 57(8), 1485-1509. Bailey, A., & Schonhardt-Bailey, C. (2008). Does deliberation matter in FOMC monetary policymaking? The Volcker Revolution of 1979. Political Analysis, 16(4), 404-427. Baker, T. A., & Wang, C. C. (2006). Photovoice: Use of a participatory action research method to explore the chronic pain experience in older adults.Qualitative Health Research, 16(10), 1405-1413. Bara, J., Weale, A., & Biquelet, A. (2007). Analysing parliamentary debate with computer assistance. Swiss Political Science Review, 13(4), 577. Bauman, A. and Nutbeam, D. (2013) Evaluation in a nutshell: a practical guide to the evaluation of health promotion programs, Sydney, AU, McGraw Hill. Bauer, M. and Gaskell, G (2009) Computer Assistance. In Qualitative Researching with Text, Image and Sound, eds. Bauer, M. and Gaskell, G. (London: Sage). Bicquelet, A. and Weale, A. (2011) ‘Coping with the Cornucopia: Can Text Mining Help Handle the Data Deluge in Public Policy Analysis?’ Policy & Internet 3(4): 150-171. Bishop, P., & Davis, G. (2002). Mapping public participation in policy choices.Australian journal of public administration, 61(1), 14-29. Blyth, F. M., Macfarlane, G. J., & Nicolas, M. K. (2007). The contribution of psychosocial factors to the development of chronic pain: The key to better outcomes for patients: Pain, 129, 8–11. Brodie, M., Flournoy, R. E., Altman, D. E., Blendon, R. J., Benson, J. M., & Rosenbaum, M. D. (2000). Health information, the Internet, and the digital divide. Health affairs, 19(6), 255265. Braithwaite, D. O., Waldron, V. R., & Finn, J. (1999). Communication of social support in computer-mediated groups for people with disabilities.Health communication, 11(2), 123151. Breivik, H., Collett, B., Ventafridda, V., Cohen, R., & Gallacher, D. (2006). Survey of chronic pain in Europe: prevalence, impact on daily life, and treatment. European journal of pain, 10(4), 287-287. Brugidou, M. (2003) ‘Argumentation and Values: An Analysis of Ordinary Political Competence via an Open-Ended Question’, International Journal of Public Opinion Research, 15:4, pp. 413- 430. Cline, R. J., & Haynes, K. M. (2001). Consumer health information seeking on the Internet: the state of the art. Health education research, 16(6), 671-692. Cooley, R., Mobasher, B., and Srivastava, J. (1997) ‘Web Mining: Information and Pattern Discovery on the World Wide Web’, Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence ICTAI 97. Cotten, S. R., & Gupta, S. S. (2004). Characteristics of online and offline health information seekers and factors that discriminate between them. Social science & medicine, 59(9), 1795-1806. Coulson, N. S., Buchanan, H., & Aubeeluck, A. (2007). Social support in cyberspace: a content analysis of communication within a Huntington's disease online support group. Patient education and counseling, 68(2), 173-178. Coulson, N. S., & Greenwood, N. (2012). Families affected by childhood cancer: An analysis of the provision of social support within online support groups. Child: care, health and development, 38(6), 870-877. Craig, K. D. (2009). The social communication model of pain. Canadian Psychology/Psychologie canadienne, 50(1), 22. Eccleston, C., Crombez, G., Aldrich, S., & Stannard, C. (1997). Attention and somatic awareness in chronic pain. Pain, 72(1), 209-215. Elliott, A. M., Smith, B. H., Penny, K. I., Smith, W. C., & Chambers, W. A. (1999). The epidemiology of chronic pain in the community. The lancet,354(9186), 1248-1252. Ernst, W. (2013). Digital memory and the archive. J. Parikka (Ed.). University of Minnesota Press. Eysenbach, G. and E. J. Till. 2001. ‘Ethical Issues in Qualitative Research on Internet Communities’ British Medical Journal 323: 1103-5. Feldman, R. and Sanger, J. (2007), The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data (Cambridge: Cambridge University Press). Ginsburg F. (2012) Disability in the digital Age. In: Horst HA and Miller D (eds) Digital Anthropology. Oxford: Berg. Glance, N., Hurst, M., Nigam, K., Siegler, M., Stockton, R., & Tomokiyo, T. (2005, August). Deriving marketing intelligence from online discussion. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining (pp. 419-428). ACM. Guérin-Pace, France (1998) 'Textual Statistics: An Exploratory Tool for the Social Sciences', Population: An English Selection 10: 1, pp. 73-95. Gonzalez-Polledo, E., & Tarr, J. (2014). The thing about pain: The remaking of illness narratives in chronic pain expressions on social media. New Media & Society, 1461444814560126. Goubert, L., Craig, K. D., Vervoort, T., Morley, S., Sullivan, M. J. L., de CAC, W., ... & Crombez, G. (2005). Facing others in pain: the effects of empathy. Pain, 118(3), 285288. Harrison, S., M. Barnes, and M. Mort. 1997. ‘Praise and Damnation: Mental Health User Groups and the Construction of Organizational Legitimacy’, Public Policy and Administration 12 (2) 4-6. Hearst, M. (2003). What is text mining. SIMS, UC Berkeley. Herbette, G., & Rimé, B. (2004). Verbalization of emotion in chronic pain patients and their psychological adjustment. Journal of Health Psychology, 9(5), 661-676. Højsted, J., & Sjøgren, P. (2007). Addiction to opioids in chronic pain patients: a literature review. European journal of pain, 11(5), 490-518. Hsieh, H.-F., and Shannon, S.E. (2005) ‘Three approaches to qualitative content analysis’, Qualitative Health Research, 15:9, pp. 1277-1288. Jackson, P. L., Meltzoff, A. N., & Decety, J. (2005). How do we perceive the pain of others? A window into the neural processes involved in empathy, Neuroimage, 24(3), 771-779. Kappesser, Judith, Amanda C. de C. Williams, and Kenneth M. Prkachin. "Testing two accounts of pain underestimation." Pain 124.1 (2006): 109-116. Kugelmann, R. (1999). Complaining about chronic pain. Social Science & Medicine, 49(12), 1663-1676. Lagus, K., Honkela, T., Kaski, S. and Kohonen, T. (1999) ‘Websom for Textual Data Mining’ Artificial Intelligence, 13: 5, 345-364. Lahlou, S. (1996). A method to extract social representations from linguistic corpora. Japanese Journal of Experimental Social Psychology, 35(3), 278-291. Lincoln, Y. S. & Guba, E. G. (1985). Naturalistic inquiry. Beverly Hills, CA: Sage. Linoff, G. S., & Berry, M. J. (2011). Data mining techniques: for marketing, sales, and customer relationship management. John Wiley & Sons. Martín‐Sánchez, E., Furukawa, T. A., Taylor, J., & Martin, J. L. R. (2009). Systematic review and meta‐analysis of cannabis treatment for chronic pain. Pain medicine, 10(8), 13531368. McCaffrey, Margo, and Betty R. Ferrell. "Nurses' knowledge of pain assessment and management: how much progress have we made?." Journal of pain and symptom management 14.3 (1997): 175-188. McCracken, L. M., & Eccleston, C. (2003). Coping or acceptance: what to do about chronic pain? Pain, 105(1), 197-204. McCrystal, K. N., Craig, K. D., Versloot, J., Fashler, S. R., & Jones, D. N. (2011). Perceiving pain in others: validation of a dual processing model. Pain, 152(5), 1083-1089. McMullan, M. (2006). Patients using the Internet to obtain health information: how this affects the patient–health professional relationship. Patient education and Counseling, 63(1), 2428. Melzack, R. (1975). The McGill Pain Questionnaire: major properties and scoring methods. Pain, 1(3), 277-299. Morley, S., Williams, A. C., & Hussain, S. (2008). Estimating the clinical effectiveness of cognitive behavioural therapy in the clinic: Evaluation of a CBT informed pain management programme. Pain, 137, 670–680. Netzer, O., Feldman, R., Goldenberg, J., & Fresko, M. (2012). Mine your own business: Marketstructure surveillance through text mining. Marketing Science, 31(3), 521-543. NHS Community Care Act (NHSCCA). 1990. London: HMSO. NHS Management Executive (NHSME). 1992. Local Voices, Leeds: NHSME. Norris, P. (2001). Digital divide: Civic engagement, information poverty, and the Internet worldwide. Cambridge University Press. Padfield D. (2003) Perceptions Of Pain, Stockport: Dewi Lewis. Padfield D. (2011) ‘“Representing” the pain of others’, Health 15(3): 241-257. Patton, M. Q. (2008). Utilization-focused evaluation (4. ed.). Thousand Oaks: Sage. Patton, M. Q. (2012). Essentials of Utilization-Focused Evaluation. Thousand Oaks: Sage. Pawson, R., & Tilley, N. (1997). Realistic evaluation. Sage. Preece, J. (2000). Online communities: Designing usability and supporting socialbilty. John Wiley & Sons, Inc.. Pope, C., Ziebland, S., & Mays, N. (2000). Qualitative research in health care: Analysing qualitative data. BMJ: British Medical Journal, 320(7227), 114. Presser, S., Couper, M. P., Lessler, J. T., Martin, E., Martin, J., Rothgeb, J. M., & Singer, E. (2004). Methods for testing and evaluating survey questions.Public opinion quarterly, 68(1), 109-130. Prkachin, K. M., Solomon, P. E., & Ross, J. (2007). Underestimation of pain by health-care providers: towards a model of the process of inferring pain in others. CJNR (Canadian Journal of Nursing Research), 39(2), 88-106. Reinert, M. (1993). Les «mondes lexicaux» et leur «logique» à travers l'analyse statistique d'un corpus de récits de cauchemars. Langage et société, 66, 5-39. Rowe, R. and M. Shepherd. 2002. Public Participation in the New NHS: No Closer to Citizen Control? Social Policy and Administration 36 (3) 275-90. Schonhardt-Bailey, C. (2005) ‘Measuring Ideas More Effectively: An Analysis of Bush and Kerry's National Security Speeches’, PS: Political Science and Politics, 38:3, pp. 701711. Sharf. B. (1999) ‘Beyond Netiquette: The ethics of doing naturalistic discourse research on the internet’, in S. Jones (eds) Doing Internet Research. London: Sage. Smedley, R., Coulson, N., Gavin, J., Rodham, K., & Watts, L. (2015). Online social support for Complex Regional Pain Syndrome: A content analysis of support exchanges within a newly launched discussion forum. Computers in Human Behavior, 51, 53-63. Sullivan, M. J. L., Martel, M. O., Tripp, D., Savard, A., & Crombez, G. (2006). The relation between catastrophizing and the communication of pain experience. Pain, 122(3), 282288. Sullivan, M. J. L. (2008). Toward a biopsychomotor conceptualization of pain: Implications for research and intervention. Clinical Journal of Pain, 24, 281–290. Tanabe, L., U. Scherf, L. Smith, J. Lee, L. Hunter, and J. Weinstein. 1999. ‘MedMiner: An Internet Text-Mining Tool for Biomedical Information, with Application to Gene Expression Profiling’, Biotechniques 27 (6) 1210-7. Tong, R. and Yager, R. (2006) Characterizing buzz and sentiment in internet sources: Linguistic summaries and predictive behaviors. In J. Shanahan, Y. Qu, & J. Wiebe (Eds.), Computing attitude and affect in text: Theory and applications (Dordrecht: Springer). Ward, S. E., & Gordon, D. B. (1996). Patient satisfaction and pain severity as outcomes in pain management: a longitudinal view of one setting's experience. Journal of pain and symptom management, 11(4), 242-251. Weale, A (2007) ‘What Is so Good about Citizens’ Involvement in Healthcare?’ in Edward Andersson, Jonathan Tritterand, Richard Wilson (eds), Health Democracy: The Future of Involvement in Health and Social Care (London: Involve and NHS National Centre for Involvement, 2007), pp. 37-43. Weale, A., Bicquelet, A., & Bara, J. (2012). Debating abortion, deliberative reciprocity and parliamentary advocacy. Political Studies, 60(3), 643-667. Wyatt, S., Henwood, F., Hart, A., & Smith, J. (2005). The digital divide, health information and everyday life. New Media & Society, 7(2), 199-218. Ziebland S. (2004) The importance of being expert: the quest for cancer information on the Internet. Social Science & Medicine 59: 1783-1793. Appendix: Table 1: Units of Sampling Units of Sampling (Videos) Number of views Units of Coding Chronic Pain - Is it All in Their Head? - Daniel J. Clauw M.D https://www.youtube.com/watch?v=pgCfkA9RLrM 45,863 145 V1 Honest Vlog - Living with Chronic Fatigue Syndrome https://www.youtube.com/watch?v=4nb0xhqzi1Q 10,363 130 V2 My Chronic Pain Condition https://www.youtube.com/watch?v=03LgQrWWLxs 4,216 103 V3 Chronic Pain Patient, first visit with Pain Management Doctor https://www.youtube.com/watch?v=T8qT7hlz6P0 16,415 98 V4 Chronic Pain: Changing My Story and Loving My Body https://www.youtube.com/watch?v=zCJBGBv-1nE 2,480 64 V5 Doctors and Chronic Pain https://www.youtube.com/watch?v=H8yYSajFql8 2,947 61 V6 Healing Chronic Pain - Brendan Mooney's Testimonial https://www.youtube.com/watch?v=R8MZZkd7NZo 8,691 45 V7 Why do chronic pain patients kill themselves? https://www.youtube.com/watch?v=QpRr-A3JKyo 4,911 37 V8 I Live In Chronic Pain https://www.youtube.com/watch?v=f5LBmlwjiPc 1,381 24 V9 Struggling to be me with chronic pain https://www.youtube.com/watch?v=FPpu7dXJFRI 12,801 18 V10 Chronic Pain https://www.youtube.com/watch?v=5txx4MQ77xM 7,181 15 V11 The distress of chronic pain https://www.youtube.com/watch?v=U-Ndp8mSsIg 5,169 13 V12 Chronic Pain Explained https://www.youtube.com/watch?v=B2SI-gmpDUU 8,998 8 V13 Chronic Pain: The Invisible Disease https://www.youtube.com/watch?v=9g4XOn-c52M 1,257 2 V14 Total: 132,673 763 Codes 14 Figure 1: Distribution of ECUs per class and number of words analysed by class Figure 2: Descending Hierarchical Classification Figure 3: Correspondence Analysis Table 2: Recommendations TMTs may be used as stand-alone techniques in FE when: The research is exploratory. A whole population is being studied (census). The aim of the FE is to capture repetitive patterns only. TMTs may be used in combination with traditional methods in FE to: Inform interview/focus group topic guides. Inform coding strategy (and/or codebook) of data already collected. Provide springboards to large-n studies. Triangulate results obtained via elicitation/observation techniques/surveys and consultations.