Download Using Online Mining Techniques to inform Formative Evaluations

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Using Online Mining Techniques to inform Formative Evaluations:
An Analysis of YouTube Video Comments about Chronic Pain
Corresponding author:
Aude Bicquelet
Research Director
NatCen Social Research
35 Northampton Square
London EC1V 0AX
Tel: 0207 549 7023
e-mail: [email protected]
Biography:
Dr. Aude Bicquelet is a Research Director at NatCen Social Research. Her research interests lie
at the intersection of Mixed-Methods, Evidence-based and Public Health Policy. An important
part of her work revolves around the use and application of Text-Mining techniques to
analyse large corpora – i.e. public consultations; open-ended responses and Social Media/Web
data.
Abstract
Despite the growing body of research analysing information posted on social media, very few
studies have focused on how ‘naturally occurring data’ could inform formative evaluations in
health research. This paper argues that exploratory data mining techniques such as descending
hierarchical classification, cluster and correspondence analysis could usefully be employed either
as stand-alone or mixed methods in the design of needs assessments on health-related issues. To
this end, the paper reports on the application of text mining techniques to analyse YouTube video
comments on chronic pain.
Key words: Text-Mining; Social Media; Formative Evaluation; Health; Needs assessment.
1. Introduction
Formative evaluations (FE) are a key component of health research (Bauman
and Nutbeam 2013; Patton 2008, 2012). Among the various types of formative evaluations, needs
assessments are particularly important because they help identify who might need a particular
program, how great the need is and the extent to which this need might be met (Pawson and
Tilley 1997). In order to understand the needs and the procedures required to improve a particular
treatment and/or to innovate in an area where medical and psychological supports have been
unsuccessful, inputs from patients, their families and carers are critical (see Harrison et al. 1997;
Weale 2007; Rowe and Shepherd 2002). This chimes with the National Health Service’s stance
and the guidelines for regulatory agencies such as the National Institute for Health and Care
Excellence and the Care Quality Commission, which stress the importance of fostering public
engagement in health care (i.e. see 1992 White Paper Local Voices and the NHS and Community
Care Act of 1990).
Traditional techniques for involving stakeholders in FE, needs assessments and/or
decision-making on health-related issues include (a) elicitation techniques (typically used in
small-n studies) and (b) surveys and consultations (typically used in large-n studies). Elicitation
techniques such as interviews and focus groups have the advantage of obtaining ‘thick’ or ‘rich’
data, offering a window on patients’ and/or carers’ day-to-day lives and experiences. Yet, they
usually include a relatively small sample of a population. Consequently, they risk excluding
important perspectives from the analysis. Moreover, they potentially generate long and diverse
answers difficult to analyse with traditional approaches to qualitative data analysis, such as
thematic and/or discourse analysis (see Pope et al. 2000).
By contrast, surveys and large-scale consultations offer the possibility of analysing a
larger sample and can potentially reduce or prevent geographical dependence (when conducted
remotely). But given the number of answers requiring analysis, inputs must be guided or
restricted in questionnaires. Hence, they are unlikely to capture ‘thick data’ or precise accounts.
Other disadvantages traditionally associated with surveys and consultations include (a) low
response rates and (b) the difficulty to capture information on sensitive issues (see Bishop and
Davis 2002; Presser et al. 2004).
Text mining techniques (TMTs) are an alternative third way of gathering and looking at
data about patients’ experiences and their needs. Although established TMTs such as descending
hierarchical classification, cluster analysis and correspondence analysis have been usefully
employed in the analysis of large-scale consultations on health-related issues (see Bicquelet and
Weale 2011), they have rarely been applied to the analysis of online information. This paper
argues that TMTs could prove valuable in assisting the development of formative evaluations by
summarising the vast amount of ‘naturally occurring’ data contained within Internet health
communities. To this end, the paper reports on the application of TMTs to analyse YouTube
video comments on chronic pain.
2. Using Text Mining Techniques to analyse YouTube Video Comments on Chronic Pain
Due to the rapidly growing availability of online documents and Internet content (and the
emerging need for their quick and effective interpretation), the automated analysis of large textual
corpora has received increasing attention in recent years (see in particular Cooley et al. 1997;
Feldman & Sanger 2007; Bauer & Gaskell 2009). Descended from the older and established
tradition of data mining, Web mining and text mining have mainly been employed in two
separate strands of research. The first has been to discover and process data across thousands, and
sometimes millions, of pages on the Web (see Lagus et al. 1999; Tanabe et al. 1999). The second
has been used for more ‘local’ analyses of Internet sources, such as newsgroups, message boards
or electronic brainstorming sessions (see for instance Tong and Yager 2006). Although
definitions vary, text mining can be understood as the process of extracting information in large
corpora to automatically identify patterns and relationships in textual data (Feldman and Sanger
2007: 17).
Text mining is similar to reading in that both activities involve extracting meaning from
strings of letters. However, the computational and statistical analysis of text differs from reading
in two important respects. First, computer-enabled approaches can process and summarise far
more text than any person has time to read. Second, such approaches are able to extract meaning
from text that is missed by human readers who may overlook certain patterns because they do not
conform to prior beliefs or expectations (see Bicquelet & Weale 2011). Text mining is a very
inductive, exploratory approach to data analysis where a key element is the linking together of the
extracted information to form new facts (or new hypotheses) to be explored further by more
conventional means of experimentation (see Hearst 2003). Although widely applied in political
science (Bailey 2008; Bara et al. 2007; Weale et al. 2012), marketing (Netzer et al. 2012; Linoff
and Berry 2011; Glance et al. 2005) and the analysis of consumer behaviour (Archak et al. 2011),
text mining has been less utilised to extract meaningful information from naturally occurring data
on health-related issues.
Case selection and sampling
To demonstrate how TMTs can usefully be applied to the analysis of online data and how
they can feedback into the design of formative evaluations, we analyse comments about ‘chronic
pain’ posted under YouTube videos. A variety of mediums exist through which individuals can
access and exchange health information online including websites, support groups, chat rooms
and instant messaging. We focus on comments posted on YouTube specifically because, unlike
other social media websites (such as Facebook or Twitter), little information is associated with
the identity of online commentators, who are also informed that their comments will be made
visible to a wide community of viewers.
Several important studies have looked at the content of online comments on physical
disabilities (Braithwaite et al. 1999), Huntington’s disease (Coulson et al. 2007), childhood
cancer (Coulson & Greenwood 2012) and Alzheimers patients (Preece 2000). However, very
little research has focused on comments spontaneously posted on social media by chronic pain
patients (with the notable exception of Smedley et al. 2015). This is a significant absence
especially because ‘talking’ about pain is often perceived as a challenge by patients, doctors and
health providers or social care services seeking to enhance patients’ experiences and their wellbeing.
Communicating Chronic Pain
Chronic pain is a complex biopsychosocial phenomenon that is not easily communicated
and difficult to measure objectively (see Jackson et al. 2005). Although there have been dramatic
advances in our understanding and management of chronic pain over the last 20 years (see
McCaffrey et al. 1997; Ward & Gordon 1996; McCracken & Eccleston 2003), it is often poorly
recognized and inadequately assessed (Brennan & Cousins 2004; Foley 1997; Hill 1995; Rich
1997). For instance, comparisons of self-reports and the estimates of a person’s pain by healthcare professionals have revealed systematic underestimation leading, at times, to failure to deliver
needed care (Kappesser et al. 2006; Prkachin et al. 2007). Standardised diagnostic tools such as
the McGill Pain Questionnaire (Melzack 1975) attempt to measure pain by translating it into
words that describe its sensory, spatial and evaluative properties, yet qualitative studies have
shown that this terminology does not correspond to patients’ spontaneous descriptions
(McCracken & Eccleston 2003). The current focus on intrapersonal features of pain – its biology
(on the one hand) and suffering as a psychological experience (on the other hand) – is often
considered inadequate to address the complex social nature of the phenomenon (see Blyth et al.
2007; Sullivan 2008). Hence the necessity to turn to patients’ experiences – expressed in their
own words – to better understand their needs and inform programs.
In order to select comments for analysis, we entered the key words ‘Chronic Pain Patient’
in the YouTube search engine. We used the word ‘patient’ to avoid generic medical videos
providing explanations and/or descriptions of the causes and symptoms of chronic pain. The key
term search returned 217,000 results. We selected only the most commented-upon videos. Table
1 below provides an overview of our units of sampling. At the top of the list, the most
commented video included 145 comments for 45,863 views. Videos including less than 2
comments were not included in the analysis. Overall, we gathered 763 comments posted under 14
videos. The total number of words in our corpus of data was 59,322.
[Table 1 about here]
The Alceste Software
To assist in the analysis of online comments on chronic pain, we use a computer-assisted
content analysis package named Alceste1. This software was originally developed and applied to
the study of the humanities (Reinert 1993). Its use has more recently spread to the social sciences
(Lahlou 1996; Allum 1998) and has attracted researchers seeking to analyse political speeches
(Schonhardt-Bailey 2005), parliamentary debates (Schonhardt-Bailey 2008; Bara, Weale &
Bicquelet 2007), opinion polls (Brugidou 2003) and large public consultations (Bicquelet &
Weale 2011).
Alceste combines textual and statistical analyses. It relies upon co-occurrence analysis,
which is the statistical analysis of frequent word pairs in a text or corpus (in this case, YouTube
comments). Within this corpus, homogeneous subsets of words are identified on the basis of their
lexical profile (Brugidou 2003: 418). As defined by Max Reinert, this ‘method can be used to
determine the main word distribution pattern within a text or a discourse (a corpus). The technical
procedure leads to selecting classes, each determined by a pool of words mathematically linked
together and having the highest significant frequency of occurrence, i.e, those which the speakers
Alceste stands for Analyse des Lexèmes Co-occurents dans les Énnoncés Simples d’un texte - Analysis of the
co-occurring lexemes within the simple statements of a text. Its algorithm, based on Benzecri’s important
contributions in textual statistics, was created by Max Reinert at the CNRS.
1
tended to use most repeatedly. These classes with their content and function words subscribe to
different types of discourse with their specific vocabulary and syntax’ (cited in Brugidou 2003:
418).
In other words, the software is designed to quantify the text in order to extract its most
significant structures. Two major assumptions at the core of this method are, first, that the most
significant structures are closely linked to the distribution of words in a text and, second, that
repetition is the main factor for the stabilization (identification) of a discursive activity. What
Alceste therefore carries out is a particular reading of the text that does not take into account the
meaning of the words but focuses instead on their recurrence and co-occurrence in sentence
segments.
Steps of Analysis
In its initial phase, the software breaks down the corpus into two kinds of context unit:
initial context unit (ICU) and elemental context unit (ECU). ICUs are sampling units
corresponding to the divisions of the text specified by the user, to which one or several variables
can be assigned. In the analyses below, each comment constitutes an ICU. Each ICU has been
coded with the label of the Video (V1, V2, Vn).
At this stage, the software runs three successive processes: Word Recognition,
Lemmatization, Syntax Categorization. This means that different forms of the same word (in the
form of plurals, suffixes, etc.) are reduced to the root form and that irregular verbs are
transformed to the indicative. The corpus is subdivided into ‘function words’ (articles,
prepositions, conjunctions, pronouns and auxiliary verbs) and ‘content words’ (nouns, verbs,
adjectives and adverbs).
The corpus is then fragmented into ECUs. These are ‘gauged sentences that the program
automatically constructs based on word length and punctuation in the text’ (Schonhardt-Bailey
2005: 705). ECUs are classified according to the distribution of their vocabulary. A data matrix is
created allowing an analysis of statistical similarities and dissimilarities of words in order to
identify repetitive language patterns. This classification proceeds by successively splitting the
ECUs into classes on the basis of vocabulary oppositions: ‘the procedure searches for maximally
separate patterns of co-occurrence between the words classes’ (Schonhardt-Bailey 2005: 710).
Classes are constructed according to the lexical content of each ECU by grouping similar forms
(or words) on the basis of X2 discriminating criteria. What is obtained through this decreasing
hierarchical classification is a number of classes of words that should be representative of the
main themes of the corpus analysed.
3. Results
Figure 1 below shows that five classes of arguments have been automatically selected by
the software. These classes represent the main themes, sometimes also called ‘dimensions’, raised
in the comments (see Schonhardt-Bailey 2005). Figure 1 reveals that the largest class is class 2
(292 ECUs were grouped under this class because they contain similar key terms). With 105
ECUs, the smallest class is class 5.
[Figure 1 about here]
Figure 2 below is a cluster analysis displaying the most frequent key terms used by
YouTube commentators gathered together under different classes. Those key terms have not been
selected by the analyst; they have been automatically selected by the software based on their
occurrence and co-occurrence. To interpret and give meaning to the classes identified by the
algorithm, we look at the key terms and the sentence segments (ECUs) from which they were
extracted.
[Figure 2 about here]
In class 1, YouTube commentators thank each other for sharing their experiences in the
videos posted on the website. The emphasis is on tolerance and empathy for chronic pain
sufferers. The ‘honest’ depiction of the accounts is found particularly valuable, while some
stories are praised as being really ‘inspiring’. The most common key terms associated with this
class include: thank, sharing, chronic, bless, inspiring and honest. One of the most representative
sentence segments under class 1, is:
‘This is so very inspiring. Concrete proof that self-care and living through honesty and
responsibility is the true way. Thank you Brendan for sharing your wisdom. I enjoyed your
video Brendan. You share some very honest, simple truths that I feel so many people
overlook2.’(ECU no. 160 X2 = 0.05).
Class 2 illustrates how YouTube and other social media provide new avenues for communicating
Key terms in bold are associated with the class discussed. i.e. ‘inspiring; self; living; honesty; responsibility;
true way (etc...) are all associated with Class 1.
2
pain outside clinical contexts. This is where chronic pain sufferers express their frustration in
their own words, such as: go, say, head, f*, thing, wonder. Typical sentence segments include:
‘It’s so fucking hard. I don’t know how to deal with the situation any more. I got my medical
marijuana card. I had to try something. For two straight weeks for the first time in a decade I
didn't suffer from peripheral neuropathy. Today is a bad day, but I'm not even mad.’ (ECU no. 15
X2 = 0.03)
Class 3 deals with the symptoms and consequences of, and daily practices to cope with, chronic
pain; including key words such as: sleep, school, bed, wake, sore, and headache and ECUs such
as:
‘I am 16 right now and I had glandular fever when I was 15 which in the last year has led to
chronic fatigue syndrome. I constantly feel sluggish, sick, massive headaches and body aches and
just lack energy all round.’ (ECU no. 648 X2 = 0.05).
Class 4 deals with the alternative medicines and the use of illegal drugs to cope with the
syndromes of chronic pain. This is also the class where chronic pain sufferers discuss their
encounters with clinicians and their sometimes conflicting relationships. The class comprises key
terms such as: contract, drug, pharmacy, prescription, law and illegal. Typical sentence segments
include:
‘I used to be 110% against marijuana. But after dealing with "medical professionals" who made it
obvious they didn't want to help and a doctor who loved to under-medicate to keep the FDA off
his back, I became a 110% supporter of medical marijuana. In fact I now also support legalizing
MJ all together.’ (ECU no. 1128 X2 = 0.05).
Finally in class 5, chronic pain sufferers discuss the risks associated with different types of
medication. Particular concerns such as addiction and overdose are frequently mentioned along
with increased risks of depression associated with some treatments against pain. Key terms in this
class include: opiate, medication, overdose, narcotic, death and toxic. Typical ECUs include:
‘Short term they work great but you rapidly become tolerant and they stop working. You do not
become tolerant to the side effects however. They suppress the immune system, increase
depression, worsen sleep quality, and therefore pain, and even in the small group of patients who
get long term pain relief they do not improve function.’ (EC no. 888 X2 = 0.04).
The correspondence analysis result, depicted in Figure 3, is simply a projection of the key terms
and classes in two-dimensional space. The clustering of pretty much all the classes around the
centre of the axes indicates a large degree of convergence between the different topics discussed
in different classes. In other words, even if the themes identified are different, there are shared
patterns of argumentation between the different classes.
[Figure 3 about here]
4. Discussion
From a purely substantive perspective, our results are very much in line with studies
outlining a comprehensive formulation of pain that is inclusive of multiple biological,
psychological and social features (see, for example, Craig 2009). From a more methodological
perspective, our results are in line with studies that have focused on the new forms of social
communication of pain. Gonzalez-Polledo and Tarr (2014), for instance, have shown how
traditional illness narratives are remade on social media in less traditional narrative structures.
They argue that new forms of mediation and social media dynamics transform pain narratives,
which, in turn, has implications for our understandings of the formats of pain communication.
The remaking of narrative structures is evident in our class 1, where empathy,
understanding and honesty stand out as cardinal values in the process of communicating chronic
pain. People who might be quite cautious about sharing their experiences indicate on YouTube
that there is a particular way in which they feel comfortable about sharing their stories. This
finding could inform the design of a formative evaluation on chronic pain, emphasizing the need
to be particularly careful and thoughtful about the way participants’ narratives are elicited. Doing
so is all the more important as Morley et al. (2008) and Herbette and Rime (2004) have shown
that chronic pain patients may feel inhibited from talking about their pain and frustration, because
they want to be seen as functioning well or they anticipate misunderstanding and stigmatization.
Our class 2 highlights another important dimension. It captures chronic pain sufferers’
need to express their experiences outside a clinical context. A formative evaluation wishing to
take this need into account might consider, for instance, how non-traditional interventions (i.e.
art-based interventions) could be used to channel frustration and give an outlet to chronic pain
patients to express themselves. Studies of pain expression through photography, such as Padfield
(2003, 2011), and of disability in the digital age (Ginsburg 2012) attest to how the creative
process involved in communicating about pain through multiple media has the potential to
transform the experience of pain by shifting its locus from within to outside the body. Other
studies have also demonstrated that in the process of sharing pain experiences and meanings and
of engaging in new dynamics that produce new caring and support relations, new forms of patient
expertise emerge through communicating about chronic illness online (Ziebland 2004). Medical
and humanities approaches from literature and philosophy have often overemphasised pain as an
individual problem rather than situating it in a social context (see Craig 2009). However, there are
now new possibilities for communicating pain beyond clinical contexts, which could be taken
into account in a programmatic context.
Class 3 also constitutes a particularly useful vein of data offering essentially descriptive
findings of the most common symptoms experienced by chronic pain sufferers. As remarked
elsewhere by Ernst and Parikka (2013) chronic pain expressions in social media are becoming a
growing archive that can be accessed, analysed and re-analysed for different purposes. This
archive conveys multiple experiences of what it means to live with pain and could be used as
triangulation data in a formative evaluation to confirm (or not) results produced by elicitation
techniques, survey answers and/or case studies. Robustness checks are often difficult to perform
in a programmatic context because access to primary data might be difficult or because the
timeframe of the study might not allow repeated trials. However, the use of TMTs to analyse
naturally occurring data potentially offers new avenues to test the validity and reliability of
results obtained via more traditional qualitative or quantitative techniques.
Class 4 provides important information difficult to obtain via traditional elicitation
techniques where social desirability, fear of judgement or stigma may bias responses. This is
where chronic pain sufferers admit turning to alternative medicines, illegal drugs and alcohol to
cope with chronic pain syndromes. Issues expressed here pertaining to pain management via
illegal or dangerous means could inform not only the results of a formative evaluation but also
topic guides or coding strategies of additional methods of data collection/analysis employed in
the evaluation. They could also feedback into new recruitment strategies for case studies.
The other important theme recurrent in this class (i.e. the often conflicting relationships
between doctors and chronic pain sufferers) could be explored furthered via, for instance, openended questionnaires or semi-structured interviews. Abundant research on the chasm between
patients and doctors in the clinical management of chronic pain (see Eccleston et al. 1997;
Kugelmann 1999; Sullivan et al. 2006; McCrystal et al. 2011) highlights that pain
communication in clinical contexts is often fraught by differences in expectations and outcomes.
A formative evaluation dedicated to understand this chasm could usefully employ TMTs to crosscompare a variety of experiences and identify recurring issues.
Finally in class 5, concerns raised by chronic pain sufferers regarding the risks associated
with certain kinds of medication could inform interviews or focus group topic guides with
patients. More importantly, perhaps, these results could be used in a formative evaluation to warn
health professionals about the common fears experienced by patients about to commence a
particular treatment. An important challenge identified in the literature on how health information
is gathered and shared online is how health professionals themselves can respond to the more
‘Internet-informed’ patient (see McMullan 2006). Although acquiring and sharing information
from online health communities can improve patients’ understanding of their condition and selfcare, they have often been implicated as culprits in the dissemination of misleading information
because much of the guidance offered by support-group members is based on personal experience
and often lacks professional expertise (Cotten & Gupta 2004; Cline & Haynes 2001).
Commentators have also pointed out that patients who visit their doctors with inappropriate or
misinterpreted information from the Internet will do little to enhance doctor–patient
communication (see Ziebland 2004). But, communication could be improved simply if health
professional themselves were better informed about the common fears and sometimes the
common ‘myths’ disseminated on online health communities. This is an issue that a needs
assessment using TMTs could certainly help address by providing a quick, yet systematic,
analysis of the voluminous data readily available online.
5. Conclusion
The Internet is now a heavily relied upon source of reference material for the public that
transcends existing geographical and regulatory boundaries and where the distinction between
‘experience’ and ‘expertise’ is blurred. In many ways, it is an optimal way to gather and share
health information. It affords individuals privacy, immediacy, convenience, anonymity and a
variety of perspectives on the same topic. In addition, the cloak of confidentiality afforded by the
anonymous nature of the Internet is advantageous in that it allows users to ask awkward, sensitive
or detailed questions without the risk of facing judgment, scrutiny or stigma.
Evaluators and researchers would do well to tap into the resources offered by the Internet,
but of course the sheer amount of information posted on the Web requires appropriate methods of
analysis. This is a methodological challenge that can be met with the use of text mining
techniques.
In comparison to traditional elicitation techniques and participant observation (i.e. small-n
studies), TMTs offer the possibility to analyse a large sample of the population. They offer the
advantage of not invading patients’ privacy’. In certain cases, they offer greater access to a
population, and they considerably reduce practical issues associated with the organisation of
interviews and focus groups. In comparison to surveys/ consultations (i.e. large-n studies), TMTs
are able to capture a richer vein of data. They avoid traditional issues associated with the design
of surveys and consultations questions – that is, the necessity to avoid ‘closed’, ‘leading’ and
‘vague’ questions – and typical pitfalls such as non-response.
However, there is a great deal of uncertainty around how to harness the opportunities of
analysing the wealth of information posted online in a representative, robust and ethical way.
Despite their usefulness and efficiency, analyses of online comments with text mining techniques
do raise a host of concerns.
First, although this is an issue associated with online data analysis generally rather than
with text mining techniques specifically, it is difficult to identify whether comments posted on
Internet websites have been truly drafted by patients (and/or their carers). Internet discussion
forums often also lack monitoring, hence data (or comments) selected will have to be carefully
chosen and, most of the time, purposively sampled by the analyst.
Second, the use of TMTs to analyse online comments may raise issues of
representativeness, where the views of one cohort in a population (having access, technical skills
and inclination to post comments on Internet websites) are over represented while the views of
others are excluded – i.e. the so-called ‘digital divide’ (see Brodie et al. 2000; Norris 2001;
Cotton and Gupta 2004; Wyatt et al. 2005).
Third, online commentators may not expect to be research subjects. For example, Sharf
(1999: 247) has described various negative experiences of following up Web posts and email lists
for further research. One woman, on being contacted by a researcher seeking consent to gain
insights from breast cancer patients about their personal experiences, became hostile, accusing
the researcher of behaving voyeuristically and ‘taking advantage of people in distress’3.
3
See Esyenbach and Till (2001) for related ethical issues in qualitative research on Internet health communities.
To conclude, the analysis of online comments using TMTs may be used as a stand-alone
method in formative evaluation in certain circumstances: (a) when the research is entirely
exploratory and no other source of data is available to complement the study, (b) when the whole
population has been able to express comments on a particular issue (in which case, researchers
will be analysing a census), or (c) when the aim of the FE is to capture repetitive patterns only;
that is, marginal points of views do not matter.
The analysis of online comments using TMTs may be used in combination with traditional
methods in FE when text mining analyses of online data are used as an initial exploratory phase
to which researchers/analysts wish to add an inductive or deductive phase. TMTs can be used as a
means of informing topic guides to elicit further answers on a particular topic of interest and the
strategy of coding data already collected via elicitation techniques or surveys/consultations (see
Hsieh and Shannon 2005). They can be employed when researchers or analysts wish to
triangulate results obtained via elicitation/observation techniques and/or surveys and
consultations. Text mining analyses of online data can be used to check whether answers obtained
via interviews or survey responses are plausible or not. Or, the results of TMTs can provide a
‘springboard’ for large-n studies, to help identify variables of interest or new hypotheses to be
tested (Table 2 below provides a summary of our recommendations).
[Table 2 about here]
Overall, while the promises of text mining for marketing and advertising have largely
been established, they remain to be confirmed for the design of FE in health research.
Nevertheless, the explosion of Big Data and the popularity of online communities might
precipitate the need to integrate TMTs in a variety of evaluation processes in the near future. This
paper suggests that TMTs may be used as stand-alone methods only under minimal circumstances
and are most appropriate when complementing other methods in a majority of applications.
Despite this caveat, these methods offer considerable added value to the development and
implementation of formative evaluations, even if only used as robustness checks or reliability
tests.
Bibliography
Allum, N. C. (1998). A social representations approach to the comparison of three textual
corpora using ALCESTE (Doctoral dissertation, London School of Economics and
Political Science).
Anand, K. J. S., & Craig, K. D. (1996). New perspectives on the definition of pain. Pain-Journal
of the International Association for the Study of Pain, 67(1), 3-6.
Archak, N., Ghose, A., & Ipeirotis, P. G. (2011). Deriving the pricing power of product features
by mining consumer reviews. Management Science, 57(8), 1485-1509.
Bailey, A., & Schonhardt-Bailey, C. (2008). Does deliberation matter in FOMC monetary
policymaking? The Volcker Revolution of 1979. Political Analysis, 16(4), 404-427.
Baker, T. A., & Wang, C. C. (2006). Photovoice: Use of a participatory action research method to
explore the chronic pain experience in older adults.Qualitative Health Research, 16(10),
1405-1413.
Bara, J., Weale, A., & Biquelet, A. (2007). Analysing parliamentary debate with computer
assistance. Swiss Political Science Review, 13(4), 577.
Bauman, A. and Nutbeam, D. (2013) Evaluation in a nutshell: a practical guide to the evaluation
of health promotion programs, Sydney, AU, McGraw Hill.
Bauer, M. and Gaskell, G (2009) Computer Assistance. In Qualitative Researching with Text,
Image and Sound, eds. Bauer, M. and Gaskell, G. (London: Sage).
Bicquelet, A. and Weale, A. (2011) ‘Coping with the Cornucopia: Can Text Mining Help Handle
the Data Deluge in Public Policy Analysis?’ Policy & Internet 3(4): 150-171.
Bishop, P., & Davis, G. (2002). Mapping public participation in policy choices.Australian
journal of public administration, 61(1), 14-29.
Blyth, F. M., Macfarlane, G. J., & Nicolas, M. K. (2007). The contribution of psychosocial
factors to the development of chronic pain: The key to better outcomes for patients: Pain,
129, 8–11.
Brodie, M., Flournoy, R. E., Altman, D. E., Blendon, R. J., Benson, J. M., & Rosenbaum, M. D.
(2000). Health information, the Internet, and the digital divide. Health affairs, 19(6), 255265.
Braithwaite, D. O., Waldron, V. R., & Finn, J. (1999). Communication of social support in
computer-mediated groups for people with disabilities.Health communication, 11(2), 123151.
Breivik, H., Collett, B., Ventafridda, V., Cohen, R., & Gallacher, D. (2006). Survey of chronic
pain in Europe: prevalence, impact on daily life, and treatment. European journal of pain,
10(4), 287-287.
Brugidou, M. (2003) ‘Argumentation and Values: An Analysis of Ordinary Political Competence
via an Open-Ended Question’, International Journal of Public Opinion Research, 15:4,
pp. 413- 430.
Cline, R. J., & Haynes, K. M. (2001). Consumer health information seeking on the Internet: the
state of the art. Health education research, 16(6), 671-692.
Cooley, R., Mobasher, B., and Srivastava, J. (1997) ‘Web Mining: Information and Pattern
Discovery on the World Wide Web’, Proceedings of the 9th IEEE International
Conference on Tools with Artificial Intelligence ICTAI 97.
Cotten, S. R., & Gupta, S. S. (2004). Characteristics of online and offline health information
seekers and factors that discriminate between them. Social science & medicine, 59(9),
1795-1806.
Coulson, N. S., Buchanan, H., & Aubeeluck, A. (2007). Social support in cyberspace: a content
analysis of communication within a Huntington's disease online support group. Patient
education and counseling, 68(2), 173-178.
Coulson, N. S., & Greenwood, N. (2012). Families affected by childhood cancer: An analysis of
the provision of social support within online support groups. Child: care, health and
development, 38(6), 870-877.
Craig, K. D. (2009). The social communication model of pain. Canadian Psychology/Psychologie
canadienne, 50(1), 22.
Eccleston, C., Crombez, G., Aldrich, S., & Stannard, C. (1997). Attention and somatic awareness
in chronic pain. Pain, 72(1), 209-215.
Elliott, A. M., Smith, B. H., Penny, K. I., Smith, W. C., & Chambers, W. A. (1999). The
epidemiology of chronic pain in the community. The lancet,354(9186), 1248-1252.
Ernst, W. (2013). Digital memory and the archive. J. Parikka (Ed.). University of Minnesota
Press.
Eysenbach, G. and E. J. Till. 2001. ‘Ethical Issues in Qualitative Research on Internet
Communities’ British Medical Journal 323: 1103-5.
Feldman, R. and Sanger, J. (2007), The Text Mining Handbook: Advanced Approaches to
Analyzing Unstructured Data (Cambridge: Cambridge University Press).
Ginsburg F. (2012) Disability in the digital Age. In: Horst HA and Miller D (eds) Digital
Anthropology. Oxford: Berg.
Glance, N., Hurst, M., Nigam, K., Siegler, M., Stockton, R., & Tomokiyo, T. (2005, August).
Deriving marketing intelligence from online discussion. In Proceedings of the eleventh
ACM SIGKDD international conference on Knowledge discovery in data mining (pp.
419-428). ACM.
Guérin-Pace, France (1998) 'Textual Statistics: An Exploratory Tool for the Social Sciences',
Population: An English Selection 10: 1, pp. 73-95.
Gonzalez-Polledo, E., & Tarr, J. (2014). The thing about pain: The remaking of illness narratives
in chronic pain expressions on social media. New Media & Society, 1461444814560126.
Goubert, L., Craig, K. D., Vervoort, T., Morley, S., Sullivan, M. J. L., de CAC, W., ... &
Crombez, G. (2005). Facing others in pain: the effects of empathy. Pain, 118(3), 285288.
Harrison, S., M. Barnes, and M. Mort. 1997. ‘Praise and Damnation: Mental Health User Groups
and the Construction of Organizational Legitimacy’, Public Policy and Administration 12
(2) 4-6.
Hearst, M. (2003). What is text mining. SIMS, UC Berkeley.
Herbette, G., & Rimé, B. (2004). Verbalization of emotion in chronic pain patients and their
psychological adjustment. Journal of Health Psychology, 9(5), 661-676.
Højsted, J., & Sjøgren, P. (2007). Addiction to opioids in chronic pain patients: a literature
review. European journal of pain, 11(5), 490-518.
Hsieh, H.-F., and Shannon, S.E. (2005) ‘Three approaches to qualitative content analysis’,
Qualitative Health Research, 15:9, pp. 1277-1288.
Jackson, P. L., Meltzoff, A. N., & Decety, J. (2005). How do we perceive the pain of others? A
window into the neural processes involved in empathy, Neuroimage, 24(3), 771-779.
Kappesser, Judith, Amanda C. de C. Williams, and Kenneth M. Prkachin. "Testing two accounts
of pain underestimation." Pain 124.1 (2006): 109-116.
Kugelmann, R. (1999). Complaining about chronic pain. Social Science & Medicine, 49(12),
1663-1676.
Lagus, K., Honkela, T., Kaski, S. and Kohonen, T. (1999) ‘Websom for Textual Data Mining’
Artificial Intelligence, 13: 5, 345-364.
Lahlou, S. (1996). A method to extract social representations from linguistic corpora. Japanese
Journal of Experimental Social Psychology, 35(3), 278-291.
Lincoln, Y. S. & Guba, E. G. (1985). Naturalistic inquiry. Beverly Hills, CA: Sage.
Linoff, G. S., & Berry, M. J. (2011). Data mining techniques: for marketing, sales, and customer
relationship management. John Wiley & Sons.
Martín‐Sánchez, E., Furukawa, T. A., Taylor, J., & Martin, J. L. R. (2009). Systematic review
and meta‐analysis of cannabis treatment for chronic pain. Pain medicine, 10(8), 13531368.
McCaffrey, Margo, and Betty R. Ferrell. "Nurses' knowledge of pain assessment and
management: how much progress have we made?." Journal of pain and symptom
management 14.3 (1997): 175-188.
McCracken, L. M., & Eccleston, C. (2003). Coping or acceptance: what to do about chronic pain?
Pain, 105(1), 197-204.
McCrystal, K. N., Craig, K. D., Versloot, J., Fashler, S. R., & Jones, D. N. (2011). Perceiving
pain in others: validation of a dual processing model. Pain, 152(5), 1083-1089.
McMullan, M. (2006). Patients using the Internet to obtain health information: how this affects
the patient–health professional relationship. Patient education and Counseling, 63(1), 2428.
Melzack, R. (1975). The McGill Pain Questionnaire: major properties and scoring methods. Pain,
1(3), 277-299.
Morley, S., Williams, A. C., & Hussain, S. (2008). Estimating the clinical effectiveness of
cognitive behavioural therapy in the clinic: Evaluation of a CBT informed pain
management programme. Pain, 137, 670–680.
Netzer, O., Feldman, R., Goldenberg, J., & Fresko, M. (2012). Mine your own business: Marketstructure surveillance through text mining. Marketing Science, 31(3), 521-543.
NHS Community Care Act (NHSCCA). 1990. London: HMSO.
NHS Management Executive (NHSME). 1992. Local Voices, Leeds: NHSME.
Norris, P. (2001). Digital divide: Civic engagement, information poverty, and the Internet
worldwide. Cambridge University Press.
Padfield D. (2003) Perceptions Of Pain, Stockport: Dewi Lewis.
Padfield D. (2011) ‘“Representing” the pain of others’, Health 15(3): 241-257.
Patton, M. Q. (2008). Utilization-focused evaluation (4. ed.). Thousand Oaks: Sage.
Patton, M. Q. (2012). Essentials of Utilization-Focused Evaluation. Thousand Oaks: Sage.
Pawson, R., & Tilley, N. (1997). Realistic evaluation. Sage.
Preece, J. (2000). Online communities: Designing usability and supporting socialbilty. John
Wiley & Sons, Inc..
Pope, C., Ziebland, S., & Mays, N. (2000). Qualitative research in health care: Analysing
qualitative data. BMJ: British Medical Journal, 320(7227), 114.
Presser, S., Couper, M. P., Lessler, J. T., Martin, E., Martin, J., Rothgeb, J. M., & Singer, E.
(2004). Methods for testing and evaluating survey questions.Public opinion
quarterly, 68(1), 109-130.
Prkachin, K. M., Solomon, P. E., & Ross, J. (2007). Underestimation of pain by health-care
providers: towards a model of the process of inferring pain in others. CJNR (Canadian
Journal of Nursing Research), 39(2), 88-106.
Reinert, M. (1993). Les «mondes lexicaux» et leur «logique» à travers l'analyse statistique d'un
corpus de récits de cauchemars. Langage et société, 66, 5-39.
Rowe, R. and M. Shepherd. 2002. Public Participation in the New NHS: No Closer to Citizen
Control? Social Policy and Administration 36 (3) 275-90.
Schonhardt-Bailey, C. (2005) ‘Measuring Ideas More Effectively: An Analysis of Bush and
Kerry's National Security Speeches’, PS: Political Science and Politics, 38:3, pp. 701711.
Sharf. B. (1999) ‘Beyond Netiquette: The ethics of doing naturalistic discourse research on the
internet’, in S. Jones (eds) Doing Internet Research. London: Sage.
Smedley, R., Coulson, N., Gavin, J., Rodham, K., & Watts, L. (2015). Online social support for
Complex Regional Pain Syndrome: A content analysis of support exchanges within a
newly launched discussion forum. Computers in Human Behavior, 51, 53-63.
Sullivan, M. J. L., Martel, M. O., Tripp, D., Savard, A., & Crombez, G. (2006). The relation
between catastrophizing and the communication of pain experience. Pain, 122(3), 282288.
Sullivan, M. J. L. (2008). Toward a biopsychomotor conceptualization of pain: Implications for
research and intervention. Clinical Journal of Pain, 24, 281–290.
Tanabe, L., U. Scherf, L. Smith, J. Lee, L. Hunter, and J. Weinstein. 1999. ‘MedMiner: An
Internet Text-Mining Tool for Biomedical Information, with Application to Gene
Expression Profiling’, Biotechniques 27 (6) 1210-7.
Tong, R. and Yager, R. (2006) Characterizing buzz and sentiment in internet sources: Linguistic
summaries and predictive behaviors. In J. Shanahan, Y. Qu, & J. Wiebe (Eds.),
Computing attitude and affect in text: Theory and applications (Dordrecht: Springer).
Ward, S. E., & Gordon, D. B. (1996). Patient satisfaction and pain severity as outcomes in pain
management: a longitudinal view of one setting's experience. Journal of pain and
symptom management, 11(4), 242-251.
Weale, A (2007) ‘What Is so Good about Citizens’ Involvement in Healthcare?’ in Edward
Andersson, Jonathan Tritterand, Richard Wilson (eds), Health Democracy: The Future of
Involvement in Health and Social Care (London: Involve and NHS National Centre for
Involvement, 2007), pp. 37-43.
Weale, A., Bicquelet, A., & Bara, J. (2012). Debating abortion, deliberative reciprocity and
parliamentary advocacy. Political Studies, 60(3), 643-667.
Wyatt, S., Henwood, F., Hart, A., & Smith, J. (2005). The digital divide, health information and
everyday life. New Media & Society, 7(2), 199-218.
Ziebland S. (2004) The importance of being expert: the quest for cancer information on the
Internet. Social Science & Medicine 59: 1783-1793.
Appendix:
Table 1: Units of Sampling
Units of Sampling
(Videos)
Number
of views
Units of
Coding
Chronic Pain - Is it All in Their Head? - Daniel J. Clauw
M.D
https://www.youtube.com/watch?v=pgCfkA9RLrM
45,863
145
V1
Honest Vlog - Living with Chronic Fatigue Syndrome
https://www.youtube.com/watch?v=4nb0xhqzi1Q
10,363
130
V2
My Chronic Pain Condition
https://www.youtube.com/watch?v=03LgQrWWLxs
4,216
103
V3
Chronic Pain Patient, first visit with Pain Management
Doctor https://www.youtube.com/watch?v=T8qT7hlz6P0
16,415
98
V4
Chronic Pain: Changing My Story and Loving My Body
https://www.youtube.com/watch?v=zCJBGBv-1nE
2,480
64
V5
Doctors and Chronic Pain
https://www.youtube.com/watch?v=H8yYSajFql8
2,947
61
V6
Healing Chronic Pain - Brendan Mooney's Testimonial
https://www.youtube.com/watch?v=R8MZZkd7NZo
8,691
45
V7
Why do chronic pain patients kill themselves?
https://www.youtube.com/watch?v=QpRr-A3JKyo
4,911
37
V8
I Live In Chronic Pain
https://www.youtube.com/watch?v=f5LBmlwjiPc
1,381
24
V9
Struggling to be me with chronic pain
https://www.youtube.com/watch?v=FPpu7dXJFRI
12,801
18
V10
Chronic Pain
https://www.youtube.com/watch?v=5txx4MQ77xM
7,181
15
V11
The distress of chronic pain
https://www.youtube.com/watch?v=U-Ndp8mSsIg
5,169
13
V12
Chronic Pain Explained
https://www.youtube.com/watch?v=B2SI-gmpDUU
8,998
8
V13
Chronic Pain: The Invisible Disease
https://www.youtube.com/watch?v=9g4XOn-c52M
1,257
2
V14
Total:
132,673
763
Codes
14
Figure 1: Distribution of ECUs per class and number of words analysed by class
Figure 2: Descending Hierarchical Classification
Figure 3: Correspondence Analysis
Table 2: Recommendations
TMTs may be used as stand-alone techniques in FE when:



The research is exploratory.
A whole population is being studied (census).
The aim of the FE is to capture repetitive patterns only.
TMTs may be used in combination with traditional methods in FE to:




Inform interview/focus group topic guides.
Inform coding strategy (and/or codebook) of data already collected.
Provide springboards to large-n studies.
Triangulate results obtained via elicitation/observation techniques/surveys and
consultations.