Download NOKOBIT 2011

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Norsk konferanse for organisasjoners bruk av informasjonsteknologi
NOKOBIT 2011
Universitetet i Tromsø 21. – 23. november 2011
NOKOBIT-styret og redaksjonskomité
Terje Fallmyr
Bendik Bygstad
Jørgen Fog
Laurence Habib
Jon Iden
John Krogstie
Laila J. Matberg
Universitetet i Nordland (redaktør, styreleder)
Norges Informasjonsteknologiske Høgskole
Departementenes servicesenter
Høgskolen i Oslo
Norges Handelshøyskole
Norges teknisk-naturvitenskapelige universitet
Høgskolen i Nesna
Norsk konferanse for organisasjoners bruk av
informasjonsteknologi
NOKOBIT 2011
Universitetet i Tromsø
21. – 23. november 2011
NOKOBIT styre og redaksjonskomité
Terje Fallmyr
Universitetet i Nordland (redaktør, styreleder)
Bendik Bygstad
Jørgen Fog
Laurence Habib
Jon Iden
John Krogstie
Laila J. Matberg
Norges Informasjonsteknologiske Høgskole
Departementenes servicesenter
Høgskolen i Oslo
Norges Handelshøyskole
Norges teknisk-naturvitenskapelige universitet
Høgskolen i Nesna
© NOKOBIT-stiftelsen og Tapir Akademisk Forlag, 2011
ISSN 1892-0748
ISBN 978-82-519-2845-8
Det må ikke kopieres fra denne boka ut over det som er
tillatt etter bestemmelser i «Lov om opphavsrett til åndsverk»,
og avtaler om kopiering inngått med Kopinor.
Redaktør: Terje Fallmyr
Digital trykk og innbinding: AIT Oslo AS
Tapir Akademisk Forlag har som målsetting å bidra til å utvikle gode læremidler
og alle typer faglitteratur. Vi representerer et bredt fagspekter, og vi gir ut rundt
100 nye titler i året. Vi samarbeider med forfattere og fagmiljøer i hele landet,
og våre viktigste produktområder er:
Læremidler for høyere utdanning
Fagbøker for profesjonsmarkedet
Vitenskapelig publisering
Forlagsredaktør for denne utgivelsen:
[email protected]
Tapir Akademisk Forlag
7005 TRONDHEIM
Tlf.: 73 59 32 10
Faks: 73 59 32 04
E-post: [email protected]
www.tapirforlag.no
FORORD
Velkommen til NOKOBIT 2011!
NOKOBIT 2011 arrangeres av Universitetet i Tromsø, mens prosessen rundt det faglige programmet ble
ledet fra Universitetet i Nordland. Dette er det 18. NOKOBIT siden starten i 1993, og det er 12. gang at
NOKOBIT arrangeres sammen med NIK – og fra 2008 også sammen med NISK.
I år har vi mottatt 27 bidrag, og det er 20 bidrag som skal presenteres. Alle bidrag har vært gjennom en
grundig fagfellevurdering (blind review) av tre uavhengige reviewere. I god NOKOBIT-tradisjon vil hver
presentasjon ha en diskutant som er grundig forberedt, og bidragsytere må også fortelle hvordan de har
forholdt seg til kommentarene fra reviewerne.
Jeg vil gjerne takke alle reviewerne for konstruktive tilbakemeldinger. Uten deres innsats hadde det ikke
blitt noen konferanse. Jeg vil også takke styret i NOKOBIT for et utmerket samarbeid.
Til slutt vil jeg takke den lokale arrangementskomiteen, og spesielt Lars Ailo Bongo. Det har gått veldig
fint å samarbeide over distanse.
Vi gleder oss til en god konferanse!
Terje Fallmyr
Handelshøgskolen i Bodø, Universitetet i Nordland
Redaktør og styreleder for NOKOBIT 2011
Lasse Berntzen
Solveig Bjørnestad
Bendik Bygstad
Monica Divitini
Kjell Ellingsen
Asle Fagerstrøm
Terje Fallmyr
Anna-Mette Fuglseth
Arne Kristian Groven
Laurence Habib
Hallstein Hegerholm
Jon Iden
Grete Jamissen
Arild Jansen
Lill Kristiansen
Jens Kaasbøll
John Krogstie
Wolfgang Leister
Eystein Mathisen
Carl Erik Moe
Judith Molka-Danielsen
Eric Monteiro
Anders Morch
Bjørn Erik Munkvold
Hugo Nordseth
Dag H. Olsen
Andreas Opdahl
Tero Päivärinta
Ragnvald Sannes
Guttorm Sindre
Abbas Strømmen-Bakhtiar
Bjørnar Tessem
Pieter Toussaint
Leikny Øgrim
Extending Use and Misuse Cases to Capture Mobile Applications
Sundar Gopalakrishnan, John Krogstie and Guttorm Sindre
1
On Choosing User Participants in Local Systems Development: Preliminary Results
Sturla Bakke
15
Using the Personalized System of Instruction in an Introductory Programming Course
Hallgeir Nilsen and Even Åby Larsen
27
The Alignment of IS Development and IT Operations in System Development
Projects: a Multi-method Research
Jon Iden, Bjørnar Tessem and Tero Päivärinta
39
Non Governmental Organisations as Change Agents in Implementation of new
Software in the Health Information System in Tanzanian Regions - Ways of Handling
Conflicts
Ingeborg M. F. Klungland and Jens Kaasbøll
53
Towards Integration-Oriented Complex System Development
Liping Mu, Andreas Prinz and Carl Erik Moe
67
The Community Case Study: A Research Methodology for Social Media Use in
Eparticipation
Marius Rohde Johannessen
77
Design of a Social Communicative Framework for Collaborative Writing Using
Blended ICT
Judith Molka-Danielsen and Ole David Brask
91
Initial Experience with Virtual Worlds for People with Lifelong Disability:
Preliminary Findings
Karen Stendal, Judith Molka-Danielsen, Bjørn Erik Munkvold and Susan
Balandin
105
IT Governance in Norwegian Public Sector – Business as Usual?
Arild Jansen and Tommy Tranvik
119
Publishing Academic Articles: The Diffusion of Intellectual Contribution from Small
Local Events to the Larger International Professional Community
Tor J. Larsen and Ragnvald Sannes
131
Conducting Research with Business Intelligence
Wanda Presthus and Bendik Bygstad
145
Searching for the Meaning of Multitasking
Vedrana Jez
157
Decision Making and Information. Conjoined Twins?
Kjell Ellingsen and Eystein Mathisen
167
Critical Success Factors for ERP System Implementation Revisited
Heidi Buverud, Anna Mette Fuglseth and Kjell Grønhaug
181
ERP-implementering i en kunnskapsintensiv bedrift: en casestudie av et forlag
Christian Hoff, Eli Hustad og Dag Håkon Olsen
195
Enterprise Architecture to Enhance Organizational Agility? An Exploratory Study
Terje Fallmyr and Bendik Bygstad
209
Augmenting Online Learning with Real-Time Conferencing: Experiences from an
International Course
Bjørn Erik Munkvold, Deepak Khazanchi and Ilze Zigurs
221
Sharing Practice in the Distributed Organization
Inge Hermanrud
233
Næringsrettet IKT-utdanning – i praksis og forskning
Tor Lønnestad og Carl Erik Moe
247
CONDUCTING RESEARCH WITH BUSINESS INTELLIGENCE
Wanda Presthus
Bendik Bygstad
Norwegian School of IT
Norwegian School of IT
[email protected]
[email protected]
Abstract
Business Intelligence (BI) is commonly seen as a decision making process with associated tools. In this
paper, we explore how research can be conducted using BI techniques. Our main argument is that BI can
be modified to offer a full stepwise process, going from the research question through data collection,
data qualification, and data analysis, to findings and conclusion.
We believe that the Internet offers innovative and vast opportunities for BI analysis. As an example we
discuss an investigation of customer communication on Facebook. Further, we identify nine different BI
research designs. These designs show a considerable breadth of possible investigations, ranging from
simple blog analysis to surveillance research. We illustrate that the basic BI steps constitute a sound
research basis for all these designs. We find that BI is useful in an exploratory setting with no clear
hypotheses, because it allows for creative queries and mining of large amounts of data. Overall, we argue
that BI offers some new and exciting opportunities for research designs in an information-rich world.
1. INTRODUCTION
The main aim of Business Intelligence (BI) is to support decisions for an organisation, by providing
access to existing data (Davenport 2006; Li 2005). BI has existed as a term since 1958 (Luhn 1958) and
was introduced as a discipline in the early 1990s (Watson and Wixom 2007). However, its previous
history can be seen as an evolution of decision support systems (Turban et al. 2011) as well as the
scientific process of decision making (Simon 1977).
This paper describes how research can be conducted using BI techniques. The background is simply that
the Internet, and in particular the World Wide Web, now provides the world’s largest information source,
with a diversity of information that is astonishing. The Internet offers a wide range of information sources
interesting for the IS researcher, from simple web pages, to social media such as Facebook and Twitter, to
e-business applications and cloud services. It is important to bear in mind that the Internet is not only an
information resource; it is a global socio-technical network with a lot of action (Castells 2009). As the
Internet matures, it looks less like a library and much more like a full social and economic community.
It is our view that this resource is underused in current IS research and particularly in student
dissertations. Although there are several accepted methods, IS research tends to fall in two main
categories; case studies and surveys (Chen and Hirschheim 2004). The rise of interpretive IS research
(Walsham 2006) has led to new insights on IS development and use, but also to the reliance of small
qualitative data sets. This is not necessarily wrong, but it is still somewhat paradoxical that a discipline,
initially built on mass data processing, should prefer to work with ethnographic methods in an age of
globalisation (Kallinikos 2004). We believe that one of our strengths as a research community is our
knowledge and confidence with information technology, and the Internet offers an unprecedented
resource for IT-based analysis.
Which research approach is appropriate for exploiting these data resources? Arguably, there are several.
The journal Internet Research for example, has in the period 1991 to 2011 (as found in abstracts)
published 365 survey papers, 40 case studies, and 21 papers based on content analysis. But as we will
argue in this paper, BI presents some more exciting and comprehensive opportunities. The strength of BI
is that it is both a process and a tool (Turban et al. 2011). BI is a fun and powerful approach, and offers
the creative researcher a range of new research opportunities which we will discuss.
145
We proceed as follows. First in section 2 we make a systematic assessment of the steps of BI in order to
investigate to what degree BI can serve as a research approach. From this analysis we suggest a step-wise
framework for BI research, in section 3. To illustrate the process in detail we analyse in section 4 an
empirical study of using BI as research approach using Facebook data. Then we broaden our perspective,
and suggest in section 5 nine different generic BI research designs, which we discuss and illustrate with
research examples. We conclude in section 6.
2. LITERATURE REVIEW
In this section we conduct a systematic comparison of the (generic) research process and the BI process.
Our main points are shown in table 1 and discussed in detail in section 2.3.
Steps in research
1. Problem formulation
2. Gather information and resources
3. Formulate hypothesis or research question
4. Collect data
5. Analyse data
6. Discuss results/findings
7. Draw conclusion
Steps in BI
1. Business need for decision
2. Identify possible data sources
3. Formulate queries
4. Extraction, Transformation, Load (ETL)
5. Perform queries, make OLAP reports, data mine
6. Make decision based on information
7. Act accordingly to decision
Table 1. Traditional research process versus BI steps
2.1 The structure of the research process
The research process consists of certain steps in order to answer a research question (Bryman 2008; Sayer
1992). Although a number of different research methods exists, the basic steps are shared and relatively
straightforward for most of them. As shown in the left column of table 1, the researcher starts with a
problem formulation, and then conducts a systematic review on the existing knowledge in the field. From
this a research hypothesis or question is formulated.
Then the empirical researcher makes a research design (chooses unit of investigation, sample and
instruments), and goes into the field, or the lab, in order to collect the data. Collected data are analysed
(qualitatively or quantitatively) and findings are identified and documented. The findings are discussed in
relation to existing knowledge, and finally conclusions are drawn and implications are assessed.
2.2 The structure of BI
There are multiple definitions of BI. Some researches see it as part of a data warehouse and tools for
accessing the data (Gang et al. 2008), others define it as a decision making process (Davenport 2006).
Based on Turban et al., who claim that the process of BI is “...based on the transformation of data to
information, then to decisions, and finally to actions” (Turban et al. 2011, p. 19), we illustrate the process
of BI by four steps:
Data
Information
Decision
Action
Figure 1: The process of Business Intelligence
A recent publication by Shollo and Kautz (2010) concludes that BI, over the past twenty years, has been
defined either as a process, a product, or a set of technologies, or a combination of the three. Shollo and
Kautz reviewed over one hundred publications related to BI and revealed that the majority of research has
focused on turning data into information, as well as technology, but less on the role of the decision maker
(Shollo and Kautz 2010). Finally, a few publications indicate what BI is not, namely having a data
warehouse without access tools to the data (Howson 2008).
BI can play a crucial role in almost every function in a retail organisation, such as Customer Relationship
Management (CRM) (segmentation, campaign effectiveness analysis), alternative Sales Channels
(Internet, interactive TV), enterprise management (dashboard reporting) as well as human resources and
146
finance (Li 2005). Turban et al (2011) add to the list with fraud detection for insurance companies,
tracking goods for transportation companies, and providing the best medical care by the health industry,
as well as banking industries providing the best care for their customers and following trends in the
market.
How is all the above carried out? With end-user access techniques. Turban et al (2007) provide a
framework for the major technological components of BI, illustrated in figure 2 below. The components
on the left side of the dotted line are elements belonging to a data warehouse, and the elements on the
right side are end-user access tools. The components of “External Web documents” and “ETL” will be
described later in the paper.
Figure 2: The major components of Business Intelligence (Turban et al, 2007, p. 201)
The BI tools in the very right in figure 2 all render access to data, but have different levels. The simple
and convenient applications such as queries in Excel link the business elements, OLAP analyses incoming
data, and the most sophisticated data mining reveals hidden patterns (Ryals and Knox 2001). Data mining
is an analysis which looks for hidden patterns in large amounts of data. It does not only present data in a
new way, but actually discovers relationships among the data (Turban et al. 2007). Research is abundant
with fun examples of such hidden patterns. This example is maybe most known: The large, American
warehouse store Walmart data mined their sales data and found out that baby food and beer were
frequently purchased together, especially when physically placed side by side in the stores (Ryals and
Knox 2001).
We draw mainly on data mining techniques in this paper (Moss and Atre 2003). The first, association,
identifies occurrences within one record by means of statistics. If the customer purchases airline tickets
for the whole family, there is X% chance for car rental also. Association is also called market basket
analysis. Second, classification, is considered the most common. It looks at behaviour and attributes of
predefined groups, for example which groups of customers who are likely to purchase a product.
Algorithms for classification are decision tree or simple if-then statements. Third and final, clustering, is
similar to classification, but the groups are defined after the data mining. For example, clustering is used
to detect manufacturing defects or market segmentations, and typical algorithms are neural networks or
statistics (Turban et al. 2007).
We will now compare the structure of the research process and the BI process, going through each step of
table 1.
2.3 Comparative analysis
From table 1 we clearly see that the generic research structure on the left side and the BI process on the
right side have some important differences, but also share many features. The most important difference
147
is the basic aims; while the aim of research is to produce new scientific knowledge, the aim of BI is to
apply knowledge to take a business decision. This is seen in step 1, where the BI process is started with a
business need, while the research process starts with a problem formulation or research question. It is also
seen in the last steps (step 6 and 7) where the BI process focuses on decisions and action, while the
research outcomes are findings and knowledge claims. Thus, a BI query usually differs from a research
question. For example, the question “How many clicks does a customer do on our web site before
purchasing a product, and how can we sell more by making it easier?” is a BI query. A research question
using this information could be: “How can the clickstream of a commercial site be used to improve
usability?”
However, except for the start and the end, the process steps are rather similar. Step 2 is about identifying
relevant existing knowledge and sources of information. The review of previous research is lacking in the
BI process, but the identification of possible sources of information is the same. Step 3 is basically the
same; to formulate an assumption or hypothesis.
Step 4 concerns data collection. The ETL process is admittedly a special case of data collection, since it
usually involves the use of data warehouse technologies or other tools. But it is nevertheless about data
collection; to select the sources, collect the sample, ensure that data are trustworthy, and to arrange it in a
way that allows for systematic data analysis in the next step.
In step 5 (data analysis) BI provides a number of techniques such as queries, OLAP reports, and various
forms of data mining. Usually, this will be a quantitative analysis, but there are exceptions (for example
the researcher might be looking for certain textual expressions or pictures). Today, mining consists or
data-, text-, and web mining, and also the recent reality mining (Turban et al. 2011). Compared to
traditional statistical analysis, BI offers some other opportunities (Moss and Atre 2003). While statistical
analysis usually requires a hypothesis, BI is designed to handle open questions. Moreover, BI can handle
various types of data (text, pictures, sound) in addition to numerical data.
Summing-up this analysis we find that except for the start and end, the research process and BI process
are relatively similar.
3. CONDUCTING RESEARCH WITH BI
In this section we draw on the analysis from the previous section, and suggest a step-wise framework
(table 2) that incorporates the key features of BI into the generic research structure in the left side of table
1.
Suggested research steps
1. Problem formulation
2. Review previous research, and identify
possible data sources
3. Formulate research question(s) and BI
queries
4. Extraction, Transformation, Load (ETL)
5. Perform queries, make OLAP reports, data
mine
6. Discuss findings
7. Draw conclusion
Comment
It should be assessed whether the problem is suited for the BI
approach
In addition to the normal research review, this step also includes
the identification of Internet data resources
The research question is often an open question, while the BI
queries are always precise and data oriented
Usually involves the use of data warehouse technology or other
tools
This is usually an iterative and creative process
The results of the BI query will make a foundation for the
answer to the research question
To what extent has the research question been answered?
Table 2: Framework for conducting research with BI.
Step 1 is problem formulation. Many research problems obviously cannot be solved with the BI approach,
so the researcher has to assess whether the problem is suited for BI techniques. A key question the
researcher should ask herself is: will an analysis of mass data possibly give new insights to this problem?
If the answer is no, maybe another approach (than BI) should be chosen.
148
Step 2 includes the usual research review. In addition the researcher also should identify the necessary
data resources, often found on the Internet. This could be ordinary web pages or weblogs, or more
complex structures such as social networks.
Step 3 is formulating the research question. In contrast to some streams of research where a testable
hypothesis is formulated, the BI analyst usually has a more open question. However, the open question is
often found also in explorative case study research (Gerring 2007) and in grounded research (Miles and
Huberman 1994). From the research question, we need to deduct the BI queries. Doing this requires a
thoughtful operationalisation, going from the open question to the specific queries, which will affect the
validity of the findings.
Step 4 and 5 are the core BI steps. The ETL process usually involves the use of data warehouse
technology, which might be a sophisticated tool such as a Data Warehouse (Turban et al. 2007), but often
rather simple tools are used, such as Excel. Data analysis is often an iterative (and creative) process where
the researcher is looking for patterns in a large data set, using the tools to ask new queries. Conducting the
ETL process in the correct way ensures the reliability of the data.
In step 6 the researcher will often have to deduct the findings from the revealed patterns of data. For
example, let us assume that a researcher wants to assess the usability of a complex web site by analysing
the click-streams of a weblog. The documented click-stream might offer various answers, which might
require going back to step 5 to ask more queries. We proceed by discussing an example which is built on
these steps.
4. ANALYSIS
In this section, we will present a BI analysis of the “ash crisis” taking place in 2010 (Presthus and
Bygstad 2010). The presentation is based on the suggested framework in table 2. The ash crisis started
with the eruption of large volumes of ash from the volcano Eyafjallajøkul on Iceland, in mid April 2010,
which grounded most of the North European air traffic. Not only were airline passengers prevented from
going on outward voyage, but passengers scheduled to return to Northern Europe also found themselves
stuck at airports around the whole word.
4.1 Problem formulation
At the beginning of the ash crisis of April 2010, the call centres of the airline companies SAS and
Norwegian quickly collapsed. The stranded passengers from around the word started to use Facebook to
communicate with the airline companies. Our research interest was to explore how social media could be
used to improve customer communication for companies.
4.2 Review previous research, and identify possible data sources
First, we reviewed various technical solutions for customer communication, focusing particularly on
CRM. Then we identified that both SAS and Norwegian was on Facebook:
•
•
Norwegian on Facebook (http://www.facebook.com/flynorwegian)
SAS on Facebook (http://www.facebook.com/SAS)
These sources are examples of “External Web documents” from figure 2.
4.3 Formulate research question(s) and BI queries
The research question was: “To what extent can the most popular social media, Facebook, be used to
improve customer relationships?” From this, we derived three BI queries: “How many passengers chose
to use Facebook for communication?”, “How long did it take for SAS and Norwegian to answer a
question?”, and “How was the ambiance on Facebook during the ash crisis?”
4.4 ETL
ETL stands for Extraction, Transformation and Load, and is a time-consuming and crucial part of
Business Intelligence, and can take up to 70% of the process (Turban et al, 2011). In our case, we
extracted data from the two Facebook pages, selecting the specified elements such as the questions from
149
the passengers, the answers from SAS and Norwegian, and the time from a question was posted to an
answer was provided. We thus disregarded advertising, photos and information about the airline
companies on the web sites (see figure 3 below).
Figure 3: Two Facebook pages to be extracted (Facebook, 2010, Facebook, 2011)
The extraction was made using Ruby code, exploiting the Application Programming Interfaces of
Facebook. The results of the extraction are shown in figure 4 below (The user’s names have been
replaced with “A Facebook user”).
Figure 4: The selected data from the two Facebook pages from figure 3 are extracted
Having extracted the data from the two sources, the next step was to transform them. There are several
possibilities of transformation, and in our case, we wanted to:
•
•
•
Accumulate the number of requests to SAS and Norwegian during the ash crisis
Measure the number of minutes from a question was posted to an answer was provided
Accumulate the instances of each word
150
Then all selected data were loaded into a new spread sheet. Our data had now gone through the ETL
process (refer figure 2) and were ready for further analysis.
4.5. Perform queries/make OLAP reports/data mine
We now conducted the three queries described above. First, we counted the number of requests per day to
SAS and Norwegian during the ash crisis. Then, we measured the number of minutes from a request was
posted, to a response was provided. The final results were two graphs showing the distribution over the
whole ash crisis period.
Finally, we conducted a sentiment analysis (Turban et al. 2011), measuring the emotional temperature. In
our case, we performed text mining manually. We wanted to categorise and then accumulate the instances
of positive and negative words on the web pages for each airline. The results are shown in figure 5 below.
Figure 5: Positive and negative words for each airline company; a sentiment analysis
In the next step, we discussed these results.
4.6 Discuss findings
The graphs provided answers to our BI queries: we saw how many passengers who chose to communicate
via Facebook during the ash crisis; nearly 600 at SAS, and over 1400 at Norwegian. We also noted that it
took only minutes from a question was posted to an answer was provided. Moreover, our sentiment
analysis in figure 5 revealed that there was an overweight of positive instances of words posted on
Facebook.
Finally, we could also revisit the extracted data and read the questions and answers, as shown in figure 4
above. Building on the answers provided by BI techniques, we found that SAS and Norwegian indeed
managed to solve problems, and that the passengers even made the effort to thank the airliners for the
help.
4.7 Draw conclusion
By using BI as a research approach, we found that Facebook enabled companies to communicate with
their customers as well as solving problems. We also found that the passengers showed a positive and
civilised behaviour on the Facebook pages.
151
4.8 Summing up
Our example shows that by using BI as a research approach, it is possible to extract and analyse large
amounts of data from a semistructured Internet source. Should we, alternatively, have gathered this
information using qualitative sources, we would have had to interview a limited number of ash stranded
passengers and service staff at SAS and Norwegian. This would have been time consuming, and the
collected data could have been less reliable, as people tend to forget. Also, it would probably have been
difficult to identify and reach these passengers after the event.
5. NINE POSSIBLE RESEARCH DESIGNS
Our example in section 4 is only one of many possible research designs. We used semistructured data
collected subsequent to an event, but there are several other combinations of data types and events.
Loosely based on the classic framework of decision support by Gorry and Scott-Morton (Gorry and Scott
Morton 1971) and the more recent framework of data warehouse evolution by Brobst and Rarey (Brobst
and Rarey 2003), we propose a matrix of nine research designs, shown in table 3 below.
One dimension is the various degrees of data structure. This dimension is loosely based on Gorry and
Scott-Morton, who proposed a framework for decision support making. Our other dimension is temporal,
meaning at what time data is collected in relation to an event. First, past event: data are collected after the
event, and we are interested in finding out what happened and why. Second, present event: data are real
time or close to real time data, as we want to investigate the situation right now. A common example is
airline companies, who needs to know how many seats are empty prior to take-off of the flight in order to
sell them to a discount. Third and final, future event: data are fabricated because we aim to predict what
will happen. This is interesting in order to prevent an event from happening, or to be prepared when it
happens.
Past event
Present event
Future event
Unstructured data
1. Blog Analysis
4. Real time surveillance
Semistructured data
2. Social media analysis
5. “Reality mining”
7. Infectious disease
prediction
8. Sales prediction
Structured data
3. Clickstream analysis
6. Product
recommendations
9. Statistical prediction
Table 3: Nine research designs for with BI techniques
This matrix presents nine different research designs based on BI. There are certainly other possible
designs, but we believe that this categorisation is useful for our purpose. In addition to the normal
researcher, we have in mind the Information Systems Master student with limited time and resources
available. Further, we argue that the approaches allow for creative designs and thinking “outside the box”.
By basing her study on one or more of the nine research designs from table 3, the Master student can
spend less time collecting a large amount of data, and more time on analysis and innovative discussion.
We will now discuss each design, starting with a description of each cell. Then we give an example of
research topic and suggest one research question. Finally, we describe in more detail which BI queries
that could contribute to a research answer, and suggest an approach for ETL and data analysis.
5.1 Blog analysis
A blog typically consists of unstructured data, as a blog-writer has no predefined categories or fields to
enter data. Other sources of unstructured data are e-mails and text documents, as well as photos, sound,
and even colour (Inmon and Nesavich 2008).
A possible research topic could be gender differences in Internet publishing. On this topic, an interesting
research question could read “Are there systematic differences in the way men and women express
themselves through a blog”?
Typical BI queries would be: “Do women write longer blogs then men? Which topics are typically
associated with men and women?” Having identified the BI queries, the ETL process would proceed as
follows. For extraction, one would need to identify for example one hundred blogs written by each
152
gender, and extract the data by means of some sort of programming code or an application like Mozenda.
Then, these data would have to be transformed, by for example translating all blogs into one language,
stemming (removing all variants of one word), and counting each instance of each word. Finally, data
could be loaded into a spreadsheet or database for further data analysis. The applied text mining technique
could be clustering; as we would like to explore what men and women write about (we do not know
beforehand).
5.2 Social media analysis
Data extracted from social media, as in our Facebook case, are considered semistructured because meta
data exist (Blumberg and Atre 2003). Social media are large structures with millions of users, which
represent a huge resource for interesting research. One example is from LinkedIn, the professional
network, where Scott Nicholson investigated another aspect of gender differences (Nicholson 2011). The
research question was: “Are there systematic differences between men and women regarding networking
in different industries?”
In order to investigate this, a network savviness index was defined, as 1) the ratio of one-way connections
that men have to connections that women have, and 2) the ratio of male members on LinkedIn to female
members.
The operationalised BI queries were: “How many men/women are registered in each industry? How many
links are there from men to these women’s connections?” In the ETL steps, first the companies were
classified into industries, and all men/female members extracted for each industry. Then the number of
“male” links was extracted for each industry. In the data analysis the gender ratio and the gender link ratio
were computed, which resulted in the industry network savviness index. As might be expected, Nicholson
found that “Law Enforcement” and “Capital Markets” were male savvy, but somewhat surprisingly that
“Ranching” and “Tobacco” were female savvy.
5.3 Clickstream analysis
Clickstream analysis, such as Google Analytics, reveals a user’s behaviour on websites (Turban et al.
2011). For example, a company can follow how long a user visits their web sites and which words they
search for. Clickstream analysis requires access to company logs.
As an example of possible research topic, we could investigate the requirements of a user group of the
website of a Telecom company. Instead of asking users of their preferences, we could analyse their actual
behaviour, in order to understand their needs. The research question could be: “Which services does a
telecom customer wish to buy using his mobile phone, in contrast to the services he wants to buy on the
web?”
To be able to investigate this, we reformulate it into BI queries, such as: “Which products are mobile
users searching on? How many clicks do they spend on various products?” In the ETL step we might
identify weblogs at various company sites (and maybe other companies or countries), and load the data
into a file. Then data analysis would be conducted as clickstream analysis, accumulating user patterns for
various products and services.
5.4 Real-time surveillance
Examples of unstructured data can be film from surveillance cameras in stores and homes, or on the
roads, which can be programmed to notify if an unusual pattern occurs, such as trespassing and
congestion in traffic. Turban et al (2007) describe how hospitals can use various information of a patient
to automatically generate alerts from a Web Interface if a critical situation arises.
Based on this case, we have a possible research topic within real-time surveillance in health care, with the
following research question: “How can real-time surveillance technology assist in allocating hospital
nursing resources?”
One would need several BI queries, such as: “What are the normal heart beat rhythms of this patient?
What is the blood temperature of this patient? How are the heart beat rhythms right now?” One would
need several sources for collecting data, such as medical records and signals from heart beat which would
need to be transformed to enable various data mining and analysis. One would aim to create alerts when
153
unusual patterns occur. The contributions of such research would be to create knowledge on how BI can
improve effectiveness and quality in hospital care.
5.5 “Reality mining”
Technology in the family of automatic identification, such as RFID (Radio Frequency Identification),
barcodes, and magnetic strips generate huge amounts of data, and also require technology to make sense
of the generated data. This type of technology is called reality mining (Turban et al. 2011). A possible
research topic would be mobile tourist information, with the derived research question “Which are the
real-time patterns of tourists in our city”?
From this, we can make several BI queries, such as “How many tourists are currently in our city? Where
are our tourists right now? Where were the same tourists fifteen minutes ago?” The ETL would start with
collecting data by tracking mobile phone signals, calculate and load into a database or data warehouse for
analysis. Useful data analysis techniques would include clustering. From this research, a company or city
government could better understand visitors and provide better promotions and service. (There are
obviously ethical issues when tracking a person’s movements via a cell phone. We briefly discuss ethics
in section 6).
5.6 Product recommendations
Perhaps the most known example of product recommendations is taken from Amazon: “You have chosen
product X. Others who have purchased product X also bought product Y. Do you want to purchase
product Y?” Already in 2001, Lam and Tan believed that Amazon.com was attractive not due to the good
deals or the selection of products but because of its personal attention to users (Lam and Tan 2001).
Today, many Internet stores offer similar recommendations. Consequently, a relevant research topic
would be purchasing behaviour on the Internet. A related research question reads: “How do we use
information about customer purchases to enhance additional sales?”
In order to answer the research question, BI queries would include: “Which product has the customer
placed in the shopping basket right now? Which products are often sold together with this product?
Which customer is similar to this customer and what do the former usually purchase?”
Data collection would come from cookies from the Internet customer’s PC, as well as the Internet store’s
data about customers and sales. Data analysis would rely on the data mining technique called association.
According to Lam and Tan, use of data mining techniques can contribute to understanding consumer
preferences and creating profiles. This would be of particular interest for online retailers, but also for the
customer, who faces a vast range of products on the Internet.
5.7 Bio-surveillance
By text mining, governments may try to predict how infectious diseases spread, by mapping for example
the instances of some words as they appear in social media and blogs. One example of bio-surveillance
was described by Corley et al. (2010), where the research question was, how can blog posts be used to
predict the spread of influenza? A pattern could then be compared to the official influenza statistics from
the health authorities.
The BI queries were operationalised as: “How many blog posts mention the word “flu”? How many such
posts are registered each week over a six month period”? The ETL process included a service that
conducts real-time indexing of all blogs, and then creating a file of all occurrences each week. Data
analysis was conducted by programming the BI queries, and constructing a graph that covered the six
month period. This graph was finally compared with government statistics, and found to be reasonably
consistent (Corley et al. 2010).
5.8 Sales prediction
By using semistructured data, such as customers’ comments and feedback, a retailer can predict a
product’s popularity (Archak et al. 2007). By mapping numeric sales data up against textual data, the
authors revealed that consumers typically use adjectives such as “Great/Bad/Amazing” to evaluate a
product. With data from Amazon.com, they experimented on customer review’s impact on sales. They
found that adjectives influenced customer behaviour. “Great” and “Good” increased sales. “Bad” lead to
154
disappearing of the product. Surprisingly, “Decent” and “Nice” actually diminished sales. “Best product”
also hurt sales, because people did not believe in it.
A possible research topic is how consumer’s behaviour is influenced by other consumers’ feedback and
comments? A research question could be how customers’ comments can be used to predict the future
sales of a product. Examples of BI queries are “Which customer comments are associated with which
products? Identify the product with most positive and negative words. What are the sales numbers of a
given product after a given time of comments?” The ETL process would start by extracting data from
sales, date and comments from customers, calculate and accumulate in the transformation stage, and
finally load into a chosen application. As example of data analysis, we would recommend clustering
because we would not know what customers would say about the product. This research would help a
company predict the success of a product, thus enabling the right level of stock.
5.9 Statistical prediction
Finally, by using highly structured data, such as statistics, prediction can be made by a technique called
decision tree. A decision tree is a machine learning technique for classification of different patterns, and
is similar to the game “Twenty Questions” (Turban et al. 2011). Examples are: “Will this bank customer
be granted a loan?” “How much will I receive in retirement pension?”
We believe that an interesting research topic would be predicting failure for a youngster in school. The
attention-grabbing research question would be which students are likely to fail in the next semester? We
would need several BI queries in this case, such as: “What are the student’s grades this semester, and past
semesters? What was the outcome of previous students with similar results up to this level?” ETL would
include extraction of data about current and previous students, subjects, and grades. After transformation
and loading, these data will qualify for a decision tree analysis. This piece of research would make
valuable contributions for both the student and the school. By detecting which students are in danger of
failing one or several subjects, a school can mobilise extra resources to these students.
6. CONCLUDING REMARKS
In this paper we explored the potential of conducting research with Business Intelligence techniques. Our
main argument is that BI offers a full stepwise process, going from the research question to through data
collection, data qualification, and data analysis, to findings and conclusion. We find that BI is particularly
useful in an exploratory setting with no clear hypotheses, because it allows for creative queries and
mining of large amounts of data.
Further we identified nine different BI research designs. These designs show a considerable breadth of
possible investigations, ranging from simple blog analysis to surveillance research. We argue that the
basic BI steps constitute a sound research basis for all these designs.
We believe that BI is useful in a number of research settings, and particularly suited for Master
dissertations. The power of the BI approach lies in the fact that it can provide the researcher and the
student with a large and interesting data set, within a limited time frame. It also allows for both relatively
simple descriptive analysis, as well as more sophisticated investigations.
There are certainly also limitations. BI as a research approach is probably most useful for facts-oriented
(positivist) research, and less for interpretive or critical investigations, although there might be
exceptions. Further, the approach may require access to expensive tools, such as data warehouse
technologies, or advanced programming skills may be needed. Also, ethical issues easily arise, such as
privacy concerns with data mining in social media and obviously in various surveillance designs.
Overall, we argue that BI offers some new and exciting opportunities for research designs in an
information-rich world. Further research should exploit these opportunities, both as regular research, and
in Master dissertations.
7. REFERENCES
Archak, N., Ghose, A., and Ipeirotis, P. G. "Show me the money!: deriving the pricing power of product
features by mining consumer reviews " Presented at Proceedings of the 13th ACM SIGKDD
international conference on Knowledge discovery and data mining. KDD '0.7.
155
Blumberg, R., and Atre, S. (2003). "The Problem with Unstructured Data." DM Review (February 2003),
pp. 42-46.
Brobst, S., and Rarey, J. (2003). "Five Stages of Data Warehouse Decision Support Evolution".
DSSResources.COM, 01/06/2003.
Bryman, A. (2008). Social Research Methods: Oxford University Press.
Castells, M. (2009). The Rise of the Network Society. The Information Age: Economy, Society, and
Culture, Oxford: Blackwell Publishers.
Chen, W., and Hirschheim, R. (2004). "A paradigmatic and methodological examination of information
systems research from 1991 to 2001." Information Systems Journal, 14(3), pp. 197-235.
Corley, C. D., Cook, D. J., Mikler, A. R., and Singh, K. P. (2010). "Text and Structural Data Mining of
Influenza Mentions in Web and Social Media." International Journal of Environmental Research
and Public Health, 7, pp. 596-615.
Davenport, T. H. (2006). "Competing on Analytics." Harvard Business Review (January 2006).
Gang, T., Kai, C., and Bei, S. (2008). "The Research & Application of Business System in Retail
Industry." IEEE Xplore, pp. 87-91.
Gerring, J. (2007). The Case Study Method: Principles and Practices, New York: Cambridge University
Press.
Gorry, G. A., and Scott Morton, M. S. (1971). "A Framework for Management Information Systems."
Sloan Management Review, 13(1), pp. 55-70.
Howson, C. (2008). Successful Business Intelligence. Secrets to Making BI a Killer App: The McGrawHill Companies.
Inmon, W. H., and Nesavich, A. (2008). Tapping Into Unstructured Data: Prentice Hall.
Kallinikos, J. (2004). "Farewell to Constructivism: Technology and Context-Embedded Action", in C.
Avgerou, C. Ciborra, and L. Land, (eds.), The Social Study of Information and Communication
Technology. Oxford: Oxford University Press, pp. 140-161.
Lam, C. K. M., and Tan, B. C. Y. (2001). "The Internet is changing the music industry." Communications
of the ACM, 44(8), pp. 62-68.
Li, H. (2005). "Applications of Data Warehousing and Data Mining in the Retail Industry." IEEE Xplore,
pp. 1047-1050.
Luhn, H. P. (1958). "A Business Intelligence System." IBM Journal of Research and Development, 2(4),
314-319
Miles, M. B., and Huberman, A. M. (1994). Qualitative Data Analysis: Thousand Oaks: Sage
Publications.
Moss, L. T., and Atre, S. (2003). Business Intelligence Roadmap. The Complete Project Lifecycle for
Decision-Support Applications: Addison-Wesley.
Nicholson, S. (2011). "The Gender Divide: Are Men better than Women at Social Networking? [Online]
Available at: http://blog.linkedin.com/2011/06/22/men-vs-women/#_ftnref1 [Accessed 8. July
2011]".
Presthus, W., and Bygstad, B. (2010). Facebook as agile CRM? A business intelligence analysis of the
airline ash crisis.: NOKOBIT, Gjøvik. Tapir Akademisk Forlag.
Ryals, L., and Knox, S. (2001). "Cross-Functional Issues in the Implementation of Relationship
Marketing Through Customer Relationship Management." European Management Journal, 9(5),
pp. 534-542.
Sayer, A. (1992). Method in Social Science. A Realist Approach, New York: Routledge.
Shollo, A., and Kautz, K. (2010). "Towards an Understanding of Business Intelligence"ACIS 2010
Proceedings. Paper 86.
Simon, H. A. (1977). The new science of management decision (Revised): Prentice-Hall, Inc.
Turban, E., Aronson, J. E., Liang, T.-P., and Sharda, R. (2007). Decision Support and Business
Intelligence Systems: Pearson Prentice-Hall.
Turban, E., Sharda, R., and Delen, D. (2011). Decision Support and Business Intelligence Systems:
Prentice Hall.
Walsham, G. (2006). "Doing interpretive research." European Journal of Information Systems, 15, pp.
320-330.
Watson, H. J., and Wixom, B. H. (2007). "The Current State of Business Intelligence." IEEE Computer
Society (September 2007), pp. 96-99.
156