Download NOKOBIT 2011

Norsk konferanse for organisasjoners bruk av informasjonsteknologi NOKOBIT 2011 Universitetet i Tromsø 21. – 23. november 2011 NOKOBIT-styret og redaksjonskomité Terje Fallmyr Bendik Bygstad Jørgen Fog Laurence Habib Jon Iden John Krogstie Laila J. Matberg Universitetet i Nordland (redaktør, styreleder) Norges Informasjonsteknologiske Høgskole Departementenes servicesenter Høgskolen i Oslo Norges Handelshøyskole Norges teknisk-naturvitenskapelige universitet Høgskolen i Nesna Norsk konferanse for organisasjoners bruk av informasjonsteknologi NOKOBIT 2011 Universitetet i Tromsø 21. – 23. november 2011 NOKOBIT styre og redaksjonskomité Terje Fallmyr Universitetet i Nordland (redaktør, styreleder) Bendik Bygstad Jørgen Fog Laurence Habib Jon Iden John Krogstie Laila J. Matberg Norges Informasjonsteknologiske Høgskole Departementenes servicesenter Høgskolen i Oslo Norges Handelshøyskole Norges teknisk-naturvitenskapelige universitet Høgskolen i Nesna © NOKOBIT-stiftelsen og Tapir Akademisk Forlag, 2011 ISSN 1892-0748 ISBN 978-82-519-2845-8 Det må ikke kopieres fra denne boka ut over det som er tillatt etter bestemmelser i «Lov om opphavsrett til åndsverk», og avtaler om kopiering inngått med Kopinor. Redaktør: Terje Fallmyr Digital trykk og innbinding: AIT Oslo AS Tapir Akademisk Forlag har som målsetting å bidra til å utvikle gode læremidler og alle typer faglitteratur. Vi representerer et bredt fagspekter, og vi gir ut rundt 100 nye titler i året. Vi samarbeider med forfattere og fagmiljøer i hele landet, og våre viktigste produktområder er: Læremidler for høyere utdanning Fagbøker for profesjonsmarkedet Vitenskapelig publisering Forlagsredaktør for denne utgivelsen: [email protected] Tapir Akademisk Forlag 7005 TRONDHEIM Tlf.: 73 59 32 10 Faks: 73 59 32 04 E-post: [email protected] www.tapirforlag.no FORORD Velkommen til NOKOBIT 2011! NOKOBIT 2011 arrangeres av Universitetet i Tromsø, mens prosessen rundt det faglige programmet ble ledet fra Universitetet i Nordland. Dette er det 18. NOKOBIT siden starten i 1993, og det er 12. gang at NOKOBIT arrangeres sammen med NIK – og fra 2008 også sammen med NISK. I år har vi mottatt 27 bidrag, og det er 20 bidrag som skal presenteres. Alle bidrag har vært gjennom en grundig fagfellevurdering (blind review) av tre uavhengige reviewere. I god NOKOBIT-tradisjon vil hver presentasjon ha en diskutant som er grundig forberedt, og bidragsytere må også fortelle hvordan de har forholdt seg til kommentarene fra reviewerne. Jeg vil gjerne takke alle reviewerne for konstruktive tilbakemeldinger. Uten deres innsats hadde det ikke blitt noen konferanse. Jeg vil også takke styret i NOKOBIT for et utmerket samarbeid. Til slutt vil jeg takke den lokale arrangementskomiteen, og spesielt Lars Ailo Bongo. Det har gått veldig fint å samarbeide over distanse. Vi gleder oss til en god konferanse! Terje Fallmyr Handelshøgskolen i Bodø, Universitetet i Nordland Redaktør og styreleder for NOKOBIT 2011 Lasse Berntzen Solveig Bjørnestad Bendik Bygstad Monica Divitini Kjell Ellingsen Asle Fagerstrøm Terje Fallmyr Anna-Mette Fuglseth Arne Kristian Groven Laurence Habib Hallstein Hegerholm Jon Iden Grete Jamissen Arild Jansen Lill Kristiansen Jens Kaasbøll John Krogstie Wolfgang Leister Eystein Mathisen Carl Erik Moe Judith Molka-Danielsen Eric Monteiro Anders Morch Bjørn Erik Munkvold Hugo Nordseth Dag H. Olsen Andreas Opdahl Tero Päivärinta Ragnvald Sannes Guttorm Sindre Abbas Strømmen-Bakhtiar Bjørnar Tessem Pieter Toussaint Leikny Øgrim Extending Use and Misuse Cases to Capture Mobile Applications Sundar Gopalakrishnan, John Krogstie and Guttorm Sindre 1 On Choosing User Participants in Local Systems Development: Preliminary Results Sturla Bakke 15 Using the Personalized System of Instruction in an Introductory Programming Course Hallgeir Nilsen and Even Åby Larsen 27 The Alignment of IS Development and IT Operations in System Development Projects: a Multi-method Research Jon Iden, Bjørnar Tessem and Tero Päivärinta 39 Non Governmental Organisations as Change Agents in Implementation of new Software in the Health Information System in Tanzanian Regions - Ways of Handling Conflicts Ingeborg M. F. Klungland and Jens Kaasbøll 53 Towards Integration-Oriented Complex System Development Liping Mu, Andreas Prinz and Carl Erik Moe 67 The Community Case Study: A Research Methodology for Social Media Use in Eparticipation Marius Rohde Johannessen 77 Design of a Social Communicative Framework for Collaborative Writing Using Blended ICT Judith Molka-Danielsen and Ole David Brask 91 Initial Experience with Virtual Worlds for People with Lifelong Disability: Preliminary Findings Karen Stendal, Judith Molka-Danielsen, Bjørn Erik Munkvold and Susan Balandin 105 IT Governance in Norwegian Public Sector – Business as Usual? Arild Jansen and Tommy Tranvik 119 Publishing Academic Articles: The Diffusion of Intellectual Contribution from Small Local Events to the Larger International Professional Community Tor J. Larsen and Ragnvald Sannes 131 Conducting Research with Business Intelligence Wanda Presthus and Bendik Bygstad 145 Searching for the Meaning of Multitasking Vedrana Jez 157 Decision Making and Information. Conjoined Twins? Kjell Ellingsen and Eystein Mathisen 167 Critical Success Factors for ERP System Implementation Revisited Heidi Buverud, Anna Mette Fuglseth and Kjell Grønhaug 181 ERP-implementering i en kunnskapsintensiv bedrift: en casestudie av et forlag Christian Hoff, Eli Hustad og Dag Håkon Olsen 195 Enterprise Architecture to Enhance Organizational Agility? An Exploratory Study Terje Fallmyr and Bendik Bygstad 209 Augmenting Online Learning with Real-Time Conferencing: Experiences from an International Course Bjørn Erik Munkvold, Deepak Khazanchi and Ilze Zigurs 221 Sharing Practice in the Distributed Organization Inge Hermanrud 233 Næringsrettet IKT-utdanning – i praksis og forskning Tor Lønnestad og Carl Erik Moe 247 CONDUCTING RESEARCH WITH BUSINESS INTELLIGENCE Wanda Presthus Bendik Bygstad Norwegian School of IT Norwegian School of IT [email protected] [email protected] Abstract Business Intelligence (BI) is commonly seen as a decision making process with associated tools. In this paper, we explore how research can be conducted using BI techniques. Our main argument is that BI can be modified to offer a full stepwise process, going from the research question through data collection, data qualification, and data analysis, to findings and conclusion. We believe that the Internet offers innovative and vast opportunities for BI analysis. As an example we discuss an investigation of customer communication on Facebook. Further, we identify nine different BI research designs. These designs show a considerable breadth of possible investigations, ranging from simple blog analysis to surveillance research. We illustrate that the basic BI steps constitute a sound research basis for all these designs. We find that BI is useful in an exploratory setting with no clear hypotheses, because it allows for creative queries and mining of large amounts of data. Overall, we argue that BI offers some new and exciting opportunities for research designs in an information-rich world. 1. INTRODUCTION The main aim of Business Intelligence (BI) is to support decisions for an organisation, by providing access to existing data (Davenport 2006; Li 2005). BI has existed as a term since 1958 (Luhn 1958) and was introduced as a discipline in the early 1990s (Watson and Wixom 2007). However, its previous history can be seen as an evolution of decision support systems (Turban et al. 2011) as well as the scientific process of decision making (Simon 1977). This paper describes how research can be conducted using BI techniques. The background is simply that the Internet, and in particular the World Wide Web, now provides the world’s largest information source, with a diversity of information that is astonishing. The Internet offers a wide range of information sources interesting for the IS researcher, from simple web pages, to social media such as Facebook and Twitter, to e-business applications and cloud services. It is important to bear in mind that the Internet is not only an information resource; it is a global socio-technical network with a lot of action (Castells 2009). As the Internet matures, it looks less like a library and much more like a full social and economic community. It is our view that this resource is underused in current IS research and particularly in student dissertations. Although there are several accepted methods, IS research tends to fall in two main categories; case studies and surveys (Chen and Hirschheim 2004). The rise of interpretive IS research (Walsham 2006) has led to new insights on IS development and use, but also to the reliance of small qualitative data sets. This is not necessarily wrong, but it is still somewhat paradoxical that a discipline, initially built on mass data processing, should prefer to work with ethnographic methods in an age of globalisation (Kallinikos 2004). We believe that one of our strengths as a research community is our knowledge and confidence with information technology, and the Internet offers an unprecedented resource for IT-based analysis. Which research approach is appropriate for exploiting these data resources? Arguably, there are several. The journal Internet Research for example, has in the period 1991 to 2011 (as found in abstracts) published 365 survey papers, 40 case studies, and 21 papers based on content analysis. But as we will argue in this paper, BI presents some more exciting and comprehensive opportunities. The strength of BI is that it is both a process and a tool (Turban et al. 2011). BI is a fun and powerful approach, and offers the creative researcher a range of new research opportunities which we will discuss. 145 We proceed as follows. First in section 2 we make a systematic assessment of the steps of BI in order to investigate to what degree BI can serve as a research approach. From this analysis we suggest a step-wise framework for BI research, in section 3. To illustrate the process in detail we analyse in section 4 an empirical study of using BI as research approach using Facebook data. Then we broaden our perspective, and suggest in section 5 nine different generic BI research designs, which we discuss and illustrate with research examples. We conclude in section 6. 2. LITERATURE REVIEW In this section we conduct a systematic comparison of the (generic) research process and the BI process. Our main points are shown in table 1 and discussed in detail in section 2.3. Steps in research 1. Problem formulation 2. Gather information and resources 3. Formulate hypothesis or research question 4. Collect data 5. Analyse data 6. Discuss results/findings 7. Draw conclusion Steps in BI 1. Business need for decision 2. Identify possible data sources 3. Formulate queries 4. Extraction, Transformation, Load (ETL) 5. Perform queries, make OLAP reports, data mine 6. Make decision based on information 7. Act accordingly to decision Table 1. Traditional research process versus BI steps 2.1 The structure of the research process The research process consists of certain steps in order to answer a research question (Bryman 2008; Sayer 1992). Although a number of different research methods exists, the basic steps are shared and relatively straightforward for most of them. As shown in the left column of table 1, the researcher starts with a problem formulation, and then conducts a systematic review on the existing knowledge in the field. From this a research hypothesis or question is formulated. Then the empirical researcher makes a research design (chooses unit of investigation, sample and instruments), and goes into the field, or the lab, in order to collect the data. Collected data are analysed (qualitatively or quantitatively) and findings are identified and documented. The findings are discussed in relation to existing knowledge, and finally conclusions are drawn and implications are assessed. 2.2 The structure of BI There are multiple definitions of BI. Some researches see it as part of a data warehouse and tools for accessing the data (Gang et al. 2008), others define it as a decision making process (Davenport 2006). Based on Turban et al., who claim that the process of BI is “...based on the transformation of data to information, then to decisions, and finally to actions” (Turban et al. 2011, p. 19), we illustrate the process of BI by four steps: Data Information Decision Action Figure 1: The process of Business Intelligence A recent publication by Shollo and Kautz (2010) concludes that BI, over the past twenty years, has been defined either as a process, a product, or a set of technologies, or a combination of the three. Shollo and Kautz reviewed over one hundred publications related to BI and revealed that the majority of research has focused on turning data into information, as well as technology, but less on the role of the decision maker (Shollo and Kautz 2010). Finally, a few publications indicate what BI is not, namely having a data warehouse without access tools to the data (Howson 2008). BI can play a crucial role in almost every function in a retail organisation, such as Customer Relationship Management (CRM) (segmentation, campaign effectiveness analysis), alternative Sales Channels (Internet, interactive TV), enterprise management (dashboard reporting) as well as human resources and 146 finance (Li 2005). Turban et al (2011) add to the list with fraud detection for insurance companies, tracking goods for transportation companies, and providing the best medical care by the health industry, as well as banking industries providing the best care for their customers and following trends in the market. How is all the above carried out? With end-user access techniques. Turban et al (2007) provide a framework for the major technological components of BI, illustrated in figure 2 below. The components on the left side of the dotted line are elements belonging to a data warehouse, and the elements on the right side are end-user access tools. The components of “External Web documents” and “ETL” will be described later in the paper. Figure 2: The major components of Business Intelligence (Turban et al, 2007, p. 201) The BI tools in the very right in figure 2 all render access to data, but have different levels. The simple and convenient applications such as queries in Excel link the business elements, OLAP analyses incoming data, and the most sophisticated data mining reveals hidden patterns (Ryals and Knox 2001). Data mining is an analysis which looks for hidden patterns in large amounts of data. It does not only present data in a new way, but actually discovers relationships among the data (Turban et al. 2007). Research is abundant with fun examples of such hidden patterns. This example is maybe most known: The large, American warehouse store Walmart data mined their sales data and found out that baby food and beer were frequently purchased together, especially when physically placed side by side in the stores (Ryals and Knox 2001). We draw mainly on data mining techniques in this paper (Moss and Atre 2003). The first, association, identifies occurrences within one record by means of statistics. If the customer purchases airline tickets for the whole family, there is X% chance for car rental also. Association is also called market basket analysis. Second, classification, is considered the most common. It looks at behaviour and attributes of predefined groups, for example which groups of customers who are likely to purchase a product. Algorithms for classification are decision tree or simple if-then statements. Third and final, clustering, is similar to classification, but the groups are defined after the data mining. For example, clustering is used to detect manufacturing defects or market segmentations, and typical algorithms are neural networks or statistics (Turban et al. 2007). We will now compare the structure of the research process and the BI process, going through each step of table 1. 2.3 Comparative analysis From table 1 we clearly see that the generic research structure on the left side and the BI process on the right side have some important differences, but also share many features. The most important difference 147 is the basic aims; while the aim of research is to produce new scientific knowledge, the aim of BI is to apply knowledge to take a business decision. This is seen in step 1, where the BI process is started with a business need, while the research process starts with a problem formulation or research question. It is also seen in the last steps (step 6 and 7) where the BI process focuses on decisions and action, while the research outcomes are findings and knowledge claims. Thus, a BI query usually differs from a research question. For example, the question “How many clicks does a customer do on our web site before purchasing a product, and how can we sell more by making it easier?” is a BI query. A research question using this information could be: “How can the clickstream of a commercial site be used to improve usability?” However, except for the start and the end, the process steps are rather similar. Step 2 is about identifying relevant existing knowledge and sources of information. The review of previous research is lacking in the BI process, but the identification of possible sources of information is the same. Step 3 is basically the same; to formulate an assumption or hypothesis. Step 4 concerns data collection. The ETL process is admittedly a special case of data collection, since it usually involves the use of data warehouse technologies or other tools. But it is nevertheless about data collection; to select the sources, collect the sample, ensure that data are trustworthy, and to arrange it in a way that allows for systematic data analysis in the next step. In step 5 (data analysis) BI provides a number of techniques such as queries, OLAP reports, and various forms of data mining. Usually, this will be a quantitative analysis, but there are exceptions (for example the researcher might be looking for certain textual expressions or pictures). Today, mining consists or data-, text-, and web mining, and also the recent reality mining (Turban et al. 2011). Compared to traditional statistical analysis, BI offers some other opportunities (Moss and Atre 2003). While statistical analysis usually requires a hypothesis, BI is designed to handle open questions. Moreover, BI can handle various types of data (text, pictures, sound) in addition to numerical data. Summing-up this analysis we find that except for the start and end, the research process and BI process are relatively similar. 3. CONDUCTING RESEARCH WITH BI In this section we draw on the analysis from the previous section, and suggest a step-wise framework (table 2) that incorporates the key features of BI into the generic research structure in the left side of table 1. Suggested research steps 1. Problem formulation 2. Review previous research, and identify possible data sources 3. Formulate research question(s) and BI queries 4. Extraction, Transformation, Load (ETL) 5. Perform queries, make OLAP reports, data mine 6. Discuss findings 7. Draw conclusion Comment It should be assessed whether the problem is suited for the BI approach In addition to the normal research review, this step also includes the identification of Internet data resources The research question is often an open question, while the BI queries are always precise and data oriented Usually involves the use of data warehouse technology or other tools This is usually an iterative and creative process The results of the BI query will make a foundation for the answer to the research question To what extent has the research question been answered? Table 2: Framework for conducting research with BI. Step 1 is problem formulation. Many research problems obviously cannot be solved with the BI approach, so the researcher has to assess whether the problem is suited for BI techniques. A key question the researcher should ask herself is: will an analysis of mass data possibly give new insights to this problem? If the answer is no, maybe another approach (than BI) should be chosen. 148 Step 2 includes the usual research review. In addition the researcher also should identify the necessary data resources, often found on the Internet. This could be ordinary web pages or weblogs, or more complex structures such as social networks. Step 3 is formulating the research question. In contrast to some streams of research where a testable hypothesis is formulated, the BI analyst usually has a more open question. However, the open question is often found also in explorative case study research (Gerring 2007) and in grounded research (Miles and Huberman 1994). From the research question, we need to deduct the BI queries. Doing this requires a thoughtful operationalisation, going from the open question to the specific queries, which will affect the validity of the findings. Step 4 and 5 are the core BI steps. The ETL process usually involves the use of data warehouse technology, which might be a sophisticated tool such as a Data Warehouse (Turban et al. 2007), but often rather simple tools are used, such as Excel. Data analysis is often an iterative (and creative) process where the researcher is looking for patterns in a large data set, using the tools to ask new queries. Conducting the ETL process in the correct way ensures the reliability of the data. In step 6 the researcher will often have to deduct the findings from the revealed patterns of data. For example, let us assume that a researcher wants to assess the usability of a complex web site by analysing the click-streams of a weblog. The documented click-stream might offer various answers, which might require going back to step 5 to ask more queries. We proceed by discussing an example which is built on these steps. 4. ANALYSIS In this section, we will present a BI analysis of the “ash crisis” taking place in 2010 (Presthus and Bygstad 2010). The presentation is based on the suggested framework in table 2. The ash crisis started with the eruption of large volumes of ash from the volcano Eyafjallajøkul on Iceland, in mid April 2010, which grounded most of the North European air traffic. Not only were airline passengers prevented from going on outward voyage, but passengers scheduled to return to Northern Europe also found themselves stuck at airports around the whole word. 4.1 Problem formulation At the beginning of the ash crisis of April 2010, the call centres of the airline companies SAS and Norwegian quickly collapsed. The stranded passengers from around the word started to use Facebook to communicate with the airline companies. Our research interest was to explore how social media could be used to improve customer communication for companies. 4.2 Review previous research, and identify possible data sources First, we reviewed various technical solutions for customer communication, focusing particularly on CRM. Then we identified that both SAS and Norwegian was on Facebook: • • Norwegian on Facebook (http://www.facebook.com/flynorwegian) SAS on Facebook (http://www.facebook.com/SAS) These sources are examples of “External Web documents” from figure 2. 4.3 Formulate research question(s) and BI queries The research question was: “To what extent can the most popular social media, Facebook, be used to improve customer relationships?” From this, we derived three BI queries: “How many passengers chose to use Facebook for communication?”, “How long did it take for SAS and Norwegian to answer a question?”, and “How was the ambiance on Facebook during the ash crisis?” 4.4 ETL ETL stands for Extraction, Transformation and Load, and is a time-consuming and crucial part of Business Intelligence, and can take up to 70% of the process (Turban et al, 2011). In our case, we extracted data from the two Facebook pages, selecting the specified elements such as the questions from 149 the passengers, the answers from SAS and Norwegian, and the time from a question was posted to an answer was provided. We thus disregarded advertising, photos and information about the airline companies on the web sites (see figure 3 below). Figure 3: Two Facebook pages to be extracted (Facebook, 2010, Facebook, 2011) The extraction was made using Ruby code, exploiting the Application Programming Interfaces of Facebook. The results of the extraction are shown in figure 4 below (The user’s names have been replaced with “A Facebook user”). Figure 4: The selected data from the two Facebook pages from figure 3 are extracted Having extracted the data from the two sources, the next step was to transform them. There are several possibilities of transformation, and in our case, we wanted to: • • • Accumulate the number of requests to SAS and Norwegian during the ash crisis Measure the number of minutes from a question was posted to an answer was provided Accumulate the instances of each word 150 Then all selected data were loaded into a new spread sheet. Our data had now gone through the ETL process (refer figure 2) and were ready for further analysis. 4.5. Perform queries/make OLAP reports/data mine We now conducted the three queries described above. First, we counted the number of requests per day to SAS and Norwegian during the ash crisis. Then, we measured the number of minutes from a request was posted, to a response was provided. The final results were two graphs showing the distribution over the whole ash crisis period. Finally, we conducted a sentiment analysis (Turban et al. 2011), measuring the emotional temperature. In our case, we performed text mining manually. We wanted to categorise and then accumulate the instances of positive and negative words on the web pages for each airline. The results are shown in figure 5 below. Figure 5: Positive and negative words for each airline company; a sentiment analysis In the next step, we discussed these results. 4.6 Discuss findings The graphs provided answers to our BI queries: we saw how many passengers who chose to communicate via Facebook during the ash crisis; nearly 600 at SAS, and over 1400 at Norwegian. We also noted that it took only minutes from a question was posted to an answer was provided. Moreover, our sentiment analysis in figure 5 revealed that there was an overweight of positive instances of words posted on Facebook. Finally, we could also revisit the extracted data and read the questions and answers, as shown in figure 4 above. Building on the answers provided by BI techniques, we found that SAS and Norwegian indeed managed to solve problems, and that the passengers even made the effort to thank the airliners for the help. 4.7 Draw conclusion By using BI as a research approach, we found that Facebook enabled companies to communicate with their customers as well as solving problems. We also found that the passengers showed a positive and civilised behaviour on the Facebook pages. 151 4.8 Summing up Our example shows that by using BI as a research approach, it is possible to extract and analyse large amounts of data from a semistructured Internet source. Should we, alternatively, have gathered this information using qualitative sources, we would have had to interview a limited number of ash stranded passengers and service staff at SAS and Norwegian. This would have been time consuming, and the collected data could have been less reliable, as people tend to forget. Also, it would probably have been difficult to identify and reach these passengers after the event. 5. NINE POSSIBLE RESEARCH DESIGNS Our example in section 4 is only one of many possible research designs. We used semistructured data collected subsequent to an event, but there are several other combinations of data types and events. Loosely based on the classic framework of decision support by Gorry and Scott-Morton (Gorry and Scott Morton 1971) and the more recent framework of data warehouse evolution by Brobst and Rarey (Brobst and Rarey 2003), we propose a matrix of nine research designs, shown in table 3 below. One dimension is the various degrees of data structure. This dimension is loosely based on Gorry and Scott-Morton, who proposed a framework for decision support making. Our other dimension is temporal, meaning at what time data is collected in relation to an event. First, past event: data are collected after the event, and we are interested in finding out what happened and why. Second, present event: data are real time or close to real time data, as we want to investigate the situation right now. A common example is airline companies, who needs to know how many seats are empty prior to take-off of the flight in order to sell them to a discount. Third and final, future event: data are fabricated because we aim to predict what will happen. This is interesting in order to prevent an event from happening, or to be prepared when it happens. Past event Present event Future event Unstructured data 1. Blog Analysis 4. Real time surveillance Semistructured data 2. Social media analysis 5. “Reality mining” 7. Infectious disease prediction 8. Sales prediction Structured data 3. Clickstream analysis 6. Product recommendations 9. Statistical prediction Table 3: Nine research designs for with BI techniques This matrix presents nine different research designs based on BI. There are certainly other possible designs, but we believe that this categorisation is useful for our purpose. In addition to the normal researcher, we have in mind the Information Systems Master student with limited time and resources available. Further, we argue that the approaches allow for creative designs and thinking “outside the box”. By basing her study on one or more of the nine research designs from table 3, the Master student can spend less time collecting a large amount of data, and more time on analysis and innovative discussion. We will now discuss each design, starting with a description of each cell. Then we give an example of research topic and suggest one research question. Finally, we describe in more detail which BI queries that could contribute to a research answer, and suggest an approach for ETL and data analysis. 5.1 Blog analysis A blog typically consists of unstructured data, as a blog-writer has no predefined categories or fields to enter data. Other sources of unstructured data are e-mails and text documents, as well as photos, sound, and even colour (Inmon and Nesavich 2008). A possible research topic could be gender differences in Internet publishing. On this topic, an interesting research question could read “Are there systematic differences in the way men and women express themselves through a blog”? Typical BI queries would be: “Do women write longer blogs then men? Which topics are typically associated with men and women?” Having identified the BI queries, the ETL process would proceed as follows. For extraction, one would need to identify for example one hundred blogs written by each 152 gender, and extract the data by means of some sort of programming code or an application like Mozenda. Then, these data would have to be transformed, by for example translating all blogs into one language, stemming (removing all variants of one word), and counting each instance of each word. Finally, data could be loaded into a spreadsheet or database for further data analysis. The applied text mining technique could be clustering; as we would like to explore what men and women write about (we do not know beforehand). 5.2 Social media analysis Data extracted from social media, as in our Facebook case, are considered semistructured because meta data exist (Blumberg and Atre 2003). Social media are large structures with millions of users, which represent a huge resource for interesting research. One example is from LinkedIn, the professional network, where Scott Nicholson investigated another aspect of gender differences (Nicholson 2011). The research question was: “Are there systematic differences between men and women regarding networking in different industries?” In order to investigate this, a network savviness index was defined, as 1) the ratio of one-way connections that men have to connections that women have, and 2) the ratio of male members on LinkedIn to female members. The operationalised BI queries were: “How many men/women are registered in each industry? How many links are there from men to these women’s connections?” In the ETL steps, first the companies were classified into industries, and all men/female members extracted for each industry. Then the number of “male” links was extracted for each industry. In the data analysis the gender ratio and the gender link ratio were computed, which resulted in the industry network savviness index. As might be expected, Nicholson found that “Law Enforcement” and “Capital Markets” were male savvy, but somewhat surprisingly that “Ranching” and “Tobacco” were female savvy. 5.3 Clickstream analysis Clickstream analysis, such as Google Analytics, reveals a user’s behaviour on websites (Turban et al. 2011). For example, a company can follow how long a user visits their web sites and which words they search for. Clickstream analysis requires access to company logs. As an example of possible research topic, we could investigate the requirements of a user group of the website of a Telecom company. Instead of asking users of their preferences, we could analyse their actual behaviour, in order to understand their needs. The research question could be: “Which services does a telecom customer wish to buy using his mobile phone, in contrast to the services he wants to buy on the web?” To be able to investigate this, we reformulate it into BI queries, such as: “Which products are mobile users searching on? How many clicks do they spend on various products?” In the ETL step we might identify weblogs at various company sites (and maybe other companies or countries), and load the data into a file. Then data analysis would be conducted as clickstream analysis, accumulating user patterns for various products and services. 5.4 Real-time surveillance Examples of unstructured data can be film from surveillance cameras in stores and homes, or on the roads, which can be programmed to notify if an unusual pattern occurs, such as trespassing and congestion in traffic. Turban et al (2007) describe how hospitals can use various information of a patient to automatically generate alerts from a Web Interface if a critical situation arises. Based on this case, we have a possible research topic within real-time surveillance in health care, with the following research question: “How can real-time surveillance technology assist in allocating hospital nursing resources?” One would need several BI queries, such as: “What are the normal heart beat rhythms of this patient? What is the blood temperature of this patient? How are the heart beat rhythms right now?” One would need several sources for collecting data, such as medical records and signals from heart beat which would need to be transformed to enable various data mining and analysis. One would aim to create alerts when 153 unusual patterns occur. The contributions of such research would be to create knowledge on how BI can improve effectiveness and quality in hospital care. 5.5 “Reality mining” Technology in the family of automatic identification, such as RFID (Radio Frequency Identification), barcodes, and magnetic strips generate huge amounts of data, and also require technology to make sense of the generated data. This type of technology is called reality mining (Turban et al. 2011). A possible research topic would be mobile tourist information, with the derived research question “Which are the real-time patterns of tourists in our city”? From this, we can make several BI queries, such as “How many tourists are currently in our city? Where are our tourists right now? Where were the same tourists fifteen minutes ago?” The ETL would start with collecting data by tracking mobile phone signals, calculate and load into a database or data warehouse for analysis. Useful data analysis techniques would include clustering. From this research, a company or city government could better understand visitors and provide better promotions and service. (There are obviously ethical issues when tracking a person’s movements via a cell phone. We briefly discuss ethics in section 6). 5.6 Product recommendations Perhaps the most known example of product recommendations is taken from Amazon: “You have chosen product X. Others who have purchased product X also bought product Y. Do you want to purchase product Y?” Already in 2001, Lam and Tan believed that Amazon.com was attractive not due to the good deals or the selection of products but because of its personal attention to users (Lam and Tan 2001). Today, many Internet stores offer similar recommendations. Consequently, a relevant research topic would be purchasing behaviour on the Internet. A related research question reads: “How do we use information about customer purchases to enhance additional sales?” In order to answer the research question, BI queries would include: “Which product has the customer placed in the shopping basket right now? Which products are often sold together with this product? Which customer is similar to this customer and what do the former usually purchase?” Data collection would come from cookies from the Internet customer’s PC, as well as the Internet store’s data about customers and sales. Data analysis would rely on the data mining technique called association. According to Lam and Tan, use of data mining techniques can contribute to understanding consumer preferences and creating profiles. This would be of particular interest for online retailers, but also for the customer, who faces a vast range of products on the Internet. 5.7 Bio-surveillance By text mining, governments may try to predict how infectious diseases spread, by mapping for example the instances of some words as they appear in social media and blogs. One example of bio-surveillance was described by Corley et al. (2010), where the research question was, how can blog posts be used to predict the spread of influenza? A pattern could then be compared to the official influenza statistics from the health authorities. The BI queries were operationalised as: “How many blog posts mention the word “flu”? How many such posts are registered each week over a six month period”? The ETL process included a service that conducts real-time indexing of all blogs, and then creating a file of all occurrences each week. Data analysis was conducted by programming the BI queries, and constructing a graph that covered the six month period. This graph was finally compared with government statistics, and found to be reasonably consistent (Corley et al. 2010). 5.8 Sales prediction By using semistructured data, such as customers’ comments and feedback, a retailer can predict a product’s popularity (Archak et al. 2007). By mapping numeric sales data up against textual data, the authors revealed that consumers typically use adjectives such as “Great/Bad/Amazing” to evaluate a product. With data from Amazon.com, they experimented on customer review’s impact on sales. They found that adjectives influenced customer behaviour. “Great” and “Good” increased sales. “Bad” lead to 154 disappearing of the product. Surprisingly, “Decent” and “Nice” actually diminished sales. “Best product” also hurt sales, because people did not believe in it. A possible research topic is how consumer’s behaviour is influenced by other consumers’ feedback and comments? A research question could be how customers’ comments can be used to predict the future sales of a product. Examples of BI queries are “Which customer comments are associated with which products? Identify the product with most positive and negative words. What are the sales numbers of a given product after a given time of comments?” The ETL process would start by extracting data from sales, date and comments from customers, calculate and accumulate in the transformation stage, and finally load into a chosen application. As example of data analysis, we would recommend clustering because we would not know what customers would say about the product. This research would help a company predict the success of a product, thus enabling the right level of stock. 5.9 Statistical prediction Finally, by using highly structured data, such as statistics, prediction can be made by a technique called decision tree. A decision tree is a machine learning technique for classification of different patterns, and is similar to the game “Twenty Questions” (Turban et al. 2011). Examples are: “Will this bank customer be granted a loan?” “How much will I receive in retirement pension?” We believe that an interesting research topic would be predicting failure for a youngster in school. The attention-grabbing research question would be which students are likely to fail in the next semester? We would need several BI queries in this case, such as: “What are the student’s grades this semester, and past semesters? What was the outcome of previous students with similar results up to this level?” ETL would include extraction of data about current and previous students, subjects, and grades. After transformation and loading, these data will qualify for a decision tree analysis. This piece of research would make valuable contributions for both the student and the school. By detecting which students are in danger of failing one or several subjects, a school can mobilise extra resources to these students. 6. CONCLUDING REMARKS In this paper we explored the potential of conducting research with Business Intelligence techniques. Our main argument is that BI offers a full stepwise process, going from the research question to through data collection, data qualification, and data analysis, to findings and conclusion. We find that BI is particularly useful in an exploratory setting with no clear hypotheses, because it allows for creative queries and mining of large amounts of data. Further we identified nine different BI research designs. These designs show a considerable breadth of possible investigations, ranging from simple blog analysis to surveillance research. We argue that the basic BI steps constitute a sound research basis for all these designs. We believe that BI is useful in a number of research settings, and particularly suited for Master dissertations. The power of the BI approach lies in the fact that it can provide the researcher and the student with a large and interesting data set, within a limited time frame. It also allows for both relatively simple descriptive analysis, as well as more sophisticated investigations. There are certainly also limitations. BI as a research approach is probably most useful for facts-oriented (positivist) research, and less for interpretive or critical investigations, although there might be exceptions. Further, the approach may require access to expensive tools, such as data warehouse technologies, or advanced programming skills may be needed. Also, ethical issues easily arise, such as privacy concerns with data mining in social media and obviously in various surveillance designs. Overall, we argue that BI offers some new and exciting opportunities for research designs in an information-rich world. Further research should exploit these opportunities, both as regular research, and in Master dissertations. 7. REFERENCES Archak, N., Ghose, A., and Ipeirotis, P. G. "Show me the money!: deriving the pricing power of product features by mining consumer reviews " Presented at Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. KDD '0.7. 155 Blumberg, R., and Atre, S. (2003). "The Problem with Unstructured Data." DM Review (February 2003), pp. 42-46. Brobst, S., and Rarey, J. (2003). "Five Stages of Data Warehouse Decision Support Evolution". DSSResources.COM, 01/06/2003. Bryman, A. (2008). Social Research Methods: Oxford University Press. Castells, M. (2009). The Rise of the Network Society. The Information Age: Economy, Society, and Culture, Oxford: Blackwell Publishers. Chen, W., and Hirschheim, R. (2004). "A paradigmatic and methodological examination of information systems research from 1991 to 2001." Information Systems Journal, 14(3), pp. 197-235. Corley, C. D., Cook, D. J., Mikler, A. R., and Singh, K. P. (2010). "Text and Structural Data Mining of Influenza Mentions in Web and Social Media." International Journal of Environmental Research and Public Health, 7, pp. 596-615. Davenport, T. H. (2006). "Competing on Analytics." Harvard Business Review (January 2006). Gang, T., Kai, C., and Bei, S. (2008). "The Research & Application of Business System in Retail Industry." IEEE Xplore, pp. 87-91. Gerring, J. (2007). The Case Study Method: Principles and Practices, New York: Cambridge University Press. Gorry, G. A., and Scott Morton, M. S. (1971). "A Framework for Management Information Systems." Sloan Management Review, 13(1), pp. 55-70. Howson, C. (2008). Successful Business Intelligence. Secrets to Making BI a Killer App: The McGrawHill Companies. Inmon, W. H., and Nesavich, A. (2008). Tapping Into Unstructured Data: Prentice Hall. Kallinikos, J. (2004). "Farewell to Constructivism: Technology and Context-Embedded Action", in C. Avgerou, C. Ciborra, and L. Land, (eds.), The Social Study of Information and Communication Technology. Oxford: Oxford University Press, pp. 140-161. Lam, C. K. M., and Tan, B. C. Y. (2001). "The Internet is changing the music industry." Communications of the ACM, 44(8), pp. 62-68. Li, H. (2005). "Applications of Data Warehousing and Data Mining in the Retail Industry." IEEE Xplore, pp. 1047-1050. Luhn, H. P. (1958). "A Business Intelligence System." IBM Journal of Research and Development, 2(4), 314-319 Miles, M. B., and Huberman, A. M. (1994). Qualitative Data Analysis: Thousand Oaks: Sage Publications. Moss, L. T., and Atre, S. (2003). Business Intelligence Roadmap. The Complete Project Lifecycle for Decision-Support Applications: Addison-Wesley. Nicholson, S. (2011). "The Gender Divide: Are Men better than Women at Social Networking? [Online] Available at: http://blog.linkedin.com/2011/06/22/men-vs-women/#_ftnref1 [Accessed 8. July 2011]". Presthus, W., and Bygstad, B. (2010). Facebook as agile CRM? A business intelligence analysis of the airline ash crisis.: NOKOBIT, Gjøvik. Tapir Akademisk Forlag. Ryals, L., and Knox, S. (2001). "Cross-Functional Issues in the Implementation of Relationship Marketing Through Customer Relationship Management." European Management Journal, 9(5), pp. 534-542. Sayer, A. (1992). Method in Social Science. A Realist Approach, New York: Routledge. Shollo, A., and Kautz, K. (2010). "Towards an Understanding of Business Intelligence"ACIS 2010 Proceedings. Paper 86. Simon, H. A. (1977). The new science of management decision (Revised): Prentice-Hall, Inc. Turban, E., Aronson, J. E., Liang, T.-P., and Sharda, R. (2007). Decision Support and Business Intelligence Systems: Pearson Prentice-Hall. Turban, E., Sharda, R., and Delen, D. (2011). Decision Support and Business Intelligence Systems: Prentice Hall. Walsham, G. (2006). "Doing interpretive research." European Journal of Information Systems, 15, pp. 320-330. Watson, H. J., and Wixom, B. H. (2007). "The Current State of Business Intelligence." IEEE Computer Society (September 2007), pp. 96-99. 156

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download NOKOBIT 2011