Download CPRsouth 2016 Paper: Rathor

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Transcript
PIZZA TO POLICY: COMPARING PRE AND POST LAUNCH
TWITTER DATA OF A PRODUCT
1. PROBLEM STATEMENT/POLICY RELEVANCE
While many businesses are using social media for outbound marketing and communication
efforts, leveraging social media for product innovation is a new concept for most. To recognize
the actual needs of customers is very difficult process. User conversation, available on social
media, can be useful for new product development and subsequent market success. Insights
from user generated content using various analyses provide an opportunity to businesses to
know the customer experience. This leads to developing new products emerging from the real
customer experiences. On social media, customers share their experience based on their own
judgement and preferences. Judgement is determined by customers’ emotions. Thus it is
essential to consider customers’ emotions in developing the product because emotions affect
directly their purchasing decisions. Social media platforms allow customers to share their
emotions and feelings freely. Hence user generated data can play a significant role in
developing new products incorporating the decision making process of customers. These
processes will be useful in reducing the failure rate of new products and more beneficial for
the end users. The processing adapted by the businesses in new product design can be replicated
in the policy making arena by the governments.
2. PRINCIPAL RESEARCH QUESTION
How social media data can help businesses to evaluate the instant market reactions by
monitoring users' reactions pre and post launch of a new product?
2.1 Sub Research Questions:
1. How does the usage of social media provide valuable insights to businesses in developing
new products?
2. How to identify the patterns and trends emerging in Twitter data for a new product?
3. How can network analysis help businesses in identifying lead users and their influence?
3. RESEARCH
3.1 Literature Review
Today social media provide a space for businesses to create a place for them for better
engagement with their customers and monitor customers’ activities (He et al., 2014). Social
media has become an easy way to enhance customer reach with an effective strategy. It is not
limited to Facebook and Twitter. There are many other social media platforms for various
purposes like LinkedIN basically for professionals, Blogs, Second life for gaming, and Flickr
& YouTube (content sharing) (Tuten& Solomon, 2013). Apart from customer engagement,
social media also explores more areas for businesses like recruitment, web-based training,
updating plans, new offers promotions, and customer involvement in product development.
Businesses needs to identify carefully social media platform for their use based on its
characteristics. With a significant popularity, Twitter reached 18% of internet users and
generates more than 500M tweets per day (Forbes, 2013). This vast amount of user generated
data give a valuable opportunity for businesses in various business functions where customer
involvement is required (Kaplan & Haenlein, 2010; Verhoef et al., 2013).
To recognize the actual needs of customers is very difficult process. Social media helps to
overcome the difficultly in capturing users’ voice in form of user-generated data. More
customers are willing to share their experiences with others via social media (He et al., 2014;
Sam & Cai, 2015). This user generated information is considered unbiased because this space
allows to provide real feeling and opinions. Further social media offers a place for discussing
various topics in various communities (Papadopoulos et al., 2012). Some extends this for
knowledge sharing engagement (Du Plessis, 2007). Insights from user generated content using
various analyses provide an opportunity to businesses to know the customer experience (Moe
& Schweidel, 2011; Nambisan, 2013). This leads to development of policy for the introducing
the new product coming from real customer experience. On social media, customers share their
experience based on their own judgement and preferences. Judgement is determined by
customers’ emotions. Thus it is essential to consider customers’ emotions in developing the
product because emotions affect directly their purchasing decisions (Noble et al., 2008). Social
media platforms allow customers to share their emotions and feelings freely. Hence user
generated data can play a significant role in developing policies in designing product
considering the decision making process of customers. These policies will be useful to reduce
the failure rate of new products and more beneficial for the end users.
In core marketing and promotional activities, monitoring users’ activities during the launch of
a new product can help businesses to capture current reaction. It helps in capturing ideas for
marketing strategies to modify their launching policies (Banerjee et al., 2012). Further, it is
also helpful to get new insights for new products design and improvements (Marcus et al.,
2011). For such analysis, a deep analysis is required along with the data reflecting the collective
judgment about the product. Like sentiment only measures the score of users’ feeling not their
preferences. Thus our contribution will be the deeper understanding for the capturing insights
from data through analytical approach. This leads to policy development of early market
success for new product. For this, we present a methodology to analyze tweets around the
launch of a new product that has been formerly announced.
Use of social media in product development is very limited in literature. Researchers used
different platforms for incorporating user conversation in designing product like Facebook
comments (Carr et al., 2015). The use of tweet in developing new product is considered as a
particular case because of considering the launch event as triggering event (Lipizzi et al., 2015).
In literature, User generated data is used for many other purposes to monitor and capture the
users’ reaction like TV audience reaction (Harrington et al., 2012) and stock market prediction
(Evangelopoulos et al., 2012). These studies includes the mining and analysing data i.e.
feedback and reviews to understand the user behaviour and preferences. The approaches used
for analysis in all studies are more or less similar such as text mining and sentiment analysis
based on semantic method (Brown, 2012). Semantic analysis includes the identification
insights in a network. Moreover, these studies are carried on a common assumption i.e. one
way communication of users. Various text mining and sentiment methods are used in extraction
of emerging themes and associates users’ reaction.
Text mining techniques like word frequency involve data extraction, indexing, and
classification to understand important terms based on their relevance. Focus of these
approaches is on the recognition and evaluation of emotions and preferences shared by users
for a specific product. Sentiment classification (i.e., positive, negative, and neutral) can be
applied at different semantic levels based on polarity classification. In the case of Twitter, it
can be applied to each single tweet or to collection of tweets (Thelwall et al., 2011). The
available tools for information visualization for tweets are not able to show the structure of
users’ preferences and judgments. Monitoring user activities on social media platforms present
nice data visualization solutions, but fail to dig deeper into user conversation. Specifically,
most of tools are not aimed at identifying the conversational patterns in user generated data.
For businesses, the recognition of patterns from the user conversation can facilitate to evaluate
customer experience about a new product. In addition it alsobuilds up a better under-standing
of what customers say and how they talk to others about a new product. In the next section we
provide a theoretical framework analyzing Twitter data based on conversational analysis.
3.2 Analytical Framework
An analytical framework (Figure 1) on data mining techniques in product development is
proposed based on a review of the literature on data mining techniques. Essentially, the
literature on data mining in product development identified co-creation dimensions and data
mining techniques for their application.
Now businesses are familiar with the significance of user generated content from social
network sites to modify their products. An organization need to have capability to access all
useful information about their products in form of comments, opinions and reviews. Itidentifies
what has happened and estimates what will happen in the instant future. As many businesses
are not using social media application and analysis because of lack of awareness. It has become
necessary to use data mining and analysis techniques to gain useful insights from many textual
documents quickly (Liu, 2012). Some main applications of data mining contain: opinion
analysis, clustering, opinion extraction (opinion summarization), and pattern analysis
(Adedoyin-Olowe et al., 2013; Ngai et al., 2011).
Figure 1. Classification framework for data mining techniques in product design
3.3 Method & Data Sources
We select a launching event of a new product as triggering event famous fast food company
‘Dominos’. The selection of the product is based on the popularity of company on Twitter and
the data availability of data. Data plays very critical role in proposed approach especially in
product launch as a triggering event. In this paper, the case company was planning to introduce
a new pizza product in mid July 2015. Thus we decided to collect Twitter data from February
2015 to October 2015. Tweets extraction from Twitter can be done in two ways either using
‘stream’ or ‘search’. Stream option provides the real-time traffic and ‘search’ option allows
downloading the tweets up to a few days with geographical information due to limited API.
We opted for the ‘search’ approach because of longer time and decent number of tweets
overtime using R and saved the data in excel sheet. Output contains various meta-data
information including tweet, data, user name with profile information and geographical details.
For this study, all details were not required for the analysis. We used again R with a cleaning
script. Further, data preparation like tokenization, filtering and stemming has been done using
natural language toolkit libraries in R.
The methodology can be broadly divided into three major steps (Figure 2):
(1) Data Pre-processing
(2) Data analysis
(3) Data Visualization
Figure 2. Set-wise structure of analysis
3.4 Results, Analysis & Discussion
3.4.1. Basic Information
We did content analysis using text and network analysis. First we created word clouds for both
datasets (pre launch and post launch) (Figure 3). In pre launch word cloud, the most emerging
words are order, easi, place, effect, tweet, job, hire, delivery, gift, card, sugar, sweet etc. On
the basis of these words, we can get idea of the major emerging topics in pre launch duration.
Users are talking about the pizza order time, easy to order, specific places, welcome notes from
the company, girt cards, jobs hiring in Dominos and sweet sugar about pizza. These are the
basic themes for the Dominos Italian pizza.
In the post launch word cloud, the most discussed words are similar with the few new words
like tweet, order, job, welcome, free, delivery, custom, service, want, place, edit, return, get,
hire, like, new, look etc. So some major themes are free delivery, start tweet, get free service,
job hire, look rep, return delivery.In summary, users are concerning about order delivery and
services, new pizza appearance, taste of the new pizza, the way of engagement by tweeting and
job opportunity.
Post-launch
Pre-launch
Figure 3. Word clouds
Figure 4. Frequency distributions for pre launch data
After the word cloud, we count the word frequency. It means how much time word is discussed
in the both of time spans. In pre launch period, apart from Dominos, Italianexot and pizza,
order was used more than 200 times and tweet word was used more than 150 times (Figure 4).
Others words like now, new, deliver, welcome, gift, card, get were used around 100 times. The
frequency of words shows the association of words to the product keyword i.e. Dominos Italian
pizza in the pre launch duration.
Figure 5. Frequency distributions for post launch data
In the post launch duration, we removed the Italian and exotic words from the dataset because
these were the basic search term. So these were present in each tweet which made the long
frequency column in the chart. Apart from the similar word counts from previous figure, here
start, want, like, free, get, custom, find, job, just, look are discussed around 100 times (Figure
5).
In word cloud and word frequency analysis, we can identify partially change in the topics
discussion about the product. In pre-launch data shows the expectation of users from the
products and post launch conversation shows the fulfilment of those expectations. But in
tweets, some promotional activities are also appeared such as job hiring and gift cards.
In the addition of word frequency analysis, we made the hierarchy of words based on the
importance of each word. We set the five levels of the hierarchy (Figure 6). First level shows
the domain of the product like in our analysis the level shows one element i.e. pizza. The
hierarchy also contains the relationship edges and the thickness of edge indicate how strong
connection between those words. Pizza has connection to the Italian with very thick edge. It
has also connections with third level word free and hut words but not so strong like Italian. On
second level, tweet word has link with order and Dominos and order is linked with can. It
means Dominos order can be made by tweet.
So in this hierarchy, we can identify the importance level of each word and their linkage with
the others levels words. It helps the company to focus on the highly connected words and the
theme that are emerging from their connection.
Figure 6. Word hierarchy
In addition, we also did the domain clustering in which the most prior domains of datasets were
showing (Figure 7). In the pre launch dataset, there were two clusters having red and black
colour elements. It means, in that discussion talked about two basic domains. In our data set,
these were Dominos Italian pizza and exotic Italian pizza. Both clusters are overlapping
because of the similar discussion for both domains. On the other hand, in post launch dataset
only one domain was showing indicating that the whole discussion on the exotic Italian pizza.
Pre-launch
Post-launch
Figure 7. Domain clustering
In summary, this clustering explains that in prelaunch data users discussed the earlier or
existing similar product. They discussed the drawbacks of existing product and expecting more
from upcoming product. Businesses can get better insights to overcome the weak area of
existing product based on this expectation comparison.
3.4.2 Ideation
After identifying the most frequent words, we checked how words were connected to each
other based on the degree counts. In Figure 8, we saw the complex network of words in both
datasets. To make graph simple and better understandable, we add the directional component
in the network (Figure 9).
Pre-launch
Post-launch
Figure 8. Word degree network
Pre-launch
Post-launch
Figure 9. Word directional degree network
Now this network shows the all high degree words in both cases. In pre launch dataset, high
degree words are Dominos, welcome, easyord and edit. All other words were connected to
these words. In post degree dataset, high degree words are Dominos, easi, edit and easi. This
shows the high degree words were same in both cases. It means these words are very common
with high frequency for the Dominos Italian exotic pizza conversation.
To identify themes from words, we also made work network based on their occurrence together
(Figure 10 and Figure 11).
Figure 10. Word network with clusters for pre launch data
In above figure, we divided the words in various clusters and make edges according to their
occurrence. There are eight major clustering having a major centric word. Like in first cluster,
Dominos was the centre word. The connections show the themes like Dominos love, Dominos
up, and now apply right job. There is also comparison between Dominos pizza fare and train
fare. In other cluster, Italian is the centre word along with the themes like user liked YouTube
video and large handmade pan. It means user likes the YouTube videos of exotic Italian pizza.
They are also happy with the handmade pan using for this pizza. Now this centre word Italian
also connected the other centre words like exotic and pizza. In the ‘pizza’ cluster, users were
discussing the gift cards and the KFC pizza. Another discussion is related to hiring of drivers
for delivery.
In the next cluster ‘tweet’ as centre word, discussion was related to the easy order by using
tweets. Dominos started to take orders by tweets with the use of emoji. The ‘experience’ cluster
showed that users were expecting yummy taste of the pizza. There are more small clusters like
customer rep service, thinks real better Dominos cover the Taylor swift’s album. So this cluster
helps businesses to make policies according to the customer centric themes emerging from
their conversation.
Figure 11. Word network with clusters for post launch data
In post launch word network also contained same centre words like Dominos, pizza, order with
the few new exotic and delivery. In ‘Dominos’ cluster, love to eat and order tracking are
emerging. Another cluster ‘pizza’ showed the theme of trip voucher win, comparison of pizza
hut for competition and looking best Italian pizza party. These two centre words were
connected to another one ‘exotic’. This cluster contains the themes like order delivery, tracking,
chill, garlic creamery and engagement experience. This cluster leads to the ordering and
payment themes making easy ordering through paypal payment gateway. These cluster are
connected each other to show the connection among themes.
In summary, these word networks show the themes associated with the common origin words.
We can see how themes can be changed from the same origin after the event occurrence that is
the launch of the new product.
After identifying the theme, we need to understand the sentiment of users for the all themes in
both time periods. For that, we did the sentiment analysis based on polarities and emotions
(Figure 12 and Figure 13). In polarity sentiment, we saw that there are less negative sentiments
in pre launch than that ofpost launch. This indicates that Dominos needs to focus on these
negative sentiments.
Pre-launch
Post-launch
Figure 12. Sentiment based on polarity
Pre-launch
Post-launch
Figure 13. Sentiment based on emotions
Then we categorized sentiments into various emotions like anger, anticipation, disgust, fear,
joy, negative, positive, sadness, surprise and trust. In both dataset, only anticipation and trust
are showing more sentiments. It shows that users are expecting less in pre launch duration as
compare to post launch duration. It indicates that this pizza was not able to full fill their
expectation. They are now looking for another product with more expectations. But this two
time periods’ sentiments are not more convincing. That’s why we decided to do the time series
analysis for these sentiments over the year as shown in Figure 14.
Figure 14. Time series sentiment analysis
We can see the average sentiment score are high for anticipation followed by trust and joy.
Users were expecting more in whole time period. At the launch time, i.e. July end, anticipation
is high and trust is low. Surprise is also slightly high because users are looking for better
product. This time series analysis provides the clear view of sentiment change but not the
reason of change. For that, we need to look the themes within each sentiment. Then we created
the word cloud based on sentiment categorization (Figure 15).
In pre launch, joy category shows the good delivery, gift card and low cost of previous product.
These topics were making users joyful. In sadness category, they are blaming for not getting
the free garlic crust. Due to limitation of algorithm, no themes are coming out from fear and
anger categories. The use of some words without any connection is difficult to develop any
theme. Handmade pan for this specific Italian pizza was really surprise for them.In post launch,
garlic dip taste with happy great liking is related to joy category. Fear category contains the
smoke, drinking and chill words which indicate the scared and horrible environment. Again
handmade pan are amazing for users in surprise category. Easy ordering through tweet is
another theme which has positive sentiment from users for this particular product.
Pre-launch
Post-launch
Figure 15. Word categorization based on emotions
3.4.3 Community Detection
After this sentiment analysis, we need to do network analysis to see the major communities in
Dominos’s Italian exotic pizza dataset.
Pre-launch
Post-launch
Figure 16. Communities identification in network
As shown in above figure, there are many communities existed in the whole network. These
communities were developed based on the users’ discussion about the product. Few
communities become merged and become the larger community after the product launch. Few
communities become bigger because of adding more individual users. There is also an increase
in the connections between one to another community. This shows the before launching of
product, users are various concerns to discuss but after the experience of product they are very
limited only major concerns. In post launch, they are more willing to share their experience to
many users. To understand this, we need to go deeper as shown in Figure 17.
Figure 17. Community development
This is a major community from the whole networks. Within this community, there are many
topics to share shown in various colours using clustering. Cluster numbers are more in pre
paunch community than that of post launch community. The size of each cluster in post launch
community is bigger as compare to pre launch community.
Community detection is a critical over time to identify the trends in product discussion. It also
shows the influencer users and their individual networks in whole network. These specific users
and communities play essential role in developing new ideas for exploration for product.
Community discussion helps to validate the primaryimpression and figure concrete insights.
CONCLUSIONS AND RECOMMENDATIONS
The proposed approach in this paper can provide valuable insights from the customers’
experience. Data availability is quite good on Twitter without being any conditioned to express
their ideas by customers. This research proposes to follow a longitudinal approach while
comparing the user generated content, before and after the launch of new product. The reason
for analysing pre and postboth data is for comparative differences in customers’ reactions. In
this way it fulfils the gap between real requirements of customers and actual product features.
It aims to develop a mechanism that can mine and analyze users' emotions regarding the
specific features. A closer analysis of the new product shall show whether businesses have
incorporated the ideas generated by the customers. Some data mining techniques such as
clustering, topic modelling, sentiment analysis, and community detection are used to identify
the themes (e.g. features) of the product. Because of there is continuous change in pattern of
the product and user segmentation.
Moreover this approach of collecting data requires relatively low cost. Customer’s preferences
can be captured in systematic way. The effect of promotional activity or campaigning can be
measured using the same approach with visualization of peak trends and sentiments. The results
show that there is noticeable differences in the sharing and discussing behaviour of customers
in both pre and post launch time periods. We consider that these differences can be used for
the early adoption for market success. It covers the diversity and dynamic preferences of
customers. Identifying the early adopters in various communities can help businesses to
understand their target audience and their collective judgement. The conversation analysis
based on theoretical perspectives empower the understanding the noisy data on Twitter. It also
enhances capability to extract useful information from this data. Using classifiers to analyze
the customer conversation would provide predict real reactions through effective visualization.
In addition, businesses can shape and refine their policies involved in product development
process based on customer’s conversation. Adding policies for customer segmentation based
on communities can be developed using these results.
The above case is social media analysis of Twitter data for a new product, pizza. It compared
the pre and post launch data of the product and offered insights on the reception of the product
from the content created by the customers in the social media platform. Similar experiments
can be conducted for public policy as well. In transport system, this approach may provide
valuable insights to develop transport policy using travellers’ views and experiences. It has
potential to deliver and improve the transport policy goals. Furthermore, in disaster situations,
the adopted methodology can offer the opportunity of harvesting information for situation
awareness and taking actions. It can demonstrate the collective behaviour of users consistently
based on actionable information warnings in very limited time to respond. In same case, policy
makers can use the output to develop policies for emergencies. Moreover, politicians can also
formulate their election policies using the similar approach through social media posts
generated by users.
REFERENCES
Adedoyin-Olowe, M., Gaber, M. M. & Stahl, F. (2013). A survey of data mining techniques for social network
analysis. Retrieved October 7, 2014, from http://jdmdh.episciences.org/18/pdf
Banerjee, N., Chakraborty, D., Joshi, A., Mittal, S., Rai, A., &Ravindran, B. (2012, May). Towards Analyzing
Micro-Blogs for Detection and Classification of Real-Time Intentions. In ICWSM.
Brown, E. D. (2012). Will twitter make you a better investor? a look at sentiment, user reputation and their effect
on the stock market. Proc. of SAIS.
Carr, J., Decreton, L., Qin, W., Rojas, B., Rossochacki, T., & wen Yang, Y. (2015). Social media in product
development. Food Quality and Preference,40, 354-364.
Du Plessis, M. (2007). The role of knowledge management in innovation.Journal of knowledge
management, 11(4), 20-29.
Evangelopoulos, N., Magro, M. J., &Sidorova, A. (2012). The dual micro/macro informing role of social network
sites: can Twitter macro messages help predict stock prices?. Informing Science: the International
Journal of an Emerging Transdiscipline, 15(1), 247-268.
Forbes. (2013). Can Twitter save TV? (and can TV save Twitter?). Retrieved March 13, 2016, from
http://www.forbes.com/sites/jeffbercovici/2013/10/07/can-twitter-save-tv-and-can-tv-savetwitter/#1fbf89f86419
He, W., & Yan, G. (2015). Mining blogs and forums to understand the use of social media in customer cocreation. The Computer Journal, 58(9), 1909-1920.
Kaplan, A. M., &Haenlein, M. (2010). Users of the world, unite! The challenges and opportunities of Social
Media. Business horizons, 53(1), 59-68.
Lipizzi, C., Iandoli, L., & Marquez, J. E. R. (2015). Extracting and evaluating conversational patterns in social
media: A socio-semantic analysis of customers’ reactions to the launch of new products using Twitter
streams.International Journal of Information Management, 35(4), 490-503.
Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis lectures on human language technologies, 5(1),
1-167.
Marcus, A., Bernstein, M. S., Badar, O., Karger, D. R., Madden, S., & Miller, R. C. (2011, May). Twitinfo:
aggregating and visualizing microblogs for event exploration. In Proceedings of the SIGCHI conference
on Human factors in computing systems (pp. 227-236). ACM.
Moe, W. W., &Schweidel, D. A. (2012). Online product opinions: Incidence, evaluation, and evolution. Marketing
Science, 31(3), 372-386.
Nambisan, S. (2013). Information technology and product/service innovation: A brief assessment and some
suggestions for future research. Journal of the Association for Information Systems, 14(4), 215.
Ngai, E. W. T., Hu, Y., Wong, Y. H., Chen, Y., & Sun, X. (2011). The application of data mining techniques in
financial fraud detection: A classification framework and an academic review of literature. Decision
Support Systems, 50(3), 559-569.
Noble, C. H., & Kumar, M. (2008). Using product design strategically to create deeper consumer
connections. Business Horizons, 51(5), 441-450.
Papadopoulos, S., Kompatsiaris, Y., Vakali, A., &Spyridonos, P. (2012). Community detection in social
media. Data Mining and Knowledge Discovery,24(3), 515-554.
Sam, Y., &Cai, Y. (2015). A Study on the Use of Social Media to Understand Consumer Preference: The Case of
Starbucks. International Journal of Management and Business Research, 5(3), 207-214.
Thelwall, M., Buckley, K., &Paltoglou, G. (2011). Sentiment in Twitter events.Journal of the American Society
for Information Science and Technology,62(2), 406-418.
Tuten, T. L., and Solomon, M. R. (2013). Social Media Marketing. Boston: Pearson.
Verhoef, P. C., Beckers, S. F., & van Doorn, J. (2013). Understand the perils of co-creation.Harvard Business
Review, 91(9), 28.