Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
E6895 Advanced Big Data Analytics Task Milestone 1: Market Intelligence Analysis <Jia Ji, Tianrui Peng> February 2nd, 2017 E6895 Advanced Big Data Analytics © CY Lin, 2017 Columbia University Task Introduction • Question: If you never invested in stocks before, how do you decide which stocks to invest in? • Motivation: • designed for small investor • provide information about stock investment • provide data for future stock prediction research or project • Goal: By extracting sentiment from social media data, we aimed to provide statistical information that can help users decide which stocks to invest in. Users can check our application for daily updates about sentiment information related to stock prediction. Our also build a stockTwits database that can be used for projects related to stock prediction such as building machine learning algorithms. 2 E6895 Advanced Big Data Analytics © 2017 CY Lin, Columbia University State-of-art 1. Based on News Evaluating a company’s performance to help with accessing the validity of a stock 2. Based on internet data sources • Google Trends. Written by Tobias Preis, published in Scientific Reports. Their research shows that “increases in search volume for financially related terms tends to predict large losses in stock market. • Sentiment analysis of Twitter posts [1] Sentiments can impact sales and financial gains. This paper focuses on one-day-ahead prediction of stock based on sentiment analysis of Twitter. 3. Based on economic model (futures trading model, hedging model) 3 E6895 Advanced Big Data Analytics © 2017 CY Lin, Columbia University Dataset StockTwits data: • Post messages similar to tweets about stock or exchange record • Show useful information for stock prediction and general trending • We will design and build the database for this task. 4 E6895 Advanced Big Data Analytics © 2017 CY Lin, Columbia University Technologies • • • • • Mongo DB Python NLTK Flask Linode PROJECT StockTwits Result Visualization (Web) Machine Learning (Stock Prediction) Collect Data (Python Script) TASK Data Analysis Public API for query (Statistical Methods) data (Python) Mongo DB 5 E6895 Advanced Big Data Analytics Data Query Interface (Python) © 2017 CY Lin, Columbia University Analysis • Analysis results: • Top N most popular stocks • Top N most bullish stocks • Top N most bearish stocks • We also provide a useful database for any future research to analyze StockTwits’s data: • Potential usages: • compare user prediction with real stock price. (accuracy of each user) • Followers • Number of likes People can use these information for building various machine learning applications to predict stock market. 6 E6895 Advanced Big Data Analytics © 2017 CY Lin, Columbia University Task Milestones Plan 1. Design database structure and collect information • • Text analysis Query data through StockTwits API 2. Data analysis based on the information in the database • • • Calculate popularity for each stock base on various information Sentiment analysis based on our data Calculate sentiment score for each stock 3. Create standard API • • 7 Provide daily updates to user via our API Provide StockTwits database with analytical result for future research E6895 Advanced Big Data Analytics © 2017 CY Lin, Columbia University Reference 1. Si, Jianfeng, Arjun Mukherjee, Bing Liu, Qing Li, Huayi Li, and Xiaotie Deng. "Exploiting Topic based Twitter Sentiment for Stock Prediction." ACL (2) 2013 (2013): 24-29. http://www.aclweb.org/old_anthology/P/P13/P13-2.pdf#page=72 2. Schumaker, Robert P., and Hsinchun Chen. "Textual analysis of stock market prediction using breaking financial news: The AZFin text system." ACM Transactions on Information Systems (TOIS) 27, no. 2 (2009): 12. http://dl.acm.org/citation.cfm?id=1462204 3. Yoo, Paul D., Maria H. Kim, and Tony Jan. "Machine learning techniques and use of event information for stock market prediction: A survey and evaluation." In Computational Intelligence for Modelling, Control and Automation, 2005 and International Conference on Intelligent Agents, Web Technologies and Internet Commerce, International Conference on, vol. 2, pp. 835-841. IEEE, 2005. http://ieeexplore.ieee.org/abstract/document/1631572/ 4. Stock Market Prediction Using Twitter Sentiment Analysis. pdf) 15 (2012). http://tomx.inf.elte.hu/twiki/pub/Tudas_Labor/2012Summer/GoelMittalStockMarketPredictionUsingTwitterSentimentAnalysis.pdf 5. Preis, Tobias, Helen Susannah Moat, and H. Eugene Stanley. "Quantifying trading behavior in financial markets using Google Trends." (2013). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3635219/ 8 E6895 Advanced Big Data Analytics © 2017 CY Lin, Columbia University