Download E6895 Advanced Big Data Analytics Task

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Short (finance) wikipedia , lookup

Stock trader wikipedia , lookup

Market sentiment wikipedia , lookup

Transcript
E6895 Advanced Big Data Analytics Task Milestone 1:
Market Intelligence Analysis
<Jia Ji, Tianrui Peng>
February 2nd, 2017
E6895 Advanced Big Data Analytics
© CY Lin, 2017 Columbia University
Task Introduction
• Question: If you never invested in stocks before, how do you decide
which stocks to invest in?
• Motivation:
• designed for small investor
• provide information about stock investment
• provide data for future stock prediction research or project
• Goal:
By extracting sentiment from social media data, we aimed to provide
statistical information that can help users decide which stocks to invest
in. Users can check our application for daily updates about sentiment
information related to stock prediction.
Our also build a stockTwits database that can be used for projects related
to stock prediction such as building machine learning algorithms.
2
E6895 Advanced Big Data Analytics
© 2017 CY Lin, Columbia University
State-of-art
1. Based on News
Evaluating a company’s performance to help with accessing the validity of a stock
2. Based on internet data sources
• Google Trends.
Written by Tobias Preis, published in Scientific Reports. Their research shows
that “increases in search volume for financially related terms tends to predict large
losses in stock market.
•
Sentiment analysis of Twitter posts [1]
Sentiments can impact sales and financial gains.
This paper focuses on one-day-ahead prediction
of stock based on sentiment analysis of Twitter.
3. Based on economic model (futures trading model, hedging model)
3
E6895 Advanced Big Data Analytics
© 2017 CY Lin, Columbia University
Dataset
StockTwits data:
• Post messages similar to tweets about stock or exchange record
• Show useful information for stock prediction and general trending
• We will design and build the database for this task.
4
E6895 Advanced Big Data Analytics
© 2017 CY Lin, Columbia University
Technologies
•
•
•
•
•
Mongo DB
Python
NLTK
Flask
Linode
PROJECT
StockTwits
Result Visualization (Web)
Machine Learning (Stock Prediction)
Collect Data
(Python Script)
TASK
Data Analysis
Public API for query
(Statistical Methods) data (Python)
Mongo DB
5
E6895 Advanced Big Data Analytics
Data Query Interface (Python)
© 2017 CY Lin, Columbia University
Analysis
• Analysis results:
• Top N most popular stocks
• Top N most bullish stocks
• Top N most bearish stocks
• We also provide a useful database for any
future research to analyze StockTwits’s data:
• Potential usages:
• compare user prediction with
real stock price. (accuracy of each user)
• Followers
• Number of likes
People can use these information for building various machine learning
applications to predict stock market.
6
E6895 Advanced Big Data Analytics
© 2017 CY Lin, Columbia University
Task Milestones Plan
1. Design database structure and collect information
•
•
Text analysis
Query data through StockTwits API
2. Data analysis based on the information in the
database
•
•
•
Calculate popularity for each stock base on various information
Sentiment analysis based on our data
Calculate sentiment score for each stock
3. Create standard API
•
•
7
Provide daily updates to user via our API
Provide StockTwits database with analytical result for future
research
E6895 Advanced Big Data Analytics
© 2017 CY Lin, Columbia University
Reference
1. Si, Jianfeng, Arjun Mukherjee, Bing Liu, Qing Li, Huayi Li, and Xiaotie Deng. "Exploiting
Topic based Twitter Sentiment for Stock Prediction." ACL (2) 2013 (2013): 24-29.
http://www.aclweb.org/old_anthology/P/P13/P13-2.pdf#page=72
2. Schumaker, Robert P., and Hsinchun Chen. "Textual analysis of stock market prediction
using breaking financial news: The AZFin text system." ACM Transactions on Information
Systems (TOIS) 27, no. 2 (2009): 12. http://dl.acm.org/citation.cfm?id=1462204
3. Yoo, Paul D., Maria H. Kim, and Tony Jan. "Machine learning techniques and use of event
information for stock market prediction: A survey and evaluation." In Computational
Intelligence for Modelling, Control and Automation, 2005 and International Conference on
Intelligent Agents, Web Technologies and Internet Commerce, International Conference
on, vol. 2, pp. 835-841. IEEE, 2005.
http://ieeexplore.ieee.org/abstract/document/1631572/
4. Stock Market Prediction Using Twitter Sentiment Analysis. pdf) 15 (2012).
http://tomx.inf.elte.hu/twiki/pub/Tudas_Labor/2012Summer/GoelMittalStockMarketPredictionUsingTwitterSentimentAnalysis.pdf
5. Preis, Tobias, Helen Susannah Moat, and H. Eugene Stanley. "Quantifying trading
behavior in financial markets using Google Trends." (2013).
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3635219/
8
E6895 Advanced Big Data Analytics
© 2017 CY Lin, Columbia University