Download How to Be an Intelligent TA Expert

Document related concepts

Expectation–maximization algorithm wikipedia , lookup

Choice modelling wikipedia , lookup

Forecasting wikipedia , lookup

Data assimilation wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
The Case for A Data Mining
Approach to Technical
Analysis
If I’m so smart
how come I’m
not rich
yet ??
The Case for Data Mining
You
After
Finance
9790
You
Before
Finance
9790
1. TA Is a Multivariate Recurrent Prediction Problem
2.The Four Tasks of A Recurrent Prediction Problem
1) Defining Target (Y), 2) Propose List of Candidate Predictors (X’s)
3) Build Data Base of Solved Examples
4) Selecting X’s, 5) Determining the Prediction Function
3. Humans & Computers
Complimentary Information Processing Abilities
Humans Uniquely Able to Handle Tasks 1 & 2 &3
But Poor at Tasks 4 & 5
Data Mining Algorithms Optimal for Task 4 & 5
4. TA Practitioners Should Partner-Up With
Data Mining Algorithms
5. TA Practitioners Should Abandon Outdated Methods
& Focus On
Their Proper Role in a Human / Machine Partnership
Data Bases
Data Mining
Practitioner
Data Mining
Software
1. TA Is a Multivariate Recurrent Prediction Problem
2.The Four Tasks of A Recurrent Prediction Problem
1) Defining Target (Y), 2) Propose List of Candidate Predictors (X’s)
3) Build Data Base of Solved Examples
4) Selecting X’s, 5) Determining the Prediction Function
3. Humans & Computers
Complimentary Information Processing Abilities
Humans Uniquely Able to Handle Tasks 1 & 2 & 3
But Poor at Tasks 4 & 5
Data Mining Algorithms Optimal for Task 3 & 4
4. TA Practitioners Should Partner-Up With
Data Mining Algorithms
There Are Two Kinds of Prediction
Problems
1. Regression: predicting the FUTURE
value of a continuous variable
2. Classification: predicting the class of
an object (situation)
In Both Regression &
Classification
The target variable concerns
something that is not yet known!!
In Both Regression &
Classification
We use information that is
known
To make the prediction
Two Kinds of Prediction Problems
1. Regression: we wish to predict the
FUTURE value of a continuous variable
•
•
This variable is referred to as: the
dependant variable, the target variable, Y
The target variable in a regression problem
is a continuous variable:
 can assume any value within a range
 Example: the % change in the S&P500 from now
(t0) to a point in time 90 days into the future ( t+90)
Two Kinds of Prediction Problems
2. Classification: we wish to predict the
class of an object whose class is not yet
known
•
The target variable in a classification
problem is a discrete variable
 Assumes a limited number of discrete values or
names ( 0,1), (+1, 0, -1), (benign / malignant)
 Example 1: the future class of a company with
respect to solvency ( bankrupt / non-bankrupt)
 Example 2: the future trend of the market over
the next 90 days ( up / down)
What Is A Recurrent Multivariate
Prediction Problem?
1. The same type of prediction is required
over and over again.
2. The same set of information is available
each time a prediction is required
•
•
The information is a set of values for each of
a multitude of variables
These variables are referred by the name
“independent variables, predictors,
candidate predictors, indicators, etc.
Examples Classification
Problem
Recurrent
Decision Problems
Does the Object Belong to
Class 1 or Class 2
1. The same type of prediction is required over
and over again.
– Medicine: Is a given tumor malignant or benign
– Oil Exploration: At a given location: Is there Oil or
No Oil (Drill / Don’t Drill)
– Marketing: is given consumer a likely buyer or
non-buyer for our product or service
– Credit Approval: Is a given loan applicant likely to
Repay or Default ( Lend / Don’t Lend)
– Technical Analysis: Is the market more likely to
advance or decline ( Buy / Sell)
Recurrent
Problems
ExamplesDecision
Regression
Problem
The Future Value of A Continuous Y Variable
1. The same type of prediction is required over
and over again.
– Medicine: survival time for someone with disease X
– Oil Exploration: amount of oil a new well is likely to
produce
– Marketing: What are the likely sales of a product
– Technical Analysis:
•
•
How much will the S&P500 appreciate over the next month
By how much will stock A beat the market over the next
month
Recurrent Decision Problem
2. The same set of information is available
each time a decision is required
• Information is a set of values for a multitude
of variables
Multivariate Information Set
measured values for a multitude of variables

Medicine: set of results on medical tests



Oil Exploration: set of values for various geological
parameters
Marketing: set of demographic factors describing
the person


Blood pressure, cholesterol level, blood sugar, etc.
zip code, owns car yes/no, etc.
Credit Approval: set of credit factors describing the
loan applicant

. # years at current address, number of credit cards,
payment history
Technical Analysis Information Set
multitude of Indicator Readings at a given point in time
1.
2.
3.
4.
5.
6.
7.
close / moving average = $ 1.075
10 day ma / 50 day ma = 1.067
RSI Indicator = 74
5 day ma volume / 25 day ma volume
VIX (Implied Volatility on Stock Options)
Ratio of Insider Sales / Purchases
Ratio of Upside / Downside Volume
62.1, +0.1, -.02
This point in time
Is characterized by
These indicator values
75.5, -2.1,-.55
75.5
62.1
-2.1
+0.1
-.55
-.02
In Other Words: There Are 3 Candidate Predictor Variables.
We can treat this as
Classification Problem
Class 1: Market Return over the next 20 days is > 0
Class 2: Market Return over the next 20 days is < 0
The Target Variable: The Thing We Wish To Predict
Is Discrete Variable that can Assume 2 Values
> 0 or < 0 ( we can call this Class 1 or Class 2,
This point in time t0
Is characterized by
75.5, -2.1,-.55
62.1, +0.1, -.02
75.5
62.1
-2.1
+0.1
-.55
-.02
Do These predictors (indicators )
Enable Us to classify (discriminate)
Future Up-Moves from Future Down Moves?
Class 1 from Class 2
This point in time t0
Is characterized by
75.5, -2.1,-.55
62.1, +0.1, -.02
t0
t+20
t0
t+20
Getting Matters of Time Straight
t0 and t+20
• t0 refers to the date on which the
prediction or classification is made
– This is date of the most recent values of the
predictor variables
• t+20 or t+n refers to a time in the future
that the target variable (Y) refers to
– In the bankruptcy prediction problem it is any
time over the following two years.
– So the future looking horizon of the target
need not be a fixed date.
Value of Y is based on Future Information
Values of X’s based on past and current information
Future
Past
Values of Predictors (X)
based on
What happens
Back here & up to
from t-n unitl t0
Value of Target (Y)
based on
What happens
out here
From t0 until t+n
t0
Time
1. TA Is a Multivariate Recurrent Prediction Problem
2.The Four Tasks of A Recurrent Prediction Problem
1) Defining Target (Y), 2) Propose List of Candidate Predictors (X’s)
3) Build Data Base of Solved Examples
4) Selecting X’s, 5) Determining the Prediction Function
3. Humans & Computers
Complimentary Information Processing Abilities
Humans Uniquely Able to Handle Tasks 1 & 2 & 3
But Poor at Tasks 4 & 5
Data Mining Algorithms Optimal for Task 4 & 5
4. TA Practitioners Should Partner-Up With
Data Mining Algorithms
Task 1: Define The Target Variable
(Y) The Single Variable We Wish to Predict
1.
Define the type of the problem: Classification or
Regression
A. Classification (Discrimination): Y defined as a
class 2 or more distinct classes
• Benign / malignant
• Lend / Don’t Lend
• Buy / Sell /
• Strong Buy / Weak Buy/ Weak Sell / Strong Sell
B. Regression: a continuous quantity (linear
regression)
• Future % increase in the market
• Predicted amount of future purchases
Task 2: Propose Candidate
Predictors (X’s)

These are merely candidates because we
don’t know yet if any will be useful for
predicting the target Y

Predictors must be based on data known at the time
the prediction is made:




look back in time from present
Tomorrow’s closing price – No
Today’s closing price or prior closing prices- Yes
Not all indicators need to be useful, but some must
be.
 Success in predictive modeling requires that
some candidate predictors have useful
information about the quantity or class to be
predicted (Y)
Task 2 is crucial!!!!!
If not done well…..all is lost
1. The TASK of the domain expert……(YOU)
2. Expert must know which raw data series may
contain relevant information
1.
2.
3.
4.
Price
Volume
Open interest
Interest rates, etc
–
–
For example in our problem X’s must be stationary.
That expose the information in the raw data series
to the data mining algorithm
3. Expert proposes useful ways to
transform raw series into indicators
Skipping Task 3 For A moment
Building the Data Base
Of Solved Examples
From Which DM Algorithm
Learns the Model
Tasks 4 & 5
4. Selecting Indicators for from Candidate List
that warrant a place in the prediction model


Determining which candidates contain relevant nonredundant information about (Y)
The set of indicators that work synergistically
5. Determining the prediction function


What is mathematical or logical formula for
combining the values of the X’s to best estimate the
value of Y
A complex configural reasoning problem
What Is A Prediction Function
• A mathematical or logical formula for combining
the selected indicators to produce a best
estimate of the target variable.
• Simplest :
Y
– 1 predictor model
– linear shape: y = ax1+b
b is value of the Y intercept of line
a is the slope of the line
X1
Simplest Prediction Model
1 predictor & flat (no hills or valleys) in model’s surface
The model predicts
This value of Y
Y
Y intercept =b
For this value of X1
X1
Multiple Linear Regression
Combines Two or More X’s in a linear way to predict the
value of Y
• In multiple linear regression the combining
function is assumed to be linear (weighted sum)
• Y= a1X1 + a2X2 + a3X3……….anXn + c.
Regression coefficients (weights) are found
By the method of Least-Squares
Modern Data-Miners Need Not Assume A Linear Form
They Allow the data mining algorithm to discover it.
It May Be
Non-Linear & Arbitrarily Complex
Linear Model : Flat Response (Y) Surface
Y Is Linear Function of Two Features X1, X2
Y
“A” slope
X1
X2
“C” intercept
“B” slope
Y = A X1 + B X2 + C
Linear Model Is Best Fitting
Tilted Flat Surface to the Data
Y
“A” slope
X1
X2
“C” intercept
“B” slope
Y = A X1 + B X2 + C
The Model’s Prediction is The Altitude of the Y Surface
Corresponding to values of X1 and X2
The model predicts
This value of Y
Y
X1
X2
Thinking of A Prediction
Model’s Output As
A Super Indicator
A new indicator that condenses &
combines the information
In two or more indicators (variables)
Into a new or super indicator
Model Output As a
“Super Indicator”
• The output of a prediction model is a new
variable, produced by function found by
regression analysis
• The function is a weighted sum of the indicators
serving as inputs to the model ( X1, X2, etc)
• The function’s weights been optimized to
transform values of inputs into a best estimate of
the target (Y).
– method of least-squares is used to find optimal
weights
– Weights cause the line or plane to fit the historical
data
Multiple Linear Regression
Combines Two or More X’s in a linear way to predict the
value of Y
• In multiple linear regression the combining
function is assumed to be linear (additive)
• Y= a1X1 + a2X2 + a3X3……….anXn + c.
But What If the true shape of the relationship
Between the indicators (X1…..Xn) is not a tilted
Flat Surface….but something more complex????
Multiple Linear Regression
Combines Two or More X’s in a linear way to predict the
value of Y
• In multiple linear regression the combining
function is assumed to be linear (additive)
• Y= a1X1 + a2X2 + a3X3……….anXn + c.
Modern Data-Miners Do Not Assume the Model
Surface Is Linear (free of hills and valleys)
They Allow the data mining algorithm to discover its
Shape, Which May Be Non-Linear
Suppose the authentic relationship
Between X1 & X2 and Y Looks Like This
Y
X1
X2
3
Y = f ( X1 , X2 )
Forcing
A Linear
toCapture
Describe
The Model
Fails to
Non-Linear
Phenomenon
The Boat!
The Authentic
PatternsMisses
in the Data
Linear Model’s
Predictions
Too Low
Y – future trend
Linear Model’s
Predictions
Too High
2
X1
X2 – TA indicator X2
Financial Markets Are Most Likely to Be
Complex Non-Linear Systems
Tasks 4 & 5
Must Be Performed by Data Mining Software
X1
Candidate
Predictors: X2
A Set
X3
of
X4
Indicators
X5
Proposed
By
Human
Expert
6
Xn
Task 4
Which, if any, of the candidate predictors
Contain information relevant to Y ?
?f?
Y = f (x)
Complex
Combining
System
Function
Outcome
Y
To Predict
Task 5
What is the shape of the
mathematical function
best combines the indicators
into a Predicted Value of Y
Tasks 4 & 5
Must Be Performed by Data Mining Software
X1
Candidate
Predictors: X2
A Set
X3
of
X4
Indicators
X5
Proposed
By
Human
Expert
6
Xn
Task 4
Which, if any, of the candidate predictors
Contain information relevant to Y ?
?f?
Y = f (x)
Complex
Combining
System
Function
Outcome
Y
To Predict
Task
Task 55
Note!!
What is the
shape of the
In When
the DM method
used
mathematical
function
Is
Multiple
Linear
Regression
best
combines
the
indicators
The
into Prediction
a PredictedFunction
Value ofIsY
Assumed to Be Linear
1. TA Is a Multivariate Recurrent Prediction Problem
2.The Four Tasks of A Recurrent Prediction Problem
1) Defining Target (Y), 2) Propose List of Candidate Predictors (X’s)
3) Build Data Base of Solved Examples
4) Selecting X’s, 5) Determining the Prediction Function
3. Humans & Computers
Complimentary Information Processing Abilities
Humans Uniquely Able to Handle Tasks 1 & 2 & 3
But Poor at Tasks 4 & 5
Data Mining Algorithms Optimal for Task 4 & 5
4. TA Practitioners Should Partner-Up With
Data Mining Algorithms
Human Experts
&
Data Mining Algorithms
Have Different But Complementary
Information Processing Abilities
They Synergize
Where Human’s Are Strong, DM Algorithms Weak
Where Humans Experts Are Weak, DM Algorithms Strong
Definition: Configural Thinking
a multitude of variables (indicators) must be
considered simultaneously as an
inseparable configuration (pattern).
Considering each variable individually
will not provide the correct conclusion.
Human Intelligence
Strengths
• Creative
– Posing Problems (Y)
– Proposing candidate
indicators (Xs)
&
Weaknesses:
• Weak Configural
Reasoning
– Distinguishing
relevant from
irrelevant X’s
– Combining multiple
variables
3
Machine Intelligence (Data Mining)
Weaknesses
• Lack Creativity
– Unable to pose
questions (define Y)
– Unable to propose
candidate indicators
(define X’s).
&
Strengths
• Excellent ability to
handle numerous
variables
simultaneously
Configural
– Can identify relevant
non-redundant
indicators.
– Can formulate
multivariate
prediction functions.
3
Who or What Should Handle
the 5 Tasks?
1.
2.
3.
4.
Define Y
Propose Candidate Indicators X’s
Build Data Base of Solved Cases
Indicator Selection: which Candidate X’s
Are relevant and non-redundant
5. Determining optimal combining function:
a mathematical model that combines
useful X’s into a prediction or
classification decision
A Task for Automated
Data Mining Algorithms
The Evidence
Studies of Human Experts Solving
Multivariate Recurrent Prediction Problems
Shows……..
1. Experts realize the necessity for configural
reasoning (combining variables in complex
non-linear fashion)
2. Experts are under the impression that they are
combining information in a complex configural
manner but studies show….
3. Experts rely primarily on simple linear rules for
combining information
4. Their performance is poor
–
–
Inconsistent–same set of information elicits different
decision on different :Correlation .6
Correlation among experts is also low
Technical Analyst Faced With Large
Set Of Conflicting Indicators
Bearish
Let
each
factor
= +1
&bearish
each
bearish
factor
Sum
bullish
factors
= +5
bearish
factors
==
-3-1
5bullish
bullish
factors
&: 3Sum
factors
Human Experts (Technical Analysts) Rely
on Intuitive Linear Combining
+5 – 3 = +2
I’m bullish
Sum bullish factors = +5 &
Sum bearish factors = -3
Comparing the Subjective
Predictions of Experts
With
Multiple Linear Regression
Models
Studies Began in 1954
The Question
How accurate are the predictions of humans
compared to multiple linear regression
models given the same set of indicators ?
Expert’s Subjective Predictions vs.
Multiple Regression Models
0.9
0.8
Model
Mean
0.38
0.7
0.6
r2
Predicted
Vs.
Actual
0.5
r2
Expert
0.4
Model
Expert
Mean
0.11
0.3
0.2
0.1
r2
0
Sales Effective.
-0.1
Expert
1
Model
Academic
1
2
2
3
4
5
6
7
3
4
5
6
7
Stocks
Cancer survival
Student Att.
Mental ill.
8
8
9
9
Teach. effective
Business Failure
Meta-Analysis
of 135 Similar Studies
Draws A Conclusion
From Multiple Independent Studies
Study1
Study2
Study3
Studyn
Swets, Monahan & Dawes
2000
•
meta-analysis of >135 studies
comparing
3 decision making
methods.
1. Expert / intuitive (subjective) judgment
based on anecdotal experience &
informal reasoning.
2. Statistical models.
3. Combination of methods #1 & #2.
Wide Variety of Disciplines Were
Examined in the 135 Studies.
• Fields
– Medical diagnosis
– Penology (parole recidivism,violence)
– Psychology(diagnosis and treatment
selection),
– Education ( predicting success in academics)
– Predicting football game outcomes.
• Results were quite consistent across fields
Results of Meta Analysis
135 Studies
• In 96% of the studies, regression models
beat or were equal to expert judgment.
• In medical diagnosis expert judgment was
always worse than regression model.
• Experts beat statistical models in only 6
studies.
The Question:
With All This Evidence Why Do Experts Insist on Making
Subjective / Intuitive Predictions & Decisions
Bottom Line For
Technical Analysis
Aronson’s Editorial Opinion
When Making Predictions
Rely On Objective Statistical
Models Not
Subjective Judgment
1
1. TA Is a Multivariate Recurrent Prediction Problem
2.The Four Tasks of A Recurrent Prediction Problem
1) Defining Target (Y), 2) Propose List of Candidate Predictors (X’s)
3) Build Data Base of Solved Examples
4) Selecting X’s, 5) Determining the Prediction Function
3. Humans & Computers
Complimentary Information Processing Abilities
Humans Uniquely Able to Handle Tasks 1 & 2 & 3
But Poor at Tasks 4 & 5
Data Mining Algorithms Optimal for Task 4 & 5
4. TA Practitioners Should Partner-Up With
Data Mining Algorithms
Task #3
Build Data Base
Of Solved Examples
The Data (Experience) Base Is Used By the
Data Mining Algorithms to Learn How to
Build The Prediction Model
This task often takes 90-95% of the time when developing
A Data Mined Model
Data Base of Solved Examples Known
Values of “Y”
•
What is a “solved example”? : A case
(situations, examples, etc) for which the value
of the target variable is known as well as the
values of the X (candidate predictors)
– Value of Y is known because the case happened in
the past
– Even though Y is a forward looking the case
occurred long enough ago so that the value of Y is
known.
•
Each case in the data based is described by 2
kinds of information
1. Value for the target variable Y.
2. The values for the candidate predictors
Examples of A Solved Case
A. 1 day of market history for the S&P500
1. Y value: % change over the month following the date of
the case (regression)
2. X values: values of the indicators on the date of the
case
B. An oil drilling site
1. Y value: did the site produce oil or not (class)
2. X values: values of 10 geophysical parameters
characterizing the site
C. 1 company
1. Y value: company failed or did not fail within next 2
years
2. X values: values of various financial ratios taken from
the most recent balance sheet and income statement
Data Base of Solved Examples
• Contains many cases: (typically thousands)
– Why so many? - data density.
• From the many cases the DM algorithm tries to
discover
– Which, if any, of the candidate predictors can solve
the regression or classification problem
• Task #4
– How the selected predictors should be combined
mathematically or logically to give the most accurate
estimate possible of the value of the target (Y)
• Task #5
Candidate Indicators
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11
XN
Y
1.2 -2.5 -5.1 1.2 -2.5 -5.1-2.5 -5.1 1.2 -2.5 -5.1 1.2 -2.5 -5.1-2.5 -5.1 1.2 -2.5 -5.1
1
Examp. 2 1.2 -2.5 -5.1 1.2 -2.5 -5.1 -2.5 -5.1 1.2 -2.5 -5.1 1.2 -2.5 -5.1 -2.5 -5.1 1.2 -2.5 -5.1 1
1.2 -2.5 -5.1 0
-2.5 -5.1 1.2 -2.5 -5.1-2.5
Examp. 3 1.2 -2.5 -5.1 1.2 -2.5 -5.1 -2.5 -5.1
-5.1 1.2 -2.5 -5.1 -2.5
1.2
-2.5
1.2 -2.5 Matrix
of
Examples
1.2 -2.5 -5.1 0
-2.5
-5.1
1.2
Examp. 4
With
Known
-2.5 -2.5 -5.1
1.2 -2.5 -5.1
0
Values
1.2 -2.5 -5.1
-2.5 -2.5 -5.1 0
Of both Xs & Y
-2.5 -2.5 -5.1
1.2 -2.5 -5.1
1
1.2 -2.5 -5.1 1.2 -2.5 -5.1-2.5 -5.1 1.2 -2.5 -5.1 1.2 -2.5 -5.1 -2.5 -5.1 1.2 -2.5 -5.1 0
1.2 -2.5 -5.1 1.2 -2.5 -5.1 -2.5 -5.1 1.2 -2.5 -5.1 1.2 -2.5 -5.1 -2.5 -5.1 1.2 -2.5 -5.1 0
Case N N 1.2 -2.5 -5.1 1.2 -2.5 -5.1 -2.5 -5.1 1.2 -2.5 -5.1 1.2 -2.5 -5.1 -2.5 -5.1 1.2 -2.5 -5.1 1
Examp.
Examp. 1
Human Intelligence: Unchanging
Computer Power & Machine Intelligence
Growing Exponentially
Power
Arithmetic
Scale
Moore’s Law:
An Increasing
Competitive
Advantage to the
Data Miners
Time
The A,B,C’s of Being An
Intelligent Technical Analyst
A.
Know How to Use Data Mining Tools
B. Know how to Define Data Mining
Problems ( Define Y)
C. Know how to define List of Information
Rich Candidate Predictors (X’s)