Download What Every Mathematician Should Know About Statistics

Document related concepts
no text concepts found
Transcript
What Every Math Professor
Needs to Know About Statistics
Rick Cleary
Babson College
For AMATYC Denver, 11/19/16
Thanks!
• To AMATYC and President Jane
Tanner for the invitation!
Thanks!
• To AMATYC and President Jane
Tanner for the invitation!
• To all of the excellent teachers who
helped me think about these topics.
Thanks!
• To AMATYC and President Jane
Tanner for the invitation!
• To all of the excellent teachers who
helped me think about these topics.
• To all of you for the really important
work you do. I am glad a got to
experience it!
Three friends of mine…
• Prof. Ken Mann… Ken's Website
• Prof. Jim Hobert … Jim's Website
• Prof. Marty Wells … Marty's website
Three friends of mine…
• Prof. Ken Mann… Ken's Website
-Mohawk Valley Community College
• Prof. Jim Hobert … Jim's Website
-SUNY Ulster
• Prof. Marty Wells … Marty's website
-Shasta College
A bit about Babson…
• Located in Wellesley, Massachusetts
• 2200 undergraduates, all business
majors with different concentrations.
A bit about Babson…
• Located in Wellesley, Massachusetts
• 2200 undergraduates, all business
majors with different concentrations.
• All students take at least two semesters
of a quant methods sequence with
statistics, a little calculus, math of
finance and operations research.
A bit about Babson…
• Located in Wellesley, Massachusetts
• 2200 undergraduates, all business
majors with different concentrations.
• All students take at least two semesters
of a quant methods sequence with
statistics, a little calculus, math of
finance and operations research.
• Quite a few two year college transfers!
A fun puzzle!
• Find the next number in each sequence
A.) 91, 72, 45, 9, 81 …
B.) 13, 19, 60, 51, 9 …
(This is the one previewed for the
conference.)
Sequence A
• 91, 72, 45, 9, 81 …
• The next number is …
Sequence A
• The next number is …
63
Sequence A
• The next number is …
63
-Sequence was 91, 72, 45, 9, 81 …
Consider each as a two digit number.
Reverse the digits then take absolute
value of the difference.
Sequence B
• 13, 19, 60, 51, 9, …
• The next number is …
Last call!
B.) 13, 19, 60, 51, 9, …
Last call!
B.) 13, 19, 60, 51, 9, …
But before we answer, let’s list some
numbers that we’re pretty sure are NOT
next!
Last call!
B.) 13, 19, 60, 51, 9, …
How about π ? Probably not?
Last call!
B.) 13, 19, 60, 51, 9, …
How about π ? Probably not?
1,346,700?
Last call!
B.) 13, 19, 60, 51, 9, …
How about π ? Probably not?
1,346,700?
- 215?
What’s NOT next?
• Thinking about what the answer is NOT
gives us quite a bit or information ...
What’s NOT next?
• Thinking about what the answer is NOT
gives us quite a bit or information ...
• OK … on to the solution:
Solution
• I don’t know!
Solution
• I don’t know!
…B is a list of five numbers randomly
generated using Minitab on a uniform
distribution on {1, 2, …, 100}.
LESSON NUMBER 1
• Some things are inherently random.
Somebody might correctly guess the
next number in the sequence, but they
would have to be lucky.
LESSON NUMBER 1
• Some things are inherently random.
Somebody might correctly guess the
next number in the sequence, but they
would have to be lucky.
• Or to put it another way:
LESSON NUMBER 1
• Some things are inherently random.
Somebody might correctly guess the next
number in the sequence, but they would
have to be lucky.
• Or to put it another way:
The world is more random than
most people think!
Other examples:
• Stars in the night sky… pretty random
but people attached stories to them!
Other examples:
• Stars in the night sky… pretty random
but people attached stories to them!
• Stock market results … analysts on the
news offer reasons every day for an
inherently random process!
Another fun experiment
• Step 1: Draw a square
Another fun experiment
• Step 1: Draw a square
• Step 2: Place five points (x’s) at
random in the square
Another fun experiment
• Step 1: Draw a square
• Step 2: Place five points (x’s) at
random in the square
• Step 3: Draw nine equal sub-squares in
your original, i.e. a tic-tac-toe board in
your original square.
A guess…
• Your five points are in five different subsquares, aren’t they? (And if yours
aren’t, your students will be!)
A guess…
• Your five points are in five different subsquares, aren’t they? (And if yours
aren’t, your students will be!)
• THAT’S NOT RANDOM!!! That’s a
pattern. Real randomness is ‘clumpier’
than people expect.
A quick representation of Lesson
#1
• MATH:
y = f(x)
A quick representation of
Lesson #1
• MATH:
• STAT:
y = f(x)
y = f(x) + error
Lesson #2 - Models
• Quick quiz …
1.) True/False: Mathematics is a useful
tool for modeling real world problems.
Lesson #2 - Models
• Quick quiz …
1.) True/False: Mathematics is a useful
tool for modeling real world problems.
2.) Name one of the problems.
Lesson #2 - Models
Let’s see what the crowd likes for math
models:
What about stat models?
They are everywhere … like in any
comparative sentence!
What about stat models?
They are everywhere … like in any
comparative sentence!
“This is a pretty good sandwich for a fast
food place!”
Lesson #2 in language…
“This is a pretty good sandwich for a fast
food place!”
Quality = f(venue) + error
Lesson #2 in language…
“This is a pretty good sandwich for a fast
food place!”
Quality = f(venue) + error
“Pedroia has a lot of power for such a
small guy.”
Lesson #2 in language…
“This is a pretty good sandwich for a fast
food place!”
Quality = f(venue) + error
“Pedroia has a lot of power for such a
small guy.”
Power = f(size) + error
Power = f(size) + error
• A mathematician modeling this
relationship might develop an equation
or system that has inputs like swing
plane, bat speed, pitch speed and
location and an output of a trajectory…
Power = f(size) + error
• A mathematician modeling this relationship
might develop an equation or system that
has inputs with details swing plane, bat
speed, pitch speed and location and an
output of a trajectory…
• A statistician would be interested in the
strength of the association between the
inputs and the outputs. How much of
power can be explained by size?
To Recap:
• Lesson 1: The world is more random
than people think.
• Lesson 2: Statements that can be
interpreted as statistical models are
common.
From models to decisions:
• Statistical models are easy to find, but
studying them and applying them takes
a careful approach. This is where
statistical expertise comes in!
Models to decisions:
• Statistical models are easy to find, but
studying them and applying them takes
a careful approach. This is where
statistical expertise comes in!
-What model?
-What population?
Models to decisions:
• Statistical models are easy to find, but
studying them and applying them takes
a careful approach. This is where
statistical expertise comes in!
-What model?
-What population?
-What sample?
-What measures?
Models to decisions:
• Statistical models are easy to find, but
studying them and applying them takes
a careful approach. This is where
statistical expertise comes in!
-What model?
-What population?
-What sample?
-What measures?
-Outliers?
-Presentation?
A timely aside…
• Statisticians reputations took a hit with
the recent election. But a sophisticated
look at the question paints a much less
negative picture.
A timely aside…
• Statisticians reputations took a hit with
the recent election. But a sophisticated
look at the question paints a much less
negative picture.
• Election eve www.fivethirtyeight.com
had the probability of Trump winning the
election at about 29%. Events like that
happen all the time!
A timely aside:
• By and large the election results
nationally were well within the margins
of errors of the polls.
A timely aside:
• By and large the election results
nationally were well within the margins
of errors of the polls.
• Media outlets and their consumers want
clear cut results like “Clinton will win”
when a statement like “Clinton appears
likely to win, but Trump has a
reasonable chance” tells the story!
A timely aside:
• That said, the fact that most polling data
had a small but consistent bias is
something that statisticians within those
polling firms will need to address.
A timely aside:
• That said, the fact that most polling data
had a small but consistent bias is
something that statisticians within those
polling firms will need to address.
• Opinion: The most likely causes are
non-response, not false response; and
incorrectly predicting likelihood of voting
Probability and Statistics
• Many math departments have a course
with this title.
Probability and Statistics
• Many math departments have a course
with this title.
Inside joke: Why are most Prob and Stat
courses like BWI, the BaltimoreWashington International Airport?
Probability and Statistics
• Many math departments have a course
with this title.
Inside joke: Why are most Prob and Stat
courses like BWI, the BaltimoreWashington International Airport?
Answer: BWI is a lot closer to Baltimore.
Lesson 3…
• Statisticians teaching probability should
try to back up the theoretical probability
models with real data and statistical
thinking.
Speaking of sports…
Sports media and sports fans love streaks
…
Famous examples?
Speaking of sports…
Sports media and sports fans love streaks
…
Famous examples?
HOWEVER, sports media and sports fans
tend to underestimate the probability of
streaks by taking them out of context.
More on streaks…
• Google the phrase “Sports Probability
Streaks” and the resulting link is:
http://www.mathaware.org/mam/2010/ess
ays/ClearyStreaks.pdf
Streak Example:
• At a 2007 game, the Boston Red Sox hit
four home runs in a row.
Streak Example:
• At a 2007 game, the Boston Red Sox hit
four home runs in a row.
• The next day, the Boston Globe quoted
a local math professor that the chance
of this was about one in two million.
Streak Example:
• At a 2007 game, the Boston Red Sox hit
four home runs in a row.
• The next day, the Boston Globe quoted
a local math professor that the chance
of this was about one in two million.
• Done by taking MLB home run rate
(.027) to the fourth power.
Surprising streaks
• Seems reasonable, right?
Surprising streaks
• Seems reasonable, right?
• EXCEPT this was the fourth time it had
happened in about 140,000 MLB
games.
Surprising streaks
• Seems reasonable, right?
• EXCEPT this was the fourth time it had
happened in about 140,000 MLB
games.
• So how was the expert opinion off by
two orders of magnitude???
Two reasons…
• Small but significant reason… the four
players (Manny Ramirez, Mike Lowell,
JD Drew, Jason Varitek) all had much
higher than average home run rates.
Two reasons…
• Small but significant reason… the four
players (Manny Ramirez, Mike Lowell,
JD Drew, Jason Varitek) all had much
higher than average home run rates.
• Big reason: Context! Four home runs
in a row during the game vs. four home
runs in a row RIGHT NOW!
Context explained…
• About 80 hitters come up during a major
league baseball game so the Red Sox
and opponent (Yankees) had MANY
CHANCES to start such a streak!
Context explained…
• About 80 hitters come up during a major
league baseball game so the Red Sox
and opponent (Yankees) had MANY
CHANCES to start such a streak!
• This is why streak probabilities are
underestimated; people forget there are
many chances to start a streak.
Classroom fun…
1.) Have some students actually flip a
coin about 80 times, tell others to just
simulate by writing H’s/T’s. The ones with
streaks of five or more H’s or T’s really did
the flips. (You can look like a genius!)
Classroom fun…
1.) Have some students actually flip a
coin about 80 times…The ones with
streaks of five or more H’s or T’s really did
the flips. (You can look like a genius! …
OR not!)
Classroom fun…
2.) Sports interested students: Find some
NBA teams with records of about .500 at
www.basketball-reference.com
See if they had a winning streak or losing
streak of at least five in a row. (About
85% chance that yes they did…)
Classroom fun…
2.) Sports interested students: Find some
NBA teams with records of about .500 at
www.basketball-reference.com
See if they had a winning streak or losing
streak of at least five in a row. (About
85% chance that yes they did…)
Does ANY fan, writer, blogger think a five
game winning streak is random?
Lesson 3  Lesson 1
• Recapping again:
1.) The world is more random than you
think.
2.) Stat models are everywhere and take
that randomness into account.
3.) Check claims with real data … and
back to 1!
What About the Other Way?
• What can statisticians learn from
mathematicians?
What About the Other Way?
• What can statisticians learn from
mathematicians?
• Sometimes statisticians need to
evaluate integrals in high dimensional
spaces.
What About the Other Way?
• What can statisticians learn from
mathematicians?
• Sometimes statisticians need to
evaluate integrals in high dimensional
spaces.
• Examples: Variance calculations, joint
probabilities, Bayesian analysis…
Stat learns from Math…
• These high dimensional integrals are
sometimes easiest to evaluate in a
particular order ...
Stat learns from Math…
• These high dimensional integrals are
sometimes easiest to evaluate in a
particular order ...
• A key question: WHEN CAN WE
CHANGE THE ORDER OF
INTEGRATION?
Math to The Rescue!
Fubini’s Theorem! Suppose A and B are
complete measure spaces. Suppose f(x,y)
is A × B measurable. If
where the integral is taken with respect to a
product measure on the space over A × B,
then we can change the order of
integration… i.e.
Fubini continued…
A slight modification to Fubini
• A statistician has a slightly different
answer to the question, “When can we
change the order of integration?”
A slight modification to Fubini
• A statistician has a slightly different
answer to the question, “When can we
change the order of integration?”
Whenever we want!
Neat Streak #2…
Barton College basketball…
1/21/13 … Barton 76, Pfeiffer 68
Neat Streak #2…
• Barton College basketball…
1/21/13 … Barton 76, Pfeiffer 68
1/24/13 … Barton 76, Queens 68
Neat Streak #2…
• Barton College basketball…
1/21/13 … Barton 76, Pfeiffer 68
1/24/13 … Barton 76, Queens 68
1/26/13 … Barton 76, Erskine 68
Neat Streak #2…
• Barton College basketball…
1/21/13 … Barton 76, Pfeiffer 68
1/24/13 … Barton 76, Queens 68
1/26/13 … Barton 76, Erskine 68
What’s the next score in THIS
sequence??
Neat Streak #2…
• Barton College basketball…
1/21/13 … Barton 76, Pfeiffer 68
1/24/13 … Barton 76, Queens 68
1/26/13 … Barton 76, Erskine 68
What’s the next score in THIS
sequence??
(Alas, they lost to Mount Olive, 80-75.)
What are the chances?
• Of a team winning three straight games
by the same score?
What are the chances?
• Of a team winning three straight games
by the same score?
• -For YOUR favorite or hometown team,
this season, the probability is very
small!
What are the chances?
• Of a team winning three straight games
by the same score?
• -For YOUR favorite or hometown team,
this season, the probability is very
small!
• -For SOME team in somebody’s town
during the next 20 years? There’s a
pretty good chance it will happen again!
Neat Streak #3…
• The Chicago Blackhawks opened this
year’s National Hockey League season
with 24 straight games without a
regulation time loss.
Neat Streak #3…
• The Chicago Blackhawks opened this
year’s National Hockey League season
with 24 straight games without a regulation
time loss.
• Again, what are the chances? And again,
the media was interested:
http://www.usatoday.com/story/sports/nhl/bla
ckhawks/2013/03/04/blackhawks-streakodds/1963659/
Getting quoted is (usually)
fun…
• Later in the year, this appeared in
USAToday;
• http://www.usatoday.com/story/sports/2
013/05/23/home-iceadvantage/2354541/
The lesson here is…
A business implication: Mathematicians
rarely get consulting jobs solving
problems from Calculus I. But
statisticians routinely get paid (and
praised) for solving problems from Stat
101.
• THANKS for your attention and for all
the good work you do!