Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics – Making Sense of Data History of Math – Fall 2006 Fred Stenger Larry L. Harman Where did the term statistics come from? Derived in the 18th century from the Latin term statsticum collegium (“council of state”) because statistics represented the scientific study of state affairs. The state affairs included herd sizes, grain supplies, army strength, etc. – information the government would use to predict and prepare for military action, famine, plague, etc. Some scholars say these needs created the invention of numbers themselves. Early Developments John Graunt – London 1662 – Bills of Mortality from 1604 to 1661. Observed that more male births than females and women live longer. Annual death rate is constant barring epidemic circumstances – data tracking. Edmund Haley – 1693 – (Haley’s comet) Famous Astronomer. Began actuarial tables (science) and beginnings of data tracking for insurance companies. Statistics vs. Probability In the 1700s - Probability and statistics developed together as two related fields of the mathematics of uncertainty. Probability explores what can be said about an unknown sample of a known collection. For example: knowing all possible combinations of a pair of dice, what is the likelihood of rolling a seven. Statistics explores what can be said about an unknown collection by investigating a small sample. For example: knowing the life span of 100 Americans, we can estimate how long Americans are likely to live. Astronomy drove some of the later developments when astronomers used mean and average to predict minute changes in plants and star positioning – looking for a way to handle outliers and unexplained differences. More Developments Jakob Bernoulli (1654-1705) “Ars conjectandi” (the art of conjecture) “Law of Large Numbers” - the larger the sample size the better representation of actual data or circumstances. This seems intuitive to us now, but back then the required sample size was unknown. Abraham De Moivre (1733) established the famous binomial distribution curve (probabilities of 0.5). What we now call the normal curve. De Moivre used this idea (later rediscovered by Gauss and Laplace) to improve on Bernoulli’s estimates. His ultimate goal was to use probability and statistics toward society’s practical questions. The process materialized with Pierre Simon Laplace’s publication in 1812 called “Analytical Theory of Probabilities”. More Developments...cont. Adrien Marie Legendre (1752 - 1833) (published in 1805 the “la methode de moindres quarres” or the method least squares. His method rivaled that of Laplace in that he used the errors to help with overall predictions. “By this method, a kind of equilibrium is established among the errors which, since it prevents the extremes from dominating, is appropriate for revealing the state of the system which most nearly approaches the truth.” Statistics in Social Sciences Statistics then began inroads into the social sciences (kind of where it started) in 1835 when Lambert Quetelet of Belgium published a book called “Social Physics”. In this book he attempted to apply the laws of probability to the study of human characteristics. Unlike other social sciences, psychology embraced this method of statistical analysis. Statistics Emerges With many advances in the 19th century, statistics emerged from the shadow of probability to become a mathematical discipline in its own right. The advances focused around data collection and processing - the major contributors were: Sir Francis Galton (1860s) a first cousin of Charles Darwin – used statistics to help improve the human race by selective breeding (eugenics movement). The two methods of data analysis he is credited with are: regression and correlation. He used these methods to predict hereditary traits in humans. Karl Pearson and his student Undy refined Galton’s work into an effective methodology of regression analysis using a subtle variant of Legendre’s method of least squares. This paved the way for widespread use of statistics throughout the biological and social sciences. Modern Developments Some modern advances came from William S. Gosset working as a statistician for the Guinness Brewery where he discussed sample size and deriving reliable data from small samples. Ronald A. Fisher (1890 – 1962) widely considered the most important statistician of the early 20th century wrote books called: “Statistical Methods for Research Workers” and “The Design of Experiments”. With computers and significantly larger data sets statisticians can provide more accurate predictions. John Tukey of Bell Labs and Princeton University invented (1960s) what he called “Exploratory Data Analysis” – a collection of methods for dealing with today’s large data sets. He also coined the words “software” and “bit”. Where are statistics now? Stephen Stigler paraphrases the ascent of statistics in modern society – “modern statistics…. is a logic methodology for the measurement of uncertainty and for an examination of the consequences of that uncertainty in the planning and interpretation of experimentation and observation”. Their work illustrated the evolution of statistics and data analysis from social sciences (life expectancy, actuarial tables, politics, etc) to company related tools like quality assurance, design of experiments, and product failure charts. Penny Flip Probability that a head first arrives on an odd toss. Flip Penny Head- First throw is odd Tail - Flip again Second flip Head- Throw is even Tail – Flip again Third Try Head-Throw is odd Tail Flip again. Etc, Etc. Do 5 times. Keep track or results Penny Flip …. cont. Sum of Probabilities (Success on Odd Throw) = = .67 Sum of Probabilities (Success on Even Throw) = = .33 Check Statistically 1- Probability Z-Test Design of Experiments Widely used in industry by medical manufacturers, software developers, food manufacturers, electronics manufacturers, etc. – 6 sigma, IPC, ISO, etc. Not widely used in education, but could it be? Planning and set-up is the most important part of DOE (determining independent variables, dependent variables, control variables, random control variables, etc.). Drag car racing DOE. Math teaching DOE exercise. Timeline 1662 – John Graunt published a pamphlet entitled Natural and Political Observations Made upon the Bills of Mortality. Around 1640s – John Graunt and William Petty founded the field of “Political Arithmetic”. 1693 – Edmund Haley founded actuarial science. 1713 - Jakob Bernoulli “Ars conjectandi” (the art of conjecture) “Law of Large Numbers”. 1733 - Abraham De Moivre established the famous binomial distribution curve. 1805 - Adrien Marie Legendre (published the “la methode de moindres quarres” or the method least squares. 1812 - Pierre Simon Laplace’s publication called “Analytical Theory of Probabilities” is released. 1835 - Lambert Quetelet published a book called “Social Physics”. 1860s - Sir Francis Galton discovers regression and correlation. 1890s - Karl Pearson and G. Undy refined Galton’s work into an effective methodology of regression analysis. Early 1900s - William S. Gosset discussed sample size and deriving reliable data from small samples. 1925 - R.A. Fisher wrote books called: “Statistical Methods for Research Workers” and “The Design of Experiments”. Mid 1960s - John Tukey invented what he called “Exploratory Data Analysis”. References Berlinghoff, William P. and Gouvea, Fernando Q. Math Through the Ages – A Gentle History for Teachers and Others Oxton House Publishers; copyright 2002. Katz, Victor J. A History of Mathematics Pearson Education; copyright 2004. http://en.wikipedia.org/wiki/statistics. http://cm.bell-labs.com/cm/ms/ departments/sia/tukey/index.html. www.statease.com/pubs/dragracing.pdf