Download Using Probability Ideas in Dealing with Data

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia, lookup

Foundations of statistics wikipedia, lookup

Probability wikipedia, lookup

Probability interpretations wikipedia, lookup

History of statistics wikipedia, lookup

Higher Level Module H1
Module H1
Using Probability Ideas in Dealing with Data
This module will provide students with an understanding of basic ideas about probabilities
and their manipulation, and of elementary probability distributions and their uses. This
should equip them to discuss these ideas intelligently in the context of a National Statistical
System, and provide a basis for further training in this programme or thereafter.
The demographic section of the syllabus concerns some ways of making effective use of
routinely collected population statistics, of the role of probability in providing a rational
basis for calculations and of the need to combine that methodology with careful
consideration of practicalities, and a critical approach to interpretation.
Successful students will be able to:
Explain basic concepts of probability and probability distribution needed to
underpin later learning of general statistical work.
Distinguish between discrete and continuous measurements and how this plays out
in probability contexts.
Recognise the usefulness of probability distribution models for statistical inference,
while realising that such models are based on assumptions that may or may not be
acceptable in practice.
Use probability concepts, and realistic interpretation of figures derived from data,
in the construction and application of life tables, including basic formulations of
population projection.
SADC Course in Statistics
Module H1 – Page 1
Higher Level Module H1
Expected Outcomes
In respect of probability ideas, the module is preparatory/foundational.
The demographic strand, in the second half of the module, applies probability ideas but
combines these with meticulous arithmetical manipulation of survival data to build up the
concept of life expectancy. Students will learn to appreciate its uses. Further areas are
included where life table formats find application. First, competing risks are discussed to
illustrate how lives (e.g. working lives in a given organisation) can be terminated in several
ways, and how this feeds into ideas about multiple decrement tables. As well as mortality,
participants will learn how those concerned with population projections need also to
consider figures for, and assumptions about, fertility and migration.
A good mathematical foundation as can be achieved by completing the Arithmetic, Algebra
and parts (linear, polynomial, exponential, logarithm function and sigma notation) of the
Functions, Graphs and Sequences modules of an excellent set of training materials
available at
Students attending this module should also be familiar with the use of Excel and with the
use of the Excel add-in named SSC-Stat. This would be equivalent to skills gained by
attending the Basic Level module B2 and the Intermediate Level module I2.
It is recommended that students taking this module should be familiar with Intermediatelevel module I5 on Basic Demographic and Epidemiological Ideas, or courses at least
SADC Course in Statistics
Module H1 – Page 2
Higher Level Module H1
Session 01. Introduction to Probability and Life-Table ideas
Meaning of probability, probability based on relative frequencies/ proportions. Probability
concepts in a life-table.
Session 02. Laws of Probability:
Fundamental laws of probability. Results emerging from these laws. Venn diagrams,
events, union and intersection of events, complement of an event, mutually exclusive
Session 03. Conditional Probabilities and Independence:
Definitions, notation. Independence of events. Conditional probability and independence.
Law of Total Probability. Bayes’ Theorem. Addition rule for probabilities. Multiplication
rule for independent events. Tree diagrams.
Session 04. Probability Distributions:
Discrete and continuous random variables. Meaning of a probability distribution.
Examples. Expected value, moments and variance of a random variable. Skewness and
Kurtosis. Cumulative distribution function.
Session 05. Joint Distributions:
Definitions. Joint and Marginal distributions. Conditional distributions. Independence.
Using two-way tables of frequencies to determine joint, marginal and conditional
SADC Course in Statistics
Module H1 – Page 3
Higher Level Module H1
Session 06. The binomial distribution:
Introducing a discrete distribution, i.e. the binomial distribution. Importance of
recognising the underlying data-generating process. Its depiction and interpretation of
probabilities. Examples with varying values of p. Mean and variance of the binomial
Session 07. The Poisson distribution:
Definition of the Poisson distribution. Worked examples. Mean and variance of the
Poisson distribution. Graphs for varying values of the Poisson parameter . Cumulative
Poisson probabilities.
Session 08. The normal distribution:
Normal distribution introduced in own right. Continuous variables as opposed to discrete.
Symmetry. Mean and variance – diagrams. Probability as area under curve. Tables of the
standard normal distribution. Cumulative curve.
Session 09. Importance of the normal distribution:
Underlying notion of normal variation as resultant of many minor influences up and down.
The Central Limit Theorem and consequences thereof. Normal approximation to the
binomial and Poisson distributions. Checking for normality using normal probability plots.
Session 10. Review and further practice:
Review of the three main distributions. Poisson approximation to the binomial.
Identifying variables as following a binomial, Poisson or normal distribution. Practice with
further examples.
SADC Course in Statistics
Module H1 – Page 4
Higher Level Module H1
Session 11. Using Probability Ideas in Life Tables:
The widespread use of Life Tables. The data input: a sequence of conditional probabilities
of death in a single-year {qx} or n-year period {nqx}. The abridged Life Table.
Computations of numbers surviving. Demographic “algebra” or “shorthand”.
Session 12. Basic Life Table Computations – I:
The calculations of numbers dying by age(group) i.e. {ndx}, and of years lived i.e. {nLx}.
Interpretations. Graphs and their usefulness.
Session 13. Basic Life Table Computations – II:
Completing the Life Table. The calculations of residual life i.e. cumulative total years lived
beyond exact age x i.e. {Tx}, and of life expectancy i.e. {ex}. Interpretations. Graphs and
their usefulness.
Session 14. A Life Table Discussion Topic:
An extension of the “normal” basic calculations of the Life Table to {tpx} and {t|kqx},
motivated by a published example, and discussed through a series of classroom questions,
to practise and develop fluency in using Life Table ideas.
Session 15. Putting the Life Table in Context:
Discussion of the real interpretation of cohort and cross-sectional Life Tables, and the
stationary population. Discussion of data sources and the derivation of single-year {qx} ~
revising coverage in module I5 ~ and n-year period {nqx} ~ an extension of previous
discussion. Brief description of role of Model Life Tables, of the need for large data
samples, of the importance to insurers of allowing for life-style factors, and of statisticians’
responsibilities to work with other users.
SADC Course in Statistics
Module H1 – Page 5
Higher Level Module H1
Session 16. Applications of Stationary Population Ideas:
Elementary examples of how Life Table ideas find application in manpower planning,
illustrated by two very simple scenarios and discussed through a series of classroom
questions, to practise and further develop fluency in using Life Table idea.
Session 17. Competing Risks & Multiple Decrement Tables:
The idea of having more than one exit from the Life Table population. Example
developed of “competing risks” ~ medical statistical terminology for several disease groups
as multiple possible causes of death. Dependent observable rates and the computation of
“independent” rates. Illustration of how “what-if” calculations can be based on the
independent rates and translated back into dependent rates.
Session 18. Fertility Ideas:
Simple cross-sectional rates: crude birth rate, general and age-specific fertility rates. Effect
of changes of age at child-bearing. Child-woman ratio. Generation-based rates and ageperiod-cohort effects. Average completed family size, total fertility rate, gross and net
reproduction rates. Critique.
Session 19. Population Projections - I:
Projection, not prediction. “Mathematical” models of total population. Component
methods. Projecting age forward and accounting for deaths.
Migration ~ discussion
based on UK material. Age-specific fertility and births. Infant deaths.
Session 20. Population Projections - II:
Reasons for using projections: education planning examples. Politics of migration ~ UK
Smoother projections based on information about generation effects. General
use of varied assumptions in “steering” projections.
SADC Course in Statistics
Module H1 – Page 6