Download Glossary - FRST 231

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
20 Intro Prob Glossary
25/4/08
11:37
Page 387
Glossary
acceptance region: the range of values for a sample statistic where the null hypothesis
is not rejected.
addition rule: a probability rule based on the union of events. For two events A and
B, the addition rule is denoted by: P(A∪B) = P(A) + P(B) – P(A∩B).
alternative hypothesis: a statement which is contradictory to the null hypothesis,
denoted by H1.
arithmetic average: see mean.
attribute charts: statistical process control charts used for monitoring attribute data,
including p charts.
attribute data: in statistical process control, production-related data that require an
operational definition of acceptable and defective products.
average: see mean.
bar graphs: graphical tools used to present information summarized in categorical
frequency distributions or ungrouped frequency distributions created for discrete
variables. Since the horizontal axis is not a continuous random variable, the bars
do not touch each other.
Bayes’ Theorem: a logical proposition used to solve conditional probability problems
that generally occur in reverse order of time. Bayes’ Theorem gives the
conditional probability of the random variable A given B in terms of the
marginal probability distribution of A alone and the conditional probability
distribution of variable B given A.
bias: the amount by which a sample estimate systematically under/over-estimates the
true value of a parameter. Bias can occur, for example, when equipment used for
recording measurements are not calibrated properly.
bimodal: a population or sample with two modes.
bivariate distribution: see joint probability distribution.
bivariate frequency distribution: the joint, simultaneous distribution of two variables.
bivariate normal distribution: a joint statistical distribution of two random variables
which may or may not be correlated, and where each has a normal marginal
distribution.
blocks: groups of smaller, more uniform experimental units used in experimental
designs if the experimental units, area, time or material are not homogeneous.
blocking: see blocks.
categorical frequency distributions: frequency distributions used to place qualitative,
ordinal or nominal level variables into specific categories.
categorical variables: see qualitative variables.
Central Limit Theorem: one of the most important theorems in statistics, formalizing
the relationship between a specific parameter of a population and its estimate
(statistic). This theorem posits that when the sample size (n) is sufficiently large
(n ≥ 30), the sampling distribution of sample means approaches a normal
Glossary
387
20 Intro Prob Glossary
25/4/08
11:37
Page 388
distribution with a mean equaling the population mean and the standard
deviation equaling the standard error of the mean.
Chebyshev’s Theorem: a theorem which can be applied to samples or populations of
any kind, and states that at least the fraction (1 − 1/k2) of the observations must
lie within k standard deviations of the mean, regardless of the shape of the
distribution of the data (where k is any constant greater than one).
χ2) distribution: a positively skewed, positive-valued distribution that
chi-square (χ
describes the sampling distribution of the variances. It has a mean of n – 1 and
approaches the normal distribution at larger sample sizes.
circular permutation: the number of permutations of n distinct subjects positioned in
a circle, denoted by Pc.
class boundaries: the values occurring halfway between the upper class limit of one
interval and the lower class limit of the next interval in a frequency distribution.
class frequency: the number of observations that fall in a particular class in a
frequency distribution.
class intervals: see classes.
class limits: the smallest and largest possible values that can fall into a given class in a
frequency distribution.
class mark: see class midpoint.
class midpoint: the average of the upper and lower class limits, or upper and lower
class boundaries, of a class in a frequency distribution.
class width: the difference between the upper and lower class boundaries of a given
class in a frequency distribution.
classes: the various bounded groupings (generally with similar intervals) defined for a
frequency distribution within which data observations are placed.
classical probability: probability calculated as the ratio of the number of outcomes
favourable to a particular event versus the number of possible outcomes in a
sample space.
coefficient of variation: the standard deviation expressed as a percentage of the mean.
collectively exhaustive: a quality of events where the sum of the probabilities for all
possible events in the sample space equals unity.
combination: the number of possible outcomes when order is not important. Commonly
denoted by n or nCr and often stated as ‘n choose r’.
( )
r
complement: the event containing all the elements of the sample space that are not
contained in the event. The complement of B is denoted by B.
completely randomized design: the simplest of the experimental designs wherein
treatments are randomly assigned to each experimental unit (in time or space).
compound event: an event that consists of two or more simple events.
conditional distribution: the distribution of a random variable given that other
variables have certain specified values.
conditional probability: a redefined sample space, where a given event, B, has
occurred, and we are interested in understanding the effect of this information on
the probability of event A occurring. The conditional probability of event A given
that event B has occurred is denoted by P(A|B).
confidence interval: for a given confidence level (or degree of confidence), the interval
between the lower confidence limit (LCL) and upper confidence limit (UCL). See
confidence limits.
388
Introductory Probability and Statistics
20 Intro Prob Glossary
25/4/08
11:37
Page 389
confidence level: the quantity, (1 – α)100%, which describes the degree of statistical
certainty that can be attached to an observed statistic. The most frequently used
values of α are 0.10, 0.05 and 0.01, resulting in 90%, 95% and 99% confidence
intervals, respectively.
confidence limits: upper (UCL) and lower (LCL) bounds of the interval where the
probability of finding the true parameter, θ, is set at a confidence value, 1 – α. The
probability that we will find the true population parameter between LCL and
UCL is 1 α: P(LCL < θ < UCL) = 1 – α.
consistent: a quality of an estimator such that as the sample size, n, approaches infinity,
the value of the estimator approaches the value of the population parameter. An
unbiased estimator is consistent if, as n → ∞, var (θ̂) → 0 and θˆ → θ.
continuity correction: a constant applied to a random variable (usually equal to half
of the unit of measurement for continuous variables).
continuous random variable: a random variable defined over a continuous sample
space, where the probability of any exact value is always zero.
continuous sample space: a sample space that contains an infinite and uncountable
number of outcomes.
continuous variable: a quantitative variable that can take on all possible values over a
specific interval.
control chart: a graphical device used in statistical process control to determine
whether a production process is in or out of control based on sampled data.
control chart constants: conversion and correction factors used in the production of
statistical process control charts.
corrected sum of squares: a measure of spread equal to the sum of squared deviations
of each observation from the mean, so named because each observation is
‘corrected for’ the mean before it is squared.
covariance: the measure of joint variation between two random variables. Covariance
may be zero (when two random variables are independent), positive (when the
value of the variables increases together), or negative (when the value of one
variable increases, the value of the other variable decreases).
critical region: the range of values for a sample statistic where the null hypothesis is
rejected.
critical value: a selected arbitrary value along a statistical distribution, below or above
which the null hypothesis is rejected.
cumulative frequency: the frequency of all observations less than a particular value of
a random variable (for a frequency distribution, the upper class boundary of a
given class). Often referred to as the ‘less than frequency’.
data: pieces of information collected on subjects or items from a population that form
the building blocks of statistics.
deciles: divisions of the frequency distribution into ten equal groups that correspond
to the 10th, 20th, ....., and 90th percentiles.
degree of confidence: see confidence level.
degrees of freedom: the number of unrestricted observations used to calculate a
statistic.
dependent populations: random variables that occur in pairs and where the response
value of one variable is at least partly a function of the response of the other.
Glossary
389
20 Intro Prob Glossary
25/4/08
11:37
Page 390
dependent samples: sampled observations that occur in pairs and where the response
value of one sample is at least partly a function of the response of the other.
descriptive statistics: a branch of statistics dealing with the collection, organization
and presentation of information, and the calculation of some measures (statistics)
which describe the information.
discrete random variable: a random variable defined over discrete sample space.
discrete sample space: a sample space that contains a finite number of elements. A
discrete sample space can be unending, but countable.
discrete variables: quantitative variables which take on whole numbers only and
usually result from counting (tallying) items.
disjoint: see mutually exclusive.
distribution-free tests: see non-parametric tests.
efficient: the quality of the unbiased estimator of a given parameter, θ, having the
smallest variance.
element: a single outcome of an experiment within a given sample space.
empirical probability: the likelihood of an event happening based on experiments for
which all possible outcomes and the number of outcomes favouring the event are
not known exactly, but have generally been observed.
Empirical Rule: a rule which states that approximately 68%, 95% and 99.7% of the
observations from a normal distribution will lie within one, two or three
standard deviations of the mean, respectively.
estimate: see point estimate.
estimation: the process of estimating the values of parameters based on measured or
empirical data.
estimator: a function used to estimate an unknown parameter from observed data.
event: a subset or portion of the elements in a sample space.
expected value: the theoretical mean of a probability distribution, denoted by E(X),
interpreted as the long-term average that is ‘expected’ if an experiment is
conducted repeatedly.
experimental design: a means of collecting data in which one or more of the factors
affecting the variable(s) of interest are controlled, with the purpose of
investigating how these controlled factors affect the variable(s) of interest.
experimental error: the pooled variation among experimental units receiving the same
treatment in an experimental design.
experimental study: see experimental design
exponential distribution: the continuous counterpart to the Poisson distribution. The
exponential distribution describes the elapsed times between occurrences of
consecutive events as a function of the mean elapsed time.
F distribution: a distribution which describes the ratio of two independent χ2-values,
where each is divided by its degrees of freedom. There exist many such curves,
but each is positively skewed and positive-valued.
finite population: a population consisting of a fixed, countable number of elements,
which can be, if necessary, listed.
finite population correction factor: a multiplicative adjustment used in the calculation
of the standard error of the mean when the sample size is large relative to the
population size, specifically when n < 0.05N.
390
Introductory Probability and Statistics
20 Intro Prob Glossary
25/4/08
11:37
Page 391
frequency distribution: a systematic arrangement of data to describe a variable, where
observations (raw data) are ordered or grouped into classes, and the frequency of
observations is tallied and presented in tabular form. Frequency distributions can
be categorical, ungrouped, or grouped.
frequency polygon: a graphical display of a frequency distribution constructed by
plotting frequency (or relative frequency) against class mark (or value of the
random variable in the case of ungrouped data), and then joining each point by a
sequence of line segments. To close the polygon, an ‘imaginary’ class midpoint
with zero frequency is added to both ends of the distribution.
geometric distribution: a discrete probability function which possesses all the properties
of a binomial experiment except that trials are repeated until the first success occurs.
The geometric random variable, X, represents the number of repeated independent
trials required to produce the first success, the probability of which is p.
geometric experiment: see geometric distribution.
geometric mean: a special form of the mean that is used for ratio data like population
growth, rates of change, economic indicators, etc. The geometric mean of n
observations is the nth root of the product of the n observations.
grand mean: a special application of the weighting procedure used to find the overall
combined mean of several groups of data when the mean of each individual
group is known.
grouped frequency distribution: a frequency distribution usually used to summarize
continuous (interval or ratio scale) variables.
H-test: see Kruskal-Wallis test
harmonic mean: a special form of mean used for data where one element remains
constant but another changes. The harmonic mean is calculated as the reciprocal
of the mean of the reciprocals of the individual values.
histogram: a graphical tool for presenting the grouped frequency distribution of a
continuous variable. Like a bar graph, the middle of each bar is the class
midpoint; however, histograms do not contain spaces between bars so that bars
touch at class boundaries.
hypergeometric distribution: a discrete probability distribution that has two possible
outcomes, but where the probability of subsequent events are dependent upon
previous outcomes. In other words, the probability of success from trial to trial is
not constant and the successive trials (made without replacement from a finite
population) are not independent.
hypothesis: a statement or claim made about a parameter or a certain characteristic of
a population.
hypothesis testing: a procedure in applied statistics for determining whether a statement
or claim made about a parameter or a certain characteristic of a population is
plausible, based on some sample data collected from the population.
independence: two events are statistically independent if the probability of one event
is not affected by the occurrence or nonoccurrence of the other event.
independent populations: two populations are statistically independent if the
distribution of values in one population is not affected by the values in the other
population.
Glossary
391
20 Intro Prob Glossary
25/4/08
11:37
Page 392
inferential statistics: a branch of statistics dealing with the generalization of information
obtained in a sample to an entire population. Common procedures include
estimation, hypothesis testing, determining relationships and prediction.
infinite population: a population where (in theory) there is no limit to the number of
possible observations (or measurements). In sampling, the word ‘infinite’ is used
rather loosely and is used to refer to a population with a large number of possible
measurements.
intersection: for two events, A and B, the event that contains all the elements common
to both A and B. The intersection is denoted by A∩B.
interval estimate: see confidence interval.
interval estimation: the process of determining a confidence interval; that is, an
interval within which we expect to find the unknown population parameter.
interval scale: a scale of measurement with the same properties as the ordinal scale,
but where the data are always quantitative and the differences between data
values are meaningful.
inverse cumulative frequency: the frequency of all values greater than a particular
value of a random variable (for a frequency distribution, the lower class
boundary of a given class). Often referred to as the ‘more than frequency’.
inverse relative cumulative frequencies: inverse cumulative frequencies expressed as
percentages (or proportions) of the total frequencies.
joint probability distribution: for two random variables, X and Y, the probability
distribution of X and Y together.
joint probability function: joint probability expressed as a function of the random
variables X and Y. The function is denoted by f(x,y), which represents the
probability that X assumes the value x at the same time Y assumes the value y.
Kruskal-Wallis test: a non-parametric test used to compare three or more unknown
population means.
Latin square design: an experimental design used when the natural variation between
experimental units cannot be reduced by simple blocking alone and the variation
of the experimental units are removed in two directions.
layout: the placement of treatments on experimental units in an experimental design.
level of significance: the size of type I error (α). The value is arbitrary in that it is
selected by the person carrying out the statistical test, but 0.1, 0.05 or 0.01 are
generally used.
lower class limit: the smallest possible value that can fall into a given class in a
grouped frequency distribution.
lower confidence limit (LCL): see confidence limit.
lower control limit: the lower limit on a statistical process control chart, beyond
which production processes are said to be out of control.
lower warning limit: a lower limit on a statistical process control chart which is used
to draw attention to potential production-related problems.
Mann-Whitney U-test: see Wilcoxon rank sum test.
marginal probability: the probability of some event, regardless of the outcome of
other events. For a joint probability distribution f(x,y), the marginal probability
392
Introductory Probability and Statistics
20 Intro Prob Glossary
25/4/08
11:37
Page 393
f(x) results from constructing a probability distribution for X over all possible
values of Y.
mathematical expectation: see expected value.
mathematical expectation of a random variable: see population mean of a random
variable.
mean: a measure of central tendency that is calculated by dividing the sum of the
observations by the number of observations.
mean deviation: a measure of variation, calculated as the average of the absolute
values of the deviations of each of the observations from the sample or
population mean.
mean of a random variable: the weighted average of all possible outcomes of a random
variable, where the weights are the probabilities of the respective outcomes.
mean square: see variance.
median: the middle value when a set of observations is arranged in increasing or
decreasing order of magnitude, dividing the frequency distribution into two
equal groups and corresponding to the 50th percentile. The median is the
preferred measure of central location when extreme values are present.
midrange: a measure of central tendency defined as the average of the minimum and
maximum values.
mode: a measure of central tendency defined as the most frequently occurring value in
a sample or a population. Some data sets may have more than one mode (e.g.
when several values occur with the greatest frequency) and others may have no
mode at all.
multimodal: a population or sample with more than two modes.
multinomial distribution: a discrete probability distribution having all the properties
of a binomial distribution, except that more than two outcomes are possible from
each trial.
multiplication rule: a counting rule used to calculate the total number of outcomes for
a sample space or event. The rule states that if a random experiment has a
sequence of two steps, in which there are n1 possible outcomes for the first step
and n2 for the second, the total number of outcomes is the product of the two
numbers (n1 n2).
multivariate hypergeometric distribution: a probability distribution having all the
properties of a hypergeometric distribution, except there are more than two
possible outcomes.
mutually exclusive: a quality ascribed to two or more events which have no common
intersecting elements (i.e., when one event occurs the others cannot). For two
mutually exclusive events, A and B, A∩B = ∅.
negative binomial distribution: a discrete probability distribution which is an
extension of the binomial and geometric distribution, describing the situation
where trials are repeated until a fixed number of successes, k, occurs.
nominal scale: a scale of measurement where numbers or categories are used to
classify, name or label an individual or attribute, but the numbers or categories
have no specific order or importance.
non-critical region: see acceptance region.
non-parametric test: a statistical test that makes no assumptions about the distribution
or the parameters of the distribution from which observations are drawn.
Glossary
393
20 Intro Prob Glossary
25/4/08
11:37
Page 394
non-sampling error: errors arising during the course of data collection that are not
due to sampling. This includes errors from non-responses, improper coding,
instrument miscalibration, etc.
normal distribution: a continuous, symmetrical, bell-shaped distribution whose shape
and position are determined by the mean and standard deviation. Many of the most
important theories in statistical inference are based on the normal distribution, also
often referred to as the Gaussian distribution or the Laplacian distribution.
null hypothesis: a statement about a characteristic of the population assumed to be
true, denoted by H0.
null space: an event containing no elements in a given sample space.
observational study: a study where investigators observe without altering or
influencing the variable under study.
odds: a term used in subjective probability, often seen in gambling, sporting events,
and horse racing, which refers to the ratio of the probability of an event
occurring versus the probability of the event not occurring.
ogive: a graphical tool representing cumulative or inverse cumulative frequencies,
plotted in a similar manner to a frequency polygon. The cumulative frequencies
are plotted against the upper (cumulative) or lower (inverse cumulative) class
boundaries and joined by line segments. Also known as a cumulative frequency
or inverse cumulative frequency graph.
one-tailed tests: a hypothesis test which can be refuted in only one direction, i.e., the
inequality in the alternative hypothesis is generally ‘less than’ or ‘greater than’
some value.
open class: in a grouped frequency distribution, when the first (or last) class has no
lower (or upper) limit, to accommodate a very few (one or two) extreme
observations in the data set.
operating characteristic (OC) curve: a curve describing how the values of (the
probability of ‘accepting’ the null hypothesis when it is false) change over a range
of values of µ, n and/or α.
ordinal scale: a scale of measurement similar to the nominal scale, but where the order
or rank of the categories is meaningful.
outcome: the result of an experiment.
outliers: extreme values in a data set.
p chart: an attribute control chart used in statistical process control for monitoring
the sample proportion of defective products.
p-value: the smallest level of significance at which H0 will be rejected. Depending on
the direction of the test, the p-value indicates the probability of obtaining a value
in the sampling distribution of the test statistic less than or greater than the
calculated test statistic.
parameters: the characteristics of a population, usually denoted with Greek letters
(e.g. µ, σ ).
parametric tests: statistical testing methods that use values which uniquely define a
probability distribution and involve testing estimates of parameter values.
percentile: a measure indicating the position of an observation within a data set (not
the same as a percentage). In general, the pth percentile is the value such that p
per cent of the items in the data set fall at or below that value.
394
Introductory Probability and Statistics
20 Intro Prob Glossary
25/4/08
11:37
Page 395
permutation: the number of possible outcomes when order is important. Commonly
denoted by nPr.
permutation of similar objects: a special kind of permutation used when some of the
objects, among the n objects, are not distinguishable.
pie chart: a graphical presentation of a variable relative to a totality using a circle
divided into sectors representing each category’s frequency proportional in size to
the total.
point estimate: a single numeric estimate of a population parameter calculated from
the information in a sample.
point estimation: see point estimate.
Poisson distribution: a discrete probability distribution describing independent events
that occur in a fixed time (or space) with a known average rate.
Poisson experiments: a series of trials or tests where the variable of interest follows a
Poisson distribution.
population: the entire collection of items/subjects possessing certain common
characteristics about which information is being sought.
population mean: the mean of all elements in a population.
posterior probabilities: reversed conditional probabilities used in Bayes’ Theorem.
power of a test: the probability that a test will reject the null hypothesis when it is in
fact false.
prediction: the value of the dependent variable obtained from a regression equation
using a particular value of the independent variable.
prior probability: a conditional probability based on previously observed frequencies
in a sample space or event.
probability: (i) the branch of mathematics incorporating the most important set of
concepts used in statistics; (ii) the measure of likelihood of the occurrence or
nonoccurrence of an event. The probability of an event, A, is denoted by P(A)
and can be classical, empirical, or subjective.
probability density: a function associated with a probability distribution that specifies
how the values of a random variable are distributed over its possible range.
probability distribution: for a given random variable, the list of all possible outcomes
and their associated probabilities.
probability function: a formula (or mathematical expression) expressing probabilities
associated with given values of a random variable.
properties of probability: (i) for any given event A, the probability of A must be
between zero and one; (ii) the sum of the probabilities of all possible events in a
sample space must equal one; and (3) the sum of the probabilities of A and its
complement, A, must equal one.
qualitative survey methods: behavioural survey methods which are exploratory in
nature and are generally used to gain insight into a research problem or for
theory development.
qualitative variables: variables which can be placed into distinct categories according
to some characteristic.
quality control: see statistical process control.
quantitative survey methods: behavioural survey methods which employ rigorous
sampling methods and make it possible to draw inferences about populations.
Glossary
395
20 Intro Prob Glossary
25/4/08
11:37
Page 396
quantitative variables: variables which are numerical in nature and indicate ‘how
many’ or ‘how much’ or ‘how big’ on a numeric scale.
quartiles: percentiles which divide a frequency distribution into four equal groups
corresponding to the 25th, 50th, and 75th percentiles.
R chart: a variable control chart used in statistical process control for measuring and
monitoring sample ranges.
random number: a number that is determined entirely by chance from some specified
distribution, without bias and without correlations between successive numbers.
random variable: a variable whose value is determined by the outcome of a random
experiment, denoted by capital letters, such as X, Y or Z.
randomized complete block design: an experimental design wherein each treatment is
applied to one experimental unit within each block, and treatments are randomly
allotted to the experimental units independently within each block.
range: the simplest measure of variation, calculated as the difference between the
highest and lowest values in a data set.
ratio scale: a scale of measurement similar to the interval scale, but where zero means
‘none’, and therefore, the ratio of two variables becomes meaningful.
rejection region: see critical region.
relative cumulative frequencies: cumulative frequencies expressed as percentages (or
proportions) of the total frequencies.
replication: applying the same treatment to more than one experimental unit within
an experimental design.
response variable: the variable of interest in an experimental design.
runs rule: a systematic procedure used in statistical process control to determine whether
a process is out of control based on a pattern of consecutive measurements.
runs test: a non-parametric method for testing if observations are drawn in random
order.
S chart: a variable control chart used in statistical process control for measuring and
monitoring sample standard deviations.
sample: a portion or subset of the population.
sample mean: the mean of all elements measured in a sample.
sample point: see element.
sample space: an event containing all possible outcomes of an experiment, denoted by (S).
sample survey: collection of information from a population through interviews or the
application of questionnaires to a sample from the group.
sampling: the collection of data from a subset of the population leading to prediction,
or inferences about the entire population. There is no attempt to control the
variable(s) of interest, rather a given situation is merely observed.
sampling distribution: the probability distribution of a statistic, e.g., a sample mean,
the difference between two means, a sample proportion, the difference between
two proportions, a single variance, or the ratio of two variances.
sampling distribution of the differences between two means: the probability
distribution for the random variable describing the differences between two
independent sample means.
sampling distribution of the mean: the probability distribution for the random
variable describing sample means.
396
Introductory Probability and Statistics
20 Intro Prob Glossary
25/4/08
11:37
Page 397
sampling distribution of the statistic: see sampling distribution.
sampling error: uncertainty which occurs because observations arising from samples
tend to deviate from one sample to another (a natural consequence of taking
samples).
sampling with replacement: selection from a population such that each element can
appear in the sample as often as it is selected (the element is replaced every time it is
sampled). If a sample is selected with replacement, there are Nn possible samples.
sampling without replacement: selection from a population such that each element of
a population can only be selected once (the element is not replaced when it is
sampled). If a sample is selected without replacement, there are NCn possible
samples.
scale of measurement: a classification that refers to the nature of information
contained within a random variable and indicates what types of statistical
analyses are appropriate, e.g., nominal ordinal, interval, or ratio scales.
shape: a quality of a distribution described by its frequency histogram or bar graph.
In the case of a Normal distribution, shape is defined by the variance, σ2 (or
standard deviation, σ).
sign test: a non-parametric test of the median value of a single population that uses
plus and minus signs to identify differences between observations and their
median.
significance level: see level of significance.
simple event: an event which contains only one element of a sample space.
simple random sample: see simple random sampling.
simple random sampling: a sample selection method in which observations are drawn
randomly from a population and each sampling unit (or group of sampling units)
has the same probability of being chosen.
skewed: a quality of a frequency distribution that lacks symmetry with respect to a
central vertical axis through the distribution. Frequency distributions may be
skewed positively (i.e., have a long right tail) or negatively (i.e., have a long left
tail).
Spearman’s rank correlation test: a non-parametric test used to test the significance of
a sample correlation coefficient based on ranks known as Spearman’s rank
correlation coefficient.
standard deviation: a measure of variation in the same units as the original
observations (and the mean) which is the square root of the variance, denoted by
σ or σx from a population and s or sx from a sample.
standard error of the mean: the standard deviation of the sample means for a given
sample size.
standard error of the statistic: the standard deviation of a statistic for a given sample
size. It measures the spread of all possible values of a statistic.
standard normal distribution: a normal distribution with a mean of zero and variance
of one. A random variable, X, is transformed into a standard normal random
variable, Z, in order to use standard normal probability tables.
standard score: the relative position of an observation within a particular data set
expressed in terms of the mean and standard deviation.
statistical estimation: see estimation.
statistical hypothesis: see hypothesis.
statistical inference: see inferential statistics.
Glossary
397
20 Intro Prob Glossary
25/4/08
11:37
Page 398
statistical process control: statistical procedures for measuring production-related
metrics and monitoring them on control charts.
statistical quality control: see statistical process control.
statistics: (i) the science of collecting, organizing, analysing and interpreting
information; (ii) numbers that describe characteristics of a sample from a
population. Statistics are usually denoted with Roman letters (e.g., x, p).
stratified random sampling: a sampling method in which the sampling units
(individual measurements) in a population are grouped together to form a
stratum on the basis of similarity of some characteristic or characteristics and
each group or stratum is treated as an individual population.
Student’s t distribution: see t distribution.
Sturges’ Rule: a formula used to determine the number of classes in a grouped
frequency distribution.
subjective probabilities: probabilities based solely on an individual’s experiences, or
‘educated guesses’, and not substantiated by exact scientific evidence.
subset: a group of elements, C, that are also elements of another (larger) event, A.
When C is a subset of A, it is denoted by (C ⊂ A).
sum of squares of the deviations from the mean: see corrected sum of squares.
symmetric: a quality of a distribution where a central vertical axis separates the
distribution into two identical (mirror image) or near-identical parts.
systematic sampling: a sampling method in which the sampling units are numbered
from 1 to N, and n units are selected using a regular interval.
t distribution: the probability distribution of Student’s t statistic. The t distribution is
a symmetrical (about zero), bell-shaped curve. Its standard deviation depends on
the sample size, and will always be somewhat higher than one.
test of hypothesis: see hypothesis testing.
test statistic: a statistic computed from sample data which is compared to a critical
value to determine the outcome of a hypothesis test.
treatments: factors that are controlled or kept at fixed levels in order to estimate their
effect in experimental designs.
tree diagram: a systematic procedure for graphically listing all possible outcomes in a
sample space or an event.
trimmed mean: a special form of the mean, calculated after removing the upper and lower
5% of the ranked data, used in cases when very small or large values are apparent.
two-stage sampling: sample selection which takes place in two distinct phases. First
primary units are selected which are divisible into multiple secondary units, then
samples are selected from these secondary units.
two-tailed tests: a hypothesis test which can be refuted in two directions, i.e, the
inequality in the alternative hypothesis is generally ‘not equal to’ some value.
type I error: the probability of rejecting H0 when it is true, denoted by α. The value of
α. is decided on by the person conducting the test and is equal to the area under
the curve in the rejection region.
type II error: the probability of not rejecting (‘accepting’) H0 when it is false, denoted
by . The value of is rarely known to us because its value depends on
knowledge that we generally do not possess, namely the true value of the
population parameter, sample size and the size of (level of significance).
398
Introductory Probability and Statistics
20 Intro Prob Glossary
25/4/08
11:37
Page 399
unbiased: the quality of a sample estimator when the mean of its sampling distribution
is equal to the population parameter. An unbiased estimate of the true population
parameter occurs when E(θˆ) = θ.
ungrouped frequency distributions: frequency distributions used to summarize
discrete quantitative variables using each unique value of the random variable.
uniform distribution: a discrete or continuous probability distribution whereby the
probability of every outcome is the same.
uniform probability distribution: see uniform distribution.
uniform random variable: a random variable which follows a uniform distribution.
union: for two given events, A and B, the event that contains all of the elements in A
or in B, including elements common to both. The union is denoted by A∪B.
upper class limit: the largest possible value that can fall into a given class in a grouped
frequency distribution.
upper confidence limit (UCL): see confidence limit.
upper control limit: the upper limit on a statistical process control chart, beyond
which production processes are said to be out of control.
upper warning limit: an upper limit on a statistical process control chart which is used
to draw attention to potential production-related problems.
variable charts: statistical process control charts used for monitoring variable data,
–
including X charts, R charts and S charts.
variable data: in statistical process control, measured quantitative production-related
data.
variance: a measure of variation equal to the corrected sum of squares divided by its
degrees of freedom.
variance ratio test: a statistical test that determines if the ratio of two variances is
significantly different from a constant, usually one.
Venn diagram: a picture of events as they relate to each other within a sample space,
especially useful where compound (multiple) events are concerned. The sample
space is shown as the interior of a rectangle and the events are identified (often as
circles) as specified regions inside the rectangle.
weighted mean: a special formulation of the arithmetic mean used to find the average
of a number of values, attaching more importance to some values than to others
by assigning different weights to the n observations (representing their relative
contribution to the overall average).
Wilcoxon rank sum test: a non-parametric test for comparing two unknown
population means.
Wilcoxon signed rank test: a non-parametric test of the median value of a single
population that uses plus and minus signs to identify differences between
observations and their median.
–
X chart: a variable control chart used in statistical process control for measuring and
monitoring sample means.
Z distribution: a standard normal distribution.
Z-transformation: see standard normal distribution
Glossary
399
20 Intro Prob Glossary
25/4/08
11:37
Page 400