Download Glossary - FRST 231

20 Intro Prob Glossary 25/4/08 11:37 Page 387 Glossary acceptance region: the range of values for a sample statistic where the null hypothesis is not rejected. addition rule: a probability rule based on the union of events. For two events A and B, the addition rule is denoted by: P(A∪B) = P(A) + P(B) – P(A∩B). alternative hypothesis: a statement which is contradictory to the null hypothesis, denoted by H1. arithmetic average: see mean. attribute charts: statistical process control charts used for monitoring attribute data, including p charts. attribute data: in statistical process control, production-related data that require an operational definition of acceptable and defective products. average: see mean. bar graphs: graphical tools used to present information summarized in categorical frequency distributions or ungrouped frequency distributions created for discrete variables. Since the horizontal axis is not a continuous random variable, the bars do not touch each other. Bayes’ Theorem: a logical proposition used to solve conditional probability problems that generally occur in reverse order of time. Bayes’ Theorem gives the conditional probability of the random variable A given B in terms of the marginal probability distribution of A alone and the conditional probability distribution of variable B given A. bias: the amount by which a sample estimate systematically under/over-estimates the true value of a parameter. Bias can occur, for example, when equipment used for recording measurements are not calibrated properly. bimodal: a population or sample with two modes. bivariate distribution: see joint probability distribution. bivariate frequency distribution: the joint, simultaneous distribution of two variables. bivariate normal distribution: a joint statistical distribution of two random variables which may or may not be correlated, and where each has a normal marginal distribution. blocks: groups of smaller, more uniform experimental units used in experimental designs if the experimental units, area, time or material are not homogeneous. blocking: see blocks. categorical frequency distributions: frequency distributions used to place qualitative, ordinal or nominal level variables into specific categories. categorical variables: see qualitative variables. Central Limit Theorem: one of the most important theorems in statistics, formalizing the relationship between a specific parameter of a population and its estimate (statistic). This theorem posits that when the sample size (n) is sufficiently large (n ≥ 30), the sampling distribution of sample means approaches a normal Glossary 387 20 Intro Prob Glossary 25/4/08 11:37 Page 388 distribution with a mean equaling the population mean and the standard deviation equaling the standard error of the mean. Chebyshev’s Theorem: a theorem which can be applied to samples or populations of any kind, and states that at least the fraction (1 − 1/k2) of the observations must lie within k standard deviations of the mean, regardless of the shape of the distribution of the data (where k is any constant greater than one). χ2) distribution: a positively skewed, positive-valued distribution that chi-square (χ describes the sampling distribution of the variances. It has a mean of n – 1 and approaches the normal distribution at larger sample sizes. circular permutation: the number of permutations of n distinct subjects positioned in a circle, denoted by Pc. class boundaries: the values occurring halfway between the upper class limit of one interval and the lower class limit of the next interval in a frequency distribution. class frequency: the number of observations that fall in a particular class in a frequency distribution. class intervals: see classes. class limits: the smallest and largest possible values that can fall into a given class in a frequency distribution. class mark: see class midpoint. class midpoint: the average of the upper and lower class limits, or upper and lower class boundaries, of a class in a frequency distribution. class width: the difference between the upper and lower class boundaries of a given class in a frequency distribution. classes: the various bounded groupings (generally with similar intervals) defined for a frequency distribution within which data observations are placed. classical probability: probability calculated as the ratio of the number of outcomes favourable to a particular event versus the number of possible outcomes in a sample space. coefficient of variation: the standard deviation expressed as a percentage of the mean. collectively exhaustive: a quality of events where the sum of the probabilities for all possible events in the sample space equals unity. combination: the number of possible outcomes when order is not important. Commonly denoted by n or nCr and often stated as ‘n choose r’. ( ) r complement: the event containing all the elements of the sample space that are not contained in the event. The complement of B is denoted by B. completely randomized design: the simplest of the experimental designs wherein treatments are randomly assigned to each experimental unit (in time or space). compound event: an event that consists of two or more simple events. conditional distribution: the distribution of a random variable given that other variables have certain specified values. conditional probability: a redefined sample space, where a given event, B, has occurred, and we are interested in understanding the effect of this information on the probability of event A occurring. The conditional probability of event A given that event B has occurred is denoted by P(A|B). confidence interval: for a given confidence level (or degree of confidence), the interval between the lower confidence limit (LCL) and upper confidence limit (UCL). See confidence limits. 388 Introductory Probability and Statistics 20 Intro Prob Glossary 25/4/08 11:37 Page 389 confidence level: the quantity, (1 – α)100%, which describes the degree of statistical certainty that can be attached to an observed statistic. The most frequently used values of α are 0.10, 0.05 and 0.01, resulting in 90%, 95% and 99% confidence intervals, respectively. confidence limits: upper (UCL) and lower (LCL) bounds of the interval where the probability of finding the true parameter, θ, is set at a confidence value, 1 – α. The probability that we will find the true population parameter between LCL and UCL is 1 α: P(LCL < θ < UCL) = 1 – α. consistent: a quality of an estimator such that as the sample size, n, approaches infinity, the value of the estimator approaches the value of the population parameter. An unbiased estimator is consistent if, as n → ∞, var (θ̂) → 0 and θˆ → θ. continuity correction: a constant applied to a random variable (usually equal to half of the unit of measurement for continuous variables). continuous random variable: a random variable defined over a continuous sample space, where the probability of any exact value is always zero. continuous sample space: a sample space that contains an infinite and uncountable number of outcomes. continuous variable: a quantitative variable that can take on all possible values over a specific interval. control chart: a graphical device used in statistical process control to determine whether a production process is in or out of control based on sampled data. control chart constants: conversion and correction factors used in the production of statistical process control charts. corrected sum of squares: a measure of spread equal to the sum of squared deviations of each observation from the mean, so named because each observation is ‘corrected for’ the mean before it is squared. covariance: the measure of joint variation between two random variables. Covariance may be zero (when two random variables are independent), positive (when the value of the variables increases together), or negative (when the value of one variable increases, the value of the other variable decreases). critical region: the range of values for a sample statistic where the null hypothesis is rejected. critical value: a selected arbitrary value along a statistical distribution, below or above which the null hypothesis is rejected. cumulative frequency: the frequency of all observations less than a particular value of a random variable (for a frequency distribution, the upper class boundary of a given class). Often referred to as the ‘less than frequency’. data: pieces of information collected on subjects or items from a population that form the building blocks of statistics. deciles: divisions of the frequency distribution into ten equal groups that correspond to the 10th, 20th, ....., and 90th percentiles. degree of confidence: see confidence level. degrees of freedom: the number of unrestricted observations used to calculate a statistic. dependent populations: random variables that occur in pairs and where the response value of one variable is at least partly a function of the response of the other. Glossary 389 20 Intro Prob Glossary 25/4/08 11:37 Page 390 dependent samples: sampled observations that occur in pairs and where the response value of one sample is at least partly a function of the response of the other. descriptive statistics: a branch of statistics dealing with the collection, organization and presentation of information, and the calculation of some measures (statistics) which describe the information. discrete random variable: a random variable defined over discrete sample space. discrete sample space: a sample space that contains a finite number of elements. A discrete sample space can be unending, but countable. discrete variables: quantitative variables which take on whole numbers only and usually result from counting (tallying) items. disjoint: see mutually exclusive. distribution-free tests: see non-parametric tests. efficient: the quality of the unbiased estimator of a given parameter, θ, having the smallest variance. element: a single outcome of an experiment within a given sample space. empirical probability: the likelihood of an event happening based on experiments for which all possible outcomes and the number of outcomes favouring the event are not known exactly, but have generally been observed. Empirical Rule: a rule which states that approximately 68%, 95% and 99.7% of the observations from a normal distribution will lie within one, two or three standard deviations of the mean, respectively. estimate: see point estimate. estimation: the process of estimating the values of parameters based on measured or empirical data. estimator: a function used to estimate an unknown parameter from observed data. event: a subset or portion of the elements in a sample space. expected value: the theoretical mean of a probability distribution, denoted by E(X), interpreted as the long-term average that is ‘expected’ if an experiment is conducted repeatedly. experimental design: a means of collecting data in which one or more of the factors affecting the variable(s) of interest are controlled, with the purpose of investigating how these controlled factors affect the variable(s) of interest. experimental error: the pooled variation among experimental units receiving the same treatment in an experimental design. experimental study: see experimental design exponential distribution: the continuous counterpart to the Poisson distribution. The exponential distribution describes the elapsed times between occurrences of consecutive events as a function of the mean elapsed time. F distribution: a distribution which describes the ratio of two independent χ2-values, where each is divided by its degrees of freedom. There exist many such curves, but each is positively skewed and positive-valued. finite population: a population consisting of a fixed, countable number of elements, which can be, if necessary, listed. finite population correction factor: a multiplicative adjustment used in the calculation of the standard error of the mean when the sample size is large relative to the population size, specifically when n < 0.05N. 390 Introductory Probability and Statistics 20 Intro Prob Glossary 25/4/08 11:37 Page 391 frequency distribution: a systematic arrangement of data to describe a variable, where observations (raw data) are ordered or grouped into classes, and the frequency of observations is tallied and presented in tabular form. Frequency distributions can be categorical, ungrouped, or grouped. frequency polygon: a graphical display of a frequency distribution constructed by plotting frequency (or relative frequency) against class mark (or value of the random variable in the case of ungrouped data), and then joining each point by a sequence of line segments. To close the polygon, an ‘imaginary’ class midpoint with zero frequency is added to both ends of the distribution. geometric distribution: a discrete probability function which possesses all the properties of a binomial experiment except that trials are repeated until the first success occurs. The geometric random variable, X, represents the number of repeated independent trials required to produce the first success, the probability of which is p. geometric experiment: see geometric distribution. geometric mean: a special form of the mean that is used for ratio data like population growth, rates of change, economic indicators, etc. The geometric mean of n observations is the nth root of the product of the n observations. grand mean: a special application of the weighting procedure used to find the overall combined mean of several groups of data when the mean of each individual group is known. grouped frequency distribution: a frequency distribution usually used to summarize continuous (interval or ratio scale) variables. H-test: see Kruskal-Wallis test harmonic mean: a special form of mean used for data where one element remains constant but another changes. The harmonic mean is calculated as the reciprocal of the mean of the reciprocals of the individual values. histogram: a graphical tool for presenting the grouped frequency distribution of a continuous variable. Like a bar graph, the middle of each bar is the class midpoint; however, histograms do not contain spaces between bars so that bars touch at class boundaries. hypergeometric distribution: a discrete probability distribution that has two possible outcomes, but where the probability of subsequent events are dependent upon previous outcomes. In other words, the probability of success from trial to trial is not constant and the successive trials (made without replacement from a finite population) are not independent. hypothesis: a statement or claim made about a parameter or a certain characteristic of a population. hypothesis testing: a procedure in applied statistics for determining whether a statement or claim made about a parameter or a certain characteristic of a population is plausible, based on some sample data collected from the population. independence: two events are statistically independent if the probability of one event is not affected by the occurrence or nonoccurrence of the other event. independent populations: two populations are statistically independent if the distribution of values in one population is not affected by the values in the other population. Glossary 391 20 Intro Prob Glossary 25/4/08 11:37 Page 392 inferential statistics: a branch of statistics dealing with the generalization of information obtained in a sample to an entire population. Common procedures include estimation, hypothesis testing, determining relationships and prediction. infinite population: a population where (in theory) there is no limit to the number of possible observations (or measurements). In sampling, the word ‘infinite’ is used rather loosely and is used to refer to a population with a large number of possible measurements. intersection: for two events, A and B, the event that contains all the elements common to both A and B. The intersection is denoted by A∩B. interval estimate: see confidence interval. interval estimation: the process of determining a confidence interval; that is, an interval within which we expect to find the unknown population parameter. interval scale: a scale of measurement with the same properties as the ordinal scale, but where the data are always quantitative and the differences between data values are meaningful. inverse cumulative frequency: the frequency of all values greater than a particular value of a random variable (for a frequency distribution, the lower class boundary of a given class). Often referred to as the ‘more than frequency’. inverse relative cumulative frequencies: inverse cumulative frequencies expressed as percentages (or proportions) of the total frequencies. joint probability distribution: for two random variables, X and Y, the probability distribution of X and Y together. joint probability function: joint probability expressed as a function of the random variables X and Y. The function is denoted by f(x,y), which represents the probability that X assumes the value x at the same time Y assumes the value y. Kruskal-Wallis test: a non-parametric test used to compare three or more unknown population means. Latin square design: an experimental design used when the natural variation between experimental units cannot be reduced by simple blocking alone and the variation of the experimental units are removed in two directions. layout: the placement of treatments on experimental units in an experimental design. level of significance: the size of type I error (α). The value is arbitrary in that it is selected by the person carrying out the statistical test, but 0.1, 0.05 or 0.01 are generally used. lower class limit: the smallest possible value that can fall into a given class in a grouped frequency distribution. lower confidence limit (LCL): see confidence limit. lower control limit: the lower limit on a statistical process control chart, beyond which production processes are said to be out of control. lower warning limit: a lower limit on a statistical process control chart which is used to draw attention to potential production-related problems. Mann-Whitney U-test: see Wilcoxon rank sum test. marginal probability: the probability of some event, regardless of the outcome of other events. For a joint probability distribution f(x,y), the marginal probability 392 Introductory Probability and Statistics 20 Intro Prob Glossary 25/4/08 11:37 Page 393 f(x) results from constructing a probability distribution for X over all possible values of Y. mathematical expectation: see expected value. mathematical expectation of a random variable: see population mean of a random variable. mean: a measure of central tendency that is calculated by dividing the sum of the observations by the number of observations. mean deviation: a measure of variation, calculated as the average of the absolute values of the deviations of each of the observations from the sample or population mean. mean of a random variable: the weighted average of all possible outcomes of a random variable, where the weights are the probabilities of the respective outcomes. mean square: see variance. median: the middle value when a set of observations is arranged in increasing or decreasing order of magnitude, dividing the frequency distribution into two equal groups and corresponding to the 50th percentile. The median is the preferred measure of central location when extreme values are present. midrange: a measure of central tendency defined as the average of the minimum and maximum values. mode: a measure of central tendency defined as the most frequently occurring value in a sample or a population. Some data sets may have more than one mode (e.g. when several values occur with the greatest frequency) and others may have no mode at all. multimodal: a population or sample with more than two modes. multinomial distribution: a discrete probability distribution having all the properties of a binomial distribution, except that more than two outcomes are possible from each trial. multiplication rule: a counting rule used to calculate the total number of outcomes for a sample space or event. The rule states that if a random experiment has a sequence of two steps, in which there are n1 possible outcomes for the first step and n2 for the second, the total number of outcomes is the product of the two numbers (n1 n2). multivariate hypergeometric distribution: a probability distribution having all the properties of a hypergeometric distribution, except there are more than two possible outcomes. mutually exclusive: a quality ascribed to two or more events which have no common intersecting elements (i.e., when one event occurs the others cannot). For two mutually exclusive events, A and B, A∩B = ∅. negative binomial distribution: a discrete probability distribution which is an extension of the binomial and geometric distribution, describing the situation where trials are repeated until a fixed number of successes, k, occurs. nominal scale: a scale of measurement where numbers or categories are used to classify, name or label an individual or attribute, but the numbers or categories have no specific order or importance. non-critical region: see acceptance region. non-parametric test: a statistical test that makes no assumptions about the distribution or the parameters of the distribution from which observations are drawn. Glossary 393 20 Intro Prob Glossary 25/4/08 11:37 Page 394 non-sampling error: errors arising during the course of data collection that are not due to sampling. This includes errors from non-responses, improper coding, instrument miscalibration, etc. normal distribution: a continuous, symmetrical, bell-shaped distribution whose shape and position are determined by the mean and standard deviation. Many of the most important theories in statistical inference are based on the normal distribution, also often referred to as the Gaussian distribution or the Laplacian distribution. null hypothesis: a statement about a characteristic of the population assumed to be true, denoted by H0. null space: an event containing no elements in a given sample space. observational study: a study where investigators observe without altering or influencing the variable under study. odds: a term used in subjective probability, often seen in gambling, sporting events, and horse racing, which refers to the ratio of the probability of an event occurring versus the probability of the event not occurring. ogive: a graphical tool representing cumulative or inverse cumulative frequencies, plotted in a similar manner to a frequency polygon. The cumulative frequencies are plotted against the upper (cumulative) or lower (inverse cumulative) class boundaries and joined by line segments. Also known as a cumulative frequency or inverse cumulative frequency graph. one-tailed tests: a hypothesis test which can be refuted in only one direction, i.e., the inequality in the alternative hypothesis is generally ‘less than’ or ‘greater than’ some value. open class: in a grouped frequency distribution, when the first (or last) class has no lower (or upper) limit, to accommodate a very few (one or two) extreme observations in the data set. operating characteristic (OC) curve: a curve describing how the values of (the probability of ‘accepting’ the null hypothesis when it is false) change over a range of values of µ, n and/or α. ordinal scale: a scale of measurement similar to the nominal scale, but where the order or rank of the categories is meaningful. outcome: the result of an experiment. outliers: extreme values in a data set. p chart: an attribute control chart used in statistical process control for monitoring the sample proportion of defective products. p-value: the smallest level of significance at which H0 will be rejected. Depending on the direction of the test, the p-value indicates the probability of obtaining a value in the sampling distribution of the test statistic less than or greater than the calculated test statistic. parameters: the characteristics of a population, usually denoted with Greek letters (e.g. µ, σ ). parametric tests: statistical testing methods that use values which uniquely define a probability distribution and involve testing estimates of parameter values. percentile: a measure indicating the position of an observation within a data set (not the same as a percentage). In general, the pth percentile is the value such that p per cent of the items in the data set fall at or below that value. 394 Introductory Probability and Statistics 20 Intro Prob Glossary 25/4/08 11:37 Page 395 permutation: the number of possible outcomes when order is important. Commonly denoted by nPr. permutation of similar objects: a special kind of permutation used when some of the objects, among the n objects, are not distinguishable. pie chart: a graphical presentation of a variable relative to a totality using a circle divided into sectors representing each category’s frequency proportional in size to the total. point estimate: a single numeric estimate of a population parameter calculated from the information in a sample. point estimation: see point estimate. Poisson distribution: a discrete probability distribution describing independent events that occur in a fixed time (or space) with a known average rate. Poisson experiments: a series of trials or tests where the variable of interest follows a Poisson distribution. population: the entire collection of items/subjects possessing certain common characteristics about which information is being sought. population mean: the mean of all elements in a population. posterior probabilities: reversed conditional probabilities used in Bayes’ Theorem. power of a test: the probability that a test will reject the null hypothesis when it is in fact false. prediction: the value of the dependent variable obtained from a regression equation using a particular value of the independent variable. prior probability: a conditional probability based on previously observed frequencies in a sample space or event. probability: (i) the branch of mathematics incorporating the most important set of concepts used in statistics; (ii) the measure of likelihood of the occurrence or nonoccurrence of an event. The probability of an event, A, is denoted by P(A) and can be classical, empirical, or subjective. probability density: a function associated with a probability distribution that specifies how the values of a random variable are distributed over its possible range. probability distribution: for a given random variable, the list of all possible outcomes and their associated probabilities. probability function: a formula (or mathematical expression) expressing probabilities associated with given values of a random variable. properties of probability: (i) for any given event A, the probability of A must be between zero and one; (ii) the sum of the probabilities of all possible events in a sample space must equal one; and (3) the sum of the probabilities of A and its complement, A, must equal one. qualitative survey methods: behavioural survey methods which are exploratory in nature and are generally used to gain insight into a research problem or for theory development. qualitative variables: variables which can be placed into distinct categories according to some characteristic. quality control: see statistical process control. quantitative survey methods: behavioural survey methods which employ rigorous sampling methods and make it possible to draw inferences about populations. Glossary 395 20 Intro Prob Glossary 25/4/08 11:37 Page 396 quantitative variables: variables which are numerical in nature and indicate ‘how many’ or ‘how much’ or ‘how big’ on a numeric scale. quartiles: percentiles which divide a frequency distribution into four equal groups corresponding to the 25th, 50th, and 75th percentiles. R chart: a variable control chart used in statistical process control for measuring and monitoring sample ranges. random number: a number that is determined entirely by chance from some specified distribution, without bias and without correlations between successive numbers. random variable: a variable whose value is determined by the outcome of a random experiment, denoted by capital letters, such as X, Y or Z. randomized complete block design: an experimental design wherein each treatment is applied to one experimental unit within each block, and treatments are randomly allotted to the experimental units independently within each block. range: the simplest measure of variation, calculated as the difference between the highest and lowest values in a data set. ratio scale: a scale of measurement similar to the interval scale, but where zero means ‘none’, and therefore, the ratio of two variables becomes meaningful. rejection region: see critical region. relative cumulative frequencies: cumulative frequencies expressed as percentages (or proportions) of the total frequencies. replication: applying the same treatment to more than one experimental unit within an experimental design. response variable: the variable of interest in an experimental design. runs rule: a systematic procedure used in statistical process control to determine whether a process is out of control based on a pattern of consecutive measurements. runs test: a non-parametric method for testing if observations are drawn in random order. S chart: a variable control chart used in statistical process control for measuring and monitoring sample standard deviations. sample: a portion or subset of the population. sample mean: the mean of all elements measured in a sample. sample point: see element. sample space: an event containing all possible outcomes of an experiment, denoted by (S). sample survey: collection of information from a population through interviews or the application of questionnaires to a sample from the group. sampling: the collection of data from a subset of the population leading to prediction, or inferences about the entire population. There is no attempt to control the variable(s) of interest, rather a given situation is merely observed. sampling distribution: the probability distribution of a statistic, e.g., a sample mean, the difference between two means, a sample proportion, the difference between two proportions, a single variance, or the ratio of two variances. sampling distribution of the differences between two means: the probability distribution for the random variable describing the differences between two independent sample means. sampling distribution of the mean: the probability distribution for the random variable describing sample means. 396 Introductory Probability and Statistics 20 Intro Prob Glossary 25/4/08 11:37 Page 397 sampling distribution of the statistic: see sampling distribution. sampling error: uncertainty which occurs because observations arising from samples tend to deviate from one sample to another (a natural consequence of taking samples). sampling with replacement: selection from a population such that each element can appear in the sample as often as it is selected (the element is replaced every time it is sampled). If a sample is selected with replacement, there are Nn possible samples. sampling without replacement: selection from a population such that each element of a population can only be selected once (the element is not replaced when it is sampled). If a sample is selected without replacement, there are NCn possible samples. scale of measurement: a classification that refers to the nature of information contained within a random variable and indicates what types of statistical analyses are appropriate, e.g., nominal ordinal, interval, or ratio scales. shape: a quality of a distribution described by its frequency histogram or bar graph. In the case of a Normal distribution, shape is defined by the variance, σ2 (or standard deviation, σ). sign test: a non-parametric test of the median value of a single population that uses plus and minus signs to identify differences between observations and their median. significance level: see level of significance. simple event: an event which contains only one element of a sample space. simple random sample: see simple random sampling. simple random sampling: a sample selection method in which observations are drawn randomly from a population and each sampling unit (or group of sampling units) has the same probability of being chosen. skewed: a quality of a frequency distribution that lacks symmetry with respect to a central vertical axis through the distribution. Frequency distributions may be skewed positively (i.e., have a long right tail) or negatively (i.e., have a long left tail). Spearman’s rank correlation test: a non-parametric test used to test the significance of a sample correlation coefficient based on ranks known as Spearman’s rank correlation coefficient. standard deviation: a measure of variation in the same units as the original observations (and the mean) which is the square root of the variance, denoted by σ or σx from a population and s or sx from a sample. standard error of the mean: the standard deviation of the sample means for a given sample size. standard error of the statistic: the standard deviation of a statistic for a given sample size. It measures the spread of all possible values of a statistic. standard normal distribution: a normal distribution with a mean of zero and variance of one. A random variable, X, is transformed into a standard normal random variable, Z, in order to use standard normal probability tables. standard score: the relative position of an observation within a particular data set expressed in terms of the mean and standard deviation. statistical estimation: see estimation. statistical hypothesis: see hypothesis. statistical inference: see inferential statistics. Glossary 397 20 Intro Prob Glossary 25/4/08 11:37 Page 398 statistical process control: statistical procedures for measuring production-related metrics and monitoring them on control charts. statistical quality control: see statistical process control. statistics: (i) the science of collecting, organizing, analysing and interpreting information; (ii) numbers that describe characteristics of a sample from a population. Statistics are usually denoted with Roman letters (e.g., x, p). stratified random sampling: a sampling method in which the sampling units (individual measurements) in a population are grouped together to form a stratum on the basis of similarity of some characteristic or characteristics and each group or stratum is treated as an individual population. Student’s t distribution: see t distribution. Sturges’ Rule: a formula used to determine the number of classes in a grouped frequency distribution. subjective probabilities: probabilities based solely on an individual’s experiences, or ‘educated guesses’, and not substantiated by exact scientific evidence. subset: a group of elements, C, that are also elements of another (larger) event, A. When C is a subset of A, it is denoted by (C ⊂ A). sum of squares of the deviations from the mean: see corrected sum of squares. symmetric: a quality of a distribution where a central vertical axis separates the distribution into two identical (mirror image) or near-identical parts. systematic sampling: a sampling method in which the sampling units are numbered from 1 to N, and n units are selected using a regular interval. t distribution: the probability distribution of Student’s t statistic. The t distribution is a symmetrical (about zero), bell-shaped curve. Its standard deviation depends on the sample size, and will always be somewhat higher than one. test of hypothesis: see hypothesis testing. test statistic: a statistic computed from sample data which is compared to a critical value to determine the outcome of a hypothesis test. treatments: factors that are controlled or kept at fixed levels in order to estimate their effect in experimental designs. tree diagram: a systematic procedure for graphically listing all possible outcomes in a sample space or an event. trimmed mean: a special form of the mean, calculated after removing the upper and lower 5% of the ranked data, used in cases when very small or large values are apparent. two-stage sampling: sample selection which takes place in two distinct phases. First primary units are selected which are divisible into multiple secondary units, then samples are selected from these secondary units. two-tailed tests: a hypothesis test which can be refuted in two directions, i.e, the inequality in the alternative hypothesis is generally ‘not equal to’ some value. type I error: the probability of rejecting H0 when it is true, denoted by α. The value of α. is decided on by the person conducting the test and is equal to the area under the curve in the rejection region. type II error: the probability of not rejecting (‘accepting’) H0 when it is false, denoted by . The value of is rarely known to us because its value depends on knowledge that we generally do not possess, namely the true value of the population parameter, sample size and the size of (level of significance). 398 Introductory Probability and Statistics 20 Intro Prob Glossary 25/4/08 11:37 Page 399 unbiased: the quality of a sample estimator when the mean of its sampling distribution is equal to the population parameter. An unbiased estimate of the true population parameter occurs when E(θˆ) = θ. ungrouped frequency distributions: frequency distributions used to summarize discrete quantitative variables using each unique value of the random variable. uniform distribution: a discrete or continuous probability distribution whereby the probability of every outcome is the same. uniform probability distribution: see uniform distribution. uniform random variable: a random variable which follows a uniform distribution. union: for two given events, A and B, the event that contains all of the elements in A or in B, including elements common to both. The union is denoted by A∪B. upper class limit: the largest possible value that can fall into a given class in a grouped frequency distribution. upper confidence limit (UCL): see confidence limit. upper control limit: the upper limit on a statistical process control chart, beyond which production processes are said to be out of control. upper warning limit: an upper limit on a statistical process control chart which is used to draw attention to potential production-related problems. variable charts: statistical process control charts used for monitoring variable data, – including X charts, R charts and S charts. variable data: in statistical process control, measured quantitative production-related data. variance: a measure of variation equal to the corrected sum of squares divided by its degrees of freedom. variance ratio test: a statistical test that determines if the ratio of two variances is significantly different from a constant, usually one. Venn diagram: a picture of events as they relate to each other within a sample space, especially useful where compound (multiple) events are concerned. The sample space is shown as the interior of a rectangle and the events are identified (often as circles) as specified regions inside the rectangle. weighted mean: a special formulation of the arithmetic mean used to find the average of a number of values, attaching more importance to some values than to others by assigning different weights to the n observations (representing their relative contribution to the overall average). Wilcoxon rank sum test: a non-parametric test for comparing two unknown population means. Wilcoxon signed rank test: a non-parametric test of the median value of a single population that uses plus and minus signs to identify differences between observations and their median. – X chart: a variable control chart used in statistical process control for measuring and monitoring sample means. Z distribution: a standard normal distribution. Z-transformation: see standard normal distribution Glossary 399 20 Intro Prob Glossary 25/4/08 11:37 Page 400

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Glossary - FRST 231