Download FBK Study Guide - OMIS 600 (Business Statistics)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
FBK Study Guide - OMIS 600 (Business Statistics)
Ch. 1: Data and Statistics
1. Statistics:
a. The term statistics can refer to numerical facts such as averages, medians, percents, and index numbers
that help us understand a variety of business and economic situations.
b. Statistics can also refer to the art and science of collecting, analyzing, presenting, and interpreting data.
2. Data and Data Sets:
a. Data are the facts and figures collected, analyzed, and summarized for presentation and interpretation.
b. All the data collected in a particular study are referred to as the data set for the study.
3. Elements, Variables, and Observations
a. Elements are the entities on which data are collected.
b. A variable is a characteristic of interest for the elements.
c. The set of measurements obtained for a particular element is called an observation.
d. A data set with n elements contains n observations.
e. The total number of data values in a complete data set is the number of elements multiplied by the
number of variables.
4. Scales of Measurement:
a. The scale determines the amount of information contained in the data.
b. The scale indicates the data summarization and statistical analyses that are most appropriate.
a. 4 Scales:
i. Nominal - Data are labels or names used to identify an attribute of the element.
ii. Ordinal - The data have the properties of nominal data and the order or rank of the data is
meaningful.
iii. Interval - The data have the properties of ordinal data, and the interval between observations is
expressed in terms of a fixed unit of measure. Interval data are always numeric.
iv. Ratio - The data have all the properties of interval data and the ratio of two values is
meaningful. Variables such as distance, height, weight, and time use the ratio scale. This scale
must contain a zero value that indicates that nothing exists for the variable at the zero point.
5. Qualitative vs. Quantitative Data
a. Qualitative
i. Data is categorical.
ii. Labels or names used to identify an attribute of each element
iii. Use either the nominal or ordinal scale of measurement and can be either numeric or
nonnumeric
b. Quantitative
i. Quantitative data are always numeric.
ii. Quantitative data indicate how many or how much:
1. discrete, if measuring how many
2. continuous, if measuring how much
1
Draft Created - Fall 2015 (JW)
FBK Study Guide - OMIS 600 (Business Statistics)
6. Cross-sectional Data
a. Cross-sectional data are collected at the same or approximately the same point in time.
7. Time-series Data
a. Time series data are collected over several time periods.
8. Data Acquisition Considerations:
a. Time requirements
b. Cost of acquisition
c. Data errors
9. Types of Statistics:
a. Descriptive Statistics
i. Most of the statistical information in newspapers, magazines, company reports, and other
publications consists of data that are summarized and presented in a form that is easy to
understand.
ii. Such summaries of data, which may be tabular, graphical, or numerical, are referred to as
descriptive statistics.
b. Inferential Statistics
i. Statistical inference - The process of using data obtained from a sample to make estimates and
test hypotheses about the characteristics of a population
1. Population - the set of all elements of interest in a particular study
2. Sample - a subset of the population
3. Census - collecting data for the entire population
4. Sample survey - collecting data for a sample
2
Draft Created - Fall 2015 (JW)
FBK Study Guide - OMIS 600 (Business Statistics)
Ch. 2: Descriptive Statistics
1) Summarizing Qualitative (Categorical) Data
a) Frequency Distribution - A frequency distribution is a tabular summary of data showing the number (frequency)
of observations in each of several non-overlapping categories or classes.
b) Relative Frequency Distribution
i) The relative frequency of a class is the fraction or proportion of the total number of data items belonging to
the class.
ii) A relative frequency distribution is a tabular summary of a set of data showing the relative frequency for
each class.
c) Percent Frequency Distribution
i) The percent frequency of a class is the relative frequency multiplied by 100.
ii) A percent frequency distribution is a tabular summary of a set of data showing the percent frequency for
each class.
d) Bar Chart - A graphical display for depicting qualitative data.
i) On one axis (usually the horizontal axis), we specify the labels that are used for each of the classes.
ii) A frequency, relative frequency, or percent frequency scale can be used for the other axis (usually the
vertical axis).
iii) Using a bar of fixed width drawn above each class label, we extend the height appropriately.
iv) The bars are separated to emphasize the fact that each class is a separate category.
v) Pareto Diagram - When the bars are arranged in descending order of height from left to right (with the most
frequently occurring cause appearing first) the bar chart is called a Pareto diagram.
e) Pie Chart
i) The pie chart is a commonly used graphical display for presenting relative frequency and percent frequency
distributions for categorical data.
ii) Use relative frequencies to subdivide a circle into sectors that correspond to the relative frequency for each
class.
2) Summarizing Quantitative Data
a) Frequency Distribution
b) Relative Frequency and Percent Frequency Distributions
c) Dot Plot
i) A horizontal axis shows the range of data values.
ii) Each data value is represented by a dot placed above the axis.
d) Histogram
i) The variable of interest is placed on the horizontal axis.
ii) A rectangle is drawn above each class interval with its height corresponding to the interval’s frequency,
relative frequency, or percent frequency.
iii) Unlike a bar graph, a histogram has no natural separation between rectangles of adjacent classes.
e) Cumulative Distributions
i) Cumulative frequency distribution - shows the number of items with values less than or equal to the upper
limit of each class.
ii) Cumulative relative frequency distribution – shows the proportion of items with values less than or equal to
the upper limit of each class.
iii) Cumulative percent frequency distribution – shows the percentage of items with values less than or equal to
the upper limit of each class.
3
Draft Created - Fall 2015 (JW)
FBK Study Guide - OMIS 600 (Business Statistics)
f)
Stem-and-Leaf Displays
i) A stem-and-leaf display shows both the rank order and shape of the distribution of the data.
ii) It is similar to a histogram on its side, but it has the advantage of showing the actual data values.
iii) The first digits of each data item are arranged to the left of a vertical line. To the right of the vertical line we
record the last digit for each item in rank order.
iv) Each line in the display is referred to as a stem.
v) Each digit on a stem is a leaf.
3) Summarizing Data for Two Variables
a) Crosstabulation
i) A crosstabulation is a tabular summary of data for two variables.
ii) Crosstabulation can be used when:
(1) one variable is qualitative and the other is quantitative,
(2) both variables are qualitative, or
(3) both variables are quantitative.
b) Scatter Diagram and Trendline
i) A scatter diagram is a graphical presentation of the relationship between two quantitative variables.
ii) One variable is shown on the horizontal axis and the other variable is shown on the vertical axis.
iii) The general pattern of the plotted points suggests the overall relationship between the variables.
iv) A trendline provides an approximation of the relationship.
c) Side-by-Side Bar Chart
d) Stacked Bar Chart
4
Draft Created - Fall 2015 (JW)
FBK Study Guide - OMIS 600 (Business Statistics)
Ch. 3: Descriptive Statistics: Numerical Measures
1) Measures of Location
a) Three primary measures are the mean, median, and mode.
i) Mean - The mean of a data set is the average of all the data values and provides a measure of central
location.
ii) Median - The median of a data set is the value in the middle when the data items are arranged in ascending
order. Whenever a data set has extreme values, the median is the preferred measure of central location.
iii) Mode - The mode of a data set is the value that occurs with greatest frequency.
2) Measures of Variability
a) It is often desirable to consider measures of variability (dispersion), as well as measures of location.
b) Basic measures:
i) Range - The range of a data set is the difference between the largest and smallest data values.
ii) Variance (1) The variance is a measure of variability that utilizes all the data.
(2) It is based on the difference between the value of each observation (xi) and the mean (for a sample, m
for a population).
(3) The variance is the average of the squared differences between each data value and the mean.
iii) Standard Deviation - The standard deviation of a data set is the positive square root of the variance. It is
measured in the same units as the data, making it more easily interpreted than the variance.
iv) Coefficient of Variation - The coefficient of variation indicates how large the standard deviation is in relation
to the mean.
3) Measures of Distribution Shape
a) Skewness - measure of the shape of a distribution
i) Symmetric (not skewed)
(1) Skewness is zero. The mean and median are equal.
ii) Moderately Skewed:
(1) Moderately Skewed Left - Skewness is negative. The mean will usually be less than the median.
(2) Moderately Skewed Right - Skewness is positive. The mean will usually be greater than the median.
iii) Highly Skewed:
(1) Skewness is positive (often above 1.0). The mean will usually be greater than the median.
b) z-Scores
i) An observation’s z-score is a measure of the relative location of the observation in a data set.
c) Empirical Rule
i) The empirical rule can be used to determine the percentage of data values that must be within a specified
number of standard deviations of the mean. The empirical rule is based on the normal distribution.
(1) 68.26% of the values of a normal random variable are within +/-1 of its mean.
(2) 95.44% of the values of a normal random variable are within +/-2 of its mean.
(3) 99.72% of the values of a normal random variable are within +/-3 of its mean.
4) Measures of Association Between Two Variables
a) Covariance
i) The covariance is a measure of the linear association between two variables.
b) Correlation (See the Overview of Select Statistical Methods Covered section for more details)
i) Correlation is a measure of linear association and not necessarily causation.
ii) The coefficient can take on values between -1 and +1.
5
Draft Created - Fall 2015 (JW)
FBK Study Guide - OMIS 600 (Business Statistics)
(1) Values near -1 indicate a strong negative linear relationship.
(2) Values near +1 indicate a strong positive linear relationship.
(3) The closer the correlation is to zero, the weaker the relationship.
6
Draft Created - Fall 2015 (JW)
FBK Study Guide - OMIS 600 (Business Statistics)
Ch. 4: Intro to Probabilities
1) Probability
a) Probability is a numerical measure of the likelihood that an event will occur.
b) Probability values are always assigned on a scale from 0 to 1.
c) A probability near zero indicates an event is quite unlikely to occur.
d) A probability near one indicates an event is almost certain to occur.
2) Experiments and Sample Space
a) An experiment is any process that generates well- defined outcomes.
b) The sample space for an experiment is the set of all experimental outcomes.
c) An experimental outcome is also called a sample point.
3) Assigning Probabilities
a) Classical Method - Assigning probabilities based on the assumption of equally likely outcomes
b) Relative Frequency Method - Assigning probabilities based on experimentation or historical data
c) Subjective Method - Assigning probabilities based on judgment
4) Basic Relationships of Probability
a) Complement of an Event - The complement of event A is defined to be the event consisting of all sample points
that are not in A.
b) Union of Two Events - The union of events A and B is the event containing all sample points that are in A or B or
both.
c) Intersection of Two Events - The intersection of events A and B is the set of all sample points that are in both A
and B.
i) Addition Law - The addition law provides a way to compute the probability of event A, or B, or both A and B
occurring.
d) Mutually Exclusive Events - Two events are said to be mutually exclusive if the events have no sample points in
common. Two events are mutually exclusive if, when one event occurs, the other cannot occur.
i) Conditional Probability - The probability of an event given that another event has occurred is called a
conditional probability.
ii) Multiplication Law - The multiplication law provides a way to compute the probability of the intersection of
two events.
e) Independent Events - If the probability of event A is not changed by the existence of event B, we would say that
events A and B are independent.
f) Venn Diagrams of Relationships
7
Draft Created - Fall 2015 (JW)
FBK Study Guide - OMIS 600 (Business Statistics)
Complement of an Event
Union of Two Events
Intersection of Two Events
Mutually Exclusive Events
8
Draft Created - Fall 2015 (JW)
FBK Study Guide - OMIS 600 (Business Statistics)
Ch. 5: Discrete Probability Distributions
1) Random Variables
a) A random variable is a numerical description of the outcome of an experiment.
b) A discrete random variable may assume either a finite number of values or an infinite sequence of values.
c) A continuous random variable may assume any numerical value in an interval or collection of intervals.
2) Discrete Probability Distribution
a) The probability distribution for a random variable describes how probabilities are distributed over the values of
the random variable.
b) The probability distribution is defined by a probability function, denoted by f(x), that provides the probability for
each value of the random variable.
c) Most common discrete probability distributions
i) Binomial Distribution
(1) Four Properties of a Binomial Experiment:
(a) The experiment consists of a sequence of n identical trials.
(b) Two outcomes, success and failure, are possible on each trial.
(c) The probability of a success, denoted by p, does not change from trial to trial.
(d) The trials are independent.
ii) Poisson Distribution
(1) A Poisson distributed random variable is often useful in estimating the number of occurrences over a
specified interval of time or space.
(2) It is a discrete random variable that may assume an infinite sequence of values (x = 0, 1, 2, . . . ).
(3) Two Properties of a Poisson Experiment
(a) The probability of an occurrence is the same for any two intervals of equal length.
(b) The occurrence or nonoccurrence in any interval is independent of the occurrence or
nonoccurrence in any other interval.
iii) Hypergeometric Distribution
(1) The hypergeometric distribution is closely related to the binomial distribution.
(2) However, for the hypergeometric distribution:
(a) the trials are not independent, and
(b) the probability of success changes from trial to trial.
(1) When the population size is large, a hypergeometric distribution can be approximated by a binomial
distribution.
3) Bivariate Distributions- A probability distribution involving two random variables is called a bivariate probability
distribution. Each outcome of a bivariate experiment consists of two values, one for each random variable. Example:
rolling a pair of dice
9
Draft Created - Fall 2015 (JW)
FBK Study Guide - OMIS 600 (Business Statistics)
Ch. 6: Continuous Probability Distributions
1) Continuous Probability Distributions
a) A continuous random variable can assume any value in an interval on the real line or in a collection of intervals.
b) It is not possible to talk about the probability of the random variable assuming a particular value. Instead, we
talk about the probability of the random variable assuming a value within a given interval.
c) The probability of the random variable assuming a value within some given interval from x1 to x2 is defined to be
the area under the graph of the probability density function between x1 and x2.
2) 3 continuous distributions:
a) Uniform Distribution i) A random variable is uniformly distributed whenever the probability is proportional to the interval’s length.
ii) The area under the graph of f(x) and probability are identical.
b) Normal Distribution i) The normal probability distribution is the most important distribution for describing a continuous random
variable and is widely used in statistical inference.
ii) Characteristics:
(1) The distribution is symmetric; its skewness measure is zero.
(2) The entire family of normal probability distributions is defined by its mean µ and its standard deviation
.
(3) The highest point on the normal curve is at the mean, which is also the median and mode.
(4) The standard deviation determines the width of the curve: larger values result in wider, flatter curves.
(5) Probabilities for the normal random variable are given by areas under the curve. The total area under
the curve is 1 (.5 to the left of the mean and .5 to the right).
iii) Empirical Rule
(1) The empirical rule can be used to determine the percentage of data values that must be within a
specified number of standard deviations of the mean. The empirical rule is based on the normal
distribution.
(a) 68.26% of the values of a normal random variable are within +/-1 of its mean.
(b) 95.44% of the values of a normal random variable are within +/-2 of its mean.
(c) 99.72% of the values of a normal random variable are within +/-3 of its mean.
c) Standard Normal Probability Distribution
i) A random variable having a normal distribution with a mean of 0 and a standard deviation of 1 is said to
have a standard normal probability distribution.
ii) The letter z is used to designate the standard normal random variable.
3) Normal Approximation of Binomial Probabilities
a) When the number of trials, n, becomes large, evaluating the binomial probability function by hand or with a
calculator is difficult.
b) The normal probability distribution provides an easy-to-use approximation of binomial probabilities where np >
5 and n(1 - p) > 5.
4) Exponential Distribution
a) The exponential probability distribution is useful in describing the time it takes to complete a task.
b) The exponential random variables can be used to describe:
i) Time between vehicle arrivals at a toll booth
ii) Time required to complete a questionnaire
iii) Distance between major defects in a highway
c) A property of the exponential distribution is that the mean and standard deviation are equal.
10
Draft Created - Fall 2015 (JW)
FBK Study Guide - OMIS 600 (Business Statistics)
d) The exponential distribution is skewed to the right. Its skewness measure is 2.
5) Relationship between the Poisson and Exponential Distributions
a) The Poisson distribution provides an appropriate description of the number of occurrences per interval.
b) The exponential distribution provides an appropriate description of the length of the interval between
occurrences.
11
Draft Created - Fall 2015 (JW)
FBK Study Guide - OMIS 600 (Business Statistics)
Ch. 7: Sampling and Sampling Distributions
1) Introduction
a) Terms
i) An element is the entity on which data are collected.
ii) A population is a collection of all the elements of interest.
iii) A sample is a subset of the population.
iv) The sampled population is the population from which the sample is drawn.
v) A frame is a list of the elements from which the sample is selected.
b) Rationale for Sampling
i) The reason we select a sample is to collect data to answer a research question about a population.
ii) The sample results provide only estimates of the values of the population characteristics.
iii) The reason is simply that the sample contains only a portion of the population.
iv) With proper sampling methods, the sample results can provide “good” estimates of the population
characteristics.
2) Sampling
a) Sampling from a Finite Population
i) Replacing each sampled element before selecting subsequent elements is called sampling with replacement.
ii) Sampling without replacement is the procedure used most often.
b) Sampling from a Infinite Population
i) Populations are often generated by an ongoing process where there is no upper limit on the number of units
that can be generated.
ii) Some examples of on-going processes, with infinite populations, are:
(1) parts being manufactured on a production line
(2) transactions occurring at a bank
(3) telephone calls arriving at a technical help desk
(4) customers entering a store
3) Random Sampling:
a) In the case of an infinite population, we must select a random sample in order to make valid statistical
inferences about the population from which the sample is taken.
b) A random sample from an infinite population is a sample selected such that the following conditions are
satisfied.
i) Each element selected comes from the population of interest.
ii) Each element is selected independently.
4) Point Estimation
a) Point estimation is a form of statistical inference.
b) In point estimation we use the data from the sample to compute a value of a sample statistic that serves as an
estimate of a population parameter.
i) 𝑥̅ as the point estimator of the population mean µ.
ii) s is the point estimator of the population standard deviation .
iii) 𝑝̅ is the point estimator of the population proportion p.
c) Practical Advice
i) The target population is the population we want to make inferences about.
ii) The sampled population is the population from which the sample is actually taken.
iii) Whenever a sample is used to make inferences about a population, we should make sure that the targeted
population and the sampled population are in close agreement.
12
Draft Created - Fall 2015 (JW)
FBK Study Guide - OMIS 600 (Business Statistics)
d) Process of Statistical Inference
̅
5) Sampling Distribution of 𝒙
a) The sampling distribution of 𝑥̅ is the probability distribution of all possible values of the sample mean 𝑥̅ .
b) Notations:
i) 𝜎𝑥̅ = the standard deviation of 𝑥̅ . This is referred to as the standard error of the mean.
ii)  = the standard deviation of the population
iii) n = the sample size
iv) N = the population size
c) When the population has a normal distribution, the sampling distribution of 𝑥̅ is normally distributed for any
sample size.
d) In most applications, the sampling distribution of can be approximated by a normal distribution whenever the
sample is size 30 or more. In cases where the population is highly skewed or outliers are present, samples of size
50 may be needed.
e) Central Limit Theorem
i) In selecting random samples of size n from a population, the sampling distribution of the sample mean can
be approximated by a normal distribution as the sample size becomes large.
6) Relationship Between the Sample Size and the Sampling Distribution of ̅
𝒙
a) Whenever the sample size is increased, the standard error of the mean 𝜎𝑥̅ is decreased.
b) Smaller standard errors mean that values of 𝑥̅ have less variability and tend to be closer to the actual population
mean.
7) Other Sampling Methods
a) Stratified Random Sampling
i) Process:
(1) The population is first divided into groups of elements called strata.
(2) Each element in the population belongs to one and only one stratum.
(3) Best results are obtained when the elements within each stratum are as much alike as possible (i.e. a
homogeneous group).
(4) A simple random sample is taken from each stratum.
ii) Advantage: If strata are homogeneous, this method is as “precise” as simple random sampling but with a
smaller total sample size.
13
Draft Created - Fall 2015 (JW)
FBK Study Guide - OMIS 600 (Business Statistics)
b) Cluster Sampling
i) Process:
(1) The population is first divided into separate groups of elements called clusters.
(a) Ideally, each cluster is a representative small-scale version of the population (i.e. heterogeneous
group).
(2) A simple random sample of the clusters is then taken.
(3) All elements within each sampled (chosen) cluster form the sample.
ii) Advantage: The close proximity of elements can be cost effective (i.e. many sample observations can be
obtained in a short time).
iii) Disadvantage: This method generally requires a larger total sample size than simple or stratified random
sampling.
c) Systematic Sampling
i) This method has the properties of a simple random sample, especially if the list of the population elements
is a random ordering.
ii) Advantage: The sample usually will be easier to identify than it would be if simple random sampling were
used.
iii) Example: Selecting every 100th listing in a telephone book after the first randomly selected listing
d) Convenience Sampling
i) It is a nonprobability sampling technique. Items are included in the sample without known probabilities of
being selected.
ii) The sample is identified primarily by convenience.
iii) Example: A professor conducting research might use student volunteers to constitute a sample.
iv) Advantage: Sample selection and data collection are relatively easy.
v) Disadvantage: It is impossible to determine how representative of the population the sample is.
e) Judgment Sampling
i) The person most knowledgeable on the subject of the study selects elements of the population that he or
she feels are most representative of the population.
ii) It is a nonprobability sampling technique.
iii) Example: A reporter might sample three or four senators, judging them as reflecting the general opinion of
the senate.
iv) Advantage: It is a relatively easy way of selecting a sample.
v) Disadvantage: The quality of the sample results depends on the judgment of the person selecting the
sample.
8) Sampling Recommendations
a) It is recommended that probability sampling methods (simple random, stratified, cluster, or systematic) be used.
b) For these methods, formulas are available for evaluating the “goodness” of the sample results in terms of the
closeness of the results to the population parameters being estimated.
c) An evaluation of the goodness cannot be made with non-probability (convenience or judgment) sampling
methods.
14
Draft Created - Fall 2015 (JW)
FBK Study Guide - OMIS 600 (Business Statistics)
Ch. 8: Interval Estimation
1) Introduction
a) A point estimator cannot be expected to provide the exact value of the population parameter.
b) An interval estimate can be computed by adding and subtracting a margin of error to the point estimate.
i) Point Estimate +/- Margin of Error
c) The purpose of an interval estimate is to provide information about how close the point estimate is to the value
of the parameter.
d) In order to develop an interval estimate of a population mean, the margin of error must be computed using
either:
i) the population standard deviation  ( Known), or
ii) the sample standard deviation s ( Unknown )
2) Confidence Level
a) In survey sampling, different samples can be randomly selected from the same population; and each sample can
often produce a different confidence interval. Some confidence intervals include the true population parameter;
others do not.
b) A confidence level refers to the percentage of all possible samples that can be expected to include the true
population parameter.
i) For example, suppose all possible samples were selected from the same population, and a confidence
interval were computed for each sample. A 95% confidence level implies that 95% of the confidence
intervals would include the true population parameter.
ii) The value .95 is referred to as the confidence coefficient.
c) In order to have a higher degree of confidence, the margin of error and thus the width of the confidence interval
must be larger.
3) Population Mean:  Known
a)  is rarely known exactly, but often a good estimate can be obtained based on historical data or other
information.
b) Adequate Sample Size:
i) In most applications, a sample size of n = 30 is adequate.
ii) If the population distribution is highly skewed or contains outliers, a sample size of 50 or more is
recommended.
iii) If the population is not normally distributed but is roughly symmetric, a sample size as small as 15 will
suffice.
iv) If the population is believed to be at least approximately normal, a sample size of less than 15 can be used.
4) Population Mean:  Unknown
a) If an estimate of the population standard deviation  cannot be developed prior to sampling, we use the sample
standard deviation s to estimate .
b) In this case, the interval estimate for µ is based on the t distribution.
i) The t distribution is a family of similar probability distributions.
ii) A specific t distribution depends on a parameter known as the degrees of freedom.
iii) Degrees of freedom refer to the number of independent pieces of information that go into the
computation of s.
(1) A t distribution with more degrees of freedom has less dispersion.
(2) As the degrees of freedom increases, the difference between the t distribution and the standard
normal probability distribution becomes smaller and smaller.
15
Draft Created - Fall 2015 (JW)
FBK Study Guide - OMIS 600 (Business Statistics)
(3) For more than 100 degrees of freedom, the standard normal z value provides a good approximation to
the t value.
c) Adequate Sample Size:
i) In most applications, a sample size of n = 30 is adequate to develop an interval estimate of a population
mean.
ii) If the population distribution is highly skewed or contains outliers, a sample size of 50 or more is
recommended.
iii) If the population is not normally distributed but is roughly symmetric, a sample size as small as 15 will
suffice.
iv) If the population is believed to be at least approximately normal, a sample size of less than 15 can be used.
16
Draft Created - Fall 2015 (JW)
FBK Study Guide - OMIS 600 (Business Statistics)
Ch. 9: Hypothesis Testing
1) Hypothesis Testing
a) Hypothesis testing can be used to determine whether a statement about the value of a population parameter
should or should not be rejected.
b) The null hypothesis, denoted by H0 , is a tentative assumption about a population parameter.
c) The alternative hypothesis, denoted by Ha, is the opposite of what is stated in the null hypothesis.
d) The hypothesis testing procedure uses data from a sample to test the two competing statements indicated by H0
and Ha.
e) Statisticians follow a formal process to determine whether to reject a null hypothesis, based on sample data. An
hypothesis test consists of four steps.
i) Formulate the hypotheses. This involves stating the null and alternative hypotheses. The hypotheses are
stated in such a way that they are mutually exclusive. That is, if one is true, the other must be false; and vice
versa.
ii) Identify the test statistic. This involves specifying the statistic (e.g., a mean score, proportion) that will be
used to assess the validity of the null hypothesis.
iii) Formulate a decision rule. A decision rule is a procedure that the researcher uses to decide whether to
reject the null hypothesis.
iv) Test the null hypothesis. Use the decision rule to evaluate the test statistic. If the statistic is consistent with
the null hypothesis, you cannot reject the null hypothesis; otherwise, reject the null hypothesis.
2) Developing Null and Alternative Hypotheses
a) Forms for Null and Alternative Hypotheses:
i) The equality part of the hypotheses always appears in the null hypothesis.
ii) In general, a hypothesis test about the value of a population mean µ must take one of the following
three forms (where µ0 is the hypothesized value of the population mean).
One-tailed
(lower-tail)
One-tailed
(upper-tail)
Two-tailed
𝐻0 : 𝜇 ≥ 𝜇0
𝐻0 : 𝜇 < 𝜇0
𝐻0 : 𝜇 ≤ 𝜇0
𝐻0 : 𝜇 > 𝜇0
𝐻0 : 𝜇 ≤ 𝜇0
𝐻0 : 𝜇 ≤ 𝜇0
iii) In general, a hypothesis test about the value of a population proportion p must take one of the following
three forms (where p0 is the hypothesized value of the population proportion).
One-tailed
(lower-tail)
One-tailed
(upper-tail)
Two-tailed
𝐻0 : 𝑝 ≥ 𝑝0
𝐻0 : 𝑝 < 𝑝0
𝐻0 : 𝑝 ≤ 𝑝0
𝐻0 : 𝑝 > 𝑝0
𝐻0 : 𝑝 ≤ 𝑝0
𝐻0 : 𝑝 ≤ 𝑝0
3) Type I and Type II Errors
a) Because hypothesis tests are based on sample data, we must allow for the possibility of errors.
b) Type I Error
i) A Type I error is rejecting H0 when it is true.
17
Draft Created - Fall 2015 (JW)
FBK Study Guide - OMIS 600 (Business Statistics)
ii) The probability of making a Type I error when the null hypothesis is true as an equality is called the level of
significance.
iii) Applications of hypothesis testing that only control the Type I error are often called significance tests.
c) Type II Error
i) A Type II error is accepting H0 when it is false.
ii) It is difficult to control for the probability of making a Type II error.
iii) Statisticians avoid the risk of making a Type II error by using “do not reject H0” and not “accept H0”.
4) Hypothesis Testing and Decision Making
a) Evaluating the Null Hypothesis
i) p-value approach:
(1) The p-value is the probability, computed using the test statistic, that measures the support (or lack of
support) provided by the sample for the null hypothesis.
(2) If the p-value is less than or equal to the level of significance a, the value of the test statistic is in the
rejection region.
(3) Reject H0 if the p-value < a.
b) Critical Value Approach
i) The test statistic z has a standard normal probability distribution.
ii) We can use the standard normal probability distribution table to find the z-value with an area of a in the
lower (or upper) tail of the distribution.
iii) The value of the test statistic that established the boundary of the rejection region is called the critical value
for the test.
iv) The rejection rule is:
(1) Lower tail: Reject H0 if z < -za
(2) Upper tail: Reject H0 if z > za
Overview of Select Statistical Methods Covered:
1) Correlation
a) Overview:
i) Correlation is a measure of the relation between two or more variables.
ii) The measurement scales used should be at least interval scales, but other correlation coefficients are
available to handle other types of data.
iii) Correlation coefficients can range from -1.00 to +1.00.
(1) The value of -1.00 represents a perfect negative correlation.
(2) A value of +1.00 represents a perfect positive correlation.
(3) A value of 0.00 represents a lack of correlation.
b) Simple Linear Correlation (Pearson r)
i) Pearson correlation assumes that the two variables are measured on at least interval scales, and it
determines the extent to which values of the two variables are "proportional" to each other.
ii) The value of correlation (i.e., correlation coefficient) does not depend on the specific measurement units
used.
(1) For example, the correlation between height and weight will be identical regardless of whether inches
and pounds, or centimeters and kilograms are used as measurement units.
iii) Proportional means linearly related; that is, the correlation is high if it can be "summarized" by a straight line
(sloped upwards or downwards).
18
Draft Created - Fall 2015 (JW)
FBK Study Guide - OMIS 600 (Business Statistics)
(1) This line is called the regression line or least squares line, because it is determined such that the sum of
the squared distances of all the data points from the line is the lowest possible.
iv) How to interpret the values of correlations:
(1) As mentioned before, the correlation coefficient (r) represents the linear relationship between two
variables.
(2) If the correlation coefficient is squared, then the resulting value (r2, the coefficient of determination) will
represent the proportion of common variation in the two variables (i.e., the "strength" or "magnitude"
of the relationship). In order to evaluate the correlation between variables, it is important to know this
"magnitude" or "strength" as well as the significance of the correlation.
2) ANOVA
a) Overview:
i) Analysis of Variance (ANOVA) can be used to test for the equality of three or more population means.
ii) In general, the purpose of analysis of variance (ANOVA) is to test for significant differences between means.
(1) If we are only comparing two means, ANOVA will produce the same results as the t test for independent
samples (if we are comparing two different groups of cases or observations) or the t test for dependent
samples (if we are comparing two variables in one set of cases or observations).
iii) Why the name analysis of variance?
(1) It may seem odd that a procedure that compares means is called analysis of variance. However, this
name is derived from the fact that in order to test for statistical significance between means, we are
actually comparing (i.e., analyzing) variances.
iv) We want to use the sample results to test the following hypotheses:
H0: m1 = m2 = m3 = . . . = mk
Ha: Not all population means are equal
v) If H0 is rejected, we cannot conclude that all population means are different.
vi) Rejecting H0 means that at least two population means have different values.
b) Assumptions:
i) For each population, the response (dependent) variable is normally distributed.
ii) The variance of the response variable, denoted  2, is the same for all of the populations.
iii) The observations must be independent.
c) Interpreting ANOVA
i) If the analysis of variance provides statistical evidence to reject the null hypothesis of equal population
means, Fisher’s least significant difference (LSD) procedure can be used to determine where the differences
occur.
d) For more information on ANOVA, visit: http://www.statsoft.com/Textbook/ANOVA-MANOVA#basic
3) Regression
a) Overview:
i) Regression analysis can be used to develop an equation showing how the variables are related.
ii) The variable being predicted is called the dependent variable and is denoted by y.
iii) The variables being used to predict the value of the dependent variable are called the independent
variables and are denoted by x.
b) Simple Linear Regression
i) Simple linear regression involves one independent variable and one dependent variable.
(1) The relationship between the two variables is approximated by a straight line.
(2) The equation that describes how y is related to x and an error term is called the regression model.
19
Draft Created - Fall 2015 (JW)
FBK Study Guide - OMIS 600 (Business Statistics)
(3) The simple linear regression model is: y = b0 + b1x +e
where:
b0 and b1 are called parameters of the model,
e is a random variable called the error term.
ii) Testing for Significance
(1) To test for a significant regression relationship, we must conduct a hypothesis test to determine
whether the value of b1 is zero.
(2) Two tests are commonly used: t Test and F Test
(3) Both the t test and F test require an estimate of  2, the variance of e in the regression model. The
mean square error (MSE) provides the estimate of  2, and the notation s2 is also used. By taking the
square root of  2, we get s, which is called the standard error of the estimate.
iii) Assumptions:
(1) Linearity
(2) Normality
iv) Limitations:
(1) The major conceptual limitation of all regression techniques is that you can only ascertain relationships,
but never be sure about underlying causal mechanism.
(2) For example, you would find a strong positive relationship (correlation) between the damage that a fire
does and the number of firemen involved in fighting the blaze. Do we conclude that the firemen cause
the damage? Of course, the most likely explanation of this correlation is that the size of the fire (an
external variable that we forgot to include in our study) caused the damage as well as the involvement
of a certain number of firemen (i.e., the bigger the fire, the more firemen are called to fight the blaze).
c) Multiple Regression
i) Regression analysis involving two or more independent variables is called multiple regression.
(1) The equation that describes how the dependent variable y is related to the independent variables x1, x2,
. . . xp and an error term is: y = b0 + b1x1 + b2x2 + . . . + bpxp + e
where:
b0, b1, b2, . . . , bp are the parameters, and
e is a random variable called the error term
ii) Interpreting the Coefficients
(1) In multiple regression analysis, we interpret each regression coefficient as follows:
bi represents an estimate of the change in y corresponding to a 1-unit increase in xi when all other
independent variables are held constant.
iii) Testing for Significance
(1) In simple linear regression, the F and t tests provide the same conclusion.
(2) In multiple regression, the F and t tests have different purposes.
(a) F test:
(i) The F test is used to determine whether a significant relationship exists between the dependent
variable and the set of all the independent variables.
(ii) The F test is referred to as the test for overall significance.
(b) t test:
(i) If the F test shows an overall significance, the t test is used to determine whether each of the
individual independent variables is significant.
(ii) A separate t test is conducted for each of the independent variables in the model.
(iii) We refer to each of these t tests as a test for individual significance.
20
Draft Created - Fall 2015 (JW)
FBK Study Guide - OMIS 600 (Business Statistics)
iv) Multicollinearity
(1) The term multicollinearity refers to the correlation among the independent variables.
(2) When the independent variables are highly correlated (say, |r | > .7), it is not possible to determine the
separate effect of any particular independent variable on the dependent variable.
(3) If the estimated regression equation is to be used only for predictive purposes, multicollinearity is
usually not a serious problem.
(4) Every attempt should be made to avoid including independent variables that are highly correlated.
v) Using the Estimated Regression Equation for Estimation and Prediction
(1) The procedures for estimating the mean value of y and predicting an individual value of y in multiple
regression are similar to those in simple regression.
(2) We substitute the given values of x1, x2, . . . , xp into the estimated regression equation and use the
corresponding value of y as the point estimate.
(3)
vi) Choice of the number of variables:
(1) Multiple regression is a seductive technique: "plug in" as many predictor variables as you can think of
and usually at least a few of them will come out significant. This is because you are capitalizing on
chance when simply including as many variables as you can think of as predictors of some other variable
of interest. This problem is compounded when, in addition, the number of observations is relatively low.
Intuitively, it is clear that you can hardly draw conclusions from an analysis of 100 questionnaire items
based on 10 respondents.
(2) A general recommendation is to have at least 10 to 20 times as many observations (cases, respondents)
as you have variables; otherwise the estimates of the regression line are probably very unstable and
unlikely to replicate if you were to conduct the study again.
21
Draft Created - Fall 2015 (JW)