Download MGT-150 -- Statistics I -- Course Outline

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Inductive probability wikipedia , lookup

Taylor's law wikipedia , lookup

Foundations of statistics wikipedia , lookup

German tank problem wikipedia , lookup

Statistical inference wikipedia , lookup

History of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Benedictine University
MGT 150 Business Statistics I, Sec. B
Spring, 2017
Class Location: GN-312
Class Meeting Times: TuTh-8:00
Office Hours: MW–10:00-1:00, TuTh–9:30-11:00; GN-166
Instructor: Jeffrey M. Madura
B.A. University of Notre Dame
M.B.A. Northwestern University
C.P.A. State of Illinois
Contact Information: 630-829-6467 / [email protected]
Website: http://www.ben.edu/faculty/jmadura/home.htm
Course Description: (from the Catalog) Basic course in statistical technique, includes
measures of central tendency, variability, probability theory, sampling, estimation and
hypothesis testing. Computational, Mathematical and Analytical Mode of Inquiry (QCM).
Three semester hours.
(Instructor's description) This is a course in introductory statistics. The
orientation is toward applications and problem-solving, not mathematical theory.
The instructor intends that students gain an appreciation for the usefulness of
statistical methods in analyzing data commonly encountered in business and the
social and natural sciences. The course is a framework within which students may
learn the subject matter. This framework consists of a program of study,
opportunity for questions/discussion, explanation, and evaluative activities
(quizzes). The major topics are:
o
o
o
o
o
o
o
o
o
Data and Statistics
Descriptive Statistics: Tabular and Graphical Presentations
Descriptive Statistics: Numerical Measures
Introduction to Probability
Discrete Probability Distributions
Continuous Probability Distributions
Sampling and Sampling Distributions
Interval Estimation, Means and Proportions
Hypothesis Tests, Means and Proportions
Learning Objectives: below
Course Expectations: The instructor expects students to learn the terminology,
understand the concepts, and apply the computational procedures described at the end
of each of the five parts of the Course Outline that follows this syllabus.
1
College of Business Learning Objectives:
The course addresses the following College of Business Program Objectives:
Students in this program will receive a thorough grounding in Mathematics and
Statistics.
IDEA objectives: This course emphasizes the following IDEA objectives:
Learning fundamental principles, generalizations, or theories.
Learning to apply course material to improve thinking, problem-solving, and
decision-making.
Developing specific skills, competencies and points of view needed by professionals
in the fields most closely related to this course.
Prerequisites: MATH 105 or MATH 110.
Software: Familiarity with Microsoft Excel is expected.
Required Text and Materials:
Textbook: Modern Business Statistics with Microsoft Office Excel, 5th edition;
Anderson, Sweeney & Williams, South-Western/Cengage, 2015;
ISBN: 978-1-285-43330-1 (hard cover)
Other: Aplia interactive learning/assignment system. Aplia includes the textbook as
an e-book.
TI-83 or TI-84 calculator.
Course Schedule: The course is divided into five three-week parts, with a quiz at the
end of each part. Dates are subject to change.
Week 1-4 starts 1/17
Week 5-7
2/13
Week 8-10
3/6
Week 11-13
Week 14-15
4/3
4/24
Introduction; Descriptive Statistics
Probability; Tests for Dependence
Permutations and Combination; Binomial and Normal
Distributions
Estimation and Hypothesis Testing—Means
Estimation and Hypothesis Testing—Proportions
Quizzes 1-4 will be on the Thursday of weeks 4, 7, 10, and 13.
Quiz 5 will be on the date and time scheduled for the final exam.
Your average on the quizzes will constitute 2/3 of the course grade.
Grade requirements: A–90%, B–80%, C–60%, D–50%. There may also be other
assignments requiring analysis of data using Excel. There will be a term project on
Critical Thinking, with weight equal to one quiz. It is the responsibility of any student
who is unsure of the grading scale, course requirements, or anything else in this course
outline to ask the instructor for clarification.
Homework Assignments: There will be 10-15 Aplia homework assignments. Due
dates are listed in the Aplia system. The assignments will constitute 1/3 of the course
2
grade. To accommodate the occasional instance when you cannot meet an Aplia
deadline, the lowest assignment will be dropped. Grading will be handled by Aplia. You
must access the Aplia website, which means you must register for an account at:
http://www.aplia.com. Please register within 24 hours of the first class meeting.
The computer is unforgiving about accepting late assignments. Time is kept at Aplia, and
not by the computer you are working on. You may appeal grading decisions made by the
computer, if you can demonstrate that an error has been made.
Non-Aplia assignments must be turned in during class on the day they are due.
Assignments turned in after this time but before the assignment is handed back may
receive one-half credit. Assignments turned in after the hand-back can no longer be
accepted for credit.
The worst thing some students do in a course is not think about course material a little
every day. They sometimes let weeks go by and then try to learn all the material in one
or two days. This usually does not work. Assignments will require keeping up-to-date.
"Repetitio est mater studiorum." (Repetition is the mother of learning.)
Course Management Policies
Students are expected to be partners with the instructor in their educational
experience. Frequent communication with the instructor is encouraged.
Attendance: You are expected to attend every class session. Attendance is not taken
every day, but frequent absences will be noticed. Attendance is mandatory on days
when quizzes are returned. Two absences on those days will reduce your letter grade.
Cheating: The search for truth and the dissemination of knowledge are the central
missions of a university. Benedictine University pursues these missions in an
environment guided by our Roman Catholic tradition and our Benedictine heritage.
Integrity and honesty are therefore expected of all BU students. Actions such as
cheating, plagiarism, collusion, solicitation, and misrepresentation are violations of
these expectations and constitute unacceptable behavior in the University community.
To access the complete Academic Honesty Policy, which includes student
responsibilities, responsibilities and authority of faculty, violations, reporting and
communicating, responsibilities of the Provost, appeals, the academic appeals board,
and records, please visit www.ben.edu/ahp. Penalties for cheating can range from a
private verbal warning, all the way to expulsion from the University.
Incomplete Grade: A grade of “I” may be requested by a student for a course in
which he or she is doing satisfactory work but, for illness or other circumstances
beyond the student’s control, as determined by the instructor, the required work
cannot be completed by the end of the semester. To qualify for the grade, a student
must have satisfactory academic standing, be doing at least “C” work in the class, and
submit a written request with a plan for completion approved by the instructor stating
3
the reason for the delay in completing the work. Arrangements for the “I” grade must
be made prior to the final examination. One may not receive an “I” in a semester in
which he or she is already on academic probation. An “I” is a temporary grade. Failure
to complete the course work and obtain a final grade within 180 days from the end of
the term in which the “I” was received will result in the “I” immediately becoming an
“F.”
Recommended Exercises: Students should work as many as possible of the evennumbered exercises in the text. Proficiency gained from practice on these will help
when similar problems appear on quizzes. Answers to even-numbered exercises are
at the back of the book.
Missed Quizzes: Make-up quizzes will be given only if a quiz was missed for a good
and documented reason. If a make-up is given. The quiz score may be reduced 20%
in an effort to maintain some degree of fairness to those who took the quiz at the
proper time.
Student Responsibilities
• Students who are not enrolled in class cannot attend the class and cannot receive
credit.
• Students cannot submit additional work after grades have been submitted (except
in cases of temporary grades such as “I,” “X,” or “IP”).
• Students on academic probation are not eligible for a grade of “I.”
Students are responsible for planning their academic programs and progress, and for
evidencing academic performance with honesty and integrity (see “cheating” above).
However, the University encourages students to assist one another (e.g. tutoring and
group projects) and this course explicitly promotes such behavior.
Electronic Devices: One aspect of being a member of a community of scholars is to
show respect for others by creating and maintaining an environment conducive to
learning. To minimize distractions, electronic devices may be used only in connection
with currently-discussed course material. Electronic devices used during a quiz, other
than the approved TI calculator, will result in a zero grade for that quiz.
University Closings: A variety of conditions may disrupt scheduled classes—
weather, building issues, health-related issues, etc. For severe weather, contact the
BU emergency information line at
(630) 829-6622 or check www.emergencyclosings.com or www.cancellations.com.
Radio stations WBBM 780 AM and WGN 720 AM announce closings.
Faculty are required to provide students with alternate activities so that the learning
process continues and the course objectives are met. Additional procedures may be
implemented by the University in the event of an extended closing.
Technology Requirement: Students are expected to have basic skills in word
processing and spreadsheet development, and effectively use technology to support
oral presentations.
4
Access to the University computer network and to the University email system is
gained through the use of login IDs. Each person’s Login ID is unique and access is
controlled by a password of your choosing. For instructions on obtaining login IDs and
email addresses, see http://www.ben.edu/ithome/faqs.asp.
Recording (audio) Lectures: Audio recording is permitted with the instructor’s
approval. University policy strictly prohibits video recording.
Special Needs and Americans with Disabilities ACT (ADA): If you have a
documented learning, psychological, or physical disability, you may be eligible for
reasonable academic accommodations or services. To request these, contact the
Student Success Center. All students are expected to fulfill essential course or degree
requirements.
Religious Accommodations: Students whose religious obligations conflict with a
course requirement may request an accommodation from the instructor. Such
requests must be made in writing by the end of the first week of class.
FERPA: The Family Education Rights and Privacy Act, also known as the Buckley
Amendment, addresses the issue of student privacy. Enacted in 1974, guidelines were
established prohibiting institutions from releasing information to anyone without
expressed written permission from the student. This includes discussing student
schedules, grades, or other specific information with spouses, family members, or
friends.
A student may provide for release of identifiable, non-directory information to a third
party by signing a Confidential Release Authorization form. For more information
please see http://www.ben.edu/ferpa/index.cfm.
Mission Statement: Benedictine University dedicated itself to the education of
undergraduate and graduate students from diverse ethnic, racial, and religious
backgrounds. As an academic community committed to liberal arts and professional
education, distinguished and guided by its Roman Catholic tradition and Benedictine
heritage, the University prepares its students for a lifetime as active, informed, and
responsible citizens and leaders in the world community.
Assignment Feedback Policy: The instructor will provide feedback on each graded
assignment (quizzes, papers, homework, exams, etc.) no later than 10 calendar days
after submission. Students are encouraged to review their individual course grades
and to request clarification as needed. Quiz and homework scores, and class statistics,
will be reviewed after each quiz. Final grades are issued only by the University
Registrar.
Final comments: Feel free to see me if there is anything else of concern to you.
Your comments about this course or any course are always welcome and appreciated.
You are responsible for the information in the syllabus and should ask for clarification
for anything in the syllabus about which you are unsure.
5
The remaining pages are (1) a detailed outline of each of the five parts of the course,
including terminology, concepts, skills, and procedures, and (2) a statement of Course
Philosophy.
6
Essential Ideas, Terminology, Skills/Procedures, and Concepts for Each Part of
the Course
Part I
Two Types of Statistics: Descriptive and Inferential
Descriptive Statistics--purpose: to communicate characteristics of a set of data
Characteristics: Mean, median, mode, variance, standard deviation, skewness,
etc.
Charts, graphs
Inferential Statistics--purpose: to make statements about population parameters
based on sample statistics
Population--group of interest being studied; often too large to sample every
member
Sample--subset of the population; must be representative of the population
Random sampling is a popular way of obtaining a representative sample.
Parameter--a characteristic of a population, usually unknown, often can be
estimated: Population mean, population variance, population proportion, etc.
Statistic--a characteristic of a sample: Sample mean, sample variance, sample
proportion, etc.
Two ways of conducting inferential statistics
Estimation
Point estimate--single number estimate of a population parameter, no
recognition of uncertainty, such as: "40" to estimate the average age of the
voting population
Interval estimate--point estimate with an error factor, as in: "40 ± 5"
The error factor provides formal and quantitative recognition of uncertainty.
Confidence level (confidence coefficient)--the probability that the parameter
being estimated actually is in the stated range
Hypothesis testing
Null hypothesis--an idea about an unknown population parameter, such as: "In
the population, there is no correlation between smoking and lung cancer."
Alternate hypothesis--the opposite idea about the unknown population
parameter, such as: "In the population, there is correlation between
smoking and lung cancer."
Data are gathered to see which hypothesis is supported. The result is either
rejection or non-rejection (acceptance) of the null hypothesis.
Four types of data
Nominal
Names, labels, categories (e.g. cat, dog, bird, rabbit, ferret, gerbil)
Ordinal
Suggests order, but computations on the data are impossible or meaningless
(e.g. Pets can be listed in order of popularity--1-cat, 2-dog, 3-bird, etc.--but
the difference between cat and dog is not related to the difference between
dog and bird.)
Interval
Differences are meaningful, but they are not ratios. There is no natural zero
point (e.g. clock time--the difference between noon and 1 p.m. is the same
7
amount of time as the difference between 1 p.m. and 2 p.m. But 2 p.m. is not
twice as late as 1 p.m. unless you define the starting point of time as noon,
thereby creating a ratio scale)
Ratio
Differences and ratios are both meaningful; there is a natural zero point.
(e.g. Length--8 feet is twice as long as 4 feet, and 0 feet actually does mean
no length at all.)
Two types of statistical studies
Observational study (naturalistic observation)
Researcher cannot control the variables under study; they must be taken as
they are found (e.g. most research in astronomy).
Experiment
Researcher can manipulate the variables under study (e.g. drug dosage).
Characteristics of Data
Central tendency--attempt to find a "representative" or "typical" value
Mean--the sum of the data items divided by the number of items, or Σx / n
More sensitive to outliers than the median
Outlier--data item far from the typical data item
Median--the middle item when the items are ordered high-to-low or low-to-high
Also called the 50th percentile
Less sensitive to outliers than the mean
Mode--most-frequently-occurring item in a data set
Dispersion (variation or variability)--the opposite of consistency
Variance--the Mean of the Squared Deviations (MSD), or Σ(x-xbar)2/n
Deviation--difference between a data item and the mean
The sum of the deviations in any data set is always equal to zero.
Standard Deviation--square root of the variance
Range--difference between the highest and lowest value in a data set
Coefficient of Variation—measures relative dispersion
CV = standard deviation / mean
Skewness--the opposite of symmetry
Positive skewness--mean exceeds median, high outliers
Negative skewness--mean less than median, low outliers
Symmetry--mean, median, mode, and midrange about the same
Kurtosis--degree of relative concentration or peakedness
Leptokurtic--distribution strongly peaked
Mesokurtic--distribution moderately peaked
Platykurtic--distribution weakly peaked
Symbols & "Formula Sheet No. 1"
Descriptive statistics
Sample Mean--"xbar" (x with a bar above it)
Sample Variance--"svar" (the same as MSD for the sample)
Also, the "mean of the squares less the square of the mean"
Sample Standard Deviation--"ssd"--square root of svar
Population parameters (usually unknown, but can be estimated)
Population Mean--"μ" (mu)
8
Population Variance--"σ2" (sigma squared) (MSD for the population)
Population Standard Deviation--"σ" (sigma)--square root of σ2
Inferential statistics--estimating of population parameters based on sample statistics
Estimated Population Mean--"μ^" (mu hat)
The sample mean is an unbiased estimator of the population mean.
Unbiased estimator--just as likely to be greater than as less than the
parameter being estimated
If every possible sample of size n is selected from a population, as many
sample means will be above as will be below the population mean.
Estimated Population Variance--"σ^2" (sigma hat squared)
The sample variance is a biased estimator of the population variance.
Biased estimator--not just as likely to be greater than as less than the
parameter being estimated
If every possible sample of size n is selected from a population, more of the
sample variances will be below than will be above the population variance.
The reason for this bias is the probable absence of outliers in the sample.
The variance is greatly affected by outliers.
The smaller a sample is, the less likely it is to contain outliers, and hence
the lower its variance is likely to be.
Note how the correction factor's [n / (n-1)] impact increases as the sample size
decreases.
This quantity is also widely referred to as "s2" and is widely referred to as the
"sample variance."
In this context "sample variance" does not mean variance of the sample; it
is, rather, a shortening of the cumbersome phrase "estimate of
population variance computed from a sample."
Estimated Population Standard Deviation--"σ^" (sigma hat)--square root of σ^2
The bias considerations that apply to the estimated population variance also apply
to the estimated population standard deviation.
This quantity is also widely referred to as "s", and is widely referred to as the
"sample standard deviation."
In this context "sample standard deviation" does not mean standard
deviation of the sample; it is, rather, a shortening of the cumbersome
phrase "estimate of population standard deviation computed from a
sample."
Calculator note--some calculators, notably TI's, compute two standard deviations
The smaller of the two is the one we call "ssd"
TI calculator manuals call this the "population standard deviation."
This refers to the special case in which the entire population is included in
the sample; then the sample standard deviation (ssd) and the population
standard deviation are the same. (This also applies to means and
variances.) There is no need for inferential statistics in such cases.
The larger of the two is the one we call σ^ (sigma-hat) (estimated population
standard deviation).
TI calculator manuals call this the "sample standard deviation."
This refers to the more common case in which "sample standard deviation"
really means estimated population standard deviation, computed from a
sample.
9
Significance of the Standard Deviation
Normal distribution (empirical rule)--empirical: derived from experience
Two major characteristics: symmetry and center concentration
Two parameters: mean and standard deviation
"Parameter," in this context, means a defining characteristic of a distribution.
Mean and median are identical (due to symmetry) and are at the high point.
Standard deviation--distance from mean to inflection point
Inflection point--the point where the second derivative of the normal curve
is equal to zero,
or, the point where the curvature changes from "right" to "left" (or viceversa), as when
you momentarily travel straight on an S-curve on the highway
z-value--distance from mean, measured in standard deviations
Areas under the normal curve can be computed using integral calculus.
Total area under the curve is taken to be 1.000 or 100%
Tables enable easy determination of these areas.
about 68-1/4%, 95-1/2%, and 99-3/4% of the area under a normal curve
lie within one, two, and three standard deviations from the mean,
respectively
Many natural and economic phenomena are normally distributed.
Tchebyshev's Theorem (or Chebysheff P. F., 1821-1894)
What if a distribution is not normal? Can any statements be made as to what
percentage of the area lies within various distances (z-values) of the mean?
Tchebysheff proved that certain minimum percentages of the area must lie within
various z-values of the mean.
The minimum percentage for a given z-value, stated as a fraction,
is [ (z2-1) / z2 ]
Tchebysheff's Theorem is valid for all distributions.
Other measures of relative standing
Percentiles--A percentile is the percentage of a data set that is below a specified
value.
Percentile values divide a data set into 100 parts, each with the same number of
items.
The median is the 50th percentile value.
Z-values can be converted into percentiles and vice-versa.
A z-value of +1.00, for example, corresponds to the 84.13 percentile.
The 95th percentile, for example, corresponds to a z-value of +1.645.
A z-value of 0.00 is the 50th percentile, the median.
Deciles
Decile values divide a data set into 10 parts, each with the same number of items.
The median is the 5th decile value.
The 9th decile value, for example, separates the upper 10% of the data set from
the lower 90%. (Some would call this the 1st decile value.)
Quartiles
Quartile values divide a data set into 4 parts, each with the same number of items.
The median is the 2nd quartile value.
10
The 3rd quartile value (Q3), for example, separates the upper 25% of the data set
from the lower 75%.
Q3 is the median of the upper half; Q1 (lower quartile) is the median of the
lower half
Other possibilities: quintiles (5 parts), stanines (9 parts)
Some ambiguity in usage exists, especially regarding quartiles--For example, the
phrase "first quartile" could mean one of two things: (1) It could refer to the value
that separates the lower 25% of the data set from the upper 75%, or (2) It could
refer to the members, as a group, of the lower 25% of the data.
Example (1): "The first quartile score on this test was 60."
Example (2): "Your score was 55, putting you in the first quartile."
Also the phrase "first quartile" is used by some to mean the 25th percentile
value, and by others to mean the 75th percentile value. To avoid this
ambiguity, the phrases "lower quartile," "middle quartile," and "upper
quartile" may be used.
Terminology
Statistics, population, sample, parameter, statistic, qualitative data, quantitative data,
discrete data, continuous data, nominal measurements, ordinal measurements,
interval measurements, ratio measurements, observational study (naturalistic
observation), experiment, precision, accuracy, sampling, random sampling, stratified
sampling, systematic sampling, cluster sampling, convenience sampling,
representativeness, inferential statistics, descriptive statistics, estimation, point
estimation, interval estimation, hypothesis testing, dependency, central tendency,
dispersion, skewness, kurtosis, leptokurtic, mesokurtic, platykurtic, frequency table,
mutually exclusive, collectively exhaustive, relative frequencies, cumulative
frequency, histogram, Pareto chart, bell-shaped distribution, uniform distribution,
skewed distribution, pie chart, pictogram, mean, median, mode, bimodal, midrange,
reliability, symmetry, skewness, positive skewness, negative skewness, range, MSD,
variance, deviation, standard deviation, z-value, Chebyshev's theorem, empirical rule,
normal distribution, quartiles, quintiles, deciles, percentiles, interquartile range, stemand-leaf plot, boxplot, biased, unbiased.
Skills/Procedures--given appropriate data, compute or identify the
Sample mean, median, mode, variance, standard deviation, and range
Estimated population mean, variance, and standard deviation
Kind of skewness, if any, present in the data set
z-value of any data item
Upper, middle, and lower quartiles
Percentile of any data item
Percentile of any integer z-value from -3 to +3
Concepts
Identify circumstances under which the median is a more suitable measure of central
tendency than the mean
Explain when the normal distribution (empirical rule) may be used
Explain when Chebyshev's Theorem may be used; when it should be used
11
Give an example (create a data set) in which the mode fails as a measure of central
tendency
Give an example (create a data set) in which the mean fails as a measure of central
tendency
Explain why the sum of the deviations fails as a measure of dispersion, and describe
how this failure is overcome
Distinguish between unbiased and biased estimators of population parameters
Describe how percentile scores are determined on standardized tests like the SAT or
the ACT
Explain why the variance and standard deviation of a sample are likely to be lower
than the variance and standard deviation of the population from which the
sample was taken
Identify when the sample mean, variance, and standard deviation are identical to the
population mean, variance, and standard deviation
Part II
Basic Probability Concepts
Probability--the likelihood of an event
Probability is expressed as a decimal or fraction between zero and one, inclusive.
An event that is certain has a probability of 1.
An event that is impossible has a probability of 0.
If the probability of rain today (R) is 30%, it can be written P(R) = 0.3.
Objective probabilities--calculated from data according to generally-accepted methods
Relative frequency method--example: In a class of 25 college students there are
14 seniors.
If a student is selected at random from the class, the probability of selecting a
senior is 14/25 or 0.56. Relative to the number in the class, 25, the number
of seniors (frequency), 14, is 56% or 0.56.
Subjective probabilities--arrived at through judgment, experience, estimation,
educated guessing, intuition, etc. There may be as many different results as
there are people making the estimate.
(With objective probability, all should get the same answer.)
Boolean operations--Boolean algebra--(George Boole, 1815-1864)
Used to express various logical relationships; taught as "symbolic logic" in college
philosophy and mathematics departments; important in computer design
Complementation--translated by the word "not"--symbol: A¯or A-bar
Complementary events are commonly known as "opposites."
Examples: Heads/Tails on a coin-flip; Rain/No Rain on a particular day; On
Time/Late for work
Complementary events have two properties
Mutually exclusive--they cannot occur together; each excludes the other
Collectively exhaustive--there are no other outcomes; the two events are a
complete or exhaustive list of the possibilities
12
Partition--a set of more than two events that are mutually exclusive and
collectively exhaustive
Examples: A, B, C, D, F, W, I--grades received at the end of a course;
Freshman, Sophomore, Junior, Senior--traditional college student categories
The sum of the probabilities of complementary events, or of the probabilities of all
the events in a partition, is 1.
Intersection--translated by the words "and," "with," or "but"--symbol:  or, for typing
convenience, n
A day that is cool (C) and rainy (R) can be designated (CnR).
If there is a 25% chance that today will be cool (C) and rainy (R), it can be
written P(CnR) = 0.25.
Intersections are often expressed without using the word "and."
Examples: "Today might be cool with rain." or "It may be a cool, rainy day."
Two formulas for intersections:
For any two events A and B: P(AnB) = P(A|B)*P(B) ("|" is defined below.)
For independent events A and B: P(AnB) = P(A)*P(B)
This can be used as a test for independence.
This formula may be extended to any number of independent events
P(AnBnCn . . . nZ) = P(A)*P(B)*P(C)* . . . P(Z)
The intersection operation has the commutative property
P(AnB) = P(BnA)
"Commutative" is related to the word "commute" which means "to switch."
The events can be switched without changing anything.
In our familiar algebra, addition and multiplication are commutative, but
subtraction and division are not.
Intersections are also called "joint (together) probabilities."
Union--translated by the word "or"--symbol:  or, for typing convenience, u
A day that is cool (C) or rainy (R) can be designated (CuR).
If there is a 25% chance that today will be cool (C) or rainy (R), it can be written
P(CuR) = 0.25.
Unions always use the word "or."
Addition rule to compute unions: P(AuB) = P(A) + P(B) – P(AnB)
The deduction of P(AnB) eliminates the double counting of the intersection that
occurs when P(A) is added to P(B).
The union operation is commutative: P(AuB) = P(BuA)
Condition--translated by the word "given"--symbol: |
A day that is cool (C) given that it is rainy (R) can be designated (C|R).
The event R is called the condition.
If there is a 25% chance that today will be cool (C) given that it is rainy (R),
it can be written P(C|R) = 0.25.
Conditions are often expressed without using the word "given."
Examples: "The probability that it will be cool when it is rainy is 0.25."
P(C|R) = 0.25.
"The probability that it will be cool if it is rainy is 0.25." P(C|R) = 0.25.
"25% of the rainy days are cool." [P(C|R) = 0.25.]
All three of the above statements are the same, but the next one is different:
"25% of the cool days are rainy." This one is P(R|C) = 0.25.
The condition operation is not commutative: P(A|B) ≠ P(B|A)
13
For example, it is easy to see that P(rain|clouds) is not the same as
P(clouds|rain).
Conditional probability formula: P(A|B) = P(AnB) / P(B)
Occurrence Tables and Probability Tables
Occurrence table--table that shows the number of items in each category and
in the intersections of categories
Can be used to help compute probabilities of single events,
intersections, unions, and conditional probabilities
Probability table--created by dividing every entry in an occurrence table
by the total number of occurrences.
Probability tables contain marginal probabilities and joint probabilities.
Marginal probabilities--probabilities of single events, found in the right and bottom
margins of the table
Joint probabilities--probabilities of intersections, found in the interior part of the
table where the rows and columns intersect
Unions and conditional probabilities are not found directly in a probability table,
but they can be computed easily
from values in the table.
Two conditional probabilities are complementary if they have the same condition
and the events before the "bar" (|) are complementary. For example, if warm
(W) is the opposite of cool, then (W|R) is the complement of (C|R),
and P(W|R) + P(C|R) = 1.
In a 2 x 2 probability table, there are eight conditional probabilities, forming four
pairs of complementary conditional probabilities.
It is also possible for a set of conditional probabilities to constitute a partition
(if they all have the same condition, and the events before the "bar" are a
partition).
Testing for Dependence/Independence
Statistical dependence
Events are statistically dependent if the occurrence of one event
affects the probability of the other event.
Identifying dependencies is one of the most important tasks of statistical analysis.
Tests for independence/dependence
Conditional probability test--posterior/prior test
Prior and posterior are the Latin words for "before" and "after."
A prior probability is one that is computed or estimated before additional
information is obtained.
A posterior probability is one that is computed or estimated after additional
information is obtained.
Prior probabilities are probabilities of single events, such as P(A).
Posterior probabilities are conditional probabilities, such as P(A|B).
Independence exists between any two events A and B if P(A|B) = P(A)
If P(A|B) = P(A), the occurrence of B has no effect on P(A)
If P(A|B) ≠ P(A), the occurrence of B does have an effect on P(A)
Positive dependence if P(A|B) > P(A) -- posterior greater than prior
Negative dependence if P(A|B) < P(A) -- posterior less than prior
Multiplicative test--joint/marginal test
14
Independence exists between any two events A and B if P(AnB) = P(A)*P(B)
Positive dependence if P(AnB) > P(A)*P(B) -- intersection greater than
the product
Negative dependence if P(AnB) < P(A)*P(B) -- intersection less than
the product
Bayesian Inference--Thomas Bayes (1702-1761)
Bayes developed a technique to compute a conditional probability,
given the reverse conditional probability
Computations are simplified, and complex formulas can often be avoided, if a
probability table is used.
Basic computation is: P(A|B) = P(AnB) / P(B), an intersection probability divided by
a single-event probability. That is, a joint probability divided by a marginal
probability.
Bayesian analysis is very important because most of the probabilities upon which we
base decisions are conditional probabilities.
Other Probability Topics:
Matching-birthday problem
Example of a "sequential" intersection probability computation, where each
probability is revised slightly and complementary thinking is used
Complementary thinking--strategy of computing the complement (because it is
easier) of what is really needed, then subtracting from 1
Redundancy
Strategy of using back-ups to increase the probability of success
Usually employs complementary thinking and the extended multiplicative rule for
independent events to compute the probability of failure. P(Success) is
then equal to 1 – P(Failure).
Permutations and Combinations
Permutation--a set of items in which the order is important
Without replacement--duplicate items are not permitted
With replacement--duplicate items are permitted
Combination--a set of items in which the order is not important
Without replacement--duplicate items are not permitted
With replacement--duplicate items are permitted
In the formulas, "n" designates the number of items available, from which "r" is the
number that will be chosen. (Can r ever exceed n?)
To apply the correct formula when confronting a problem, two decisions must be
made: Is order important or not? Are duplicates permitted or not?
Permutations, both with and without replacement, can be computed by using the
"sequential" method instead of the formula. This provides way of verifying the
formula result.
Lotteries
Usually combination ("Lotto") or permutation ("Pick 3 or 4") problems
Lotto games are usually without replacement--duplicate numbers are not possible
Pick 3 or 4 games are usually with replacement--duplicate numbers are possible
15
Poker hands
Can be computed using combinations and the relative frequency method
Can also be computed sequentially
Terminology
PROBABILITY:
probability, experiment, event, simple event, compound event, sample space,
relative frequency method, classical approach, law of large numbers, random
sample, impossible event probability, certain event probability, complement,
partition, subjective probability, occurrence table, probability table, addition rule for
unions, mutually exclusive, collectively exhaustive, redundancy, multiplicative rule
for intersections, tree diagram, statistical independence/dependence, conditional
probability, Bayes' theorem, acceptance sampling, simulation, risk assessment,
redundancy, Boolean algebra, complementation, intersection, union, condition,
marginal probabilities, joint probabilities, prior probabilities, posterior probabilities,
two tests for independence, triad, complementary thinking, commutative.
PERMUTATIONS AND COMBINATIONS:
permutations, permutations with replacement, sequential method, combinations,
combinations with replacement.
Skills/Procedures--given appropriate data, prepare an occurrence table
PROBABILITY
prepare a probability table
compute the following 20 probabilities
4 marginal probabilities (single simple events)
4 joint probabilities (intersections)
4 unions
8 conditional probabilities--identify the 4 pairs of conditional complementary
events
identify triads (one unconditional and two conditional probabilities in each triad)
conduct the conditional (prior/posterior) probability test for independence /
dependence
conduct the multiplication (multiplicative) (joint/marginal) test for independence /
dependence
identify positive / negative dependency
identify Bayesian questions
use the extended multiplicative rule to compute probabilities
use complementary thinking to compute probabilities
compute the probability of "success" when redundancy is used
compute permutations and combinations with and without replacement
Concepts
PROBABILITY
give an example of two or more events that are not mutually exclusive
give an example of two or more events that are not collectively exhaustive
give an example of a partition--a set of three or more events that are mutually
exclusive and collectively exhaustive
16
express the following in symbolic form using F for females and V for voters in a
retirement community
60% of the residents are females
30% of the residents are female voters
50% of the females are voters
75% of the voters are female
70% of the residents are female or voters
30% of the residents are male non-voters
25% of the voters are male
40% of the residents are male
identify which two of the items above are a pair of complementary probabilities
identify which two of the items above are a pair of complementary conditional
probabilities
from the items above, comment on the dependency relationship between F and V
if there are 100 residents, determine how many female voters there would be if
gender and voting were independent
explain why joint probabilities are called "intersections"?
identify which two of our familiar arithmetic operations and which two Boolean
operations are commutative
tell what Thomas Bayes is known for (not English muffins)
PERMUTATIONS AND COMBINATIONS:
give an example of a set of items that is a permutation
give an example of a set of items that is a combination
tell if, in combinations/permutations, "r" can ever exceed "n" give an example
Part III
Permutations and Combinations (outline, etc. Repeated from Part II)
Permutation--a set of items in which the order is important
Without replacement--duplicate items are not permitted
With replacement--duplicate items are permitted
Combination--a set of items in which the order is not important
Without replacement--duplicate items are not permitted
With replacement--duplicate items are permitted
In the formulas, "n" designates the number of items available, from which "r" is the
number that will be chosen. (Can r ever exceed n?)
To apply the correct formula when confronting a problem, two decisions must be
made: Is order important or not? Are duplicates permitted or not?
Permutations, both with and without replacement, can be computed by using the
"sequential" method instead of the formula. This provides way of verifying the
formula result.
Lotteries
Usually combination ("Lotto") or permutation ("Pick 3 or 4") problems
Lotto games are usually without replacement--duplicate numbers are not possible
17
Pick 3 or 4 games are usually with replacement--duplicate numbers are possible
Poker hands
Can be computed using combinations and the relative frequency method
Can also be computed sequentially
Terminology
PERMUTATIONS AND COMBINATIONS:
permutations, permutations with replacement, sequential method, combinations,
combinations with replacement.
Skills/Procedures--given appropriate data,
PERMUTATIONS AND COMBINATIONS:
decide when order is and is not important
decide when selection is done with replacement and without replacement
compute permutations with and without replacement using the permutation formula
compute combinations wi2th and without replacement using the combination
formula
use the sequential method to compute permutations with and without replacement
solve various applications problems involving permutations and combinations
give an example of a set of items that is a permutation
give an example of a set of items that is a combination tell if, in
combinations/permutations,
"r" can ever exceed "n"
Mathematical Expectation
Discrete variable--one that can assume only certain values (often the whole numbers)
There is only a finite countable number of values between any two specified
values.
Examples: the number of people in a room, your score on a quiz in this course,
shoe sizes (certain fractions permitted), hat sizes (certain fractions
permitted)
Continuous variable--one that can take on any value--there is an infinite number of
values between any two specified values
Examples: your weight (can be any value, and changes as you breathe), the
length of an object, the amount of time that passes between two events, the
amount of water in a container (but if you look at the water closely enough,
you find that it is made up of very tiny chunks--molecules--so this last
example is really discrete at the submicroscopic level, but in ordinary
everyday terms we would call it continuous)
Mean (expected value) of a discrete probability distribution
Probability distribution--a set of outcomes and their likelihoods
Mean is the probability-weighted average of the outcomes
Each outcome is multiplied by its probability, and these are added.
The result is not an estimate. It is the actual population value, because the
probability distribution specifies an entire population of outcomes. ("μ" may be
used, without the estimation caret above it.)
The mean need not be a possible outcome, and for this reason
18
the term "expected value" can be misleading.
Variance of a discrete probability distribution
Variance is the probability-weighted average of the squared deviations
similar to MSD, except it's a weighted average
Each squared deviation is multiplied by its probability, and these are added.
The result is not an estimate. It is the actual population value, because the
probability distribution specifies an entire population of outcomes. ("σ2" may
be used, without the estimation caret above it.)
Standard deviation of a discrete probability distribution--the square root of the
variance
("σ" may be used, without the estimation caret ^ above it.)
The Binomial Distribution
Binomial experiment requirements
Two possible outcomes on each trial
The two outcomes are (often inappropriately) referred to as "success" and
"failure."
n identical trials
Independence from trial to trial--the outcome of one trial does not affect the
outcome of any other trial
Constant p and q from trial to trial
p is the probability of the "success" event
q is the probability of the "failure" event; (q = (1-p) )
"x" is the number of "successes" out of the n trials.
Symmetry is present when p = q
When p < .5, the distribution is positively skewed (high outliers).
When p > .5, the distribution is negatively skewed (low outliers).
Binomial formula--for noncumulative probabilities
Cumulative binomial probabilities--computed by adding the noncumulative
probabilities
Binomial probability tables--may show cumulative or noncumulative probabilities
If cumulative, compute noncumulative probabilities by subtraction
Parameters of the binomial distribution--n and p
Binomial formula: P(x) = n!/(x!(n-x)! * p^x * q^(n-x)
Note that when x=n, the formula reduces to p^n, and when x=0, the formula
reduces to q^n.
These are just applications of the multiplicative rule for independent events.
The Normal Distribution
Normal distribution characteristics--center concentration and symmetry
Parameters of the normal distribution--μ (mu), mean; and σ (sigma), standard
deviation
Z-value formula (four arrangements--for z, x, μ, and σ)
Normal distribution problems have three variables given, and the fourth must be
computed and interpreted.
Z-values determine areas (probabilities) and areas (probabilities) determine z-valuesthe normal
table or calculators converts from one to the other.
19
Normal distribution probability tables--our text table presents one-sided central
areas
Two uses of the normal distribution
Normally-distributed phenomena
To approximate the binomial distribution--this application is far less important
now that computers and calculators can generate binomial probabilities
Binomial parameters (n and p) can be converted to normal parameters μ and σ
μ = np; σ2 = (npq); σ = (npq)
Terminology
MATHEMATICAL EXPECTATION: random variable, discrete variable, continuous variable,
probability distribution, probability histogram, mean of a probability distribution,
variance and standard deviation of a probability distribution, probability-weighted
average of outcomes (mean), probability-weighted average of squared deviations
(variance).
BINOMIAL DISTRIBUTION: binomial experiment, requirements for a binomial
experiment, independent trials, binomial probabilities, cumulative binomial
probabilities, binomial distribution symmetry conditions, binomial distribution
skewness conditions, binomial distribution parameters, mean and variance of a
binomial distribution
NORMAL DISTRIBUTION
normal distribution, normal distribution parameters, mean, standard deviation,
standard normal distribution, z-value, reliability, validity
Skills/Procedures
MATHEMATICAL EXPECTATION:
compute the mean, variance, and standard deviation of a discrete random variable
solve various applications problems involving discrete probability distributions
BINOMIAL DISTRIBUTION:
compute binomial probabilities and verify results with table in textbook
compute cumulative binomial probabilities
compute binomial probabilities with p = q and verify symmetry
solve various application problems using the binomial distribution
NORMAL DISTRIBUTION -- given appropriate data,
determine a normal probability (area), given x, μ, and σ
determine x, given μ, σ, and the normal probability (area)
determine μ, given x, σ, and the normal probability (area)
determine σ, given x, μ, and the normal probability (area)
solve various applications problems involving the normal distribution
compute the sampling standard deviation (standard error) from the population
standard deviation and the sample size
solve various applications problems involving the central limit theorem
Concepts
MATHEMATICAL EXPECTATION
give an example (other than water) of something that looks continuous at a
distance, but, when you get up close, turns out to be discrete
explain why "expected value" may be a misleading name for the mean of a
20
probability distribution
describe how to compute a weighted average
BINOMIAL DISTRIBUTION:
explain why rolling a die is or is not a binomial experiment
explain why drawing red/black cards from a deck of 52 without replacement is or is
not a binomial experiment
explain why drawing red/black cards from a deck of 52 with replacement is or is not
a binomial experiment
NORMAL DISTRIBUTION
describe conditions under which the normal distribution is symmetric
describe the kind of shift in the graph of a normal distribution caused by a change in
the mean
describe the kind of shift in the graph of a normal distribution caused by a change in
the standard deviation
explain why, as the sample size increases, the distribution of sample means clusters
more and more closely around the population mean
Part IV
Sampling Distributions
Sampling distribution of the mean--the distribution of the means of many samples of
the same size drawn from the same population
Central Limit Theorem--three statements about the sampling distribution of sample
means:
1. Sampling distribution of the means is normal in shape, regardless of the
population distribution shape when the sample size, n, is large. (When n is
small, the population must be normal in order for the sampling distribution of
the mean to be normal.) ("Large" n is usually taken to be 30 or more.)
2. Sampling distribution of the means is centered at the true population mean.
3. Sampling distribution of the means has a standard deviation equal to σ / n.
This quantity is called the sampling standard deviation or the standard error
(of the mean).
(The full name is "standard deviation of the sampling distribution of the
mean(s).”)
This quantity is represented by the symbol σx bar.
σx bar is less than σ because of the offsetting that occurs within the sample. The larger
the sample size n, the smaller the σx bar (standard error), because the larger the
n, the greater the amount of offsetting that can occur, and the sample means
will cluster more closely around the true population mean μ.
Sampling standard deviation (σx bar or standard error)--key value for inferential
statistics
Two uses of the standard error
Computing the error factor in interval estimation
Computing the test statistic (zc or tc) in hypothesis testing
21
Terminology
normal distribution, normal distribution parameters, mean, standard deviation,
standard normal distribution, z-value, reliability, validity, sampling distribution,
central limit theorem (three parts), sampling standard deviation, standard error,
offsetting, effect of the sample size on the sampling standard deviation (standard
error).
Skills/Procedures--given appropriate data,
determine a normal probability (area), given x, μ, and σ
determine x, given μ, σ, and the normal probability (area)
determine μ, given x, σ, and the normal probability (area)
determine σ, given x, μ, and the normal probability (area)
solve various applications problems involving the normal distribution
compute the sampling standard deviation (standard error) from the population
standard deviation and the sample size
solve various applications problems involving the central limit theorem
Concepts-describe conditions under which the normal distribution is symmetric
describe the kind of shift in the graph of a normal distribution caused by a change in
the mean
describe the kind of shift in the graph of a normal distribution caused by a change in
the standard deviation
explain why, as the sample size increases, the distribution of sample means clusters
more and more closely around the population mean
Part V
Interval Estimation--Large Samples
Four Types of Problems
Means--one-group; two-group
Columns one and two of the four-column formula sheet
Proportions--one-group; two-group
Columns three and four of the four-column formula sheet
Confidence level (confidence coefficient)--the probability that a confidence interval will
actually contain the population parameter being estimated (confidence interval is a
range of values that is likely to contain the population parameter being e
stimated).
90%, 95%, and 99% are the most popular confidence levels, and correspond to zvalues of 1.645, 1.960, and 2.576, respectively.
Of these, 95% is the most popular, and is assumed unless another value is
mentioned.
Error (uncertainty) factors express precision, as in 40 ± 3.
Upper confidence limit--the point estimate plus the error factor, 43 in this example
Lower confidence limit--the point estimate minus the error factor, 37 in this
example
Error factor is the product of the z-value and the standard error: zt * σx bar.
Required sample sizes for desired precision may be computed.
22
Increased precision means a lower error factor.
Precision can be increased by increasing the sample size, n.
Increasing n lowers the standard error, since the standard error = σ / n.
Taken to the extreme, every member of the population may be sampled, in
which case the error factor becomes zero--no uncertainty at all--and the
population parameter is found exactly.
Economic considerations--the high cost of precision
The required increase in n is equal to the square of the desired increase in
precision.
To double the precision--to cut the error factor in half--the sample size must be
quadrupled. Doubling the precision may thus quadruple the cost.
To triple the precision--to cut the error factor to 1/3 of its previous value, n must be
multiplied by 9.
Hypothesis Testing--Large Samples
Four Types of Problems--Four-column formula sheet
Means--one-group; two-group
Proportions--one-group; two-group
Null (Ho) and alternate (Ha) hypotheses
Means, one-group
H0: μ = some value
Ha: μ ≠ that same value (two-sided test)
μ > that same value (one-sided test, high end, right side)
μ < that same value (one-sided test, low end, left side)
Means, two-group
H0: μ1 = μ2
Ha : μ1 ≠ μ2 (two-sided test)
μ1 > μ2 (one-sided test, high end, right side)
μ1 < μ2 (one-sided test, low end, left side)
Proportions, one-group
H0: π = some value
Ha : π ≠ that same value (two-sided test)
π > that same value (one-sided test, high end, right side)
π < that same value (one-sided test, low end, left side)
Proportions, two-group
H0: π1 = π2
Ha : π1 ≠ π2 (two-sided test)
π1 > π2 (one-sided test, high end, right side)
π1 < π2 (one-sided test, low end, left side)
Type I error
Erroneous rejection of a true H0
Probability of a Type I error is symbolized by α (alpha)
Type II error
Erroneous acceptance of a false H0
Probability of a Type II error is symbolized by β (beta)
Selecting α--based on researcher’s attitude toward risk
α--the researcher's maximum tolerable risk of committing a type I error
0.10, 0.05, and 0.01 are the most commonly used.
23
Of these, 0.05 is the most common--known as "the normal scientific standard
of proof."
Table-z (critical value); symbolized by zt; determined by the selected α value
α
2-sided z 1-sided z
0.10
1.645
1.282
0.05
1.960
1.645
0.01
2.576
2.326
Calculated-z (test statistic); symbolized by zc
Fraction--"signal-to-noise" ratio
Numerator ("signal")--strength of the evidence against H0
Denominator ("noise")--uncertainty factor for the numerator
Rejection criteria
Two-sided test: |zc| >= |zt|; also p <= α
One-sided test: |zc| >= |zt|, AND zc and zt have the same sign; also p <= α
Significance level (p-value) ("p" stands for probability)
Actual risk (probability) of a Type I error if H0 is rejected on the basis of the
experimental evidence
Graphically, the area beyond the calculated z-value, zc.
Treatment--in a column-2 test, the difference that the experimenter introduces
between the two groups
Terminology
inferential statistics, sample mean, population mean, estimator, estimate, unbiased
estimator, point estimate, interval estimate, confidence interval, degree of confidence,
confidence level, table-z, error factor, required sample size, upper confidence limit,
lower confidence limit, hypothesis test, null hypothesis, alternate hypothesis, type I
error, α, type II error, β, calculated-z (test statistic), critical region, table-z (critical
value of z), rejection of the null hypothesis, non-rejection of the null hypothesis, pvalue, hypothesis-test conclusion, independent samples, standard error of the
difference, sample proportion, population proportion, pooled proportion (two-group
proportion cases), treatment
Skills/Procedures
• given appropriate data, conduct estimation and hypothesis testing on the
population mean of one group (col. 1), using these ten steps:
1. make a point estimate of a population mean
2. compute the sampling standard deviation (standard error) of the sample
means
3. compute and interpret the error factor for the interval estimate for the 90%,
95%, and 99% confidence levels
4. determine the sample size needed to obtain a given desired error factor
5. state the null and alternate hypotheses regarding the population mean
6. determine the table-z (critical value of z) for alpha levels of 0.10, 0.05 and
0.01
7. compute the calculated-z (test statistic)
8. draw the appropriate hypothesis-test conclusion based on the given level of α,
the table-z (critical value) and the calculated-z (test statistic)
24
9. interpret the conclusion
10. determine and interpret the p-value
• given appropriate data, conduct estimation and hypothesis testing on the
population means of two groups (col. 2), using these ten steps:
1. make a point estimate of the difference between population means
2. compute the sampling standard deviation (standard error) of the difference
between sample means
3. compute and interpret the error factor for the interval estimate for the 90%,
95% and 99% confidence levels
4. determine the sample size needed to obtain a given desired error factor
5. state the null and alternate hypotheses regarding the difference between
population means
6. determine the table-z (critical value of z) for alpha levels of 0.10, 0.05 and
0.01
7. compute the calculated-z (test statistic)
8. draw the appropriate hypothesis-test conclusion based on the given level of α,
the table-z, and the calculated-z
9. interpret the conclusion
10 determine and interpret the p-value
•
given appropriate data, conduct estimation and hypothesis testing on the
population proportion of one group (col. 3), using these ten steps:
1. make a point estimate of a population proportion
2. compute the sampling standard deviation (standard error) of the sample
proportions
3. compute and interpret the error factor for the interval estimate for the 90%,
95% and 99% confidence levels
4. determine the sample size needed to obtain a given desired error factor
5. state the null and alternate hypotheses regarding the population proportion
6. determine the table-z (critical value of z) for alpha levels of 0.10, 0.05 and
0.01
7. compute the calculated-z (test statistic)
8. draw the appropriate hypothesis-test conclusion based on the given level of α,
the table-z and the calculated-z
9. interpret the conclusion
10. determine and interpret the p-value
•
given appropriate data, conduct estimation and hypothesis testing on the
population proportions of two groups (col. 4), involving these ten steps:
1. make a point estimate of the difference between population proportions
2. compute the sampling standard deviation (standard error) of the difference
between sample proportions
3. compute and interpret the error factor for the interval estimate for the 90%,
95% and 99% confidence levels
4. determine the sample size needed to obtain a given desired error factor
25
5. state the null and alternate hypotheses regarding the difference between
population proportions
6. determine the table-z (critical value of z) for alpha levels of 0.10, 0.05 and
0.01
7. compute the calculated-z (test statistic)
8. draw the appropriate hypothesis-test conclusion based on the given level of α,
the table-z and the calculated-z
9. interpret the conclusion
10. determine and interpret the p-value
Concepts-explain why a confidence interval becomes larger as the confidence level increases
explain why a confidence interval becomes smaller as the sample size increases
describe the nature of the trade-off between precision and cost
identify the type of error that is made if the null hypothesis is "the defendant is
innocent," and an innocent defendant is erroneously convicted
identify the type of error that is made if the null hypothesis is "the defendant is
innocent," and a guilty defendant is erroneously acquitted
explain why a researcher seeking to reject a null hypothesis may tend to prefer a
one-sided alternative hypothesis
26
COURSE PHILOSOPHY -- STATISTICS
In an article in the Chronicle of Higher Education, Sharon Rubin, assistant dean at the
University of Maryland, states that all course syllabi, in addition to providing the basic
information on texts, topics, schedule, etc., should answer certain questions. The
instructor of this course would like to share these questions with you, and provide some
answers.
You are what you know. You are what you can do.
"What value can you add to our organization?"
1. WHY SHOULD A STUDENT WANT TO TAKE THIS COURSE?
As a decision-maker, you must learn how to analyze and interpret quantitative
information. Such skills will improve your ability to adopt the questioning attitude
and independence of thought that are essential to leadership and success in any
field. You may also have the opportunity to introduce statistical data analyses in
areas where they are not currently in use, thus improving the quality of your
organization's decisions.
2. WHAT IS THE RELEVANCE OF THIS COURSE TO THE DISCIPLINE?
Statistics courses are part of the curriculum in many of BU's programs. But since
this course is part of a program leading to a degree in business, let us interpret the
word "discipline" in this question to mean "management." This can refer to
marketing management, financial management, human resource management, etc.,
even the management of your personal affairs. To MANAGE something requires the
ability to exert some CONTROL over it, and the ability to exert control requires
identification of DEPENDENCIES. In order to manage sales performance, for
example, you must find things upon which sales depends (e.g. advertising budget;
product price; number, training, and compensation of salespersons; interest rates;
and competitive factors), and learn something about the nature of the
dependencies. Statistics is the major tool for identifying dependencies.
Another example of the importance of identifying dependencies: a new disease
appears. Researchers immediately try to find things that enhance the occurrence
rate or the severity of the illness (positive dependencies), and things that reduce
them (negative dependencies). Only after such things are found can there be any
hope of controlling the disease. Again, statistical analysis plays a major role.
Or, the objective may simply be to know more about how the world works.
So-called "pure research" has no immediate application, but seeks to find
relationships among things, thereby securing knowledge that may become useful in
the future.
27
CAREFUL STATISTICAL ANALYSIS OF DATA OFTEN RESULTS IN THE
IDENTIFICATION OF DEPENDENCIES, and this is the reason why statistics is an
important tool in virtually all disciplines.
3. HOW DOES THIS COURSE FIT INTO THE "GENERAL EDUCATION" PROGRAM?
Statistics is a major way in which human beings learn about the world, and how to
control it. To be familiar with a tool as fundamental and important as this is a
responsibility of every educated person.
Statistics can be viewed as applied quantitative logic, usually seeking to make
inferences about unknown parameters on the basis of observations and
measurements of samples drawn from a target population.
The study of statistics can promote clear and careful thinking, enhance problemsolving skills, and strengthen one's ability to avoid premature conclusions. These
are traits of the educated person, and are the mental qualities essential for
"knowledge workers" in modern society.
4. WHAT ARE THE OBJECTIVES OF THE COURSE?
The most important objective is the development of your ability to learn this kind of
material on your own, and to continue learning more about the subject after the
course is over. Continuous and independent learning is an important activity of
every successful person. In connection with the objective of independent learning,
the instructor will expect students to study and learn certain topics in the course
without formal discussion of them in class. Questions on these topics, of course, are
always welcomed and encouraged.
With respect to specific objectives, they are: that students learn the terminology,
theory, principles, and computational procedures related to basic descriptive and
inferential statistics; and the careful cultivation of the logical processes involved in
statistical inference. This will enable students to understand statistics and
communicate statistical ideas using generally-accepted terminology.
Another important objective is that students become aware of the limitations of
various statistical procedures. This is particularly important since most students in
this course will be consumers rather than providers of statistical information and
conclusions. Estimates and forecasts, for example, are generally regarded with too
much faith, and relied upon to a degree not warranted in light of their inherent
limitations.
5. WHAT MUST STUDENTS DO TO SUCCEED IN THIS COURSE?
Your activities in this course should include: reading and studying the relevant
sections of the text; attending class and taking notes; rewriting, reviewing, and
studying your notes; working the recommended exercises in the text; practicing and
experimenting with various spreadsheet files supplied by the instructor; asking and
28
answering questions in class; spending time just thinking about the procedures and
their underlying logic; forming a study group with other students to review notes on
terminology and concepts, and to practice problem-solving skills; and taking the
quizzes.
These activities should help you to further develop your abilities to read, listen,
record, and organize important information; and to communicate, analyze, compute,
and learn independently the subject matter of statistics.
In order to do well, students must recognize a basic difference between courses like
statistics and courses like history, philosophy, management or organizational
strategy. In the latter type, the emphasis is often on general ideas in broad
contexts, with grades based on essay exams and term papers in which students
have considerable latitude to choose what they are going to discuss. The cogent
expression and defense of well-reasoned opinion are highly valued. Students with
good verbal, logical and writing skills often excel in this type of course. Statistics,
on the other hand, is a skills course, requiring precise knowledge of concepts,
terminology, and computational procedures. Verbal skills are still important, but
now quantitative logic and computational competence are also critical. Grades are
based on knowledge of terminology and concepts, and even more on the ability to
get the right answers to problems.
Regarding study strategy, it is extremely important for most students to read about
statistics, to think about statistics and to do a few problems every day. The most
common error is to neglect the material until shortly before a quiz. But for most
students, many of the concepts in statistics are new and strange, and there will be
many places where they are stopped cold: "What?" "I just don't get this!" Then
there is no time left to cultivate the understanding of new concepts and to refine the
computational procedures. Anyone can learn statistics, but most cannot do it
overnight.
As with most courses, this course is organized with the most fundamental material
coming first. In learning a new language, or how to play a musical instrument, or
any new set of skills, mastery of the basics is essential to success later on. The
subject matter of statistics is not like history, where, if you did not study 14th
century France, it probably did not affect your learning about 17th century England.
In statistics, failure to obtain a good understanding of earlier material will have a
serious adverse effect on your ability to make sense out of what comes later. It is
therefore essential to build a solid foundation of fundamental knowledge early in the
course in order to support the more elaborate logical and computational structures
involved later.
6. WHAT ARE THE PREREQUISITES FOR THE COURSE?
The primary prerequisite is a logical mind. This course is computational, but it is not
a "math" course. Mathematical theorems are not derived or proven; the need to
solve equations is very rare. The emphasis is on concrete applications rather than
29
abstract theory. Some students with good math backgrounds have done poorly,
while others with little or no math experience have done very well.
The best MBA stats student I ever had was a philosophy major who did not have
single math course at the college level. When asked about this, the he replied: "My
philosophy major gave me excellent training in logic, and that's really what this
course requires."
7. OF WHAT IMPORTANCE IS CLASS PARTICIPATION?
In this course, class participation means frequently asking relevant questions and
supplying answers (right or wrong) to the instructor's and colleagues' questions as
problems and examples are worked out and discussed. These behaviors are
evidence of active involvement with the material and will result in better learning
and an automatic positive effect on your grade. In grade border-line cases, a
history of active participation will enable the instructor to award the higher grade to
the deserving student.
8. WILL STUDENTS BE GIVEN ALTERNATIVE WAYS TO ACHIEVE SUCCESS, BASED ON
DIFFERENT LEARNING STYLES?
Different learning styles do exist. Some prefer a deductive method (deriving specific
knowledge from general principles), while others tend to prefer an inductive method
(deriving the generalities from examples). The inductive learners may need to work
a number of problems before seeing the patterns that are present. The deductive
learners may never need to work a problem--they will know instinctively what to do.
Some will not like the book, and will learn primarily from the class presentations and
discussions, while others will learn mostly from the book and will find class time to
be of lesser importance.
But the intended outcomes are the same for all--those in number 4 above.
9. WHAT IS THE PURPOSE OF THE ASSIGNMENTS?
Problems from the text may be suggested, for the purpose of providing practice in
analyzing what must be done, and in performing the required computations. Even
though computer software is available to perform calculations, students can gain
insight into the logical structure of a sequence of computational steps if they go
through them several times by hand (i.e. using simple calculators).
Computer assignments using instructor-supplied spreadsheet files will require
students to become more familiar with spreadsheet software that they probably are
or will be using in connection with their work. More importantly, the spreadsheets
allow students to experiment with data in order to investigate the quantitative
relationships involved. Such experimentation would be too tedious and timeconsuming for manual or even calculator computation.
30
10.WHAT WILL THE TESTS TEST? -- MEMORY? UNDERSTANDING? ABILITY TO
SYNTHESIZE? TO PRESENT EVIDENCE LOGICALLY? TO APPLY KNOWLEDGE IN A
NEW CONTEXT?
The tests will test your ability to recognize and use statistical terminology correctly,
and they will test your understanding of the logic and principles underlying various
statistical procedures. In addition, you will have to demonstrate your ability to solve
problems similar to those discussed in class, sometimes using computer spreadsheet
files.
There is a place for memorization in learning. It is not a substitute for
comprehension, but it is better than getting something wrong on a quiz that you
were expected to know. As with prayers among small children, memorization is
often a first step, eventually followed by understanding. But if the memorization (of
terminology, for example) is not done, it is less likely that the comprehension will
ever occur.
11.WHY HAS THIS PARTICULAR TEXT BEEN CHOSEN?
Our text is one of the most widely adopted introductory statistics books. It has gone
through several editions, and its popularity remains high. It is relatively easy to
read, and its exercise material is excellent.
12.WHAT IS THE RELATIONSHIP BETWEEN KNOWLEDGE LEVEL AND GRADES?
Consider this hypothetical but realistic situation.
Knowledge
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
Percentage Grade
Course A
Course B
100%
100%
90%
81%
80%
64%
70%
49%
60%
36%
50%
25%
40%
16%
30%
9%
20%
4%
10%
1%
Course A might be like philosophy, history, or management, where the grade is
more-or-less proportional to knowledge level. Course B might be like statistics or
other skills courses, where small deficiencies in knowledge can have disastrous
effects on results. Overstudying is the best strategy for coping with this, with the
dual payoffs of higher grades and, more importantly, greater knowledge.
31
QUIZ
HW
0.667 0.333
HW
10
20
QUIZ 100
95
90
85
80
75
70
65
60
55
50
45
40
30
20
70.0
73.4
66.7
70.0
63.4
66.7
60.0
63.4
56.7
60.0
53.4
56.7
50.0
53.4
46.7
50.0
43.4
46.7
40.0
43.3
36.7
40.0
33.3
36.7
30.0
33.3
23.3
26.7
16.7
20.0
A
B
C
D
F
30
40
50
60
70
80
90
95
76.7
80.0
83.4
86.7
90.0
93.3
96.7
98.3
73.4
76.7
80.0
83.3
86.7
90.0
93.3
95.0
96.7
95
70.0
73.4
76.7
80.0
83.3
86.7
90.0
91.7
93.3
90
66.7
70.0
73.3
76.7
80.0
83.3
86.7
88.3
90.0
85
63.4
66.7
70.0
73.3
76.7
80.0
83.3
85.0
86.7
80
60.0
63.3
66.7
70.0
73.3
76.7
80.0
81.7
83.3
75
56.7
60.0
63.3
66.7
70.0
73.3
76.7
78.3
80.0
70
53.3
56.7
60.0
63.3
66.7
70.0
73.3
75.0
76.7
65
50.0
53.3
56.7
60.0
63.3
66.7
70.0
71.7
73.3
60
46.7
50.0
53.3
56.7
60.0
63.3
66.7
68.3
70.0
55
43.3
46.7
50.0
53.3
56.7
60.0
63.3
65.0
66.7
50
40.0
43.3
46.7
50.0
53.3
56.7
60.0
61.7
63.3
45
36.7
40.0
43.3
46.7
50.0
53.3
56.7
58.3
60.0
40
30.0
33.3
36.7
40.0
43.3
46.7
50.0
51.6
53.3
30
23.3
26.7
30.0
33.3
36.7
40.0
43.3
45.0
46.6
20
32
100
100 100
I can use Excel to
perform basic computations
prepare tables
create charts and graphs
conduct common statistical procedures
create dashboards
handle large data sets—“big data”
I can use Word to
create various kinds of documents
I can
compute
means
medians
variances
standard deviations
confidence intervals for means and proportions
use the
binomial distribution to answer probability questions
normal distribution to answer probability questions
chi-square distribution to answer probability questions
F distribution to answer probability questions
conduct
hypothesis tests on
the means of one group or two
the proportions of one group or two
hypothetical vs. observed distributions
variances of one group or two
group means using ANOVA
regression analysis to examine correlation and make forecasts
I can
perform financial analysis
compute the NPV of various investment opportunities
decide between using debt or equity to raise new funds
determine the optimum mix of debt and equity financing
compute cost-of-capital
decide whether to make or buy components for our products
determine how much direct labor, direct materials, and overhead is going into
our products
create cash budgets
conduct cost-volume-profit analyses
prepare a master budget
prepare performance reports using standard costs and variances
employ the scientific method to study problems that may come up
33