Download STA2023 Statistical Methods Class NOTES

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Choice modelling wikipedia , lookup

Data assimilation wikipedia , lookup

Forecasting wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
STA2023 Statistical Methods NOTES – Prof L. Blanchette
Textbook: Elementary Statistics: A Step by Step Approach, a Brief Version, 5th Edition
Allan G. Bluman, McGraw-Hill, 2010
CHAP 1:
This chapter is an introduction to statistics. Read through carefully; focus on the broad
concepts and pay special attention to the following:















Statistics: The sciences of conducting studies to collect, organize, summarize,
analyze and draw conclusions from data.
Data: measurements, counts, outcomes, or observations
Variable: a characteristic or attribute that can assume different values [such as
temperature, height, etc.]
Random Variable: a variable whose value is determined by chance [such as the face
value of a die.]
Descriptive Statistics: used to describe what was or is actually observed
Inferential Statistics: used to generalize from samples to populations, to estimate, to
predict, to perform hypothesis testing, or to determine relationships; uses probability.
Population: all subjects of the study. The population must be clearly defined.
Sample: a subset of the population; a group from the population. A sample gives
useful, reliable information about the population if it is selected correctly and if it is
large enough.
Probability: has to do with the chance or likelihood of an event (or a specific
outcome) occurring; enables us to predict future occurrences.
Qualitative Variable: non-numeric; descriptive [such as color, gender, etc.]
Quantitative Variable: numerical [such as age, time, etc.]
Discrete Variable: assumes values that can be counted; there are a finite or countable
number of possible outcomes. [Such as number of students, number of cars, etc.]
Continuous Variable: can assume infinitely many values between any two specific
possible values; obtained by measuring. [Such as weight, volume, length, etc.]
Continuous data values must be rounded: for example 7 inches long implies a length
from 6.5 inches to 7.5 inches, not including 7.5 (which would actually round up to 8
in.) The boundaries of 7 are written (for convenience) as 6.5-7.5 inches.
Levels of Measure:
(1)
Nominal: think naming. Nominal data classifies data into mutually exclusive
(non-overlapping) exhaustive categories in which no order or ranking can be
imposed; differences between data values and averages are not meaningful.
[Examples: zip codes, gender, etc.]
(2)
Ordinal: think ordering. Ordinal data classifies data into categories that can
be ordered or ranked; differences between data and averages are not
meaningful. [Examples: Small, Medium, Large; A, B, C letter grades, etc.]
Page 1 of 3
STA2023 Statistical Methods NOTES – Prof L. Blanchette
(3)
(4)







Interval: think no natural zero. Interval data classifies data into categories
which can be ordered or ranked, in which precise differences between data
values and averages of values are meaningful, but in which there is no
meaningful zero – that is, zero does not imply “none.” [Examples:
temperature, the calendar year, IQ scores, etc.]
Ratio: think “does twice as much” make sense? Ratio data are like interval
data except the zero is meaningful. Ratios are also meaningful. [Examples:
height, weight, time, age, cost, etc.]
Collecting Data: some possible concerns involve time, cost, poorly worded
questions, bias of the question or interviewer, order of the questions, emphasis,
ethical considerations, moral considerations, affect on the subject (may need to break
the object in order to get the data result), etc. Often the entire population cannot be
accessed and a sample is used instead.
Bias: tending towards a particular result or outcome.
Unbiased sample: an unbiased sample is representative of the population; each
subject in the population has an equally likely chance of being selected.
Four Basic Methods for Sampling:
(1)
Random Sampling: Uses chance methods or random numbers generated
by a calculator or computer; close your eyes and point to a random number
on a table of such values, begin here and use the next few values according
to the desired sample size.
(2)
Systematic Sampling: List all the subjects (if possible,) randomly select a
place to begin, and then choose every kth subject for the sample.
[Example, use every 12th subject from the list.]
(3)
Stratified Sampling: Divide the population into groups (strata) according
to some important characteristic (like age, gender, race, etc.) and then
randomly select subjects from each group.
(4)
Cluster Sampling: Divide the population into groups (clusters) according
to some criteria like geographical or organizational factors; randomly
select a number of these clusters and use all subjects from these clusters.
[Example: specific elementary schools, specific districts, etc.]
Convenience Sampling: when subjects are selected for the sample based on
convenience; this sample is most likely biased and does not represent the population.
[Example: asking only your best friends or surveying only shoppers at one mall close
to your home.]
Observational Study: a study in which you observe and record, summarize, analyze,
and interpret (draw conclusions) based on what has or is occurring, without any input
or interference on your part. Advantages include: natural setting, may be more
ethical, may be less expensive. Disadvantages include: cannot directly control the
study, may take more time, and may involve data collected by others which is of
unknown reliability.
Experimental Study: a study in which you manipulate one of the variables and see
what happens. Advantages include: the researcher selects the subjects and directly
manipulates the variable, the study is in a controlled environment like a lab or a test
tube, and the subject may be an animal rather than a human. Disadvantages: the
Page 2 of 3
STA2023 Statistical Methods NOTES – Prof L. Blanchette













experiment may be costly, the manipulation may be unethical or immoral, the setting
is often not a natural one and so the results in “real life” may not duplicate the results
found in this artificial setting.
Independent Variable: the variable being manipulated
Dependent Variable: the resultant (outcome) variable; this is the variable being
studied.
Treatment Group: this group receives treatment (the variable that is manipulated.)
Control Group: this group receives no treatment or receives a placebo (a fake
treatment that they think is the real thing.)
Hawthorne Effect: when the subjects change their behavior because they know they
are being studied.
Confounding Variable: a different variable not controlled or accounted for, which
influences the outcome.
Suspect Samples: suspect samples are samples that are too small and/or incorrectly
selected; may be self-selected, may be made up of volunteers, may be from one
particular group only, may be a convenience sample, may be too few subjects to
convey meaningful data.
Overgeneralization: be careful to clearly define the population, identify the
sampling method used and do not over generalize the results; the results may not
apply to another region, another culture, another time period, etc.
Ambiguous Averages: when mean, median, mode or midranges are all referred to as
averages.
Misleading Graphs: graphs that imply relationships, proportions or differences that
are not correct.
Implied Connections: when two variables are implied to be related in a significant
way by using words such as “may help,” “suggest,” “in some cases,” or “up to…”
Faulty Survey Questions: some problems include leading questions, the order of the
questions, and the emphasis placed on certain aspects of a question.
Computers, Calculators, and Software Programs: technology helps with
numerical computations, saves time, and allows us to process huge data bases.
Remember to enter the data carefully, use appropriate commands or menu options,
and interpret the results correctly. Do not merely record an answer derived from a
calculator or computer without justifying the answer and showing proper steps
towards the conclusions.
Page 3 of 3