Download UNIT 1 Why you need to use statistics in your research

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
МІНИСТЕРСТВО ОСВІТИ І НАУКИ УКРАЇНИ
СХІДНОУКРАЇНСЬКИЙ НАЦІОНАЛЬНИЙ УНІВЕРСИТЕТ
Імені ВОЛОДИМИРА ДАЛЯ
МЕТОДИЧНІ ВКАЗІВКИ
до самостійної роботи з учбовими текстами по англійській мові
(для студентів І-ІІ курсу спеціальності «Статистика»)
Дані методичні вказівки й навчальні завдання призначені для студентів I-II курсів
спеціальності «Статистика». Основна їх ціль - розвиток навичок усного мовлення.
Тексти підібрані з наукової літератури й носять пізнавальний характер. При доборі
текстового матеріалу автор ураховувала інформативну цінність, їхню відповідність рівню
мовної підготовки.
Різні вправи носять творчий характер, стимулюють швидку мовну реакцію. Розуміння
текстів контролюється відповідями на питання різного характеру, а також коротким
переказом текстів. Деякі завдання носять дискусійний характер і дають можливість
виразити своє ставлення до обговорюваної теми.
Методичні вказівки призначені для аудиторної й поза-аудиторної роботи
UNIT 1 Why you need to use statistics in your research
1. Practice the pronunciation of the words:
Procedure, analyse, quantitative, obtain, statistician, scientific, approach, recipient,
phenomenon, occurrence, confusion, technique, concern, area, inference, conclusion,
hypothesis.
2. Look through the text and find the explanation of the following words:
Statistics
Data
Sample
inference
hypothesis
3. Read and translate the text:
What is statistics?
Put simply, statistics is a range of procedures for gathering, organising, analysing and
presenting quantitative data. ‘Data’ is the term for facts that have been obtained and
subsequently recorded, and, for statisticians, ‘data’ usually refers to quantitative data that are
numbers. Essentially therefore, statistics is a scientific approach to analyzing numerical data in
order to enable us to maximise our interpretation, understanding and use. This means that
statistics helps us turn data into information; that is, data that have been interpreted, understood
and are useful to the recipient. Put formally, for your project, statistics is the systematic
collection and analysis of numerical data, in order to investigate or discover relationships among
phenomena so as to explain, predict and control their occurrence. The possibility of confusion
comes from the fact that not only is statistics the techniques used on quantitative data, but the
same word is also used to refer to the numerical results from statistical analysis.
In very broad terms, statistics can be divided into two branches – descriptive and inferential
statistics.
1. Descriptive statistics is concerned with quantitative data and the methods for describing
them. (‘Data’ (facts) is the plural of ‘datum’ (a fact), and therefore always needs a plural verb.)
This branch of statistics is the one that you will already be familiar with because descriptive
statistics are used in everyday life in areas such as government, healthcare, business, and sport.
2. Inferential (analytical) statistics makes inferences about populations (entire groups of
people or firms) by analysing data gathered from samples (smaller subsets of the entire group),
and deals with methods that enable a conclusion to be drawn from these data. (An inference is an
assumption, supposition, deduction or possibility.) Inferential statistics starts with a hypothesis
(a statement of, or a conjecture about, the relationship between two or more variables that you
intend to study), and investigates whether the data are consistent with that hypothesis.
Statistical processing requires mathematics. One of the major problems any researcher
faces is reducing complex situations or things to manageable formats in order to describe,
explain or model them. This is where statistics comes in. Using appropriate statistics, you will be
able to make sense of the large amount of data you have collected so that you can tell your
research story coherently and with justification. Put concisely, statistics fills the crucial gap
between information and knowledge.
4. Complete the following sentences:
 Statistics is a range of procedures for…
 ‘Data’ usually refers to quantitative…
 Statistics helps us turn data…
 Statistics is the systematic collection and…
 Statistics can be divided into…
 Descriptive statistics is concerned with…
 Inferential (analytical) statistics makes inferences about….
 Using appropriate statistics, you will be able to make…
5. Ask 10 general questions to the text.
6. Answer the following questions:
 What does the term “data” mean?
 What does statistics analyze?
 What branches can statistics be divided?
 What is descriptive statistics concerned with?
 What does inferential statistics deal with?
 What does inferential statistics start with?
 What is the major problem any researcher faces?
7. Get ready to speak about what statistics is. Use the following phrases:
 Put simply…
 Essentially therefore….
 This means that….
 Put formally…
 In very broad terms…
UNIT 2 A very brief history of statistics.
1. Practice the pronunciation of the words:
Derive, affair, figure, community, specialist, society, decision, technique, supply, launch,
permeate.
2. Read and memorize the following words and word combinations. Use them in the
sentences. Pay attention to the prepositions.
To deal with
Provide
Make sense of
Reliable
Come in handy
depend on
gut feeling
to cope with
to launch (a new product)
3. Read and translate the text:
The word ‘statistics’ derives from the modern Latin term statisticum collegium (council
of state) and the Italian word statista (statesman or politician). ‘Statistics’ was used in 1584 for a
person skilled in state affairs, having political knowledge, power or influence by Sir William
Petty, a seventeenth-century polymath and statesman, used the phrase ′political arithmetic′ for
‘statistics’. (A book entitled Sir William Petty, 1623–1687, written by Lord Edmond
Fitzmaurice, and published in London in 1895, quotes Petty as saying that ‘By political
arithmetic, we mean the art of reasoning by figures upon things relating to government’.) By
1787, ‘statistic’ (in the singular), meant the science relating to the branch of political science
dealing with the collection, classification and discussion of facts bearing on the condition of a
state or a community.
‘Statists’ were specialists in those aspects of running a state which were particularly
related to numbers. This encompassed the tax liabilities of the citizens as well as the state’s
potential for raising armies. The word ‘statistics’ is possibly the descendant of the word ‘statist’.
By 1837, statistics had moved into many areas beyond government. Statistics, used in the plural,
were (and are) defined as numerical facts (data) collected and classified in systematic ways. In
current use, statistics is the area of study that aims to collect and arrange numerical data, whether
relating to human affairs or to natural phenomena.
The importance of statistics
It is obvious that society can’t be run effectively on the basis of hunches or trial and error,
and that in business and economics much depends on the correct analysis of numerical
information. Decisions based on data will provide better results than those based on intuition or
gut feelings.
What applies to this wider world applies to undertaking research into the wider world. And
learning to use statistics in your studies will have a wider benefit than helping you towards a
qualification. Once you have mastered the language and some of the techniques in order to make
sense of your investigation, you will have supplied yourself with a knowledge and understanding
that will enable you to cope with the information you will encounter in your everyday life.
Statistical thinking permeates all social interaction. For example, take these statements:
 The earlier you start thinking about the topic of your research project, the more likely it is
that you will produce good work.’
 You will get more reliable information about that from a refereed academic journal than a
newspaper.’
 ‘On average, my journey to work takes 1 hour and 40 minutes.’
 ‘More people are wealthier now than ten years ago.’
Or these questions:
 ‘Which university should I go to?’
 ‘Should I buy a new car or a second-hand one?’
 ‘Should the company buy this building or just rent it?’
 ‘Should we invest now or wait till the new financial year?’
 ‘When should we launch our new product?’
All of these require decisions to be made, all have costs and benefits (either financial or
emotional), all are based upon different amounts of data, and all involve or necessitate some kind
of statistical calculation. This is where an understanding of statistics and knowledge of statistical
techniques will come in handy.
4. Complete the following sentences:







The word ‘statistics’ derives from…
First it was used by….
‘Statists’ were specialists in….
Society can’t be run effectively on….
In business and economics much depends on….
Decisions based on data will provide…
Statistical thinking permeates….
5. Ask 10 disjunctive questions to the text.
6. Answer the following questions:







What does the word statistics derives from?
When was the word “statistics” used for the first time?
Who introduced the word “statistics”?
Who was called “statists”?
Why can’t society be run without statistics?
What problems will statistics enable you to cope with?
When will statistics come in handy?
7. Get ready to speak about history of statistics. Use the following phrases:
Our topic deals with…
To begin with…
I’d like to say a few words about…
I’d like to add that…
It’s necessary to note that…
One cannot help saying about…
In addition…
That’s why….
Why you need to use statistics
Much of everyday life depends on making forecasts, and business can’t progress without
being able to audit change or plan action. In your research, you may be looking at areas such as
purchasing, production, capital investment, long-term development, quality control, human
resource development, recruitment and selection, marketing, credit risk assessment or financial
forecasts or others.
And that is why the informed use of statistics is of direct importance to you while you are
collecting your data and analysing them. If nothing else, your results and findings will be more
accurate, more believable and, consequently, more useful.
Some of the reasons why you will be using statistics to analyse your data are the same
reasons why you are doing the research. Ignoring the possibility that you are researching because
the project or dissertation element of your qualification is compulsory, rather than because you
very much want to find something out, you are likely to be researching because you want to:
 measure things;
 examine relationships;
 make predictions;
 test hypotheses;
 construct concepts and develop theories;
 explore issues;
 explain activities or attitudes;
 describe what is happening;
 present information;
 make comparisons to find similarities and differences;
 draw conclusions about populations based only on sample results.
If you didn’t want to do at least one of these things, there would be no point to doing your
research at all.
UNIT 3 What statistical language actually means. Variable and constant. Discrete and
continuous.
Like other academic disciplines, statistics uses words in a different way than they are used
in everyday language. You will find a fuller list of the words you need to understand and use in
the Glossary.
1. Practice the pronunciation of the words:
Variable, trait, value, confusingly, particular, discrete, continuous, quantitative, certain,
height, weight, temperature, intervene, measurements.
2. Read and memorize the following words. Use them in the sentences.
Vary
Be likely to do smth
Depend on
be separated from
for instance
contain
3. Read and translate the texts:
Variable and constant
In everyday language, something is variable if it has a tendency to change. In statistical
language, any attribute, trait or characteristic that can have more than one value is called a
variable.
In everyday language, something that does not change is said to be constant. In statistical
language, an attribute, trait or characteristic that only has one value is a constant. Confusingly,
something may be a variable in one context and a constant in another. For example, if you are
looking at the spending patterns of a number of households, the number of children (which will
vary) in a particular household is a variable, because we are likely to want to know how
household spending depends on the number of children. But, if you are looking at the spending
patterns of households which have, say, three children, then the number of children is a constant.
Strictly speaking, in statistical language, when your variables and constants are categorical, for
example, eye colour or nationality, they are known as attributes.
Discrete and continuous
Quantitative variables are divided into ‘discrete’ and ‘continuous’. A discrete variable is
one that can only take certain values, which are clearly separated from one another – for
instance, a sales department can have 2 or 15 or 30 people within it. It cannot, however, contain
38 or 48.1 people. A continuous variable is one that could take any value in an interval.
Examples of continuous variables include body mass, height, age, weight or temperature. Where
continuous variables are concerned, whatever two values you mention, it is always possible to
have more values (in the interval) between them. An example of this is height – a child may be
1.21 metres tall when measured on 27th September this year, and 1.27 metres on 27 September
next year. In the intervening 12 months, however, the child will have been not just 1.22 or 1.23
or 1.24 and so on up to 1.27 metres, but will have been all the measurements possible, however
small they might be, between 1.21 and 1.27.
Sometimes the distinction between discrete and continuous is less clear. An example of this
is a person’s age, which could be discrete (the stated age at a particular time, 42 in 2007) or
continuous, because there are many possible values between the age today (42 years, 7 weeks
and 3 days) and the age next week (42 years, 8 weeks and 3 days).
4. Write out the definitions out of the texts:
Variable (in everyday language);
Variable (in statistics);
Discrete variable;
Continuous variable;
Constant (in everyday language);
Constant (in statistics);
Attributes.
5. Complete the following sentences:
 In every day language variable….
 In statistical language variable is…..
 In every day language constant ….
 In statistical language constant is …..
 In statistical language, when your variables and constants are categorical…
 Quantitative variables are divided into…
 A discrete variable is…
 A continuous variable is ….
 Where continuous variables are concerned….
 Sometimes the distinction between discrete and….
6. Ask 10 general questions to the text.
7. Give examples of:
Variable (in everyday language);
Variable (in statistics);
Discrete variable;
Continuous variable;
Constant (in everyday language);
Constant (in statistics).
UNIT 4 What statistical language actually means. Cardinal and ordinal. Population and
sample.
1. Practice the pronunciation of the words:
Cardinal, ordinal, highlight, subtract, multiply, divide, techniques, quantitative, applicable,
item, route.
2. Read and memorize the following words. Use them in the sentences.
highlight
express
item
be interested in
own
destination
consern
3. Read and translate the texts:
Cardinal and ordinal
We will deal with cardinal and ordinal numbers later, but here we want to highlight that a
‘cardinal’ in statistics is not a person with high-rank in the Roman Catholic church. Cardinal
numbers are 1, 2, 3 and so on, and they can be added, subtracted, multiplied and divided. An
ordinal number describes position 1st, 2nd, 3rd and so on), and they express order or ranking,
and can’t be added, subtracted, multiplied or divided. Most of the statistical techniques created
for the analysis of quantitative are not applicable to ordinal data. It is therefore meaningless (and
misleading) to use these statistical techniques on rankings.
Population and sample
In statistics, the term ‘population’ has a much wider meaning than in everyday language.
The complete set of people or things that is of interest to you in its own right (and not because
the collection may be representative of something larger) is a population. The number of items,
known as cases, in such a collection is its size. For example, if you are interested in all the
passengers on a particular plane in their own right and not as representatives of the passengers
using the airline which owns that particular plane, then those particular plane passengers are your
population.
But if you do a statistical analysis of those particular plane passengers in order to reach
some conclusion about, say, (1) all plane passengers heading to that destination, or (2) all plane
passengers on any route on that day and at that time, then the passengers are a sample. They are
being used to indicate something about the population (1) or (2). A sample is therefore a smaller
group of people or things selected from the complete set (the population). It hardly goes without
saying that you need to be clear about whether your data are your population or a sample. Most
of statistics concerns using sample data to make statements about the population from which the
sample comes.
4. Write out the definitions out of the texts:
Cardinal numbers;
ordinal numbers;
population;
sample.
5. Complete the following sentences:
 ‘Cardinal’ in statistics is not a person ….
 Cardinal numbers are 1, 2, 3 and so on, and …..
 An ordinal number describes ….
 They express order or ranking, and …..
 Most of the statistical techniques created for the analysis of quantitative …
 In statistics, the term ‘population’ has a much …
 The complete set of people or things that is of interest to …
 A sample is therefore a smaller group of people or ….
 You need to be clear about whether your data are ….
 Most of statistics concerns using sample data to make ….
6. Ask 10 general questions to the text.
7. Give examples of:
Cardinal numbers;
ordinal numbers;
population;
sample.
UNIT 5. Misuses of statistics.
1. Practice the pronunciation of the words:
Measure, manipulate, pursue, alter, ignore, distribution, inconvenient, coherently, exclude,
correlation, height, weight, causation, ethical, truthfully, inexcusable, retribution, supervisor.
Prejudice, elicit, error, procedure, questionnaire, percentage, loyalty, allegiance,
specifically.
2. Read and memorize the following words. Use them in the sentences.
Consist of
Focus on
Provide smb with
Framework
By accident
Remove
refer to
compose of
derive from
deal with
benefit
Consider
be based on
undertake
deliberately
3. Read and translate the texts:
Statistics consists of tests used to analyse data. You have decided what your research
question is, which group or groups you want to study, how those groups should be put together
or divided, which variables you want to focus on, and what are the best ways to categorise and
measure them. This gives you full control of your study, and you can manipulate it as you wish.
Statistical tests provide you with a framework within which you can pursue your research
questions. But such tests can be misused, either by accident or design, and this can result in
potential misinterpretation and misrepresentation.
You could, for instance, decide to:

alter your scales to change the distribution of your data;

ignore or remove high or low scores which you consider to be inconvenient so that your
data can be presented more coherently;

focus on certain variables and exclude others;

present correlation (the relationship between two variables, for example, height and
weight – the taller people in the sample are thinner than the shorter people) as causation
(tallness results in or is a cause of thinness).
It goes without saying that, because research is based on trust, you must undertake your
research in an ethical manner, and present your findings truthfully. Deliberately misusing your
statistics is inexcusable and unacceptable, and if it is discovered by your supervisor or examiner,
retribution will be severe.
Because you are inexperienced in research, the main errors which you might make are bias,
using inappropriate tests, making improper inferences, and assuming you have causation from
correlations.
Bias
In ordinary language, the term ‘bias’ refers simply to prejudice. It could be that when the
data you are using were collected, the respondents were prejudiced in their responses. You might
get this kind of thing if you are eliciting attitudes or opinions. In statistical language, bias refers
to any systematic error resulting from the collection procedures you used. For example, in a
questionnaire, if the non-respondents (those who haven’t answered the questionnaire) are
composed of, say, a large percentage of a higher socio-economic group, it could introduce bias
(systematic error) because you would have an under-representation of that group in your study.
Often the people with the strongest opinions, or those who have a greater interest in the results of
the research, who may derive some benefit from the results, or who have a loyalty or allegiance
to express, are more likely to respond to the questionnaire than those without those views or
interests. There are procedures that can deal with nonresponse in questionnaires and interviews.
It would benefit your research if you read up about these and included them in your research
design if you are collecting data specifically for your research project (primary data) rather than
reanalysing data that have already been collected for some other purpose (secondary data).
4. Complete the following sentences:
 Statistics consists of…..
 Statistical tests provide you with….
 Such tests can be misused, either….
 It goes without saying that, because research is based on….
 Deliberately misusing your statistics is….
 If it is discovered by your supervisor or examiner….




In ordinary language, the term ‘bias’ ….
You might get this kind of thing if …..
In statistical language, bias refers to …..
There are procedures that can deal with …..
5. Ask 10 general questions to the text.
6. Answer the following questions:









What does statistics consist of?
What should you decide before starting your research?
What do statistical tests provide you with?
Why can tests be misused?
Why must you undertake your research in an ethical manner?
What errors might you make in your research?
What does the term ‘bias’ refer to in ordinary language?
When might you get this kind of thing?
What does the term ‘bias’ refer to in statistical language?
7. Sum up all the information about statistics and discuss this topic with your groupmates according to the plan:
 Definition of statistics;
 Descriptive statistics;
 Inferential statistics;
 Brief history of statistics;
 Importance of statistics;
 Variable and constant in statistics;
 Discrete and continuous in statistics;
 Cardinal and ordinal;
 Population and sample;
 Misuses in statistics;
 Bias.
The expressions below will help you to start and continue the conversation:
I didn't know about...;
Our topic deals with...;
It is important to mention...;
To begin with...:
By the way...;
I'd like to say a few words about...;
Furthermore...;
I'd like to add that...;
Move over...;
It is necessary to note that...;
I could add that...;
One cannot help saying about...;
In addition…
It should also be mentioned..,;
I would say that...;
In addition...;
As a matter of fact...;
It is essential to note...;
That's why....
It is common knowledge that...;
As far as I know...;
Test Yourself
1. The portion of the population that is selected for analysis is called:
(a) a sample
(b) a frame
(c) a parameter
(d) a statistic
2. A summary measure that is computed from only a sample of the population is called:
(a) a parameter
(b) a population
(c) a discrete variable
(d) a statistic
3. The height of an individual is an example of a:
(a) discrete variable
(b) continuous variable
(c) categorical variable
(d) constant
4. The body style of an automobile (sedan, coupe, wagon, etc.) is an example of a:
(a) discrete variable
(b) continuous variable
(c) categorical variable
(d) constant
5. The number of credit cards in a person’s wallet is an example of a:
(a) discrete variable
(b) continuous variable
(c) categorical variable
(d) constant
6. Statistical inference occurs when you:
(a) compute descriptive statistics from a sample
(b) take a complete census of a population
(c) present a graph of data
(d) take the results of a sample and draw conclusions about a population
7. The human resources director of a large corporation wants to develop a dental benefits
package and decides to select 100 employees from a list of all 5,000 workers in order to study
their preferences for the various components of a potential package. All the employees in the
corporation constitute the _______.
(a) sample
(b) population
(c) statistic
(d) parameter
8. The human resources director of a large corporation wants to develop a dental benefits
package and decides to select 100 employees from a list of all 5,000 workers in order to study
their preferences for the various components of a potential package. The 100 employees who
will participate in this study constitute the _______.
(a) sample
(b) population
(c) statistic
(d) parameter
9. Those methods involving the collection, presentation, and characterization of a set of data in
order to properly describe the various features of that set of data are called:
(a) statistical inference
(b) the scientific method
(c) sampling
(d) descriptive statistics
10. Based on the results of a poll of 500 registered voters, the conclusion that the Republican
candidate for U.S. president will win the upcoming election is an example of:
(a) inferential statistics
(b) descriptive statistics
(c) a parameter
(d) a statistic
11. A summary measure that is computed to describe a characteristic of an entire population is
called:
(a) a parameter
(b) a population
(c) a discrete variable
(d) a statistic
12. You were working on a project to look at the value of the American dollar as compared to
the English pound. You accessed an Internet site where you obtained this information for the
past 50 years. Which method of data collection were you using?
(a) Published sources
(b) Experimentation
(c) Surveying
13. Which of the following is a discrete variable?
(a) The favorite flavor of ice cream of students at your local elementary school
(b) The time it takes for a certain student to walk to your local elementary school
(c) The distance between the home of a certain student and the local elementary school
(d) The number of teachers employed at your local elementary school
14. Which of the following is a continuous variable?
(a) The eye color of children eating at a fast-food chain
(b) The number of employees of a branch of a fast-food chain
(c) The temperature at which a hamburger is cooked at a branch of a fast-food chain
(d) The number of hamburgers sold in a day at a branch of a fast-food chain
15. The number of cars that arrive per hour at a parking lot is an example of:
(a) a categorical variable
(b) a discrete variable
(c) a continuous variable
(d) a statistic
16. The possible responses to the question “How long have you been living at your current
residence?” are values from a continuous variable.
(a) True
(b) False
17. The possible responses to the question “How many times in the past three months have you
visited a museum?” are values from a discrete variable.
(a) True
(b) False
18. An insurance company evaluates many variables about a person before deciding on an
appropriate rate for automobile insurance. The number of accidents a person has had in the past
three years is an example of a _______ variable.
19. An insurance company evaluates many variables about a person before deciding on an
appropriate rate for automobile insurance. The distance a person drives in a day is an example of
a _______ variable.
20. An insurance company evaluates many variables about a person before deciding on an
appropriate rate for automobile insurance. A person’s marital status is an example of a _______
variable.
Answers to Test Yourself Questions
1. a
8. a
15. b
2. d
9. d
16. a
3. b
10. a
17. a
4. c
11. a
18. discrete
5. a
12. a
19. continuous
6. d
13. d
20. categorical
7. b
14. c
GLOSSARY
Alternative hypothesis (H1)—The opposite of the null hypothesis (H0).
Analysis of variance (ANOVA)—A statistical method that tests the significance of different
factors on a variable of interest.
Arithmetic mean—The balance point in a set of data that is calculated by summing the
observed numerical values in a set of data and then dividing by the number of values involved.
Bar chart—A chart containing rectangles (“bars”) in which the length of each bar represents the
count, amount, or percentage of responses of one category.
Binomial distribution—A distribution that finds the probability of a given number of successes
for a given probability of success and sample size.
Box-and-whisker plot—A graphical representation of the five-number summary that consists of
the smallest value, the first quartile (or 25th percentile), the median, the third quartile (or 75th
percentile), and the largest value.
Categorical variable—The values of these variables are selected from an established list of
categories.
Cell—Intersection of a row and a column in a two-way cross-classification table
Chi-square (χ2) distribution—Distribution used to test relationships in twoway crossclassification tables.
Coefficient of correlation—Measures the strength of the linear relationship between two
variables.
Coefficient of determination—Measures the proportion of variation in Y that is explained by
the independent variable X in the regression model.
Collectively exhaustive events—One in a set of events must occur.
Common causes of variation—Represent the inherent variability that exists in the system.
Completely randomized design—An experimental design in which there is only a single factor.
Confidence interval estimate—An estimate of the population parameter given by an interval
with a lower and upper limit.
Continuous numerical variables—Values of these variables are measurements.
Control chart—A tool for distinguishing between the common and special causes of variation.
Critical value—Divides the nonrejection region from the rejection region.
Degrees of freedom—The actual number of values that are free to vary after the mean is known.
Dependent variable—The variable to be predicted in a regression analysis.
Descriptive statistics—The branch of statistics that focuses on collecting, summarizing, and
presenting a set of data.
Discrete numeric variables—The values are counts of things.
Dot scale diagram—A chart in which each response is represented as a point above a number
line that includes the range of all values.
Error sum of squares (SSE)—Consists of variation that is due to factors other than the
relationship between X and Y.
Event—Each possible type of occurrence.
Expected frequency—Frequency expected in a particular cell if the null hypothesis is true.
Expected value—The mean of a probability distribution.
Experiments—A process that uses controlled conditions to study the effect on the variable of
interest of varying the value(s) of another variable or variables.
Explanatory variable—The variable used to predict the dependent or response variable in a
regression analysis.
F distribution—A distribution used for testing the ratio of two variances.
First quartile (Q1)—The value such that 25.0% of the observations are smaller and 75.0% are
larger.
Five-number summary—Consists of smallest value, Q1, median, Q3, and largest value.
Frame—The list of all items in the population from which samples will be selected.
Frequency distribution—A table of grouped numerical data in which the names of each group
are listed in the first column and the percentages of each group of numerical data are listed in the
second column.
Histogram—A special bar chart for grouped numerical data in which the frequencies or
percentages of each group of numerical data are represented as individual bars.
Hypothesis testing—Methods used to make inferences about the hypothesized values of
population parameters using sample statistics.
Independent events—Events in which the occurrence of one event in no way affects the
probability of the second event.
Independent variable—The variable used to predict the dependent or response variable in a
regression analysis.
Inferential statistics—The branch of statistics that analyzes sample data to draw conclusions
about a population.
Level of significance—Probability of committing a type I error.
Mean—The balance point in a set of data that is calculated by summing the observed numerical
values in a set of data and then dividing by the number of values involved.
Mean squares—The variances in an analysis-of-variance table.
Median—The middle value in a set of data that has been ordered from the lowest to highest
value.
Mode—The value in a set of data that appears most frequently.
Mutually exclusive events—events are mutually exclusive if both events cannot occur at the
same time.
Normal distribution—The normal distribution is defined by its mean (μ) and standard deviation
(σ) and is bell-shaped.
Normal probability plot—A graphical device for helping to evaluate whether a set of data
follows a normal distribution.
Null hypothesis—A statement about a parameter equal to a specific value, or the statement that
there is no difference between the parameters for two or more populations.
Numerical variables—The values of these variables involve a counted or measured value.
Observed frequency—Actual tally in a particular cell of a cross-classification table.
p-chart—Used to study a process that involves the proportion of items with a characteristic of
interest.
p-value—The probability of getting a test statistic equal to or more extreme than the result
obtained from the sample data, given that the null hypothesis H0 is true.
Paired samples—Items are matched according to some characteristic and the differences
between the matched values are analyzed.
Parameter—A numerical measure that describes a characteristic of a population.
Pareto diagram—A special type of bar chart in which the count, amount, or percentage of
responses of each category are presented in descending order left to right, along with a
superimposed plotted line that represents a running cumulative percentage.
Percentage distribution—A table of grouped numerical data in which the names of each group
are listed in the first column and the percentages of each group of numerical data are listed in the
second column.
Pie chart—A chart in which wedge-shaped areas (“pie slices”) represent the count, amount, or
percentage of each category and the circle (the “pie”) itself represents the total.
Placebo—A substance that has no medical effect.
Poisson distribution—A distribution to find the probability of the number of occurrences in an
area of opportunity.
Population—All the members of a group about which you want to draw a conclusion.
Power of a statistical test—The probability of rejecting the null hypothesis when it is false and
should be rejected.
Probability—The numeric value representing the chance, likelihood, or possibility a particular
event will occur.
Probability distribution for a discrete random variable—A listing of all possible distinct
outcomes and their probabilities of occurring.
Probability sampling—A sampling process that takes into consideration the chance of
occurrence of each item being selected.
Published sources—Data available in print or in electronic form, including data found on
Internet Web sites.
Range—The difference between the largest and smallest values in a set of data.
Region of rejection—Consists of the values of the test statistic that are unlikely to occur if the
null hypothesis is true.
Regression sum of squares (SSR)—Consists of variation that is due to the relationship between
X and Y.
Residual—The difference between the observed and predicted values of the dependent variable
for a given value of X.
Response variable—The variable to be predicted in a regression analysis.
Sample—The part of the population selected for analysis.
Sampling—The process by which members of a population are selected for a sample.
Sampling distribution—The distribution of a sample statistic (such as the arithmetic mean) for
all possible samples of a given size n.
Sampling error—Variation of the sample statistic from sample to sample.
Sampling with replacement—A sampling method in which each selected item is returned to the
frame from which it was selected so that it has the same probability of being selected again.
Sampling without replacement—A sampling method in which each selected item is not
returned to the frame from which it was selected. Using this technique, an item can be selected
no more than one time.
Scatter plot—A chart that plots the values of two variables for each response. In a scatter plot,
the X-axis (the horizontal axis) always represents units of one variable, and the Y-axis (the
vertical axis) always represents units of the second variable.
Simple linear regression—A statistical technique that uses a single numerical independent
variable X to predict the numerical dependent variable Y.
Simple random sampling—The probability sampling process in which every individual or item
from a population has the same chance of selection as every other individual or item.
Six Sigma management —A method for breaking processes into a series of steps in order to
eliminate defects and produce near perfect results.
Skewness—A skewed distribution is not symmetric. There are extreme values either in the lower
portion of the distribution or in the upper portion of the distribution.
Slope—The change in Y per unit change in X.
Special causes of variation—Represent large fluctuations or patterns in the data that are not
inherent to a process.
Standard deviation—Measure of variation around the mean of a set of data.
Standard error of the estimate—The standard deviation around the line of regression.
Statistic—A numerical measure that describes a characteristic of a sample.
Statistics—The branch of mathematics that consists of methods of processing and analyzing
data to better support rational decision-making processes.
Sum of squares among groups (SSA)—The sum of the squared differences between the sample
mean of each group and the mean of all the values, weighted by the sample size in each group.
Sum of squares total (SST)—Represents the sum of the squared differences between each
individual value and the mean of all the values.
Sum of squares within groups (SSW)—Measures the difference between each value and the
mean of its own group and sums the squares of these differences over all groups.
Summary table—A two-column table in which the names of the categories are listed in the first
column, and the count, amount, or percentage of responses are listed in a second column.
Survey—A process that uses questionnaires or similar means to gather values for the responses
from a set of participants.
Symmetry—Distribution in which each half of a distribution is a mirror image of the other half
of the distribution.
t distribution—A distribution used to estimate the mean of a population and to test hypotheses
about means.
Test statistic—The statistic used to determine whether to reject the null hypothesis.
Third quartile (Q3)—The value such that 75.0% of the observations are smaller and 25.0% are
larger.
Time-series plot—A chart in which each point represents a response at a specific time. In a time
series plot, the X-axis (the horizontal axis) always represents units of time, and the Y-axis (the
vertical axis) always represents units of the numerical responses.
Two-way cross-classification table—A table that presents the count or percentage of joint
responses to two categorical variables (a mutually exclusive pairing, or cross-classifying, of
categories from each variable). The categories of one variable form the rows of the table, and the
categories of the other variable form the columns.
Type I error—Occurs if the null hypothesis H0 is rejected when in fact it is true and should not
be rejected. The probability of a type I error occurring is α.
Type II error—Occurs if the null hypothesis H1 is not rejected when in fact it is false and
should be rejected. The probability of a type II error occurring is β.
Variable—A characteristic of an item or an individual that will be analyzed using statistics.
Variance—The square of the standard deviation.
Variation—The amount of dispersion, or “spread,” in the data.
Y intercept—The value of Y when X = 0.
Z score—The difference between the value and the mean, divided by the standard deviation.