Download statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Regression analysis wikipedia , lookup

Corecursion wikipedia , lookup

Randomness wikipedia , lookup

Data analysis wikipedia , lookup

Pattern recognition wikipedia , lookup

Data assimilation wikipedia , lookup

Transcript
The Role of Statistics
Sexual Discrimination Problem
A large company had to downsize and fire 10 employees.
Of these 10 employees, 5 were women. However, only
1/3 of the company’s employees were women. This
discrepancy has led the women who were fired to file a
sexual discrimination lawsuit.
Do they have a legitimate claim?
The Role of Statistics
What are the two possibilities in this case?
 they have a legitimate claim:
the company fired a higher proportion of women on
purpose
 they don’t have a legitimate claim:
this could have occurred by random chance
The Role of Statistics
Which of the two possibilities can we actually assess?
 not the first one
we cannot know what the boss was thinking
 however, we can estimate the probability of getting a
result as surprising as this by random chance
The Role of Statistics
Simulate the firing by using a population of beads to
represent the population of the company
white = women
black = men
Draw 10 beads at random and count the number of
women fired (# of white beads).
The Role of Statistics
Collect class data and estimate the probability of having 5
or more women fired by random chance
(company is telling the truth).
The Role of Statistics
Does this give evidence of discrimination (the women
were fired on purpose)?
NO! Since it is somewhat likely to get 5 or more
women by random chance alone, we do not have
evidence that women were discriminated against.
The Role of Statistics
Summary:
Based on the makeup of the company, we would expect to
have 3 or 4 women fired. However, firing 5 or more
women could have occurred by random chance so we
should not decide the company is guilty.
How many women fired would make you suspicious?
In statistics, it is always possible that we make the wrong
decision. More on this later…
The Role of Statistics
STATISTICS is the science of collecting, analyzing, and
drawing conclusions from data. Statistics is also the art of
distilling meaning from data.
The POPULATION OF INTEREST is the entire
collection of individuals or objects about which
information is desired.
The Role of Statistics
When you study an entire population, it is called a
CENSUS.
A SAMPLE is a subset of the population, selected for
study in some prescribed manner.
The Role of Statistics
DESCRIPTIVE statistics is the branch of statistics that
studies methods for summarizing data.
INFERENTIAL statistics is the branch of statistics which
involves generalizing about a population based on
information from a sample of that population.
Statistical INFERENCE is the process of drawing these
generalizations.
The Role of Statistics
A VARIABLE is any characteristic whose value may
change from one individual to another.
Ex:
DATA results from making observations on one or more
variables. It is important to remember that a set of
information is not data unless it comes in a context.
The Role of Statistics
A DISTRIBUTION shows the values a variable can take
and how often it takes those values.
Ex:
The Role of Statistics
A UNIVARIATE data set consists of observations on a
single variable.
Ex:
A BIVARIATE data set consists of observations of two
variables for each member of the sample.
Ex:
The Role of Statistics
A variable is CATEGORICAL (or qualitative) if the
possible responses fall into categories.
Ex:
A variable is NUMERICAL (or quantitative) if the
possible responses are numerical in nature.
Ex:
The Role of Statistics
Quantitative variables usually include units, which tell
how the variable was measured. For example, if you are
told the weight of an animal is 12, you wouldn’t know
very much until you were informed of the unit
(e.g. tons or milligrams).
The Role of Statistics
Observations of categorical data are usually recorded with
words (e.g. Honda, brown), but can also be recorded with
numbers. Area codes are an example. Living in the 626
area code isn’t necessarily better than living in the 310
area code, even though it is higher numerically. In cases
like these, the numbers are just labels for different
categories.
The Role of Statistics
Many variables can be used as a categorical variable or a
quantitative variable. For example, scores on the STAR
test are recorded numerically, but also placed into
categories such as “proficient” and “basic”.
The Role of Statistics
Numerical data is DISCRETE if the possible values are
isolated points on the number line.
Ex:
Numerical data is CONTINUOUS if the possible values
form an entire interval on the number line.
Ex:
In general, you MEASURE continuous variables and
COUNT discrete variables.
The Role of Statistics
For each of the following variables, determine if they
are categorical or numerical. If it is numerical,
determine if it is continuous or discrete:
 length of a pen
 type of pen
 number of pens in a box
The Role of Statistics
For each of the following variables, determine if they
are categorical or numerical. If it is numerical,
determine if it is continuous or discrete:
 color of pants
 number of pockets
 length of inseam
The Role of Statistics
For each of the following variables, determine if they
are categorical or numerical. If it is numerical,
determine if it is continuous or discrete:
 subject of book
 number of pages
 area of a page