Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter one: The
Nature of Probability
and Statistics
What are statistics?
Statistics is the science of conducting studies to collect, organize, summarize,
analyze and draw conclusions from data.
Always ask the 3 w’s: Who, what and why?
1-1: Descriptive and inferential
statistics
Variable: is a characteristic or attribute that can assume different values.
Data: Are the values (measurements or observations) that the variables can
assume.
Random Variables: Variables whose values are determined by chance.
Data set: Collection of data values
Data value (Datum): each value in the data set.
Two types of statistics
Descriptive statistics: consists of the collection, organization, summarization
and presentation of data.
Inferential statistics: consists of generalizing from samples to populations,
performing estimations and hypotheses tests, determining relationships
among variables, and making predictions.
(Inferential statistics uses probability(the chance of something occurring)
Sample vs Population
Population: consists of all subjects (human or otherwise) that are being
studied.
Sample: is a group of subjects selected from a population.
Hypothesis testing
An area of inferential statistics that is used in decision making process for
evaluating claims about a population based on information obtained from a
process from samples.
1-2 Variables and types of Data
Two types of data: Qualitative vs quantitative
Qualitative variables can be placed into distinct categories, according to some
characteristic or attribute
Quantitative variables: are numerical and can be ordered or ranked.
Two types of quantitative:
Discrete variables: assume values that can be counted
Continuous variables: can assume an infinite number of values between any two specific values.
They are obtained by measuring. They often include fractions and decimals.
Boundaries see handout
Levels of measurement
Nominal
Ordinal
Interval
Ratio
Nominal (sounds like names)
Categorical/qualitative
Consists of a set of categories that has different labels
Often dichotomous (ie. Biological sex or yes of no questions)
Another example: country of residency
ADVANCED: What kind of statistics use nominal scales? Pearson chi-square
(both independent and dependent are measured on a nominal scale like
handedness and dyslexia)
Ordinal: (sounds like order)
Categorical/qualitative
A set of categories organized in an ordered sequence. Ranks; Likert scale
(An order exists but unknown quantitative differences: ie: name your five
closest friends)
ADVANCED: Spearman correlation(Likert items)
Interval (numerical scale with a
meaningful order)
Quantitative difference between numbers within the scale reflect equal
differences in magnitude.
Limitations: no zero (ie: Celsius (zero is not the absence of temp) Calendar
(zero is not absence of time), SAT (lowest score 200) IQ (lowest is 40)
ADVANCED: Pearson Correlation (IQ and SAT scores), (TEMP and SAT)
Ratio(interval + natural zero point)
Can express differences between two values as a ratio (can multiply or divide
values) {with interval you can add and subtract but cannot multiple or
divide} ie: height or weight, # of times out of the country, # of items
recalled on a memory test, reaction time..
ADVANCED: Pearson Correlation ***note many statisticians do not
differentiate between ratio and interval.
1-3 Data Collection and Sampling
techniques
Name and define the four basic sampling methods:
Random: selected by using chance (think of the rectangles)
systematic: Selecting every kth subject (students entering the classroom)
Stratified: Dividing the population into groups (called strata) think of
Freshmen, Sophomore, Junior, Senior.
Cluster: Geographic area of schools in a large school district.
Another popular technique convenience sample.
1-4 Observational and experimental
studies
An observational study: the researcher merely observes what is happening or
what has happened in the past and tries to draw conclusions based on these
observations.
An experimental study: the researcher manipulates one of the variables and
tries to determine how the manipulation influences other variables. [Quasiexperimental study: using intact groups]
WATCH Brown eyes BLUE eyes (https://youtu.be/KpRQ0-ZGNZk)
Variable review
Independent variable also called the explanatory variable is the one being
manipulated
Dependent variable also called the outcome variable is the resultant variable.
(dependent is the one that is studied to see if it has changed significantly due
to the manipulation of the independent variable)
Treatment group and the control group
Confounding variable is one that influences the dependent or outcome variable but
was not separated from the independent variable.
1-5 Uses and misuses of statistics
Recall the 3 w’s
Suspect samples
Ambiguous averages
Changing the subject
Detached statistics
Implied connections
Misleading graphs
Faulty survey questions
Suspect Samples
Too small of a sample
Bad selection of sample
(convenience sampling)
Ambiguous Averages
Measures of central tendency are mean, median, mode and midrange. When
someone says average what are they talking about.
Real estate example
Changing the subject
Different values are used to represent the same data.
Using percentages vs actual numbers for wow factor
Detached Statistics
Advil works 3 times faster.
1/3 fewer calories
Low fat
Implied Connections
“Eating fish may help to reduce your cholesterol”
Studies suggest that using our exercise machine will reduce your weight
Taking calcium will lower blood pressure in some people.
Misleading graphs
Faulty survey questions
Do you feel that the school should build a new football stadium?
Vs Do you favor increasing school taxes for a new athletic field?