Download Statistics for CS 312

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Statistics for CS 312
Descriptive vs. inferential statistics
• Descriptive – used to describe an existing
population
• Inferential – used to draw conclusions of
related populations
Graphical descriptions
•
•
•
Histograms
Frequency polygons/curves
Pie charts
Measures of central tendency
• Mean – average – used most often
• Median – midpoint value – used when
data is skewed
• Mode – most frequently occurring value –
used when interested in what most people
think
Measures of variability
• Range – highest value minus lowest value
• Standard deviation – average of how
distant the individual values are from the
mean
Normal curve
• Bell shaped curve – 68% of values lie within one
standard deviation of the mean
• Non-normal – skewed either negatively (tail to
left) or positively (tail to right)
• Percentiles - values that fall between two
percentile values
• Standard scores – distance from mean in terms
of the standard deviation – z = (X-m) / s.
• Z scores – transformed standard scores – Z =
10z + 50
Variables
• Quantitative – things that can be
measured (age, income, number of
credits)
• Qualitative – things without an inherent
order (college major, address)
Populations and samples
• Population – entire universe from which a
sample is drawn
• Sample – subset of population
• Symbols – mean m, µ; standard deviation
s, σ; variance s2, σ2
How representative is the sample
• Random sample – use random numbers to
choose members of the sample
• Stratified sample – sample that represents
subgroups proportionally
Hypothesis testing
• Hypothesis as to relationship of variables
– similar or different
• Inference from a sample to the entire
population
Statistical significance
• Accept true hypotheses and reject false ones
• Based on probability (10 heads in a row occurs
once in 1024 coin tosses)
• Significant result means a significant departure
from what might be expected from chance alone
• Example – a result two standard deviations from
the mean occurs 2.3% of the time in a normally
distributed population
Null hypothesis
• Assumption that there is no difference
between two variables
• Example – Male and female college
students do similar amounts of music
downloading using BitTorrent.
• Example – School use of computers is
unrelated to income of the students’
families
Levels of significance
• 5 percent level – Event could occur by
chance only 5 times in 100
• 1 percent level – Event could occur by
chance only 1 time in 100
• Significance level should be chosen before
doing experiment
Types of errors
• Type I error – Rejection of a true null
hypothesis
• Type II error – Acceptance of a false null
hypothesis
• Decreasing one type increases the other
One and two tailed tests
• One tailed test – Experimental values will
only fail the null hypothesis in one
direction
• Two tailed test – Values could occur on
either the positive or negative tail of the
curve
Estimation
• Concerns the magnitude of relationships
between variables
• Hypothesis testing asks “is there a
relationship”
• Estimation asks “how large is the
relationship”
• Confidence interval – provides an estimate
of the interval that the mean will be in
Sequence of activities
•
•
•
•
Description
Tests of hypotheses
Estimation
Evaluation
Correlation
• Quantifiable relationship between two
variables
• Example – relationship between age and
type of computer games played
• Example – relationship between family
income and speed of home computer
connection.
Correlation chart
• Two (or more) dimensional table
• Variables on the axes, could be intervals
• Scattergram – positive correlated values
scatter with positive slope, negative with
negative slope
Product-moment coefficient
• Formula based on deviations from means
• If deviations are the same or similar,
values are positively correlated
• If deviations are the opposite, values are
negatively correlated
• Most correlations are somewhere in
between +1 and -1
Perfect positive correlation: r = +1
D
A
B
X
D
C
A
B
Y
C
Perfect negative correlation: r = -1
D
A
B
X
D
C
C
B
Y
A
Related documents