• Study Resource
• Explore

# Download Statistics for CS 312

Survey
Was this document useful for you?
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
```Statistics for CS 312
Descriptive vs. inferential statistics
• Descriptive – used to describe an existing
population
• Inferential – used to draw conclusions of
related populations
Graphical descriptions
•
•
•
Histograms
Frequency polygons/curves
Pie charts
Measures of central tendency
• Mean – average – used most often
• Median – midpoint value – used when
data is skewed
• Mode – most frequently occurring value –
used when interested in what most people
think
Measures of variability
• Range – highest value minus lowest value
• Standard deviation – average of how
distant the individual values are from the
mean
Normal curve
• Bell shaped curve – 68% of values lie within one
standard deviation of the mean
• Non-normal – skewed either negatively (tail to
left) or positively (tail to right)
• Percentiles - values that fall between two
percentile values
• Standard scores – distance from mean in terms
of the standard deviation – z = (X-m) / s.
• Z scores – transformed standard scores – Z =
10z + 50
Variables
• Quantitative – things that can be
measured (age, income, number of
credits)
• Qualitative – things without an inherent
order (college major, address)
Populations and samples
• Population – entire universe from which a
sample is drawn
• Sample – subset of population
• Symbols – mean m, µ; standard deviation
s, σ; variance s2, σ2
How representative is the sample
• Random sample – use random numbers to
choose members of the sample
• Stratified sample – sample that represents
subgroups proportionally
Hypothesis testing
• Hypothesis as to relationship of variables
– similar or different
• Inference from a sample to the entire
population
Statistical significance
• Accept true hypotheses and reject false ones
• Based on probability (10 heads in a row occurs
once in 1024 coin tosses)
• Significant result means a significant departure
from what might be expected from chance alone
• Example – a result two standard deviations from
the mean occurs 2.3% of the time in a normally
distributed population
Null hypothesis
• Assumption that there is no difference
between two variables
• Example – Male and female college
students do similar amounts of music
downloading using BitTorrent.
• Example – School use of computers is
unrelated to income of the students’
families
Levels of significance
• 5 percent level – Event could occur by
chance only 5 times in 100
• 1 percent level – Event could occur by
chance only 1 time in 100
• Significance level should be chosen before
doing experiment
Types of errors
• Type I error – Rejection of a true null
hypothesis
• Type II error – Acceptance of a false null
hypothesis
• Decreasing one type increases the other
One and two tailed tests
• One tailed test – Experimental values will
only fail the null hypothesis in one
direction
• Two tailed test – Values could occur on
either the positive or negative tail of the
curve
Estimation
• Concerns the magnitude of relationships
between variables
• Hypothesis testing asks “is there a
relationship”
• Estimation asks “how large is the
relationship”
• Confidence interval – provides an estimate
of the interval that the mean will be in
Sequence of activities
•
•
•
•
Description
Tests of hypotheses
Estimation
Evaluation
Correlation
• Quantifiable relationship between two
variables
• Example – relationship between age and
type of computer games played
• Example – relationship between family
income and speed of home computer
connection.
Correlation chart
• Two (or more) dimensional table
• Variables on the axes, could be intervals
• Scattergram – positive correlated values
scatter with positive slope, negative with
negative slope
Product-moment coefficient
• Formula based on deviations from means
• If deviations are the same or similar,
values are positively correlated
• If deviations are the opposite, values are
negatively correlated
• Most correlations are somewhere in
between +1 and -1
Perfect positive correlation: r = +1
D
A
B
X
D
C
A
B
Y
C
Perfect negative correlation: r = -1
D
A
B
X
D
C
C
B
Y
A
```
Related documents