Download 5Lesson7.2 cont

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Different Distributions
• Consider the range of the data (the minimum
point to the maximum point).
• If there is no mode, then the distribution is
relatively uniform.
• If the mean, median, and mode are about
equal, then the distribution is roughly
“normal”.
• If the mean and median are not roughly
equal, then the distribution is “skewed”.
Advantages/Disadvantages
of Displays
• Categorical data: pictograph, bar graph, circle graph
• Numerical data (one variable): line plot, histogram,
stem and leaf plot, box and whisker plot
• Two categorical variables: double bar graph
• Two of same numerical variables: side-by-side stem
and leaf plot, box and whisker plot
• Two different numerical variables: line graph,
scatterplot
Scatterplot
• Used to see if two variables are related.
If they are closely related, then we say
that correlation is high--this will look like
a straight line.
Line Graph
• Used to see if there is a trend over
time.
Good place to find graphs
• http://images.google.com/images?um=
1&hl=en&client=safari&rls=en&q=usa+t
oday+graphs&btnG=Search+Images
Normal Distributions
• Advantage: let us know quickly the mean
and the spread of the data.
• Mean is the center of the distribution
• Standard Deviation:
Quick illustration of
standard deviation
•
•
•
•
Here are 2 data sets:
3, 4, 10, 11
3, 4, 5, 8
If we look at them, we see that the second
data set is much closer together than the
first. We also see that the data are closer to
the mean in the second data set. We use the
term standard deviation to describe how
close the data is to the mean.
3, 4, 10, 11: Mean = 7
3, 4, 5, 8: Mean = 5
• We are basically finding the average of the distances
to the mean for each data set. Find the distance from
the mean for each data point. Then square this
distance.
•
•
•
•
•
•
•
(3-7)2 + (4-7)2 + (10-7)2 + (11-7)2
(3-5)2 + (4-5)2 + (5-5)2 + (8-5)2
Add the squares:
42 + 32 + 32 + 42 = 50 22 + 12 + 02 + 32 = 14
Divide by number of data points. 50/4 14/4
Now take the square root: 12.5 3.5
Standard deviation:
3.5
1.9
Example
• Graph a normal distribution with a mean of 5 and a
standard deviation of 1.
• Compare to mean of 6 and standard deviation of 1.
• Compare to mean of 3 and a standard deviation of 1
• Compare to a mean of 5 and a standard deviation of
2.
• Compare to a mean of 5 and a standard deviation of
0.5.
Normal Distribution
0
1
2
3
4
5
6
7
8
9
10
Issues to consider
• Bias: the way the data was collected;
the way the data is displayed to
mislead
• Validity: Answers the question asked.
• Reliability: Get the same answers each
time.
Examples of bias
• Asking a sample of freshmen to learn college
students’ preference of dorms.
• Asking only working mothers about childcare issues.
• Asking only men about marriage issues.
• Using categories/intervals that hide important
information in displays
Examples of poor validity
• Asking questions on the first exam that
focus upon content from 302A to
determine if students learned the
material from Chapter 7.
• Asking questions about political party
affiliation to determine likelihood of
voting.
Examples of poor reliability
• Are you pleased with your life right now?
• Select the best way to mislead in a display; vs. what
is the most important concept in not misleading the
reader with a display?
• On a scale of 1 - 10, how ready are you for a test
next week?