Download Day 1 - Web4students

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistical inference wikipedia , lookup

Student's t-test wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Statistics Workshop – Day 1
Reviewing Some Concepts
Part 1 – Snapshot of the Annenberg Series: “Against all Odds”
This is a great FREE resource! Later on remember to register.
http://mcmath.blogdrive.com/
Scroll down to the section “Free Online Annenberg Video Series on Teaching/Learning Math”
Click on the Annenberg Project Video Series – Statistics
Click on the VoD logo and register
Part 2 – Read and Comment
Literary Digest
For the 1936 presidential election, Literary Digest conducted a poll to determine the winner. Over 10
million questionnaires were sent to those who owned automobiles and/or telephones. Over 2.4 million
questionnaires were returned and Literary Digest predicted that Alf Landon would defeat Franklin D.
Roosevelt with 57% of the vote. George Gallup also conducted a poll of 50,000 random voters and
predicted Roosevelt as the winner. Many people laughed at Gallup because Literary Digest had been
correctly predicting the outcome of the presidential election since 1916 and based its predictions on such
a large sample. Gallup was correct and Roosevelt won with 62% of the vote. Where Literary Digest go
wrong?
In the end, Literary Digest went bankrupt and Gallup started his own company, which still predicts the
elections.
Part 3 – Note: Tutorials are also available in my web page:
http://www.montgomerycollege.edu/faculty/~maronne/
Review some concepts by means of Tutorial (1), then comment on the
following:
- Distinction between:
- Population and sample
- Parameter and statistic
- Descriptive and inferential statistics
- Why sampling?
- Why random selection?
- Importance of simple random samples
- Biased, unbiased sampling techniques
1
Statistics Workshop – Day 1
Selecting Random Samples (Section 1.4)
Simulating Experiments
Describing Data Sets with Tables and Graphs (Sections 2.2, 2.3)
Part 4 - Select 5 students at random from your Statistics class.
4-a) Use the TI-83/84 calculator to generate 5 different random integers from 1 to 28.
The instruction in the home screen of your calculator should read:
randInt(1,28)
(Need help? See calculator section, item 1)
4-b) List the five numbers obtained:
4-c) Check with a classmate. Are his/her numbers the same as yours? Explain.
4-d) Check with the class roster shown on the transparency to name the students selected.
4-e) Comment on the importance of random selection
Part 5 - Use random numbers to simulate rolling a “fair” six-sided die 60 times.
5-a) Use the calculator to simulate rolling a fair die once. Indicate the instruction entered in the
calculator. Press ENTER a few times, observe the outcomes and reflect on what you are doing.
5-b) There is a shortcut to simulate rolling the die 60 times. We are going to Clear a list (L1), generate
60 integers from 1 to 6 and, store the numbers into L1. We’ll access the editor to explore the list and
record the outcomes in a table.
The instruction in the home screen of your calculator should read:
ClrList L1:randInt(1,6,60)→ L1
Note: we use “:” (colon) to separate statements
(Need help? See calculator section, item 2)
......... ..........
5-b) Do you have any suggestions to make the counting process easier?
5-c) Think on a way of graphing the information contained in this table. Show graph above.
Counting is tedious; in the next page you are given instructions to get help from the TI-83 to
determine the “counts”. First we need to review some vocabulary.
2
Statistics Workshop – Day 1
Constructing Frequency Tables and Histograms (Sections 2.2, 2.3)
Part 6 - Review some concepts by means of Tutorial (2), then comment
on the following:
-
Why frequency distributions?
Advantages and disadvantages
If you have to choose a representative of a class, what number is a good choice?
Why class boundaries?
Part 7 - Use the calculator to sketch the histogram of the data stored in L1. Trace the histogram to
read the frequencies of the classes. Display results below.
(Need help? See calculator section, item 3)
Frequency Distribution. Label
Histogram. Label
..................
Part 8 – Reflect in what we have done and comment on the following:
-
Randomness
Equally Likely Outcomes (Chapter 3)
Unpredictability of a single outcome
Long run regularity
Law of large numbers (Chapter 3)
Uniform distribution
Theoretical distribution
Sampling error
Part 9 – Copy data into another list
To keep the generated data available for future use, we’d like to copy it into another list labeled RNDIE.
(Need help? See calculator section, item 4)
3
Statistics Workshop – Day 1
Law of Large Numbers (Chapter 3)
Collecting Class Results
Sketching Histograms using Grouped Data (Sections 2.2, 2.3)
Part 10: Collecting class’ results.
10-a) In part 7, you constructed the frequency distribution for the simulation of the experiment of rolling
a die 60 times. Look up your results and write them on the board. We’ll produce a new frequency
distribution with the results of the class, and graph the corresponding histogram by hand. What do you
expect the shape of this new distribution to be? Refer to the Law of Large numbers in your explanation.
Class Results
Frequency Distribution
Numbers
Show the histogram. Label
Frequencies
4
Statistics Workshop – Day 1
Finding the Mean and Standard Deviation (Section 2.4)
The Standard Deviation as a Ruler (Section 2.4)
Range Rule of Thumb (Section 2.4)
Part 11 - Review some concepts by means of Tutorial (3), then comment
on the following:
- Mean versus median in a skewed distribution
- Resistant measure of the center
- Sum of the deviations, the mean as a “fulcrum”
- Difference between the formulas for standard deviation for a sample and
population
- Usual and unusual values according to the range rule of thumb
- Empirical rule
- Chebyshev’s theorem
Part 12: Mean and standard deviation of grouped data
12-a) Find the mean and standard deviation of the distribution of class’ results shown on the previous
page.
The instruction in the home screen of your calculator should read:
mean(L1, L2)
(or stdDev(L1, L2))
(Need help? See calculator section item 5)
mean (L1,L2) =
standard deviation (L1, L2) =
12-b) Think about our experiment of rolling a die and recording the outcome. Is it unusual to roll a 6?
Are any of the outcomes unusual? Use your intuition to answer.
12-c) Use the values of the mean and standard deviation to label the scale given below. Use the range
rule of thumb to comment on usual and unusual values. Do the results agree with your answer to part 12b?
___|__________|__________|__________|__________|__________|__________|
x -3s
x -2s
x -s
x
x +s
x +2s
x +3s
5
Statistics Workshop – Day 1
Probability Distributions and Histograms (Chapter 4)
Mean and Standard Deviation of Probability Distributions (Chapter 4)
Part 13 - Review some concepts by means of Tutorial (4), then comment
on the following:
- Similarities/differences between relative frequency distributions/histograms and
probability distributions/histograms
- Is the random variable in our experiment of rolling a die and recording the
outcome, discrete or continuous?
- Correspondence between areas and probabilities
- Requirements for a probability distributions compared to what you learned about
relative frequencies
- Formula for finding the mean of a probability distribution compared to the one
used for finding the mean of a frequency distribution
- Range rule of thumb for determining unusual results compared to the probability
rule
- How to use the calculator to find the mean and standard deviation of probability
distributions
6
Statistics Workshop – Day 1
Relative Frequency Distributions and Histograms (Sections 2.2, 2.3)
Probability Distributions and Histograms (Section 4.2)
Mean and Standard Deviation of Probability Distributions (Section 4.2)
Part 14: Use the class results from part 10.
14-a) Construct a relative frequency distribution and a relative frequency histogram.
14-b) Construct a probability distribution and probability histogram.
14-c) Construct the theoretical probability distribution and histogram for the experiment of rolling a die
and recording the outcome.
14-d) Find the mean and standard deviations for the distributions of parts (b) and (c).
(Need help? See calculator section items 5 & 6)
(a) Relative Frequency
Distribution from Class Results
(b) Probability Distribution
from Class’ Results
Numbers
Obtained
Random
Variable, x
Relative
Frequencies*100
(%)
Relative Frequency Histogram.
Label
Probability,
P(x)
Probability Histogram
Label
Mean =
St. Deviation =
(c) Theoretical Probability
Distribution
Random
Variable, x
Probability,
P(x)
Probability Histogram
Label
Mean =
St. Deviation =
7
Statistics Workshop – Day 1
Distribution of Sample Means (Section 5.5)
Central Limit Theorem (Section 5.5)
Part 15 - Consider the theoretical uniform distribution of the experiment of rolling a die and recording
the outcome. The mean and standard deviation of this population was obtained in part 14-c. The
parameters for that population are: μ =
σ=
15-a) Think of the list RNDIE that you have in your calculator, as a sample that was selected at random
from this population. Find the mean of RNDIE and write your result here: x =
15-b) Since each of us has a RNDIE list, we can say that we have selected 28 samples of size 60 from
this theoretical population.
We are going to enter each of the 28 sample means in the overhead calculator. Let’s use list 6. We have
created a new distribution; which is the distribution of sample means for samples of size 60.
Before doing that, just think about this new distribution of sample means for samples of size 60:
Comment on the shape, the mean and the standard deviation. How do you think they compare to the
shape, mean and standard deviation of the original uniform distribution?
15-c) Let’s sketch a histogram for the distribution of sample means for samples of size 60. Observe its
shape, center and variability. Is it what you predicted?
15-d) Let’s find the mean of the distribution of sample means; which is stored in the list 6 of the
overhead calculator.  x =
How does it compare to μ?
Part 16 - Review some concepts by means of Tutorial (5), then
comment on the following:
- Distribution of sample means
- Mean and standard deviation of the distribution
- Central Limit Theorem
8
Statistics Workshop – Day 1
Distribution of Sample Means, Small Sample Size (Section 5.5)
Central Limit Theorem (Section 5.5)
Part 17 – If time permits, simulate rolling a die 10 times, store the numbers into L2, and find the
mean of L2.
Collect the class’ results to generate the distribution of sample means for samples of size 10.
Sketch the histogram and observe its shape. Find the mean and standard deviation of this distribution.
Compare your results with what is predicted by the Central Limit Theorem.
9
Statistics Workshop – Day 1
Importance of the Class Width (Section 2.3)
Part 18: Histograms
Observe how the selection of the class width “changes” the “story” portrayed by the graph. Decide what
class width provides the best picture of the data.
Access my web page: http://montgomerycollege.edu/~maronne/
Click on Statistics Workshop
Click on Applets
Click on Histogram
18-1) Assume the data represents grades of students in a test.
a) What is a convenient number to use as the class width?
b) What class width is convenient to use if we want to know if there are
any students who scored above 95%?
18-2) Assume the data represent the number of cars that go through a busy intersection from 4 am
until 10 am. To avoid entering new data that fits this situation, and to be able to use the given
histogram, we’ll have to make the assumption that 40 = 4 a.m., 50 = 5 a.m., etc.
A class width of 10 will mean 1-hour intervals.
A class width of 5 will mean .....-minute intervals
A class width of 2.5 will mean .........-minute intervals
A class width of 1 will mean ............-minute intervals
Change the class width from 10 to 5, to 2.5, to 1. You can use the slider, but it’s more exact if you just
type the number and press enter.
18-2-a) Give the time interval in which the most cars go through the intersection if you use
i) A class width of 10:
ii) A class width of 5:
iii) A class width of 2.5:
iv) A class width of 1:
18-2-b) You cross the intersection sometime after 5:30 a.m. What is the most convenient time interval to
go through the intersection?
18-2-c) What if you are in that area sometime between 6 and 7 a.m.?
18-2-d) What is the best choice of class width if we want to pinpoint the rush hour and avoid the time
when the most cars go through the intersection?
10
Statistics Workshop – Day 1
Mean versus Median (Section 2.4)
Part 19: Mean and Median
19-1) Access my web page: http://montgomerycollege.edu/~maronne/
Click on Statistics Workshop
Click on Applets
Click on Mean and Median 1
Click on Mean and Median
Read Instructions and play with it
Make sure you drag a point along the line
19-1-a) Observe what happens. Explain.
19-1-b) Which of the two measures of the center is said to be resistant? Explain the meaning of this
term.
19-2) Exploring Mean and Median
Objective: To stress the concept: The median is a resistant measure of the center while the mean is
affected by extreme values.
Access the Applets window in my web page
Click on Mean and Median 2
Click on Mean versus Median
Read instructions and play with it
19-2-a) Salaries of U.S. households are skewed to the .................... If you were reporting results about
this population, what measure of the center would you use? Explain.
19-2-b) Dates of coins
Suppose you and your friends emptied your pockets of coins and recorded the year marked on each coin.
What do you think the shape of the distribution looks like? Explain. What measure of the center is more
appropriate to use, the mean or the median? Explain why.
Part 20 - Review some concepts by means of Tutorial (6).
11
Statistics Workshop – Day 1
Box Plots (Section 2.7)
Part 21 – Using Box Plots to Explore Data
21-1) The data show the average amount of money spent per student in public elementary schools
for each of the 50 states and the District of Columbia. The categories are region: S = south, W = west,
NE = northeast, MW = mid-west. Source: National Center for Education Statistics
We’ll explore the box plot(s) for the Amount (in dollars) Spent per Student in Public Elementary Schools by
States. First we’ll look into the Graph for All Data, then, we’ll explore the Graphs by Category.
Access my web page: http://montgomerycollege.edu/~maronne/
Click on Statistics Workshop
Click on Applets
Click on Box Plots
On the “Select a Data set” drop down menu,
Select “Amount Spent Per Student”
Scroll down and click on
“Graph by Category”, and then on “Graph all Data”
Observe the changes
Scroll down and in the window at the bottom, you can actually see the individual values
21-1-a) Name the variables in this data set.
21-1-b) How would you describe the data: as qualitative or quantitative?
21-1-c) Overall, how many observations are there?
21-1-d) How many observations per region?
21-1-e) Select Graph All Data
- Write down the 5-number summary
Category
N
min
Q1
Median
Q3
Max.
- How would do you describe the distribution’s shape?
Symmetric, skewed to the right, skewed to the left?
12
Statistics Workshop – Day 1
Box Plots
21-1-f) Select Graph by Category
- Write down the 5-number summary for each category
Category
N
min
Q1
Median
Q3
Max.
- Use the back of this paper to answer the following questions:
1. Which of the 4 categories is “closer” to a symmetric distribution?
2. Which of the 4 categories has an outlier? (Even though it’s not indicated as one?)
3. Overall, what was the median amount spent per student?
4. How does the median amount spent per student in the NE compare to the other regions?
5. Which region spends the most per student?
6. Which spends the least?
7. How do the middle 50% of the data compare for the different regions?
8. Refer to the box plot of the South to answer the following. Explain your choice.
- True or False? The length of the different portions of the box suggest that in the selected sample, there
are more schools with expenditures above $5109 (Q3), than with expenditures below $5109.
Part 22 – Load some data sets into your calculator
(Need help? See calculator section, item 7)
Here are the data sets that will be loaded into the calculator.
Diet Coke Volume = CDTVL
Diet Coke Weight = CDTWT
Regular Coke Volume = CRGVL
Regular Coke Weight = CRGWT
Diet Pepsi Volume = CDTVL
Diet Pepsi Weight = CDTWT
Regular Pepsi Volume = CRGVL
Regular Pepsi Weight = CRGWT
Head circumferences (cm) of Two-Month-Old Baby-Boys = MHED
Head circumferences (cm) of Two-Month-Old Baby-Girls = FHED
13
Statistics Workshop – Day 1
Describing Data Sets
Part 23 - Exploring the distribution of volumes of Regular Coke Cans
Before constructing any graphs, think about the following:
23-a) What is the volume posted in a regular Coke can?
23-b) Think on selecting a sample of regular Coke cans, recording their volume, and using the calculator
to sketch the histogram. What do you think the histogram will look like? What shape will this
distribution have?
23-c) Load the data set into the editor of the calculator.
(Need help? See calculator section item 8)
23-d) Construct a histogram letting your calculator select the window. Is it what you expected? As usual,
expand in your explanations.
(Need help? See calculator section, item 4, Note 9)
23-e) Show the frequency distribution and histogram here. If you think it necessary, modify the window
values.
23-f) Open a second STAT PLOT to construct the box plot for the data. You may want to use a large Ymax value in the window of the calculator to fit both graphs. Use both graphs to describe some
characteristics of the data set.
(Need help? See calculator section item 9)
14
Statistics Workshop – Day 1
Histograms and Box Plots (Sections 2.3, and 2.7)
Part 24 – Comparing Weights of Diet Coke and Regular Coke by using Box Plots
Before constructing any graphs, think about both box plots.
24-a) Do you think they will have the same length (range)?
Will they have the same minimum and maximum, or one of the plots will be farther to the right of the
other? If so, which will be to the right?
24-b) Construct a box plot for each of the distribution of the weights of regular and diet Coke. Display
both plots in the same window. Is it what you predicted? Compare the graphs and determine whether
there appears to be a significant difference between the two distributions. If so, provide a possible
explanation for the difference.
Use the scale provided below as a guide to sketch the box plots.
Record the 5 number summary and the outliers for each of the distributions.
Also mention the smallest and largest number of the distributions which are not outliers.
_____|_____|_____|_____|_____|_____|_____|_____|_____|_____|_____|_____|_____|___
Part 25 – Use graphs to compare the head circumference of two months old baby boys and girls.
As usual, think on the situation, make some conjectures and then verify with the graph. Comment on the
results.
15
Statistics Workshop – Day 1
Histograms and Box Plots (Sections 2.3, and 2.7)
Part 26 – The Story of Old Faithful
Use the data sets stored in your calculator:
Intervals between eruptions (in minutes), Old Faithful Geyser, Yellowstone National Park = OFINT
Duration of eruptions (in seconds), Old Faithful Geyser = OFDTN
26-a) Use a histogram and a box plot to graph the distribution of time between eruptions which is stored
in the calculator (OFINT). Comment on the shape and other characteristics of the data. Notice the
advantage of showing both graphs. Some characteristics are captured in one of the graphs and hidden on
the other.
Note:
The distribution of intervals between eruptions was close to symmetric before the 1959 earthquake. The
data in our calculator was collected after this earthquake. The mean interval between eruptions has
remained steady at about 65 minutes for the past 100 years, but the earthquake has changed the
distribution of eruption intervals. The range of the distribution has also remained steady, but the
standard deviation of the eruption intervals has also changed. Which of the two distributions of eruption
intervals has a larger standard deviation, the one before the 1959 earthquake or the one after?
26-b) In the same window of the calculator display the box plot and the histogram for the distribution of
the duration of eruptions. (OFDTN). Comment on the results.
16
Statistics Workshop – Day 1
Data
Part 27 – Here is a revision to Part 21
Here are the data sets that will be loaded into the calculator.
AGE = age of students in two Statistics classes
CDTVL = Diet Coke Volume (oz)
CDTWT = Diet Coke Weight (lb)
CRCTY = City fuel consumption (mi/gal)
CRGVL = Regular Coke Volume (oz)
CRGWT = Regular Coke Weight (lb)
CRHWY = Highway Fuel Consumption (mi/gal)
FHED = Head circumferences of Two-Month-Old Baby-Girls (cm)
HMSP = Selling Price of Homes in Dutchess County, NY (thousand dollars)
MHED = Head circumferences of Two-Month-Old Baby-Boys (cm)
OFDTN = Duration of eruptions of Old Faithful Geyser (seconds)
OFHT = Height of eruptions of Old Faithful Geyser (feet)
OFINT = Intervals between eruptions of Old Faithful Geyser (minutes)
PDTVL = Diet Pepsi volume (oz)
PDTWT = Diet Pepsi weight (lb)
PRGVL = Regular Pepsi volume (oz)
PRGWT = Regular Pepsi weight (lb)
QRTRS = Weight of quarters
17