Download File

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

History of statistics wikipedia , lookup

Long tail wikipedia , lookup

Statistical inference wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Transcript
MAT206 – Statkey Tutorial – version 2.0
Website: http://lock5stat.com/statkey/
Dot Plot, Histogram, Box Plot, Summary Statistics
Select One Quantitative Variable
Statkey will bring up a dot plot of a default data set. Notice the summary statistics are on the right.
Summary statistics include: sample size, mean, standard deviation, minimum, quartile 1, median (quartile 2),
quartile 3, and maximum.
Page 1
Selecting these tabs will display the data as a histogram or a box plot.
Histogram
Boxplot
Page 2
Categorical data
Click One Categorical Variable
The data for categorical variables can be edited using the procedure on page 4.
Page 3
Changing the Data
Custom data may be entered by clicking Edit Data.
Use the following screen as a model of how to enter the data.
Make sure to pay attention to the “Data has
header row” box. If there is a label for the data,
make sure this is clicked.
Once data is entered, click OK.
Statkey will not save the data. It is a good idea to save any data that will be used into a spreadsheet so it can be
easily copied and pasted into Statkey, thus saving time.
Page 4
Confidence Intervals using Bootstrap
Click the appropriate confidence interval. For example: mean, proportion, difference of proportions, etc.
For now, default data will be used, but the data can be edited using the steps on page 3.
Once the type of confidence interval is selected, a screen appears containing the following three sections:
-
raw data
bootstrap distribution and
individual bootstraps.
This box contains a dot plot of
the raw data along with
sample size, mean, median,
and the standard deviation.
This box will contain the
bootstrap distribution
This box contains
individual bootstrap
samples
Page 5
To create a confidence interval, create a bootstrap distribution by selecting Generate 1000 Samples. For most
statistics classes, 1000 bootstrap samples should be sufficient.
The standard error of the distribution is also displayed. (Statkey labels it st. dev.)
Bootstrap
Distribution
By hovering of an individual dot on the dot plot, the information for the individual bootstrap sample is displayed
in the lower right box including dot plot and a few summary statistics.
Page 6
To create the confidence interval, click Two –Tail.
Statkey will default to 95% confidence interval. (In this case the confidence interval is 33.104 to 70.917.
However, since this is a randomized distribution, your results will vary.) The level of confidence can be changed
by clicking the center box and changing the proportion as shown below.
In this example, the confidence interval becomes 34.063 to 66.438 but your results will vary slightly.
Page 7
Creating Randomization Distribution
Click the randomization distribution that you are using (mean, proportion, etc.)
For now, default data will be used, but the data can be edited using the steps on page 3. The box in the upper
right corner shows a dot plot of the raw (original) data along with sample size, mean, median, and the standard
deviation.
Create a bootstrap distribution by selecting Generate 1000 Samples. For most statistics classes, 1000 bootstrap
samples should be sufficient.
Notice the original data
with corresponding
statistics.
Page 8
The most important thing to remember with a randomized distribution is that the distribution needs to be
centered at the value you have in the null hypothesis. This is easily changed.
For this example, let’s say we want to test if the sale price of New York homes is greater than 280 (thousand)
dollars. (𝐻0 : 𝜇 = 280, 𝐻𝐴 : 𝜇 > 280). We will continue with the randomized distribution from the previous page.
The null hypothesis needs to be changed to 280 by using this button.
The following box appears to change the null hypothesis.
This will reset the plot, so select Generate 1000 Samples. Notice that the null hypothesis is labeled on the x-axis.
Now, this distribution can be used to find any needed p-values.
Page 9
P-values (when using randomization distributions)
To find the p-value from the previous example, select Right Tail (because this is an upper tail test). The selection
of Left Tail, Two Tail, or Right Tail is dependent on the problem and the ultimately the hypothesis that is being
tested.
Statkey will give default values as shown below.
Change this default value by clicking on 559.450 and entering the statistic from the original data (565.633).
Page 10
The p-value is given. Since this is randomized, your results will vary slightly.
P-value
P-values (using the normal distribution)
If the randomization distribution is normal: to calculate a p-value, we just need to find the area in the
appropriate tail(s) beyond the observed statistic.
The tail that you select will depend on the null and alternative hypotheses.
Adjust 𝜇 and 𝜎 as
appropriate. The mean and
standard deviation will
depend on the problem
statement
Page 11
For example, testing the following hypotheses: 𝐻0 : 𝜇 = 170
𝐻𝐴 : 𝜇 > 170
P-value
Enter the mean from the sample. This will be different
depending on your sample data. It will also change
depending on if you’re dealing with means, proportions,
difference of proportions, etc.
Depending on your problem, select Left Tail, Two Tail or Right Tail, as appropriate.
P-values (using the t distribution)
To calculate a p-value, we just need to find the area in the appropriate tail(s) beyond the observed statistic.
First select the degrees of freedom. This will depend on the sample size and
the type of test you are doing.
The tail that you select will depend on the null and alternative hypotheses.
Page 12
For example, testing the following hypotheses: 𝐻0 : 𝜇 = 13
𝐻𝐴 : 𝜇 < 13
Statkey doesn’t have a place to enter the mean and standard deviation so the test statistic must be calculated by
hand. For this problem, the mean is 12.20, the standard deviation is 2.25 and the sample size is 30. Using these
12.20−13
values, the test statistic is 𝑡 = 2.25 = −1.947. This is the value that we will use to find the p-value.
√30
P-value
Enter t test statistic here. This value will change based on your specific problem.
Page 13
Normal Distribution
Finding the area (probability) under the curve
Pick the tail.
Adjust 𝜇 and 𝜎 as
appropriate
The mean and standard deviation will depend on the problem statement.
Page 14
Right tail example - Given that the mean is 580 with a standard deviation of 70, what is the probability of being
greater than 700?
Probability
Adjust the mean and
standard deviation
Change this value to 700
Left tail example – Given that the mean is 29.11 with a standard deviation of 0.93, what is the probability of
being less than 27.580?
Adjust the mean and
standard deviation
Probability
Change this value to 27.580
Page 15
Two tail example – Given that a mean is 100 with a standard deviation of 15, what is the probability of being
between 80 and 110?
Probability
Adjust the mean and
standard deviation
Change these two values
Finding 𝒛∗ (for confidence interval)
Page 16
Select Two Tail
Keep 𝜇 = 0 and 𝜎 = 1
Notice that the default is 0.95
(95% confidence). Click here to
change the confidence level.
𝒛∗
Now the confidence level is 90%
so the value of 𝑧 ∗ changes
𝒛∗
Page 17
Finding 𝒕∗ (for confidence interval)
Enter degrees of freedom.
Select Two Tail
Notice that the default is 0.95
(95% confidence). Click here to
change the confidence level.
𝒕∗
Page 18