Download The T Distribution

The T Distribution ©Dr. B. C. Paul 2005 Wasn’t the Herby Assembly Line Problem Fun  But there is one little problem   We knew that our mean value could have been all over the map relative to the real true mean We calculated our standard deviation from the same sample  How come our mean could be anything and yet our standard deviation is God’s own value for the standard deviation? It Isn't   When our value for the standard deviation is just an estimate we have another chance for things to be way out in the tails Sadisticians – woops I mean statisticians figured out probability distribution for what would happen then  Called it the T distribution   First published in 1908 perfected in 1926 We look up values for areas under the curve of a T distribution just like we did with a normal distribution. Let’s Redo Herby’s Problem Right This Time  We will use the T distribution X  t s n S is the estimated standard deviation The test statistic has a T distribution (assuming the underyling population Really is normally distributed) The distribution has n-1 degrees of freedom Degrees of Freedom! What are you talking about? – this isn’t an Amnesty International Class  Consider # of equations and # of unknowns   Each sample is like an equation    To uniquely solve 3 unknowns you need 3 independent equations If I have one sample I first use it as an estimate of the mean. I can’t calculate a standard deviation – I don’t have enough data If I have two samples  I can estimate std deviation and still have one degree of freedom to measure something else   Happens to be the mean How much extra data do I have above the bear minimum? So How Do I Use This? (I have a really bad feeling your going to tell me) Note that this table is set up Different from Z values for normal Distribution. Area under the curve comes from The top line. Degrees of Freedom from the side Value in the middle is the T value (equivalent to the Z value) Remember in the normal table The Z value was on the edge And the area under the curve In the middle of the table Lets Do the Problem s   X  t * n X = 3.8 S= 0.73 N= 7 OK – So What Is t? Finding t If we do this as a two tailed test (ie we would be concerned if our Balls were to hard or to soft) we Can only have 2.5% in each tail Pick 97.5 We have 7 samples hence n-1 or 6 degrees of freedom Read into the table 2.45 Plug and Chug 0.73 UpperLimit  3.8  2.45 * 7 4.48 We can still reject the null hypothesis with an Alpha Level of 5% but it is now much closer Than before Some Observations About Degrees of Freedom and the T statistic       95% of a normal distribution is within 1.96 standard deviations of the mean 95% of a T distribution is within 2.45 estimated standard deviations of the mean if the standard deviation estimate came from 7 samples With 20 samples it is 2.09 estimated standard deviation units With 50 samples it is 2.01 With 100 samples it is 1.98 With 500 samples it is 1.96  Note that as the number of samples increases the T distribution converges to a normal distribution So When Do I Use a T Distribution   The underlying population must be realistic to model as having a normal distribution The standard deviation of the population must have been estimated from a standard deviation calculation using a sample of the population   You can get out of using the T distribution and pretend that God gave you the standard deviation if you used about 100 or more samples to calculate your estimate of the standard deviation People with a lot of experience with a distribution often ignore the T distribution completely because they have seen results from hundreds of samples  They are not “doing it wrong” using a simple normal distribution if they have that kind of data supporting their standard deviation value Why Did You Do a Two Tailed Test?  Herby was going Bananas because he thought the line might be putting out soft balls   That sounds to me like he is only concerned about 1 side of the distribution. We may be upset about one particular thing but that doesn’t mean nothing else is important    One problem with things that are too hard is that they are often brittle Premature ball failure could be due to the balls being too soft or breaking up because they are too hard We have to ask our own case specific question about what we are concerned about – You plan a one tailed test only if you are only concerned about events on just one tail Common Cheating on Random Samples  Experiments should be planned before we look at the data  If we look at the data and then decide what the experiment should have been we are “political spin doctors” not scientists    Often we had a theory that made us want to look deeper    A spin doctor looks at a result and then tries to make it say what he wants A scientist sets up the test and lets the truth be what ever it is Many theories are based on observations But the scientific method causes you to then plan an experiment and go out and get the data you need to test the theory It’s a subtle difference but its often ignored  The doctrine of “political correctness” is causing us all to loose our integrity Back to Herby and the Two Tailed Test   If it is true that hard balls make no difference – only soft ones then the test should have been set up as one tailed only If the concern was the line being out of spec and that causing unhappy customers we could not know the sample would come out below 4.5 unless we peaked first    If at that point we decided we only cared about soft balls we distort the reliability of our analysis The data would have not only determined what the values of the test statistics were – it would have determined the test Normal distribution theory only accounts for the data determining the test statistic  We in fact do not have good models for exactly what the consequences are if we let the data set up the test – we can say we are taking a chance of something bad happening My Choice  So why did I do this example as a two tailed test     1- because that sample size analysis I did is nastier to explain if I’m only working on one side 2- Because it sets up a great discussion on random samples and peaking and cherry picking data 3- Because it allowed me to discuss when I should run one and two tailed tests The story problem told is inconclusive about whether Herby was vulnerable to the line being out of spec on one side only or on both sides Look at the Problems We Have Run So Far  We looked at a storm washing out the drainage system in a subdivision  Only too much rain would create the disaster – we really only were worried about too big rain events   We looked at a Mine and the amount of ore below cut-off grade that would go to the dump  We aren’t going to dump our high grade ore – we really only care about how much stuff is on the lower end   (And we ran a one tailed test on the lower side) We looked at tolerance on a machined part  The spec said we had to be plus or minus so our customer would be upset if the pegs were too big or too little   (And we ran a one tailed test on the upper side) (And we ran a two tailed test) Determine whether to run a one or two tailed test based on the concerns for the process or design you are working on – not from peaking at the data.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download The T Distribution