Download handout

“How Many Tanks Are There” Teaching Contemporary Mathematics conference 2011 NC School of Science and Math Philip Rash Introduction We begin with a classic problem, often referred to as the “German Tank Problem.” Suppose you are an Allied Forces commander, and have captured a small handful of German Tank serial numbers. Assuming the Germans numbered their tanks from 1 to (some unknown maximum), how can we estimate the value of ? I normally use this activity to introduce the concept of sampling distributions and estimation in an AP Statistics course (i.e. Chapter 9 in Practice of Statistics, Yates, Moore, and Starnes). Without giving students much more information than that, I give students a set of 7 tank serial numbers and tell them that their job is to estimate the total number of tanks. But more than that, they have to invent their own method (or algorithm or formula) to estimate that maximum – a method that someone could easily apply to any set of tank serial numbers. We also agree on a few simplifying assumptions: that tanks are indeed numbered from 1 to , and each tank is equally likely to have been observed. Possible solutions Following is a collection of methods, most of which I have seen students propose:              (since the mean is in the middle, twice the mean should be the upper end) (similar justification as previous) (since the contains the middle 50% of the data, adding that to the middle should be the upper extreme) (Two standard deviations above the mean usually captures most of a data set.) (Three standard deviations above the mean usually captures most of a data set.) , where is the number of tanks in our sample (this is algebraically equivalent to an idea a student had this year: start with , but since that will likely overestimate the maximum, subtract from that value the amount by which underestimates the minimum, 1) (The justification for this is that the “gap” between the sample maximum and ought to be about as big as the “gap” between the sample minimum and 0. (compute the normal distribution probability of having a z-score less than the z-score of the sample maximum, then divide the sample maximum by this probability) Assessing the methods Next we turn our attention to deciding which method is “best.” But of course first we need to articulate what we mean by “best.” This is the part that is sometimes a leap for students: though we really only have one sample of tank serial numbers, and we really do not know what the real maximum is, we’re going to temporarily assume that we do know the maximum, take a random sample, and see how well each method estimates that known maximum. The thinking is that if we can develop some confidence in a particular method (under conditions in which we know the value that is to be estimated), then that confidence extends to when we honestly do not know the value we’re trying estimate. For example, let’s assume that there are tanks in the German army. We’ll draw a random sample of 7 tanks from this population and compute our estimate for the maximum using our chosen method. We’ll repeat this many times and look at the distribution of those estimates. First we think about the notion of an unbiased estimator – that is, a statistic whose mean is equal to the parameter the statistic is meant to estimate. Following is a table of our candidate methods and their means (based on and 10000 simulations). Estimate Mean 350.3 351.9 305.3 372.4 471.1 350.6 350.7 Error 0.3 1.9 -44.7 22.4 121.1 0.6 0.7 349.7 481.6 458.7 350.5 591.9 340.7 -0.3 131.6 108.7 0.5 241.9 -9.3 Of our candidate methods, several seem to be unbiased: , , and , . We also note that , , and are “nearly” algebraically equivalent (the difference being whether we consider 0 or 1 to be the minimum tank value), so among those 2 methods from this point on we’ll only consider . So now we need to distinguish among our unbiased estimators which is “best.” Among these, the estimator with the least variance would, in a sense, give us the most information about the parameter we’re trying to estimate. In other words, less variance means having a greater probability of seeing less than a given amount of error. Estimate We note that the Mean 350.3 351.9 350.7 Standard Deviation 76.6 116.1 44.2 349.7 350.5 88.1 58.2 method (or nearly equivalently, the method) is both unbiased and has minimum variance. Attached as an appendix are histograms of each of our candidate methods. It’s interesting to note not only the center and spread, but also the shape of each distribution. For a given method, for example, how likely is observing a value more than a certain distance from our assumed maximum? These types of questions lead well into discussions of hypothesis testing. Also attached is a copy of a student handout I often use with this activity. It’s written for students to use JMP (a statistical software program), but could be adapted for other software. Also, the second page of this handout references a website where students can simulate drawing samples from many types of populations. They can see sampling distributions for many different sample statistics, such as mean, median, variance, range, etc. Credits Recognition to two of my NCSSM colleagues: Floyd Bullard authored most of the student handout, and Dan Teague provided valuable resources as well. http://www.guardian.co.uk/world/2006/jul/20/secondworldwar.tvandradio describes the historical problem, noting that allied statisticians estimated the number of tanks at 246 (produced per month from 6/1940 to 9/1942). Post-war records revealed the actual value to be 245.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download handout