Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
How do you know what you know? How do you know what you know? 1) Maybe you can measure something directly. 2) You can interpret what you have measured within the context of some physical or statistical model. 3) You can accept the authority of some published work or the authority of some person whom you trust. Just because you read it in a Wikipedia article, is it true? Natural science is based on measurable data interpreted within the context of certain assumptions (e.g. the Earth revolves about the Sun). There can be problems with data, however. As Richard Feynman said, “Any theory that fits all the data must be wrong, because some of the data is probably wrong.” How can data be wrong? 1) bonehead errors; 2) calibration errors; 3) small number statistics; 4) fraud. Examples: faster-than-light neutrinos, incorrect periods for light curves of variable stars, cold fusion Modern statistics was founded by the German mathematician and astronomer Carl Friedrich Gauss (1777-1855). Prior to his work scientists often hand picked the “best” data points to derive the “most accurate” results. Gauss showed that the most robust and fair minded conclusions can be obtained from the use of the entire dataset. Example of a Gaussian distribution (bell shaped curve) In the previous example, the mean value is 15.4 and the “standard deviation” of the distribution is +/- 4.4. 68.3 percent of the data points are within one standard deviation of the mean value for a Gaussian distribution. Data points that are more than 3 standard deviations from the mean are usually considered “outliers” because they would occur less than 0.5 percent of the time. It is possible that a dataset could be characterized by more than one overlapping distribution, each of which has a different mean and/or standard deviation. Example: test scores with one peak at 60% (people who didn’t study) and another peak at 85% (people who did study). Not all frequencies of events can be described by a bell-shaped curve. For example: 1)The net worth of a people vs. how many people have that net worth (Pareto’s law of income distribution) 2) The frequencies of word use in a given book, or in a given language (Zipf’s law) 3) The number of authors (N) publishing n paper over the course of a lifetime (Lotka’s law) Scientific notation and significant figures A number such as 6378 can be represented as 6.378 X 103. Similarly, 0.0005193 is the same as 5.193 X 10-4. In the first case, because the number is greater than 1, the exponent is positive. In the second case, because the number is between 0 and 1, the exponent is negative. A number represented in scientific notation is of the form n.nnnn X 10A 7.553 X 102 is the same as 755.3. The exponent of “2” means that you move the decimal point two places to the right to write the number in regular notation. Similarly, 1.234 X 10-2 = 0.01234. The decimal point of “1.234” is moved two places to the left because the exponent is negative. For a number in scientific notation represented as n.nnnn X 10A the digits “n” aren't necessarily the same number, but the fact that there are 5 of them means that a number such as 2.9979 X 108 has 5 significant figures. A number such as 3.00 X 108 has only 3 significant figures, and therefore has less precision. When multiplying or dividing two numbers, the accuracy of the result depends on the accuracy of the two numbers. The result cannot have more significant digits than the less accurate number. For example, 6.378 X 103 times 1.123 X 10-3 is best represented as 7.162 rather than as 7.162494. Similarly, 6.378169 times 1.1 is best represented as 7.0 instead of what your calculator might tell you. Numbers such as = 3.14159265358979.... have essentially an infinite number of significant figures because they are mathematical constants. 6.7 / will have a different number of significant figures than 6.70000 / because the numerator in the first case has less precision than in the second case. Imagine a political poll conducted in 2016 involved 1000 registered voters. Say 42 percent preferred Donald Trump for President, while 46 percent preferred Hillary Clinton, and there were 12 percent undecided. You might be told that the uncertainty of these numbers is +/- 3 percentage points. Where does the +/- 3 % come from? It turns out that it's simply 1/sqrt(1000), or the square root of the reciprocal of the number of voters asked for a preference. Now say an astronomer is using a telescope to measure the brightness of a distant quasar. If 1000 photons attributable to the quasar are counted, how accurately can you say you have measured the brightness of the quasar? It turns out that it's the same relative error as in the political poll: 1/sqrt(1000) ~ 3 percent. In order to measure the brightness of the quasar to +/- 1 percent you need to measure 10,000 photons, so your integration time has to be 10 times as long, or you need to use a telescope with a collecting area 10 times as big. If we took an opinion poll consisting of fair minded questions and we wanted the results to be good to +/- 1 percent, how many people would we have to poll in our survey? A. B. C. D. 100 1000 10,000 100,000 W h a t i s t h e l i g h