Download How do I know what I know?

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
How do you know
what you know?
How do you know what you know?
1) Maybe you can measure something directly.
2) You can interpret what you have measured within the
context of some physical or statistical model.
3) You can accept the authority of some published work
or the authority of some person whom you trust.
Just because you read it in a Wikipedia article, is it
true?
Natural science is based on measurable data interpreted
within the context of certain assumptions (e.g. the Earth
revolves about the Sun).
There can be problems with data, however. As Richard
Feynman said, “Any theory that fits all the data must be
wrong, because some of the data is probably wrong.”
How can data be wrong? 1) bonehead errors; 2) calibration
errors; 3) small number statistics; 4) fraud.
Examples: faster-than-light neutrinos, incorrect periods
for light curves of variable stars, cold fusion
Modern statistics was founded
by the German mathematician
and astronomer Carl Friedrich
Gauss (1777-1855). Prior to
his work scientists often hand
picked the “best” data points
to derive the “most accurate”
results.
Gauss showed that the most robust and fair minded
conclusions can be obtained from the use of the entire
dataset.
Example of a Gaussian distribution (bell shaped curve)
In the previous example, the mean value is 15.4 and
the “standard deviation” of the distribution is +/- 4.4.
68.3 percent of the data points are within one standard
deviation of the mean value for a Gaussian distribution.
Data points that are more than 3 standard deviations from
the mean are usually considered “outliers” because they
would occur less than 0.5 percent of the time.
It is possible that a dataset could be characterized by more
than one overlapping distribution, each of which has a
different mean and/or standard deviation. Example:
test scores with one peak at 60% (people who didn’t study)
and another peak at 85% (people who did study).
Not all frequencies of events can be described by a
bell-shaped curve. For example:
1)The net worth of a people vs. how many people
have that net worth (Pareto’s law of income distribution)
2) The frequencies of word use in a given book, or in a
given language (Zipf’s law)
3) The number of authors (N) publishing n paper over
the course of a lifetime (Lotka’s law)
Scientific notation and significant figures
A number such as 6378 can be represented as
6.378 X 103.
Similarly, 0.0005193 is the same as 5.193 X 10-4.
In the first case, because the number is greater than 1,
the exponent is positive. In the second case, because
the number is between 0 and 1, the exponent is negative.
A number represented in scientific notation is of
the form
n.nnnn X 10A
7.553 X 102 is the same as 755.3. The exponent
of “2” means that you move the decimal point
two places to the right to write the number in
regular notation.
Similarly, 1.234 X 10-2 = 0.01234. The decimal
point of “1.234” is moved two places to the left
because the exponent is negative.
For a number in scientific notation represented as
n.nnnn X 10A
the digits “n” aren't necessarily the same number,
but the fact that there are 5 of them means that
a number such as 2.9979 X 108 has 5 significant
figures. A number such as 3.00 X 108 has only
3 significant figures, and therefore has less precision.
When multiplying or dividing two numbers, the
accuracy of the result depends on the accuracy of
the two numbers. The result cannot have more
significant digits than the less accurate number.
For example, 6.378 X 103 times 1.123 X 10-3 is
best represented as 7.162 rather than as 7.162494.
Similarly, 6.378169 times 1.1 is best represented
as 7.0 instead of what your calculator might tell you.
Numbers such as  = 3.14159265358979.... have
essentially an infinite number of significant figures
because they are mathematical constants.
6.7 /  will have a different number of significant
figures than 6.70000 /  because the numerator
in the first case has less precision than in the second
case.
Imagine a political poll conducted in 2016 involved
1000 registered voters. Say 42 percent preferred Donald
Trump for President, while 46 percent preferred Hillary
Clinton, and there were 12 percent undecided. You might
be told that the uncertainty of these numbers is +/- 3
percentage points.
Where does the +/- 3 % come from? It turns out that
it's simply 1/sqrt(1000), or the square root of the
reciprocal of the number of voters asked for a preference.
Now say an astronomer is using a telescope to measure
the brightness of a distant quasar. If 1000 photons
attributable to the quasar are counted, how accurately
can you say you have measured the brightness of
the quasar?
It turns out that it's the same relative error as in the
political poll: 1/sqrt(1000) ~ 3 percent. In order
to measure the brightness of the quasar to +/- 1
percent you need to measure 10,000 photons, so your
integration time has to be 10 times as long, or you
need to use a telescope with a collecting area 10 times
as big.
If we took an opinion poll consisting of fair minded
questions and we wanted the results to be good to
+/- 1 percent, how many people would we have to
poll in our survey?
A.
B.
C.
D.
100
1000
10,000
100,000
W
h
a
t
i
s
t
h
e
l
i
g
h