Download Notes on the History of Mathematics

Notes on the History of Mathematics Jeremy Martin and Judith Roitman January 14, 2014 1 Length, area, volume Length, area, and volume have been studied from the beginning of recorded history (and probably before) because of the importance of accurate measurement— consider what happens when rivers flood and recede, changing property lines. Every major ancient civilization has been able to calculate the areas of rectangles and triangles, and the volumes of rectangular solids. In ancient Egypt, the Rhind papyrus (an Egyptian mathematical text dated to approximately 1650 BCE)1 had sample problems finding the area of a rounded field, finding the amount of grain in a cylindrical granary (i.e., calculating the volume of a cylinder), calculated the area of an octagon inscribed in a square (not the same as a regular octagon), and so on. By 300 BCE (the Cairo papyrus) the Egyptians knew the Pythagorean theorem. They had an incorrect general formula for the area of convex quadrilaterals, but their formula was correct for special cases (e.g., trapezoids). Mesapotamia also had basic area formulas for triangles and trapezoids (hence rectangles); knew √ the Pythagorean theorem; knew the area of a regular hexagon; approximated π as 3, calculated 2 as 30547/21600 (accurate to 6 decimal places), and so on. Ancient China knew the Pythagorean theorem (there’s a theme here, isn’t there); knew most of the polygonal and polyhedral area and volume formulas; while they used 3 to approximate π they knew it was wrong (and Li Hui in 263 CE calculated π using a 192-sided regular polygon to get π ≈ 3.1416); by the 5th century CE Zu Chongzhi and Zu Gengzhi (father and son) knew the volume of a sphere and approximated π to 7 decimal places. And here is the Chinese diagram for the proof of the Pythagorean theorem, taken from the Zhoubi sianjing (Zhou Shadow Gauge Manual — various parts were written sometime between 1100 BCE and 100 CE): Ancient India was obsessed by the importance of accurately constructing altars according to Vedic instructions; in particular, how do you construct a square altar with the same area as a given 1 although the writer said it was actually a transcription of another document from 200 years before that 1 circular altar? They knew that π is a little bigger than 3; they knew how to construct a square whose area is twice that of a given square; they had many constructions using and several proofs of the Pythagorean theorem (proofs via diagram). By the 7th century CE Brahmagupta knew the area of any quadrilateral inscribed in a circle; knew Heron’s formula; knew the volume of a cone. By the 9th century Sridhara calculated the volume of a frustram of a cone (= what you get when you √ cut off the top) using π ≈ 10. In the 9th century Mahavira had many correct area formulas. By the 12th century Bhaskara approximated π to be about 3.1429; and knew the surface and volume of a sphere as follows: if d = diameter and C = circumference, then S = dC, V = Sd 6 — these are very elegant formulas. Plug in C = 2πr to see that indeed these are the usual formulas. The ancient Greeks were obsessed with geometry — as we will learn later, they saw algebra as, in some sense, a branch of geometry — hence they were very sophisticated regarding measurement. Now for some examples. 1. The octagon in the square This comes from ancient Babylonia. Take a square, divide each side into equal thirds, and connect as the the picture to get a nonregular octagon. What’s its area compared to the area of the square? Each of the small triangles you’re throwing 2 2 out has area 12 s9 . You’re throwing away four of them, so you’re throwing away an area of 2s9 . 2 Hence you’re left with an area of 7s9 , so the octagon has 79 the area of the square. 2. Egyptian calculation of the area of a circle2 Example of a round field of diameter 9 khet. What is its area? Take away 1/9 of the diameter, 1; the remainder is 8. Multiply 8 times 8; it makes 64. Therefore it contains 64 setat of land. Here are some features of this excerpt that are characteristic of ancient Babylonian and Egyptian mathematics: • It’s pretty accurate. The relationship between area A and radius r is A = 64d2 /81 = (256/81)r2 , which is equivalent to approximating π ≈ 256/81 = 3.16049 . . . . This is not bad at all, and would be perfectly fine for any applications the Egyptians used it for. 2 from the Rhind Papyrus. 2 • Mathematics is described verbally instead of symbolically. In modern notation, we can rewrite the relationship between area A and diameter d given in the excerpt as A = (d − d/9)2 = 64 2 d 81 but the Egyptians lacked the notational tools to do this (to be fair, Western mathematics didn’t come up with modern algebraic notation until the 1600s or so).3 • In these cultures, mathematics was concerned with solving applied, practical problems. Rather than talking about the area of a circle, the problem talks about a “round field”. There is little, if any, geometric abstraction in extant Babylonian and Egyptian texts. • We have no idea what a “khet” or a “setat” is, but we can infer it from context; one setat is presumably one square khet. In particular, they had units of measurement. • The Babylonian and Egyptian writings tend not to include explanations (much less formal proofs). There’s more focus on how to solve a problem (by following an algorithm) than why the given solution works. 3. Eratosthenes’ calculation of the size of the earth It wouldn’t be a history of math course without showing how Eratosthenes (who lived in the Greek areas of Egypt, 276 - 194 BCE) calculated the size of the earth. First of all, the ancient Greeks (and they weren’t the only ones) knew perfectly well that the earth was round — looking at ships’ masts as they disappeared over the horizon was one piece of evidence. Lunar eclipses (where the earth is directly between the sun and moon) are further evidence: you can see the earth’s curved shadow falling on the moon. Once you believe that the earth is curved, a sphere is the simplest possibility. They also knew that the sun was very very very very far away from the earth (our modern measurement is about 93 million miles). So Erastosthenes started by assuming the earth was a sphere. He also assumed that, because the sun was so far away, the sun’s rays striking two places on earth that weren’t very far away from each other were nearly parallel. He also knew that the sun shone directly into a particular well at Syene, Egypt (in modern-day Libya) at the summer solstice. He knew that Alexandria was 5000 stadia (a unit of length; the singular is stadium) due north of Syene, and that a staff in Alexandria cast a short shadow when the sun was at its zenith.4 Erastosthenes put all this information together in the following diagram. 3 4 One wonders what elements of modern mathematics notation will become obsolete in a millennium or two. That is, when the sun was as high as it would possibly get that day. 3 Measuring the circumference of the earth Sun’s rays S θ T B Sun’s rays D φ C B: base of staff C: center of earth D: shadowless place S: tip of staff T: tip of shadow earth’s center 1 The diagram, of course, is not at all to scale—the staff is much smaller in real life than in the diagram. Be that as it may, Eratosthenes made the following observations about the diagram. First, the fraction of the earth’s circumference swept out by the arc BD is precisely φ/2π (if we measure φ in radians). We know that the arc measure is 5000 stadia, so this reduces the problem to figuring out the value of φ. Second, while we obviously can’t determine the angle φ directly (without a very powerful drill!), it must be almost equal to θ, because the two red lines (the sun’s rays) are very close to parallel. (This is a direct application of Euclid’s alternate interior angle axiom: if two parallel lines are cut by a third line, then alternate interior angles are equal. This fact is in Euclid’s Elements and was certainly known to Eratosthenes.) Third, we can determine θ from knowing SB and BT , which are easy to measure directly. In modern trigonometric notation, SB SB tan θ = or θ = arctan . BT BT Eratosthenes would not have expressed this equation using the term “arctangent”, but he was able to determine angles from their tangents, at least approximately. He found that θ was 1/50 of a complete angle (i.e., 2π/50 or π/25), which meant that the earth’s circumference was 50 · 5000 = 250, 000 stadia. We don’t know how long Eratosthenes’ stadium was. If he was using the Egyptian stadium of 515.7 ft5 , his estimate would have been remarkably accurate: 24419 miles, compared to our modern value of about 24900 miles. His stadium may have been a bit longer or shorter than this (and there are additional complications – how do you know the well points toward the center of the earth, for example?), but whatever its exact value is, it doesn’t diminish the insightfulness of his method. 5 Close to 3 khet, if you’re wondering. 4 About 500 years later the great Indian mathematician Aryabhata I (476–500 CE) gave an even more accurate estimate of the earth’s circumference: 24,835 miles in our modern units. This remained the best measurement for over a thousand years. 4. The volume of the cone It’s not known who first figured out that the volume of a right circular cone is 13 the volume of the cylinder in which it is inscribed, but this fact was known to (and probably known long before) Archimedes. Its discovery might have come about by pouring liquid from a cone to a cylinder. Or it could have come about by proportions, in the following manner: Suppose we know that the volume of a pyramid with a square base has 31 the volume of the parallelpiped in which it’s embedded. Since there’s a constant ratio between the area of a circle 2 and the area of the square in which it’s inscribed ( πr = π4 ), and since volume is really an infinite 4r2 sum of stacked up areas (we think of this as integral calculus, but it was known to the ancients) there is a constant ratio of π4 between the volume of a pyramid with a square base and the right circular cone which is inscribed in it. So it suffices to establish that the ratio between the volume of a square pyramid and the parallelpiped in which it’s inscribed is 13 . For then V (cone) = V (cylinder) π 4 π 4 · V (pyramid) = · V (parallelpiped) π 4 π 4 · 1 1 = 3 3 So the next step is to sketch how to establish the ratio between the volume of a square pyramid and the parallelpiped in which it’s inscribed geometrically. (The technique we’ll use is associated with Li Hui.) First, we define a yangma. It’s what you get when you take a parallelpiped with a square base, pick the midpoint of one of the top edges, and connect that point to the vertices of the bottom square: I.e., a yangma is a slanted pyramid, so it has the same volume as a pyramid with the same base. The fact that the yangma and the pyramid have the same volume is based on a principle known as Cavalieri’s principle (17th century).6 You can fit three yangmas together to get the parallelpiped. (For an illustration go to http://nrich.maths.org/public/viewer.php?obj id=1408.) So the volume of the yangma is 13 the volume of the parallelpiped. Hence the volume of the square pyramid is 31 the volume of the parallelpiped. 6 Cavalieri’s principle says that you can calculate a volume by adding up the cross-sections. This principle was well known long before Cavalieri, e.g., it was used by both Archimedes and Li Hui, and we will meet it again in the next section. 5 Now let’s prove by a different method that the volume of a right circular cone is 13 the volume of the cylinder in which it’s inscribed. This is a method that Archimedes could have done (but using modern notation). The diagram below is a cross section through the center of a right circular cone inscribed in a cylinder: y x h r r is the radius, h is the height. At an intermediate stage (the red triangle) x is the radius and y is the height. By similar triangles, hr = xy , so x = ry h . The ancients knew that you calculated a volume by stacking up infinitely many small areas so, in modern notation, the volume of the big cone is gotten 2 by stacking up all the πx2 ’s. Since each πx2 = π hr 2 y 2 , this means the volume of the cone is, in 2 2 Rh modern notation, π hr 2 0 y 2 dy = π hr 2 · 13 h3 = 13 πr2 h. Now Archimedes didn’t have calculus, but what he did know was how R hto2 find the area of a 7 parabolic segment. And this was enough to find the value of the integral 0 y dy. (h,h^2) h Rh y 2 dy is the area of the wedge on the right.8 We know that the parabolic segment cut off by the top horizontal line has area 34 · 21 · 2h · h2 = 43 h3 . We know that the area of the rectangle in which the parabolic segment is inscribed is 2h · h2 = 2h3 . We are interested in 21 the difference, i.e., 1 4 3 1 3 r2 1 3 1 3 2 2 (2h − 3 h ) = 3 h . So the volume of the cone is, once again, π h2 · 3 h = 3 πr h. 0 Since the volume of the cylinder is πr2 h, we’re done. Archimedes went on to use the volume of a cone to establish the volume of a sphere, and Li Hui used the volume of a pyramid to establish the volume of a sphere, roughly 500 years apart. Archimedes’ version is the next thing we’ll do. 5. Archimedes’ derivation of the volume of the sphere9 7 We’ll do this in chapter 2. Note that we flipped our x and y variables — that doesn’t matter. 9 Li Hui has a different approach, involving inscribing 8 6 the sphere in a cube, see This derivation uses a little bit of physics. We conflate volume with mass (which you can do if the mass is uniform) and then we balance things on a lever. We start off with two facts: (1) the volume of a right circular cone is 13 the volume of the cylinder in which it’s inscribed, and (2) the law of the lever:10 two objects are balanced to the left and right of the fulcrum of a lever iff Aa = Bb where A is the mass of the object a units to the left of the fulcrum (or balance point) and B is the mass of the object b units to the right of the fulcrum. b a B A The sphere with weight A weighs more than the sphere with weight B, so it is closer to the fulcrum. We also notice that if a slice of a solid is very thin (a lamina), then its area essentially is (modulo a constant factor) its mass. We will assume that the material we’re using has a constant of mass of 1, that is, the area of a lamina is the mass. We are going to hang a sphere and a cone on one string to the left of a fulcrum, and place a cylinder on its side to the right so that one side goes through the fulcrum. The radius of the sphere is r, the radius and height of the cone are 2r, the radius and height of the cylinder are 2r, and the distance of the string from the fulcrum is 2r. 2r 2r 2r r 2r 2r The claim is that this configuration is balanced. Suppose we know that. Then Aa = Bb where A is the combined volume of the sphere and the cone, a = 2r, B is the volume of the cylinder, b = r (because the center of mass is halfway through the cylinder). The volume of the cylinder is π(4r2 )(2r) = 8πr3 . The volume of the cone is 83 πr3 . http://donwagner.dk/SPHERE/SPHERE.html. He ends up using a pyramid, rather than a cone. 10 which Archimedes discovered 7 So, letting V = the volume of the sphere, by the law of the lever, (V + 83 πr3 )(2r) = 8πr3 (r), i.e., V = 4πr3 − 83 πr3 = 34 πr3 . So we are done if we can prove that this configuration is balanced. We look at balancing the lamina we get from cross sections. The cross sections we consider are: (a) a horizontal cross section of the sphere a distance x from the top of the sphere, the circle Sx ; (b) a horizontal cross section of the cone the same distance x from the top of the cone, the circle Cx ; (c) a vertical cross section of the cylinder a distance x to the right of the fulcrum, the circle Dx . D_x S_x C_x In this diagram, x is the length of each red segment. To find the areas of each lamina, we have to find the length of each associated black segment, the radius of each circular lamina. Which we will do shortly. First we explain why this will be sufficient. Note that the distance of the weight of Sx from the fulcrum is 2r. Similarly, the distance of the weight of Cx from the fulcrum is 2r. And the distance of the weight of Dx from the fulcrum is x. Let A(E) denote the area of some figure E. Suppose, for each x, (†) 2r(A(Sx ) + A(Cx )) = x(A(Dx )) Then, by the law of the lever,R each set of laminae balances. Since, by Cavalieri’s principle, R 2r 2r 2r(V (sphere) + V (cone)) = 2r 0 (A(Sx ) + A(Cx ))dx, and rV (cylinder) = 0 xA(Dx ), (†) finishes the proof. First we find the radius of Sx : x-r The radius of Sx is y = r y p √ r2 − (x − r)2 = 2rx − x2 , so A(Sx ) = π(2rx − x2 ).11 Now we turn to Cx : 11 The alert reader will notice we are not considering the case x < r. This is left to the reader. 8 x y By similar triangles, since the height of the cone equals its radius, y = x. So A(Cx ) = πx2 . Finally, A(Dx ) = 4πr2 . We are ready to prove (†): 2r(A(Sx ) + A(Cx )) = 2r(π · 2rx − πx2 + πx2 ) = 4πr2 x and x(A(Dx )) = x · 4πr2 = 4πr2 x So (†) holds and we are done. 9 2 Thales and Pythagoras The great achievement of the Greek mathematicians was developing the idea of proof. As opposed to their Babylonian and Egyptian predecessors who were mostly concerned with how to solve practical problems, the Greeks were interested in why mathematics worked. The mathematicians in ancient Babylon and Egypt were priests and government officials12 and focused on practical administrative problems: measurement of land, division of goods, tax assessment, architecture, etc. The intended audience of mathematical texts was probably other administrators, and so there was no perceived need to explain why the rules worked—just how to use the rules. The excerpt from the Rhind Papyrus is an excellent example of this. By contrast, many Greek mathematicians were independently wealthy and had spare time on their hands to concern themselves with knowledge for its own sake. Furthermore, the Greeks discovered that this abstract knowledge could often be put to practical use: e.g., measuring the height of the Great Pyramid, or estimating the circumference of the earth. Thales (624–547 BCE) is often considered the first Greek mathematician in this tradition. Here are some of the theorems credited to him: • The base angles of an isosceles triangle are equal. (I.e., if ∆ABC is a triangle and AB = AC, then m∠ABC = m∠ACB.) • Any angle inscribed in a semicircle is a right angle (you observed this in Math 409 in problem SA4). • A circle is bisected by any diameter. These are simple geometric results from a modern standpoint, but the important difference between them and earlier geometry was that Thales stated them as abstract observations about lines, circles, angles and triangles, rather than about counting oxen or measuring fields. There is a legend that Thales impressed the Egyptians by determining the height of the Great Pyramid. (The square base could be measured directly, but not the height.) He placed his staff on the ground and measured the lengths of the shadows cast by the pyramid and by his staff. 11111111111111111111 00000000000000000000 00000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 378 Adapted from Burton, p.86 staff 6 342 9 (Presumably these units are in khet; see chapter 1.) In the figure, the number 378 is half the length of a side of the square base — this could be measured directly. So could 342 (the length of the shadow) and 6 and 9 the height of the staff and the length of its shadow). 12 The source for this and much other material in this notes is chapters 2–3 of D. Burton, Burton’s History of Mathematics: An Introduction, 3rd edn., Wm. C. Brown Publishers, 1991. 10 Now, said Thales, we have two similar triangles, so the height h of the pyramid is given by the equation h 6 = 378 + 342 9 which can be solved to give h = 480 khet. Thales was able to realize that an abstract theorem about similar triangles could be applied to easily solve this practical problem. This is one of the first examples of modeling: solving a real-life problem by replacing it by a mathematical problem in order to be able to apply general theorems. Abstract mathematical knowledge can have concrete benefits! Pythagoras (569–475 BCE) was a mystic whose religious beliefs included a strong mathematical component: “All is number” (by which he and his followers, the Pythagoreans, meant “integer”). This meant that they believed any two quantities should be able to be compared by a ratio n : m where n, m are integers. For an example of how useful ratios can be, Pythagoras (or his followers) noticed that if you two take two strings made of the same material, one twice the length of the other, and pluck them, then the longer string will produce a sound an octave lower.13 They noticed that other good-sounding musical intervals arose from pairs of strings whose lengths were in ratios of small integers. Here’s a question that the Pythagoreans might have asked: what pitch is half an octave? That is, we have two strings of lengths L and 2L, and we want to determine the length M of a string that will produce a sound halfway in between. Since relative pitch is controlled by ratios, the lengths M and L must satisfy the equation M 2L = L M √ Clearing denominators gives M 2 = 2L2 , so M = 2L. If you are a Pythagorean, this raises √a √ 14 natural question: what is 2? The Babylonians had found an excellent approximation to 2, but the Pythagoreans were not interested in approximations; they wanted to understand the exact value. And, because they believed quantities should be comparable by means of ratios, they wanted this √ exact value to be what we call a rational number. To their consternation, they discovered that 2 cannot be expressed that way. It is what we call irrational. Here is their famous proof. √ Theorem 1. 2 is an irrational number. That is, it is impossible to express it as a fraction P/Q, where P and Q are integers. Proof. Suppose that that √ 2 is rational. That is, suppose that there exist positive15 integers P, Q such √ 2 = P/Q. (1) If we square both sides of equation (1) and clear denominators, we get 2Q2 = P 2 . (2) One thing to notice from this equation is that P > Q. Another observation is that P has to be even. If it were odd, then P 2 would be odd as well, and equation (2) would say that an even number (namely 2Q2 ) equals an odd number, and that’s impossible. Since P is even, we can write 13 We know today that musical pitch depends on frequency, and that doubling the length will halve the frequency. Accurate to five decimal places: see, e.g., this entry from mathematician John Baez’s blog. √ 15 No one is disputing that 2 > 0, so P and Q have to be the same sign, and we might as well make them both positive instead of both negative. 14 11 it as P = 2R, where R is another positive integer. If we substitute P = 2R into equation (2), we get 2Q2 = P 2 = (2R)2 = 4R2 , which implies that Q2 = 2R2 . (3) Looking at equation (3), we now see in turn that Q > R and that Q has to be even—otherwise we would again have an even number equalling an odd number. So we may write Q = 2S, where S is another positive integer. Substituting Q = 2S into equation (3), we obtain 2R2 = Q2 = (2S)2 = 4S 2 , that is, R2 = 2S 2 . (4) This looks just like equation (2); we see that R > S and that R is even. Have we gotten anywhere, or are we just chasing ourselves around in circles? In fact, this process has to end. Otherwise, we will end up with a sequence of positive integers P, Q, R, S, T, . . . , Z, α, β, . . . , ω, ℵ, . . . that keeps getting smaller and smaller (remember, P > Q, Q > R, and so on). But this can’t happen! This means that eventually, it is impossible to continue the process. The only way to resolve this is to realize √ that our original assumption—namely, √ that we could find positive integers P and Q such that 2 = P/Q—had to be false. Therefore, 2 is irrational. In case you don’t like going down, here’s a proof where you go up: Proof. Putting together (2) and (3) says that P 2 = 2Q2 = 2(2R2 ) = 4R2 . Therefore P is divisible by 4, and in particular P ≥ 4. Putting this together with (4) says that P 2 = 4R2 = 4(2S 2 ) = 8S 2 . Therefore P is divisible by 8, and so P ≥ 8. In the next step of the proof, we’ll be able to deduce that P is at least as big as 16, and then 32, and then 64. . . P turns out to be bigger than every power of 2. But every number is smaller than some power of 2. So P doesn’t exist. There is a widespread legend, almost certainly false, that the Pythagoreans were so upset over the existence of irrational numbers that they killed the discoverer of the theorem or the person who revealed it to the rest of the world. (Pythagoras left no writings himself, but he did acquire a mythical status after his death, with the result is that we have to rely on secondary sources, many of whom basically made up stories about him.) This was one of the first proofs by contradiction. In a proof by contradiction, instead of deriving conclusions directly from your assumptions, you start by assuming that what you are trying to prove is false, and then show that this necessarily leads to something false—for example, that 2 + 2 = 6, or that there exists an infinitely long decreasing sequence of positive integers. Proof by contradiction is a vital tool in mathematics. 12 3 More and mostly Archimedes Archimedes was perhaps the greatest mathematician of the ancient Greek world. Also perhaps the greatest astronomer, physicist, and engineer of the ancient Greek world. Archimedes was from a noble family, and was related to King Hieron. He was born in 287 BCE and died in 212 BCE. According to legend, he was killed by a Roman soldier while working on a geometry problem he had drawn in the sand — in one version of the story, he enrages the soldier by complaining of the soldier’s shadow on his diagram. If any version of this story is true, the Roman soldier would have gotten into big trouble — Archimedes’ expertise would have been invaluable to the Roman army. Capturing him alive would have been a high priority. Much of what we think of as Greek did not happen in Greece. Archimedes lived in Syracuse on the island of Sicily, one of the most wealthy and learned cities of the ancient Greek civilization, visited Egypt as a young man, and studied at Euclid’s academy in Alexandria, Egypt. But Syracuse was lost to Greek influence when the Romans won the Second Punic War, the one during which he died, and now Sicily is one of the poorest parts of Italy, best known for being the home of the Mafia. It is often said that ancient Greek mathematics and science were uninterested in practical matters, and that the nobility considered such things beneath them. Archimedes is a strong counterexample to these stereotypes. Here is a brief list of some of the things Archimedes explored in mathematics: • area • center of gravity • precursors to integral calculus & infinite series • identified π (but didn’t call it such) as the ratio of the circumference of a circle to its diameter; showed that this was the same ratio as the area of a circle to its radius squared. • approximated π: between 3 + 10 71 and 3 + 10 70 • first to be interested in curves traced by moving points, e.g., Archimedean spiral: point moves with constant speed and constant angular velocity; r = a + bθ • area of parabolic segment (quadrature of parabola) = 4/3 area of triangle with same base and height • Inscribe a sphere of radius r in a cylinder with same radius, h = 2r. Deduced from this that the ratio of the volume of the sphere to the volume of the cylinder = ratio of the surface area of the sphere to the surface area of the cylinder = 2/3. And here is a brief list of some of the things Archimedes explored in physics (note the connection with the mathematical topics above — he was perhaps the first mathematical physicist): • parabolic mirror focuses light to a point • statics (= analysis of loads in static equilibrium) 13 • properties of level (used to analyze area and center of gravity) • hydrostatics: law of buoyancy, law of equilibrium of fluids • noticed gravity16 • center of gravity Finally, here is a very brief list of some of the things Archimedes worked on as an engineer (note the connection to physics): • ship designer • improved catapult • mirror as weapon? 17 • Archimedean screw (still used) Now for some mathematics that Archimedes actually did. I. The approximation of π. Archimedes knew that the ratio [circumference : diameter] is constant for all circles.18 Here is a diagram that shows one way he approximated π: s* 1-x s/2 s = old side s* = new side π ≈ 1/2 ns when n = 6, s = 1 s x 1 How Archimedes estimated π 16 Does this seem trivial? Noticing something as universal as gravity takes a special kind of imagination. The reason for the question mark is that it’s not clear whether this was ever used, or even whether he actually came up with it — it’s part of the Archimedean legend. For a terrific discussion of its plausibility both from a historical and scientific point of view, see http://skullsinthestars.com/2010/02/07/mythbusters-were-scooped-by130-years-archimedes-death-ray/ 18 although he didn’t call it π, William Jones was the first to do so, in 1706. 17 14 Here’s what’s behind the diagram: Step 1: Inscribe a regular polygon (all sides the same, all angles the same) with n sides in a circle of radius 1. Call the side sn . (In the diagram, this is called s.) Step 2: Now bisect a side and extend a radius through that point to get a regular polygon with 2n sides. Call the new side s2n . (In the diagram, this is called s∗ ). Step 3: Calculate s2n in terms of sn . Step 4: Now go back to steps 2 and 3, to find s4n in terms of s2n . And so on. Oops — we need a place to start. Step 0: Start with a regular hexagon. Why? Because its side, i.e., s6 , has length 1. q √ 2 Let’s do step 3: By the Pythagorean theorem x = 1 − s4 = 12 4 − s2 . Another application q 2 of the Pythagorean theorem gives s2n = (1 − x)2 + s4 . After algebraic simplification we get q p s2n = 2 − 4 − (sn )2 . Note that 2πr = C. And C ≈ nsn . And r = 1. So π ≈ 12 nsn . In fact, π = limn→∞ 12 nsn . p √ For n = 6, ns = 6, so πq≈ 3. At the next stage, s12 = 2 − 4 − 1 ≈ .517 and π ≈ 3.106. p At the next stage, s24 = 2 − 4 − (s12 )2 ≈ .261 and π ≈ 3.132. At the next stage, s48 = q p 2 − 4 − (s24 )2 ≈ .131 and π ≈ 3.139. At the next stage s96 ≈ .0654 and π ≈ 3.141 — in only 1 . That’s converging fairly fast. 5 stages (and the first was trivial) we’ve calculated π to within 1000 The problem with this method, from Archimedes’ point of view, is that while he could calculate various steps in an infinite process, he couldn’t estimate the error at each step. In particular, this method gave him a lower bound for π, but without an upper bound he had no way to estimate the error.19 In the homework we’ll look at another method that Archimedes used which gave him an upper bound for π as well as a lower one. Knowing that π was squeezed between the two bounds could give him an idea of how good his approximation at any stage was. II. The quadrature of the parabola. First, a quick detour: How would Archimedes have thought of a parabola? He would not have thought of it as the set of all points (x, y) in the plane so y = ax2 + b for some a, b ∈ R. He certainly would have thought of it as the curve you get by cutting a cone with a plane parallel to a side. and that’s what he used. The “quadrature of the parabola” is the following problem: Take a piece of a parabola and find its area. Here’s how Archimedes did it. 19 If he could have calculated the exact error then, of course, he would have had an exact value for π. 15 Q' S Q R P We’re interested in the area of the region bounded between the straight line segment QQ0 and the parabola that has point P on it. Let’s call this region the parabolic segment QP Q0 . We could have chosen any point on the parabola between Q and Q0 but we chose P for a reason — it has the special property that the tangent to the parabola through P is parallel to QQ0 . We’ll take the area of ∆QP Q0 as our first approximation to the area of the parabolic segment. Now let’s consider the parabolic segments P RQ and P SQ0 . Any parabolic segment has points that act like P — the tangent line through the point is parallel to the straight line defining the segment. We pick R and S so the tangent line through R is parallel to QP , and the tangent line through S is parallel to P Q0 . From the definition of the parabola by means of conic sections,20 Archimedes deduced that the area of ∆QP Q0 is 8 times the area of ∆QRP and also 8 times the area of ∆P SQ0 . We’ll write this as ∆QP Q0 = 8∆P RQ == 8∆P SQ0 .21 Thus, the two new triangles you added (∆QRP and ∆P SQ0 ) have, when added together, area of the original triangle QP Q0 . 1 4 the So our second approximation of the area of the original segment is ∆QP Q0 +∆QRP +∆P SQ0 = 1 + 41 ∆QP Q0 . Do this again — you’ll now be adding 4 new triangles, two for each of ∆QRP, ∆P SQ0 . You’ve now added an area of 41 (∆QRP + ∆Q0 SP ) = 14 · 14 ∆QP Q0 = ( 14 )2 ∆QP Q0 . And so on. You end up with the area of the parabolic segment = [1+ 14 +( 41 )2 +( 41 )3 +...]·∆QP Q0 . 1 n So all we need to know is what’s 1 + Σ∞ n=1 ( 4 ) ? You could use geometric series to calculate this, but it has a lovely calculation using only basic geometry. Take a square with area 1 and cut it into 4 congruent squares. Shade the upper left square dark yellow — that’s 14 . Leave the lower right square alone and color the other two squares light yellow. You have a yellow “L.” Now take the lower right square and cut it into 4 congruent squares. Shade its upper left square dark pink — that’s ( 14 )2 . Leave the new lower right square alone and shade the other two squares light pink. You have a pink “L” disjoint from the yellow “L.” Do it again (using blue). And again. And again. 20 21 We’re leaving out a lot of detail here. I.e., we’re conflating our notation for a triangle and its area. 16 Every time you do this you’re working with the lower right square. Each time the square that you shade is 31 the L shape of the stuff you’re not going to work with any more — e.g., the dark yellow square is 31 the area of the yellow region, the dark pink square is 13 the area of the pink region, the dark blue square (which somehow doesn’t show up as dark blue on my screen) is 31 the area of the blue region... The area of the original square (which is 1) is the sum of all these “L” shapes. And each dark square is 13 the area of its “L” shape. 1 1 n So Σ∞ n=1 ( 4 ) = 3 . And the area of the parabolic segment QP Q0 is 4 3 the area of ∆QP Q0 . III. A modern application Rh Let’s use Archimedes’ method to calculate 0 x2 dx. Of course Archimedes wouldn’t have had that notation, but he might have asked himself what happened if you inscribed a parabolic segment in a rectangle and look at what’s left over: (h, h^2) The shaded portion is half of what’s left over, and its area is calculus. The triangle has base 2h and height h2 , so its area is h3 . Hence the parabolic segment has area 34 h3 . 17 Rh 0 x2 dx. Let’s calculate it without The rectangle has area 2h3 . The shaded portion has area Rh semester calculus, is 0 x2 dx. 1 3 2 (2h − 34 h3 ) = h3 (1 − 32 ) = 1 3 3h which, as we know from first IV. A detour Aristaeus, who was roughly contemporary with Archimedes, proved that a conic section is a parabola iff it has the following property: Start with a single point (called the focus) and a line (called the directrix). The parabola is the set of points whose distance from the focus is the same as their distance from the directrix.22 Focus p Directrix Let’s prove half of Aristaeus’ theorem using modern algebraic methods: we’ll prove that if a curve has this property then it’s a parabola. First, it doesn’t matter where the parabola is in the plane or how it’s oriented, so let’s assume the focus is the point (0, 0) and that the directrix is the line y = c. (In the diagram above, c is negative, but thispdoesn’t matter for the algebraic argument.) The distance between a point p (x, y) and the focus is x2 + y 2 . And distance between a point and the directrix is |y − c| = (y − c)2 . p to the directrix iff x2 + y 2 = p The distance from (x, y) to the focus = the distance from (x, y) 1 y 2 − 2cy + c2 iff x2 + y 2 = y 2 − 2cy + c2 iff x2 = c2 − 2cy iff y = 2c (c2 − x2 ). Which is the equation of a parabola. The other direction — start with a parabola and find its focus and directrix — is a little harder. Note that none of this is what Aristaeus did. He proved that the focus/directrix characterization was equivalent to the conic section definition. He didn’t have the algebraic tools we take for granted. 22 This means that all parabolas are similar, see chapter 3. 18 4 Conics Conics consist of circles, ellipses, parabolas, and hyperbolas. They are named “conics” because they arise from slicing cones in certain ways (see below). They are studied using geometric, algebraic, and analytical (e.g., calculus) points of view, thus are a good case study in the inter-relationship among various fields of mathematics. Conics also provide a good case study on how mathematics moves through cultures, since their study began in ancient Greece, moved through Persia and Arabia, into Renaissance Europe and is fundamental in much of modern mathematics. The first definition of conics was due to Menaechmus (380 - 320 BCE).23 He defined conics via planes slicing through cones (hence the name), and he studied tangents, normals, and evolutes of conics.24 Hypatia (370 - 430 CE) studied conics intensively, but her work is now lost. A charismatic figure, she was the intellectual leader of Alexandria, brutally murdered by a Christian mob. In addition to her work on conics, she edited Euclid’s Elements and worked on Diophantine equations.25 Over 600 years after Hypatia, the great Persian mathematician and poet Omar Khayyam (1048 - 1131) translated Appolonius’ work into Arabic, which is how Appolonius’ work was preserved. Arabian mathematics made its way into Europe, and eventually Descartes (1596 - 1650) found the algebraic expressions for conics that most of us first met in high school. Conics can most easily be described by slices through double cones: A plane slicing parallel to the inclination of the double cone gives a parabola. A plane slicing which is not parallel but still intersects only one of the cones gives an ellipse (of which a circle is a special case). A plane slicing which intersects both cones gives a hyperbola. There are many other ways to construct conics. For example if you attach a strong to a piece of paper at each end, hold it taut with a pencil, and move your pencil around, you get an ellipse. point of pencil end A end B If you Google “ellipse construction” you’ll find a bewildering array of methods. One worth talking about is the trammel of Archimedes: http://en.wikipedia.org/wiki/Ellipse, which we’ll do on Sketchpad. For simple constructions of parabolas and hyperbolas see: mathdemos.org/mathdemos/conic via locus/. 23 His work was motivated by the problem of duplicating a cube, and he solved it via the intersection of two parabolas, e.g., by solving a cubic equation. 24 An evolute of a curve is the locus, or path, of the centers of curvature as a point moves around the curve. 25 Diophantine equations, due to Diophantus, are algebraic equations with integer (including negative integer) exponents. 19 While geometric constructions differ from conic to conic there is in fact a uniform definition of conics: via focus and directrix. You start with a fixed point F in the plane (the focus) and a line l in the plane (the directrix) where F is not on l. You also fix a positive number e called the eccentricity. Then you consider the curve C = {p : F p = eP l}. If 0 < e < 1, C is a ellipse. The circle is the special case e = 0 and l is infinitely far away from F. If e = 1, C is a parabola. Note that, by this definition, all parabolas are similar.26 If e > 1, C is half a hyperbola; to get the other half, reflect about l. There is also a uniform definition using polar coordinates.27 Consider the function r = where e, x are constant, and e ≥ 0 (e is called the eccentricity). s 1+e cos θ , If e < 1 the curve is an ellipse. The circle is the special case e = 0. If e = 1 the curve is a parabola. This gives us another proof that all parabolas are similar. If e > 1 the curve is a hyperbola. Finally, we have the general Cartesian form for conics:28 every conic is the set of solutions to some equation of the form Ax2 + Bx6 + Cy 2 + Dx + Ey + F = 0 where at least one of A, B, C 6= 0. To determine which conic is which, we define the discriminant to be B 2 − 4AC.29 If B 2 − 4AC < 0, the curve is either an ellipse or a degenerate form, i.e., an equation with no real solutions, hence no graph on R2 . For example, x2 + y 2 + 1 = 0 is a degenerate form. If B 2 − 4AC = 0, the curve is a parabola. If B 2 − 4AC > 0, the curve is a hyperbola. In the late 18th century people started noticing that you could generalize the notion of conics in useful ways in other ares of mathematics. Here are three generalizations, mathematical details not included. Second order partial differential equations (PDE’s) Second order PDE’s (i.e., those which involve first and second partial derivatives) are associated with quadratic forms, i.e. symmetric polynomial equations in several variables where the degree of each term is 2.30 The quadratic forms of interest here have the form Ax2 + Bxy + Cy 2 = 0 (or similar symmetric polynomial equations with perhaps more variables). Quadratic forms allow us to classify differential equations. (For details on this see a differential equations text.) To give examples of this classification, we need to mention the Laplacian operator ∇2 . Instead 2 2 2 of defining it I’ll give the three-variable example: ∇2 u = ∂∂xu2 + ∂∂yu2 + ∂∂zu2 . The Laplace equation ∇2 u = 0 is elliptic. The heat equation The wave equation ∂u 2 ∂t − α∇ u = 0 ∂2u − c2 ∇2 u = ∂t2 is parabolic. 0 is hyperbolic. 26 We used this in chapter 2. This definition is the first aspect in our discussion of conics which was not known in the ancient world. Polar coordinates appeared around the 17th century; I’m not sure when the conic equations first appeared. 28 This needed the development of the Cartesian plane (Descartes wrote it up in 1637) and co-ordinate geometry; again, I’m not sure when the general Cartesian form was noticed or by whom. 29 If this looks familiar from the quadriatic formula – it isn’t. This B is the coefficient of xy, not x; this C is the coefficient of y 2 , not the constant term. 30 Quadratic forms are important in several areas of mathematics; their study traces back at least to the 7th century Indian mathematician Brahmagupta, who studied them from an algebraic point of view. 27 20 The terminology here comes from considering a matrix M related to the general form of a second order differential equation.31 If the det M < 0, the equation is elliptic. If det M = 0, the equation is parabolic. If det M > 0, the equation is hyperbolic. Gaussian curvature In calculus you studied the local geometry — i.e., the geometric attributes that change from point to point — of a curve: tangent lines, curvature, and so on. Similarly, there is a notion of local geometry of a surface. One of the important local properties of a surface is called the Gaussian curvature. If it is positive, the geometry is called elliptic, defined as: no two distinct lines are parallel. If the Gaussian curvature is 0, the geometry is called Euclidian (i.e., all the standard two-dimensional axioms apply). If the Gaussian curvature is negative, the geometry is called hyperbolic: given a line l and a point p on the surface, there are infinitely many lines through p parallel to l. The terminology here is easy to explain. If you rotate an ellipse about an axis you get an elliptic surface (a sphere is the most familiar one). If you rotate a hyperbola about its semi-minor axis (perpendicular to the line connecting the two foci) you get a hyperbolic surface. Discrete probability distributions Discrete probability distributions describe behavior that is not continuous. For example, a coin flip can either be heads or tails, there is nothing in between. It turns out that these can be related to quadratic forms. The binomial distribution (the probability of k successes in n trials) is elliptical. The Poisson distribution (used for analyzing rare events) is parabolic. The negative binomial distribution (e.g., the probability of getting 7 heads before the fourth tail in a sequence of coin tosses) is hyperbolic. The terminology here is not so easy to explain. 31 For details, check http://mathworld.wolfram.com/HyperbolicPartialDifferentialEquation.html. 21 5 Trigonometry In the ancient world, trigonometry was largely motivated by astronomy. Other uses such as surveying, navigation and optics didn’t achieve prominence until the 13th century (Arab mathematicians used trigonometry for surveying and optics) and the 16th century (European mathematicians used trigonometry for surveying and navigation). In the ancient world they wouldn’t have spoken about “the trigonometric functions” because they didn’t have the notion of function, and thought of what we call trigonometry rather differently. For example, here’s how they thought of the sine of an angle: Take a circle of diameter one. Inscribe the angle in the circle. Look at the base of the resulting triangle. Its length is the sine. Let’s sketch a proof of why this gives the same value as our definition of sine. A C α/2 α/2 α O D B In this diagram, the inscribed angle α = ∠ACB, and the ancients’ definition of sin α was AB.32 By a theorem on inscribed angles in a circle (in the geometry notes, this is homework problem EP7),33 ∠BCA = 12 ∠BOA, so ∠DOA also equals α. Since AD ⊥ OD34 , our definition of sin is that sin α = AD . But OA has length 21 , so sin α = OA defined sin α. So the two definitions are the same. AD 1/2 = (1/2)AB 1/2 = AB, which is how the ancients Similarly, tangents were thought of in the context of calculating lengths of shadows. (Yes, this is implicitly a ratio...) It wasn’t until the 16th century that Copernicus’ student Rheticus explicitly defined the trigonometric functions as ratios. Why chords on circles? The general model for the universe in the ancient world was of the stars and planets moving around the earth. The general notion was that the movement of a particular heavenly body was restricted to a particular sphere around the earth. The line segment connecting two points of such a body’s movement through space were chords on a great circle. So knowing the length of such chords was crucial to understanding planetary and stellar motion. Later this notion developed with smaller spheres of motion moving along spherical paths, and then even smaller spheres... the whole thing became unwieldy and collapsed when Copernicus proposed instead that the earth and planets moved around the sun. Copernicus still proposed spherical orbits; Kepler was the one who proposed elliptical orbits. But that’s outside our story. The development of trigonometry was driven by the need for better calculations of (what we 32 We’re conflating the length of a line segment with the name of the line segment. If you aren’t taking Math 409 this semester, EP7 says that if O is the center of a circle, and A, B, C are points on the circle, then ∠ABC = 12 ∠AOC. 34 why? 33 22 call) trigonometric functions, largely because these were needed for other calculations. Many of the theorems about them — the formulas for sin(a+b), sin(a−b), etc. — were driven by such calculation. The real theoretical breakthroughs — trigonometric functions as ratios, their association with the unit circle, the notions of trigonometric functions as functions, their periodicity, etc. — came much later. While several aspects of trigonometry were studied by many Greek mathematicians, including Eudoxus (4th century BC), Euclid (@300 BC), and Archimedes (3rd century BC), it was Hipparchus (2nd century) who was in some sense the inventor of systematic trigonometry. In particular, Hipparchus published 12 books of trigonometric tables (all of which have been lost). Plane trigonometry developed simultaneously with spherical trigonometry (essentially the study of triangles on a sphere — AAA is a congruence axiom on the sphere, and SSS isn’t)35 and this was systematized by Menelaus of Alexandra (@ 100 CE). Ptolemy (@ 150 CE) systematized and greatly developed trigonometry (and much other mathematics) in his Mathematical Synthesis (which Arab mathematicians called the Almageste, i.e., “The Greatest,” and that is the name that caught on). Here is some of the trigonometry published in the Almageste, as we would express it: • sine tables in increments of 1/4◦ • sin2 x + cos2 x = 1 • formulas for sin(a + b), sin(a − b), and for sin( x2 ) B C • law of sines: sinA a = sin b = sin c = 2r where A, B, C are the lengths of sides of a triangle, a is the angle opposite side A etc., and r is the radius of the circle circumscribed around the triangle. Euclid knew a form of the law of cosines (C 2 = A2 + B 2 − 2AB(cos c) where A, B, C, a, b, c as above). None of these people had the modern notion of trigonometric function. Instead, they talked about chords of a circle (recall the definition of sine that this section starts with). Everything was done in the language of chords of a circle. Mathematicians in India learned about Greek trigonometry and took it further. In the 6th century CE they developed all six trigonometric functions, and thought of them as ratios – i.e., they had our modern notion of sin, cos, etc.. In 1150 CE Bhaskhara knew how to calculate the sine of any angle. By the 15th or 16th century they had power series for sine, cosine, and inverse tangent — this was two centuries before Euler developed such series in Europe. Some sources suggest that they used tangent lines of trigonometric functions to predict eclipses — I haven’t seen details of how this was done. China imported Hindu astronomers (who were necessarily also mathematicians) who had a strong influence on Chinese mathematics. The Chinese mathematician I-Hsing (also a Buddhist monk) in 724 CE published tangent tables. Both Hindu and Greek mathematics made it to the Arabian peninsula. Around 860 CE, tangent and cotangent ratios were developed by Arab mathematicians. In the late 9th and early 10th centuries, Al-Battari developed better sine and tangent tables. The great 10th century mathematician Abu-I-Wafar was the first to consider the trigonometric functions as being related to the unit circle 35 AAA is the statement that if two triangles have corresponding angles congruent, then they are congruent; SSS is the statement that if two triangles have corresponding sides congruent then they are congruent. 23 (today we think of this as the coordinates of points on the unit circle: (x, y) = (cos α, sin α), where α is the angle between the x-axis and the vector from the origin to the point). By the 13th century Arab mathematicians had broken free of the identification of trigonometry with astronomy. They knew of all six trigonometric functions, knew many identities, knew how to construct trigonometric tables by interpolations, and used trigonometry for surveying and optics. Greek and Arab mathematics in turn influenced European mathematicians in the 15th century. Regiomontanus wrote the first systematic treatment in Europe of both plane and spherical trigonometry. Rheticus defined the trigonometric functions as ratios. Work on trigonometric tables continued, and by 1700 European trigonometric tables were accurate up to 15 decimal places — this was without modern decimal notation. Such tables were crucial for surveying, navigation, and telling time (in the calendrical sense). In the 16th century Viete united trigonometry with algebra, which was a crucial step, and after that its development exploded. That trigonometric functions could be calculated for any number and were periodic functions was a major realization, and in 1635 Roberval created the first graph of the sine function. By the late 17th and mid 18th centuries the Bernoulli brothers were considering trigonometric functions of complex numbers. In the 18th century Euler considered trigonometric functions in all of their aspects: as ratios, as periodic functions, and as infinite series. From the latter he developed the formula eix = cos x + i sin x, from which he concluded (since sin π = 0 and cos π = −1) that eπi = −1. Fourier (1763 - 1830) realized that every continuous function on a closed interval is equal to an infinite sum of trigonometric functions. In particular, if the interval is [−π, π] then the series has the form a20 + Σ1=n ∞ an cos nx + bn sin nx. With this, trigonometry became firmly embedded in the rest of mathematics; the mathematics that comes out of the study of Fourier series is known as harmonic analysis. Trigonometry theorems from geometry: Ptolemy’s theorem Ptolemy’s theorem is another one of those theorems that need to be included in any history of mathematics course. But where to put it? Since it can be used to derive the law of cosines and the formula for sin(α + β), I’m putting it here. Ptolemy’s theorem says that if a quadrilatral can be inscribed in a circle, then the product of the diagonals is the sum of the products of the opposite sides. I.e., in the following picture, AC · BD = AD · BC + AB · CD.36 B C D A Proof. To prove Ptolemy’s theorem, we need to add a point E so ∠ABE = ∠DBC, as follows: 36 Again we’re conflating names of sides with their lengths. 24 B α A β α γ E C δ β δ D Note that if the center of the circle is on BD then E will be on BD. The proof in that case is much easier, so we leave it to the reader. The angles marked α are the same by construction. The angles marked β are the same because they are inscribed in the circle with the same base, BC (e.g., this is a generalization of EP 7 in the geometry notes). Similarly, the angles marked δ are the same. So, since plane triangles whose angles are congruent are similar, ∆ABE ∼ ∆DBC. Hence AE = DC . I.e., AB · DC = ED · DB. AB DB By the same reason, ∆ABD ∼ ∆EBC, so AD EC = DB , BC i.e., AD · BC = DB · EC. Putting this all together, AB · DC + AD · BC = AE · DB + EC · DB = (AE + EC) · DB = AC · DB as desired. Ptolemy’s theorem immediately gives us the Pythagorean theorem when ABCD is a rectangle: c is the length of each diagonal, and a, b are the lengths of the sides. The law of cosines comes from considering Ptolemy’s theorem applied to a trapezoid, as follows: D C A B Start with an arbitrary triangle (the red triangle) and inscribe it in a circle. Reflect about the perpendicular bisector of one side to get an isosceles trapezoid (i.e., two sides have the same length, in this case AD and BC). If the diagonals have length c, the identical sides have length a, and the other sides have length b and d respectively (we’ll set AB = b and CD = d) then, by Ptolemy’s theorem, a2 + bd = c2 . The claim is that d = b − 2a cos ∠ABC. If that were true, then we’d have a2 + b2 − 2ab cos ∠ABC = c2 which is exactly the law of cosines. So we need to prove that d = b − 2a cos ∠ABC. First we extend the trapezoid into a rectangle. 37 37 The alert reader will notice that the proof needs to be adapted when ∠ABC > 25 π . 2 E D F C B A We denote ∠ABC = γ and note that, by construction, γ = ∠DAB, so by transversals of parallel lines, γ = ∠ADE. Also, ED = 12 (b − d). Since cos γ = cos ∠ADE we have cos γ = 2a cos γ and d = b − 2a cos γ as desired. ED AD = (1/2)(b−d) . a So 12 (b − d) = a cos γ. I.e., b − d = To derive the formula for sin(α + β) (at least where α, β < π2 ) we consider the following picture, where BD is the diagonal of a circle of radius 12 , and α, β are the desired angles: C B β α D A By EP 7 in the geometry notes, ∠BAD = ∠BCD = π2 . Since BD = 1, CD = sin α and AD = sin β; also, BC = cos α and AB = cos β. Meanwhile, by the definition of sin in the beginning of this chapter, AC = sin(α + β). By Ptolemy’s theorem, AC·BD = AB·CD+BC·AD. But BD = 1, so AC = AB·CD+BC·CE. I.e., sin(α + β) = cos β sin α + cos α sin β, as desired. A quick and dirty proof of the law of sines Let’s inscribe a triangle in a circle (the red triangle) and consider one angle of the triangle (α). Drawing a diameter through another vertex of the triangle we can construct (using the theorem about angles inscribed in a circle with the same base) a right triangle with one angle equal to α: α a α If a is the length of the side opposite α and r is the radius of the circle, from the blue triangle 26 a . I.e., sina α = 2r. But this is true for all angles of the triangle, so if α, β, γ are we have sin α = 2r the angles and a, b, c are respectively the opposite sides, we have 2r = which implies the law of sines: a sin α = sin α sin β sin γ = = a b c b sin β = c sin γ . An application of trigonometry to the path of a point moving under a constraint: the versed sine curve Maria Agnesi was a mathematician in the 18th century who published an important book on curves. One of the curves she studied was called the “versed sine curve” or averisera (in Latin — the word is derived from the Latin word for “turn”, vertere). This somehow became avversiera or “witch” so the curve is often known as the witch of Agnesi. It was defined as the locus of a moving point. A still picture of it looks like this: q p The curve will trace the path of p, but before doing that let’s define p carefully. We start with a circle tangent to two parallel lines. For convenience, we assume the lines are horizontal. We take the point at the bottom of the circle (not labeled) and call it s. We draw a line from s through the circle (the straight red line) and call it l; q is where l meets the circle. We extend l until it meets the top horizontal line at point r (also not labeled) and draw thevertical line m through r. The point p is the point on m whose height is the height of q. That’s the set-up. Then we start moving q around the circle, i.e., we move l. As q moves, we sketch the movement of p — that’s the red curve below: q 27 p We need to refer to this diagram, so let’s repeat it on the same page as our calculations: q p Let x be the x-coordinate of p and y the y-coordinate. We want to find x, y as a set of parametric equations: x is the larger dotted green line; y is the thick part of the diameter of the circle. We are given is that the diameter of the circle is 2a. We let t be the angle between the red and blue solid lines; t is our parameter. x is easy: x 2a = tan t, so x = 2a tan t. Finding y is a big trickier. First let z be the short thick green line inside the circle. By the definition of sin, z = 12 · 2a sin 2t. (The 2a is because that’s the diameter of the circle, not 1.). And sin 2t = 2 sin t cos t. So z = 2a sin t cos t. By similar triangles, y z = 2a x. So y= 2az 2a · 2a sin t cos t = = 2a cos2 t x 2a sin t/cost If you eliminate t in the parametric equations, you get y = 8a3 . x2 +4a2 You can check this by substituting the parametric functions for x and y, and using the fact that 1 + tan2 t = sec2 t = cos12 t . 28 6 Curves before functions Mathematicians were studying functions for thousands of years before the notion of “function” was defined — we have already seen conics, what we would call the trigonometric functions, formulas for finding area and volume, and curves such as the versed sine curve which are defined by constrained paths, all of which were studied long before our modern notion of function, and all of which are either functions or closely related to functions. 38 Because algebraic notation did not exist until a few hundred years ago, curves were described geometrically, usually as a constrained path (called a locus).39 For example, a circle (not a function) is described as “the locus of all points equidistant from a given point” (called the center). The idea is that you have a center and a single point p at the desired distance from the center. As p travels around the center, keeping the same distance, it traces out a circle. The versed sine curve (which is a function) is another example of a curve described by a locus. In this chapter we discuss a few more of these curves which arose before functions. Their geometric definitions are complicated, but it’s important to understand that they arose from considerations of very concrete problems. For example here are three major problems of ancient Greece: • Duplicating a cube: given a side of a cube s, can you construct (using only straightedge and compass) a side s∗ of a cube whose volume is exactly twice that of the first cube? • Trisecting an angle: given an angle α, can you construct (using only straightedge and compass) another angle whose measure is α3 ? • Squaring a circle: given a circle, can you construct (using only straightedge and compass) a square with the same area? It’s important to note that if you take away the phrase “using only straight edge and compass” all of these constructions can be done (and we’ll see a few examples). It’s the constraint that makes the following theorem true: Theorem 2. Doubling a cube, trisecting an angle, and squaring a circle are impossible using only straightedge and compass. Here’s a sketch of the proof. √ Proof. For doubling a cube: Given a cube whose sides have length a, if x3 = 2a3 , then x = a 3 2. But (by methods similar to what’s in Stahl’s algebra text on the construction of regular polygons) you can’t construct the cube root of 2 by straightedge and compass. For trisecting an angle: If you could trisect α you could construct sin α3 . So let x = α3 . By the triple angle formula for sin, 4 sin3 x − 3 sin x + sin α = 0. I.e., you’d be solving a cubic. By straightedge and compass. Which you can’t. √ For squaring a circle: Given the radius r you’d need to construct a line of length π. But if you √ can construct a using straightedge and compass, then you can construct a. And transcendental numbers such as π can’t be constructed. 38 It’s important to note that not every function describes a curve and not every curve can be described as a function. 39 Although we have already seen conic curves described in other ways, e.g., by slicing cones. 29 The alert reader will notice that we’ve already used some functions: a simple cubic in the first proof, and trigonometric functions and cubics in the second. But the main purpose of this section to show that much more complicated functions arise naturally. Let’s look at trisecting an angle when you’re allowed more techniques than just straightedge and compass. If you Google “trisecting an angle” you’ll find many techniques for doing this. These generally involve complicated curves, i.e., functions. We will try to look at these curves and their uses the way they were originally described, but will find ourselves naturally falling into algebraic terminology because that’s just how folks think these days. We give two detailed examples of trisecting an angle: using the Archimedean spiral, and using the quadratix of Hippias. These are both curves but, failing the vertical line test, neither of them are functions. As a preliminary step let’s show that, using just straightedge and compass, you can trisect a straight line segment. Suppose you want to trisect AB. Draw a another line m through A. On this other line, mark off some length Ap three times. Now draw parallel lines as in the diagram. m p'' p' p A B By similar triangles, you’ve just trisected AB. How Archimedes trisected an angle Using this, let’s trisect an angle using the Archimedean spiral. Given positive constants a, b, the Archimedean spiral determined by a, b is the locus of points p so that if r is the distance from p to the origin, then r − a is a constant multiple by b of the measure of the angle θ formed by the line Op (where O is the origin) and the x-axis.40 (By using the variables r and θ we’re already anachronistic. But it’s really hard to make sense of the Archimedean spiral without this algebraic notation.) In modern terms, the Archimedean spiral has the following equation in polar coordinates: r = a + bθ, where a, b are constants. That’s a lot easier to make sense of. Try sketching it without a graphing calculator when a = 0 and b = 1; when a = 0 and b = 2 [hint: let θ = 2π, π, π2 , π3 ...]. Now suppose you have an Archimedean spiral and you want to trisect an angle α. Place the angle α so the origin O is the vertex, and the x-axis is one side of the angle. Let p be the point where the spiral intersects the other side of the angle. Trisect Op to get length R3 . Construct a circle centered at O with radius R3 . This circle intersects the Archimedean spiral at a point s. Os is the side of the desired angle α3 . 40 Note that angular measurement and linear measurement are equated here. This might not seem so natural to us, but it would have seemed natural to the ancient Greeks, since they thought of an angle measure as a measurement of the length of a segment of a circle. 30 p q s O 5 5 10 15 In this picture, the dark lines form the original angle α and the blue curve is the spiral r = θ. How do we know that we’ve trisected α? Os = Oq = 13 OP = 13 α. How Hippias trisected an angle Next, let’s trisect an angle using the quadratix of Hippias. What is the quadratix of Hippias? Here’s how Hippias thought of it: Pick a point O, draw a horizontal line through the point, and let α be an angle (in radian measure) whose vertex is O with one side coinciding with the horizontal line; call the other side m. Now let p be the point on the vertical line through O whose distance 41 Draw a horizontal line l through p. The point q where l meets m is a point on the from O is 2α π . quadratix. As α varies, the points q trace out the quadratix. 4 2 -5 5 -2 -4 To see how this curve is constructed, ask yourself what happens when α = π2 , π3 , π4 ... and so on. In modern notation, the equation for the quadratix is x = y cot( πy 2 ). Or, in polar coordinates, 2α r = π sin . α Now let’s return to trisecting an angle. Because of the way p is defined from α in the quadratix, if you trisect Oq to get the point r (so Or = Oq 3 ), and then draw a horizontal line from r meeting the quadratix at point s, Os forms the desired angle. [Proof not given.] 1 q 0.5 r s 1 0.5 Other curves invented by the ancient Greeks include the conchoid of Nicomedes (about 200 BC), which satisfies the polar equation r = a + b sec θ — this curve can be used both to duplicate the cube and trisect an angle — and the cissoid of Diocles (about 180 BC), which satisfies the polar 41 See previous footnote. 31 equation r = 2a sin θ tan θ. And so on — the ancient Greeks discovered many interesting curves via loci. Another locus problem due to Apollonius, and discussed by Pappus in the 3rd century CE, was the following: suppose you have four lines l, k, m, n, and a positive real number r. Find all points p so d(p, l) · d(p, k) = r · d(p, m) · d(p, n). This is known as the four line problem. It is worth mentioning because, when Descartes wrote his Geometry in the 17th century, setting forth much of what we now know as Cartesian geometry (a.k.a. analytic geometry) (which was independently developed by Fermat) he specifically cited this problem. I.e., people worked on this problem for 1900 years and still hadn’t solved it. Once you think of lines having equations of the form y = mx + b, then, because of the formula for the distance from a point to a line,42 this problem reduces to finding all solutions (x, y) to 1| 2| 3| 4| √1 x−y+b √2 x−y+b √3 x−y+b √4 x−y+b the equation |m · |m = r |m · |m . Finding all x, y satisfying such an 2 2 2 2 (m1 ) +1 (m2 ) +1 (m3 ) +1 (m4 ) +1 equation is not a trivial task, but look how the problem has shifted — from geometry to algebra. Note that Decartes’ Geometry was an appendix to his major philosophical work A Discourse on Method. Scholarship wasn’t split up back then. 42 which you learned in calculus 32 7 Functions and relations So far we’ve seen curves largely described as paths or locii: how a point moves according to certain constraints. We also discussed other geometric definitions of hyperbolas, ellipses, and parabolas. In the ancient world people deduced enough information about curves defined in these ways to come very close to being able to describe coordinates on a coordinate system. But they didn’t have coordinate systems. Meanwhile there were instructions that, when translated into symbols, would have looked like functions. E.g., “Given a number, square it and add two.” We would write this as: x2 + 2. For the two approaches to come together, algebra and geometry had to be conflated. I.e., you needed a coordinate system, you needed algebraic language, you needed to be able to express geometric notions (such as distance) algebraically, and you needed to put this all together. This point of view gives the following: a circle in the coordinate plane centered at (a, b) is defined by an equation (x − a)2 + (y − b)2 = R where R > 0. That is, it consists of all points with coordinates (x, y) which satisfy this equation. This is the point of view you first met in 6th or 7th grade. Like many other things you met in elementary or middle school (e.g., the alphabet, or place value notation) the development of these ideas was non-trivial and took a long time. The work of Fermat and Descartes was crucial here. Some of these geometric objects behaved, from an algebraic point of view, very nicely: one variable was completely determined by the other one (or by the other ones in a multivariable situation). I.e., in modern terminology, they are algebraically defined as functions. The algebraic descriptions of objects that don’t satisfy this property, such as circles, ellipses, and hyperbolas, are called relations. These sorts of objects were very useful to physicists, who wanted to quantify phenomena as theoretical relationships between variables. In particular, Galileo thought of physics in terms of what we would now call functions: the value of all but one of the variables predicts the value of the remaining variable. Calculus could not have developed without the notion of function, and much of the early theory of functions (and of physics) came from calculus. The (independent) inventors of calculus were Newton and Liebniz (who first used the term “function” in 1673). And while we’ve been discussing functions in the context of algebraic equations, it’s important to note that infinite series were a xn major way of describing functions right from the beginning. For example, y = Σ∞ n=0 n! describes the function y = ex . In the late 18th century, mathematicians worked hard on the problem of describing the motion of a vibrating string, i.e., fix an elastic string at two point, pull a point in between — where is that point at time t? Major contributors here were dAlembert, Euler, Daniel Bernoulli, and Lagrange. This problem led to the notion of partial differential equations, and its solutions were infinite trigonometric series. Through this history, there was an informal notion of a function as necessarily being very smooth (we would call such functions infinitely differentiable, and there are generalizations for higher dimensions), largely because of the connection with physics. In the 19th century, however, people started thinking of functions that did not have this property. For example, the Dirichlet function (f (x) = 1 if x is rational, 0 otherwise) has a derivative at no point. The notion of function was caught up in the drive to put mathematics on a formally sound foundation. Fairly simple objects were ambiguous without this kind of foundation. For example, 33 what is the infinite sum 1 - 1 + 1 - 1 + 1.... ? You can make a case for 1, for -1, for 0, and even (this was Euler’s preference) 12 . The case for 1: 1 - 1 + 1 - 1 +... = 1 + (-1 + 1) + (-1 + 1) + ... The case for -1: 1 - 1 + 1 - 1 +... = - 1 + 1 - 1 + 1... = -1 + (1 - 1) + (1 - 1) + ... The case for 0: 1 - 1 + 1 - 1 +... = (1 - 1) + (1 - 1) +... k n k The case for 21 : By geometric series, 1 - 1 + 1 - 1 +... = Σ∞ k=0 (−1) = lim Σk=0 (−1) = 0 if n is odd 1 if n is even Rather than throw up his hands and say the sum was undefined, as we do, Euler took the mean 1 ( 0+1 2 ) to get 2 . Why did Euler do this? Because of the human propensity to reify, that is, when we have a nice k formal expression for something (in this case Σ∞ k=0 (−1) ) it’s very difficult to admit that maybe it 43 doesn’t describe anything. Major work on the problem of finding a firm foundation for the notion of convergence and smoothness (some of which you learn in Math 500) was done by Bolzano, Cauchy, Abel, Dirichlet, Weierstrass, Reimann, Cantor, Fourier. The modern notion of function really derives from the work of Cantor via tweaking some of what’s in Bourbaki:44 a function is a set of ordered pairs f so if (x, y) ∈ f and (x, z) ∈ f then y = z. Which means that “function” is a notion that then can detach itself from the notion of continuous curve. In fact it doesn’t need geometry at all. You can have a function that takes one function to another (e.g., the derivative, or the anti-derivative). You can have functions that take one kind of mathematical object to another (e.g., a ring to its additive group). And so on. This general notion of function was revolutionary, and is essential to much of contemporary mathematics. Some definitions of function45 Below are some quotes which give an idea of how mathematicians eventually found their way to our modern notion of function. Isaac Newton, 1713: I call any quantity a genitum which is. . . generated or produced in arithmetic by the multiplication, division, or extraction of the root of any terms whatsoever. . . These quantities I here consider as variable and indetermined, and increasing or decreasing, as it were, by a continual motion or flux. Comment. A function (here called genitum) takes numbers to numbers, is algebraic, and smooth. Multivariable functions are allowed, but the variables take only numerical values. Johann Bernoulli, 1718: I call a function of a variable magnitude a quantity composed in any manner whatsoever from this variable magnitude and from constants. Comment. Not restricted to algebraic definitions — perhaps he was thinking of functions generated by physics where a formula is not known — nor restricted to smooth functions (but Bernoulli 43 You can apply this analysis to a lot of things, from unicorns to beauty. a conglomerate of French mathematicians in the mid 20th century who fairly successfully sought to provide the kind of foundational and comprehensive treatment of mathematics that Euclid managed in an earlier time. 45 The source is the Mathematical Association of America’s CDHistorical Modules for the Teaching and Learning of Mathematics 44 34 didn’t consider any other kinds). Restricted to functions of one variable.46 Leonhard Euler, 1748: A function of a variable quantity is an analytic expression composed in any way whatsoever of the variable quantity and numbers or constant quantitiesIf, therefore, x denotes a variable quantity, then all quantities which depend upon x in any way or are determined by it are called functions of it. Comment. “Analytic expression” is key here. Again, numbers go to numbers, and multi-variable functions aren’t described. Smoothness isn’t mentioned, but Euler didn’t consider other sorts of functions. Leonhard Euler, 1755: If some quantities so depend on other quantities that if the latter are changed the former undergo change, then the former quantities are called functions of the latter. Comment. Here Euler doesn’t mention analytic expression or any other sort of expression. This seems closer to physics. Multi-variable functions are allowed. Joseph-Louis Lagrange, 1797: We define a function of one or more quantities [as] any mathematical expression in which those quantities appear in any manner, linked or not with some other quantities that are regarded as having given and constant values, whereas the quantities of the function may take all possible values. Comment. Numbers go to numbers: “mathematical expression” is key here. Multi-variable functions allowed. Jean Baptiste Joseph Fourier, 1822: In general, the function f (x) represents a succession of values or ordinates each of which is arbitrary. An infinity of values being given to the abscissa x, there is an equal number of ordinates f (x) . . . We do not suppose these ordinates to be subject to a common law; they succeed each other in any manner whatever, and each of them is given as if it were a single quantity. Comment. Fourier is freeing the notion of function from intelligibility — a function is a function whether or not we know how to generate it. He is gluing the notion of function to the x − y-plane. No multi-variable functions. Nikolai Ivanovich Lobachevsky, 1834: General conception demands that a function of x be called a number which is given for each x and which changes gradually together with x. The value of the function could be given by an analytical expression, or by a condition which offers a means for testing all numbers and selecting one of them; or, lastly, the dependence may exist but remain unknown. Comment. Lobachevsky, like Fourier, doesn’t demand that functions come with generating rules, but he’s still talking about numbers turning into numbers. The use of “gradually” implies some kind of continuity. Again, not multi-variable. Karl Weierstrass, 1861: Two variable magnitudes may be related in such a way that to every definite value of one there corresponds a definite value of the other; then the latter is called a function of the former. Comment. Functions are still about numbers. Like Lobachevsky and Fourier, intelligibility isn’t necessary for a function to be a function. Not multi-variable. Hermann Hankel, 1870: y is called a function of x when to every value of the variable quantity x inside of a certain interval there corresponds a definite value of y, no matter whether y depends on x according to the same law in the entire interval or not, or whether the dependence can be 46 No matter how they defined “function,” each of these mathematicians worked with what we would call functions of several variables. But, as with Bernoulli, they might not have considered the latter to be functions. 35 expressed by a mathematical operation or not. Comment. Hankel wants the domain to be a union of intervials. Functions are still about numbers. Not multi-variable. Not requiring “the same law” allows piecewise definitions. Nicolas Bourbaki, 1939: Let E and F be two sets, which may or may not be distinct. A relation between a variable element x of E and a variable element y of F is called a functional relation in y if, for all x an element of E, there exists a unique y an element of F which is in the given relation with x. Comment. Now a function can go from any set to any other set. We aren’t restricted to numbers. Multi-variable functions are included by allowing E to be a set of tuples, e.g., if E is a set of pairs, then the function can be described by f (x, y) = z. The word “relation” comes from formal logic and doesn’t imply intelligibility (i.e., we need not have a formula or any other way to find y given x).47 Which is where we stop, because Bourbaki’s is the modern notion of function, often expressed as: a function is a set of ordered pairs S where if (x, y) ∈ S and (x, z) ∈ S then y = z. 47 Bourbaki’s definition builds on the exposition of set theory by Cantor and, later, Zermelo: their analysis of all of mathematics in terms of sets, and the consequent formal definition of ordered pair. 36 8 Algebra For thousands of years there was sophisticated mathematics centered around what we would call polynomials, even though there was no notation in which one could easily denote anything we would recognize as a polynomial. Eventually people started looking at systems of numbers, then systems of functions (e.g., permutations; real-valued functions). It is only within the last 200 years that the more abstract systems into which all of these things could be embedded was developed. Here’s a rough outline of how this played out. Babylonia 3500 BCE to 600 CE: Could calculate square roots, understood linear interpolation, had something analogous to exponential and logarithmic tables; could solve linear and quadratic functions and even a few special cases of functions of higher degree. Largely practical; carefully worked out examples took the place of proofs. For example, 4000 years ago they were asking 1 problems like: “Given an interest rate of 60 per month, compute the doubling time.”48 China from 1600 BCE: Centralized bureaucracy meant the central importance of mathematics — taxes, standardized weights and measures, commerce and salaries, and so on — from a very early time. Yet in 212 BCE, nearly all of China’s mathematical works (as well as many other written works) were destroyed at the command of the emperor. Chinese mathematics recovered quickly, as witnessed by a major text from the Han dynasty (@200 BCE to 200 CE), the Nine Chapters on the Mathematical Art. It consisted of 246 problems divided into 9 chapters. These chapters are: Field Measurement: finding area and computing with fractions; Cereals: proportion problems (this has to do with exchanging one kind of grain for another); Distribution by proportion: more proportion; What width?: given the area or volume, find the lengths of sides, i.e., finding square roots and cube roots; Construction calculations: calculations involved in construction, especially volume; Fair taxes: how to distribute grain and labor based on population and distance; Excess and deficiency: the method of false position; Rectangular arrays: simultaneous linear equations, adding and subtracting positive and negative numbers; Gougu: the Chinese name for the Pythagorean theorem, hence PT and applications. Nine Chapters on the Mathematical Art was largely practical. Carefully worked out examples mostly took the place of proofs, but there was some theoretical discussion. Later Chinese mathematicians developed sophisticated algebraic concepts such as negative numbers and matrices. It’s important to note that, until the last two or three hundred years, Chinese mathematics developed essentially independently from mathematics elsewhere, with very little communication with other cultures. The following statement from the Zhoubi sianjing is a good explanation of how most ancient cultures thought about mathematical reasoning: “A person gains knowledge by analogy, that is, after understanding a particular line of argument they can infer various kinds of similar reasoning... Whoever can draw inferences about other cases from one instance can generalize... To be able to deduce and then generalize... is the mark of an intelligent person.”49 India 1000 BCE to 1200 CE: Ancient India was crucial to the development of algebra in Europe. They invented 0 and our decimal number notation. They were nuts about number theory and problems we would describe as solutions to algebraic equations. The surviving literature is both theoretical and practical. In particular, the translation of Brahmagupta’s Siddhanta into Arabic was seminal. Greece 800 BC to 800 CE: Unlike the Babylonians and the Chinese, Greek mathematicians were 48 See http://en.wikipedia.org/wiki/Babylonian mathematics. The mathfraktion notation. 49 from the MacTutor overview of Chinese mathematics. 37 1 60 has to do with their number largely theoretical. They could solve some quadratic and cubic equations. But the way they thought about algebra was quite different from the way we do, since they thought of most mathematics in geometrical terms. For example, they didn’t have a clear concept of a variable quantity (although they did have a clear concept of a variable point, that is, a point moving according to constraints). We’ll do some activities relating to their notion of geometric algebra, which was central to what they did. Islamic culture 700 to 1200 CE: Algebra really explodes in the Islamic culture of Arabia and Persia. In particular, al-Khwarizmi’s discussion of quadratic equations had a revolutionary effect; in some sense the notion of variable can be traced to him, and he developed the unifying theory we now call algebra in which, for example, numbers of different kinds are all treated as (what we would now call) algebraic objects. The way we think of number theory and algebra — so different from the Greek geometric approach — largely is rooted in his work, and this approach was taken up by other Arabian and Persian mathematicians, although they did not abandon the geometric approach (using it, for example, to solve cubic equations). The word algebra comes from the al-Khwarizmi’s use of the Arabic term al-jabr, the reunion of broken parts. (And al-Khwarizmi’s name gave us our word algorithm). Europe 1200 CE to present: European mathematicians learned a lot from Arabian ones and took the direction even further. Explicit algebraic solutions to cubics and quartics were found. If you’re curious about these formulas, look them up on Google — they are quite complicated. What about quintics (= degree 5)? In the mid 1820’s, Niels Henrik Abel showed that there is no formula by which you can solve an arbitrary quintic equation. None. Zero. Zip. This was absolutely revolutionary. Also in the early 19th century, Evariste Galois generalized this kind of thing into notions that we now call groups and fields — Math 558 deals with this stuff. Over the course of the 19th century, algebra became less and less about individual polynomials and more and more about abstract structures (such as groups and rings and fields) — the set of polynomials, for example, forms a ring under the operations of + and ×. Over the same centuries, in a reverse of the Greek attitude, algebra became a way of doing geometry (think of the Cartesian coordinate system) and both algebra and geometry intertwined in ways very different from how they did two thousand years earlier. Geometric objects (such as symmetries) became objects in algebraic systems — this is discussed in Math 409 — and the interplay between algebra and geometry (and its generalizations) led to entire new fields, such as algebraic geometry, or algebraic topology. So a good way to describe the development of what we call algebra would be: India to Arabia (which includes, in this shorthand, Persia) to Europe. The sophisticated methods of the Babylonians, Chinese, and ancient Greeks — and their work was highly sophisticated — had little influence. Now for some examples of how people thought of algebra before there was algebra. Square roots Here’s how the Babylonians calculated square roots: √ To calculate r: given an approximation sn , define sn+1 = 21 (sn + s1 . r sn ). You can start from any In other words, the Babylonian technique provided a sequence of better and better approximations. The further out you took this, the closer your result was the the actual square root. Since the Babylonians didn’t provide proofs, we don’t know how they knew this worked. Here’s a modern proof that it works. Suppose s = limn→∞ sn where sn is defined as above. Then s = limn→∞ sn+1 , i.e., s = 38 limn→∞ 12 (sn + r sn ). + rs ), So s = 12 (s i.e. 2s2 = s2 + r. I.e., s2 = r, as desired. √ Let’s try 2, s1 = 1. The function we want to iterate is y = 12 (x + xr ). s2 = y(1) = 1.5; s3 = y(s2 ) = y(y(1)) = 1.466....; s3 = y(s2 ) = y(y(y(1))) = 1.414215... √ The exact value of 2 is 1.414213... — in three steps we’ve come within 5 decimal places. √ Try this with 3 [= 1, 73205...], again s1 = 1. How many steps until you’re within 5 decimal places? √ Just for fun, try it with 50 [= 7.071067...], again s1 = 1, just to see how things converge when s1 is a really bad estimate. Again, how many steps until you’re within 5 decimal points? The proof we used to show that this method worked actually only shows: if the sequence (sn )n √ converges, then s = r. Here’s a sequence that doesn’t converge, but you can prove that if it converged it would converge √ to r: sn+1 = 2sn − srn . If the sequence did converge, and s = limn→∞ sn , then s = 2s − rs , so (after a little junior high √ algebra) r = s2 , i.e., s = r. But try this when r = 2. The function being iterated is y = 2x − xr . First let s1 = 1: s2 = 2 − 2 = 0; s3 =... whoops, s3 isn’t defined. Try s1 = 1.5, you get a sequence that goes to ∞. Try s1 = 1.4. You get a sequence that goes to −∞. The hypothesis (s = limn→∞ sn ) is false so the √ conclusion (s = r) need not be true and, in these examples, is in fact meaningless — there is no s to talk about. By the way, if you go to MathWorld you’ll find this algorithm credited to Newton. He came several thousand years later. But by then what the Babylonians knew had been forgotten. For the algorithm I learned in school (back in the Pleistocene era when things like this were taught in fifth grade),50 go to the second method (“Finding square roots using an algorithm”) at http://www.homeschoolmath.net/teaching/square-root-algorithm.php. Geometric algebra When people realized that not every number was rational (this is attributed to the Pythagoreans, 5th century BCE), the notion of number became a bit fraught. Lengths were more comfortable to work with, because they seemed more concrete, more like things in the real world. So a very elaborate algebraic apparatus arose which allowed people to prove what we would call algebraic theorems, but which they thought of as theorems about length, area, and volume. The easiest example is (a + b)2 = a2 + 2ab + b2 . Remember that this notation didn’t arise until well over 2000 years after Pythagoras. How did the ancient Greeks do it? b a a 50 b Note that taught 6= learned. 39 The area of the big square is (a + b)2 . The area of the yellow square is a2 . The area of the green square is b2 . The area of each blue rectangle is ab. So the area of the big square is also a2 + 2ab + b2 . Now let’s sketch a more complicated example. First of all, it was known (by an argument similar to the one about (a + b)2 , but much more a−b 2 2 complicated) that ab = ( a+b 2 ) − ( 2 ) . The question we’ll ask is: how can you find x so that 2 x = ab? To translate into geometric terms, given a, b how can you construct the side of a square whose area is ab? Here’s how it was done. In the diagram below, AB has length a + b, O is the midpoint of AB, and OC has length a+b d = a−b 2 . DC ⊥ AB, where D is on the circle about O of radius c = 2 . D c A O d x C B a−b 2 2 Hence, by the Pythagorean theorem, x2 = ( a+b 2 ) − ( 2 ) = ab. Note what is going on here: instead of looking for ways to calculate a number, the ancient Greeks were looking for ways to construct a length. The general question of “construct a square whose area is a given area” was of great interest in ancient antiquity. The particular instance “construct a square whose area is the area of a given circle” was known as the problem of “squaring a circle,” which was discussed in chapter 6. Recall that it can’t be done, because then π could √ be constructed as the length of a straight line segment, and while some irrational lengths (e.g., 2) can be constructed, the lengths that can be constructed are algebraic, that is, they come about by iterations of arithmetic operations (including taking integer roots) on integers. But π is not algebraic — this was not known until the 19th century, when it was proved by von Lindemann in 1882, a mere 38 years after Liouville proved the existence of transcendental (= non-algebraic) numbers in 1844. 40 9 Prime numbers Recall that a positive integer is prime iff it has exactly two factors: itself and 1. Thus, 1 is not prime, 2, 3, 5, 7, 11... are prime. How can you tell if a number is prime? The oldest method we know was the sieve of Eratosthenes (dating back to at least 2nd century BCE Greece), conceptually very simple but in practice quite difficult to execute. Do you want to know if n is prime? Simply try dividing n by all positive integers k with 1 < k < n. If none of them divides n, then n is prime. It’s not hard to see that in √ fact you only need to try to divide n by all positive integers k < n. The fundamental theorem of arithmetic (every positive number ≥ 2 is a product of primes in a unique way; see below) makes √ the algorithm a little easier: you only need to try to divide n by all primes k < n. But if, say, n has 10,000 digits, then the original version would have roughly 1010,000 steps if n were prime; in the second version of the sieve of Eratosthenes you’d be dividing n by all k with 100 or fewer digits, hence you’d have about 10100 steps, and in the third version you’d be dividing by all primes with 100 or fewer digits and think of how much work you had to do to find those. This is not an academic exercise: almost all security codes involve factoring numbers. How many prime numbers are there? As you probably learned in elementary school Theorem 3. (Euclid’s prime number theorem) There are infinitely many prime numbers. The first recorded proof is due to Euclid: Proof. Suppose S is the set of all primes, and suppose S is finite. Let N be the product of all the primes in S. What about N + 1? It’s bigger than any element of S, i.e., bigger than any prime, so it isn’t prime. By the fundamental theorem of arithmetic (see below) some prime, say n, divides it. Since n ∈ S, n divides N and n divides N + 1. So n divides N + 1 − N = 1, i.e., n = 1 which is not a prime. Contradiction. Therefore S is infinite. Let’s ask a similar question: how many pairs of twin primes are there? Here we define n, m to be twin primes if n, m are both prime and either n = m + 2 or m = n + 2. For example, 3, 5 is such a pair, as is 5, 7, and 11, 13, and 17, 19... The twin prime conjecture says that there are infinitely many such pairs. Is the twin prime conjecture true? Nobody knows. This simple, elegant problem remains unsolved. 3,5,7 is a finite sequence of primes with an interesting property: you get from one to the next by adding a constant, in this case 2: 3 + 2 = 5, 5 + 2 = 7. Such a sequence is called an arithmetic sequence. Since 7 + 2 = 9, which is not a prime, this particular arithmetic sequence of primes stops: 3, 5, 7 and no more. There are many infinite arithmetic sequences, for example 2, 4, 6, 8...; or 2, 5, 8, 11... Infinite arithmetic sequences cannot consist of only primes. Why? Every infinite arithmetic sequence has the same form: a, a + b, a + 2b, a + 3b, a + 4b... But then a + ba is in the sequence, and it is not prime. However, there are infinite arithmetic sequences which contain infinitely many primes. In 1837 Dirichlet proved that a, b are relatively prime51 iff the sequence a, a + b, a + 2b, a + 3b... contains infinitely many primes. You can think of this theorem as a consolation prize for not having an arithmetic sequence all of whose members are prime. 51 i.e., have no common factors other than 1 41 Is there a bound on finite arithmetic sequences of primes? That is, is there some number N so that any arithmetic sequence of primes has at most N elements? Put another way, are there arbitrarily long arithmetic sequences all of whose members are prime? This would be another consolation prize for having no infinite arithmetic sequence of primes. The answer is due to Ben Green and Terence Tao, and was proved in 2004: Theorem 4. The set of prime numbers contains arbitrarily long arithmetic sequences. You can access their paper on Arxiv: http://arxiv.org/abs/math.NT/0404188. It’s worth seeing the abstract, which refers to notions like positive density and pseudo randomness. These concepts come from areas like analysis and probability theory, and gives yet more evidence for the close interrelationships among various areas of mathematics. Now let’s go back to the ancient world. Why did people pay attention to prime numbers? A major reason is that they form the building blocks of the natural numbers. Theorem 5. (The fundamental theorem of arithmetic) Every positive integer ≥ 2 factors into primes and, except for rearrangement, this factorization is unique. The fundamental theorem of arithmetic (FTA) is thousands of years old. You implicitly learned it soon after you learned how to multiply, most probably by means of factor trees. Here’s an example: 72 4 18 6 2 2 3 2 3 In this example, 72 = 4 × 18 = 2 × 2 × 3 × 6 = 2 × 2 × 3 × 2 × 3 = 23 × 32 . We could have done it another way, for example: 72 = 2×36 = 2×4×9 = 2×2×2×3×3 = 23 ×32 . And so on. But however we started we would have ended up in the same place: 72 = 23 × 32 . FTA has two parts: (1) Any positive integer ≥ 2 is a product of primes. (2) Except for rearrangement, this factorization is unique. Most ancient cultures did not write down formal proofs, so it’s not clear why people believed the FTA other than the fact that it worked whenever you tried it. The Greeks did write down formal proofs, and Euclid’s Elements contained a proof. Or, rather, a “proof” — his attempt at proving (2) was not sufficient. We prove (1). Suppose some positive integer ≥ 2 is not a product of primes. Then there is a smallest positive number n > 1 which is not a product of primes.52 Since n is not prime, there are k, m with 1 < k ≤ m < n and n = mk. By definition of n, both m and k are products of primes. But then n is a product of primes. Which contradicts our assumption about n.53 FTA looks backward: given a positive integer n ≥ 2 you factor it into smaller primes. Another way of looking backward is to ask, given a number n, how many primes are smaller than n. We define π(n) = the number of primes ≤ n. For example, π(2) = 1, π(−1) = 0, π(π) = 2 (because 2, 3 ≤ π — note that we’re using π in two ways: as a function and as a number), π(5) = π(6) = 3 and π(7) = 4. What can we say about the function π? 52 This step uses the fact that every set of positive integers has a least element. This is a fundamental property of whole numbers, equivalent to the principle of induction. It’s so fundamental that it’s often used without being noticed, in the same way that we breathe without realizing we are breathing oxygen. 53 You can also prove (1) by mathematical induction. If you’re familiar with induction, this is a nice exercise. 42 Theorem 6. (prime number theorem) limn→∞ π(n) n/ ln n = 1. Why is this theorem given such an important name? The prime number theorem is a deep result about the distribution of primes among the positive integers. It says that if n is very large, then π(n) ≈ lnnn . It was first conjectured in 1796 by Legendre, and proved independently by Hadamard and Poussin in the late 19th century. Its proof involves the deep analytic theory of complex numbers, in particular the Riemann ζ-function.54 Not every non-ancient theorem about primes involves analysis. Here’s a typical “elementary number-theory” type theorem about primes. Theorem 7. (Fermat’s little theorem) Let p be prime. Then ∀n np − n is divisible by p. This theorem was first stated by Fermat, and proved by Leibniz in 1683. It is called Fermat’s little theorem to distinguish it from Fermat’s last theorem, also not proved by Fermat (in fact, proved only in the late 20th century).55 The proof of Fermat’s little theorem is a straightforward induction proof, so let’s do it. Fix p, and let n vary. If n = 1, then 1p − 1 = 0 which is divisible by everything except 0, so certainly divisible by p. Hence Fermat’s little theorem holds for n = 1. Now suppose Fermat’s little theorem holds for n. We will show that it holds for n + 1: (n + 1)p − (n + 1) = [the binary expansion of(n + 1)p ] − (n + 1) = [np + (the sum ofa whole bunch of terms all of which are multiples of p) + 1] − (n + 1) = (np − n) + (p × [something]) By induction hypothesis, np − n is divisible by p. And p × [something] is divisible by p. So (n + 1)p − (n + 1) is divisible by p. When we talked about the sieve of Eratosthenes we mentioned that factoring and primes are crucial in making the security codes work. The basic idea is: it’s easy to multiply numbers. But there are numbers which are hard to factor. Here “easy” and “hard” refer to how many steps are used by an algorithm. Example. It’s easy to multiply 47 × 31 = 1457. The usual algorithm is 3 steps: 1 × 47; 30 × 47; add. In fact any multiplication of two two-digit numbers (an input of 4 digits, two for each number) takes √ at most three steps. But going backwards is harder: you have to test all the prime numbers < 1457 to find the first factor... I.e., you have to check 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31 — 11 steps before you get the first factor! So this input of 4 digits gives us 11 steps to work through. And of course a 4 digit prime will give us even more steps, possibly as many as 25. The most famous algorithm for codes is the RSA algorithm, named for its inventors (Rivest, Shamir, Adleman). It’s famous because it’s the first algorithm in which a key is sent publicly, a 1 ζ(z) = Σ∞ n=1 nz . Fermat’s last theorem says that if there are positive integers a, b, c, k with ak + bk = ck then k ≤ 2. Its proof was announced by Andrew Wiles in 1993, an error was discovered in the proof, and a correct proof was given in 1994, published in 1995. Unlike the simple proof of Fermat’s little theorem, the proof of Fermat’s last theorem involves advanced areas of mathematics, including algebraic geometry, Galois theory, ring theory, and analytic number theory. 54 55 43 message is sent publicly back, but nobody except the person sending the key (we’ll call her Ann) and the person sending the message (we’ll call her Betty) can decode the message reliably. I.e., Ann publicly sends out a key to the code that Betty must use. Betty codes a message, transforms the code, and then sends her transformation to Ann. And even if I knew both messages it still would be prohibitively difficult for me to decode them. Step 1. Ann chooses two (very large) primes p, q and lets n = pq. She also chooses a number e < (p − 1)(q − 1) so that e has no common factors with (p − 1)(q − 1). e is called the public key exponent. She publicly sends a message (n, e) to Betty — she can put it on Facebook, Twitter, talk about it on Ellen, engrave it on her silverware and write it on the sidewalk in permanent ink. None of this matters. The pair (n, e) is called the public key. Step 2. Betty codes her message by a pre-determined method whose details depend on Ann’s key56 and gets a single number as the code, call it m. She doesn’t send m to Ann; instead she sends c = me (mod n). Maybe one of the neighbors works for Wikileaks57 and publishes Betty’s message. Betty doesn’t care. Step 3. Meanwhile Ann has secretly calculated d = e−1 (mod n). d is called the private key exponent Step 4. Ann receives Betty’s message! With trembling hands she unlocks the code by calculating cd (mod n). Because that’s m: m = cd (mod n). And since Ann knows the predetermined method, if she knows m she knows the message. But if you knew the predetermined method and the key and c, you still wouldn’t know the message. Why does this work? There are two aspects: 1. Why does m = cd (mod n)? The answer to this is complicated. You can’t say “because = m” because it is not in general true in modular arithmetic that if x = y −1 (mod n) then (ax )y = a (mod n). Example: (32 )3 = 4 (mod 5), even through 3 = 2−1 (mod 5). But in this situation n is a product of primes, n = pq, and e is relatively prime to (p − 1)(q − 1). In this −1 situation you can prove that (me )e = m.58 −1 (me )e 2. Why is this code hard to crack? Because finding d is difficult. If you know p, q then it’s not difficult — there are formulas that help you. But if you don’t know p, q then it’s hard. And if both p, q have, say, 1500 digits, then there isn’t enough computer time in the universe to factor n. A final comment on factoring: the problem of factoring is an example of an NP problem. NP stands for non-deterministic polynomial time, and an NP problem is one that is hard to do but easy to check. Problems that are easy to do are called P (for polynomial time) problems. For example, if I give you a factorization of a number n, you can easily check if my factorization is correct. But finding that factorization is hard. So factoring is NP. Multiplication, as we have seen, is P. Maybe the problems we think are hard only seem this way. Maybe there are fantastically clever ways of making them easy. Maybe P = NP. If P = NP then our entire way of life, based as it is on codes to protect privacy and keep secrets, comes crashing down, because just about everything we do online or with a credit card will no longer be secure, everything banks do to transfer money, everything stock exchanges do to sell stocks, all kinds of records that governments and businesses keep, all kinds of information transfer in industry and defense — all of that will come crashing down. 56 agreed on between Ann and Betty ahead of time; even if you knew this method you still wouldn’t be able to figure out what Betty’s code is 57 which still exists 58 The proof is non-trivial, so we won’t give it. 44 Whether P = NP is such an important problem that the Clay Institute has offered $1,000,000 to whoever solves it. But an interesting aspect of this problem is that it might be solved and we just don’t know it. This is because if whoever solves it works for the National Security Administration or a similar organization, that person is sworn to secrecy, and not only can’t cannot collect the $1,000,000 but can’t tell anyone without a very high security clearance.59 Homework problem #7 is about a different approach to P = NP. 59 This might sound unlikely, but aspects of the RSA algorithm were apparently known in the NSA long before anyone else had thought of them. It was only after outsiders figured them out that people in the NSA were able to talk about them publicly. 45 10 Negative and complex numbers Negative numbers were known in China (200 CE), Greek Alexandria (Diophantus, 250 CE), India (500 CE), and Arabia (800 CE) but were not known in Europe until the 12th century (Fibonacci). What took everyone so long? Early algebraic reasoning was often geometric (ancient Greece to Arabia, until al Khwarizmi) and numbers were seen as lengths, so what kind of number could a negative number be? On the other hand, negative numbers arise somewhat naturally in commerce. Consider debt, or loss. But rather than saying that someone had net assets of -$300, say, people would say that they owed $50 and had lost their remaining $250 in assets when the town flooded. Even today, people tend to do that. Early negative numbers often seemed to have an operative meaning. For example, if you said “add -2 to 3” this was seen as shorthand for “subtract 2 from 3.” Remember, there was no real algebraic or even arithmetic notation. People did mathematics using words. Negative numbers could appear in equations (that is, sentences that we would translate into equations), but they were not accepted as solutions. They were described pejoratively: “absurd,” or “unacceptable...” The history of negative numbers is closely linked to the history of mathematical notation. For example, Shang numerals, used in China in the 16th century BC, had place notation, notation for 0 (as a space, not a symbol), and by 263 CE had been adapted to include negative numbers. Similarly, negative numbers had a place in Diophantus’ syncopated algebra, and in Indo-Arabic notation (which developed in India and was adopted by Arabian and Persian mathematicians). In all of these cultures the rules for adding and subtracting negative numbers were known early, and the rules for multiplication and division of negative numbers came later. Because of the close relation between the historical development of our understanding of negative numbers and the historical development of notation, the two can’t be considered separately, and we will take a short detour into India to see how people talked mathematics and talked about mathematics. In 625 CE the great mathematician Brahmagupta knew quite a bit about negative numbers, and even considered division by 0, claiming that n0 is infinity, since “In this quantity consisting of that which has zero for its divisor, there is no alteration, though many be inserted or extracted; as no change takes place in the infinite and immutable God, at the period of the destruction or creation of worlds, though numerous orders of beings are absorbed or put forth.” The 9th century Indian mathematician Mahavira made an eloquent claim for the importance of mathematics: “In all those transactions which relate to worldly ... or ... religious affairs, calculation is of use. In the science of love, in the science of wealth, in music and in the drama, in the art of cooking, and similarly in medicine and in things like the knowledge of architecture; in prosody, in poetics and poetry, in logic and grammar and such other things,...the science of computation is held in high esteem. In relation to movements of the sun and other heavenly bodies, in connection with eclipses and the conjunction of planets...it is utilized. The number, the diameter and the perimeter of islands, oceans and mountains, the extensive dimensions of the rows of habitations and halls belonging to the inhabitants of the world, ... all of these are made out by means of computation.” You might guess that the line between mathematical language and the rest of language was not tightly drawn, and this is clear from the following mathematical problem stated by the 12th century mathematician Bhaskara: 46 ”The square root of half the number of bees in a swarm Has flown out upon a jasmine bush; Eight ninths of the swarm has remained behind; A female bee flies about a male who is buzzing inside a lotus flower; In the night, allured by the flower’s sweet odor, he went inside it And now he is trapped! Tell me, most enchanting lady, the number of bees.” Note that the problem is asked of a woman he is trying to please — instead of bringing flowers, he brings math problems. Mahariva’s “science of love” indeed. End of detour. While notation for numbers was highly developed in a number of cultures (China, India, Arabia and Persia), notation that would be useful for algebra developed later, in Europe. In the 12th century the Europeans started getting active. In 1489 the German mathematician Johann Widman used the symbols + and - for the first time. That’s right, it wasn’t until the late 15th century that people wrote things like 2+3. In 1557, the English mathematician Robert Recorde introduced the symbol = for equality. That’s right, it wasn’t until the mid 16th century that people wrote things like 2+3 = 5. In 1584 √ the Dutch mathematician Stevin introduced decimals and the sign. This is about the time that people really started shaking off the notion of number as (necessarily positive) magnitude, and Stevin was an important figure in arguing that negative numbers should be accepted as, well, numbers, just like 3 and 13 . But even then, in Europe at least, people were not quite comfortable with the notion of a negative number. So imagine people’s shock when, again in the mid 16th century, Tartaglia and Cardano discovered that if you wanted to solve quadratic and cubic equations you found yourself considering, not just negative numbers, but square roots of negative numbers. For example, Cardano solved √ √ the system 2 of equations x + y = 10 and xy = 40 (i.e., 10x − x = 40) with the solutions 5 + −15, 5 − 15. A little later in the century, Bombelli was the first to become comfortable with square roots of negative numbers. Barry Mazur’s wonderful book Imagining Numbers (particularly the square root of - 15) is about exactly this development, and in some sense the birth of modern algebra can be traced to the development of complex numbers. Here’s an example of how square roots of negative numbers arose in solving cubics: p p √ √ x3 = 15x + 4 has the solution 3 2 + −121 + 3 2 − −121 which turns out to equal 4. For how this works, see below. At this point, the important thing to note is that square roots of negative numbers turn out to be closely connected to plain old garden variety positive integers. This had an emotional impact similar to discovering that your mother is actually the daughter of extraterrestials. By the 17th century negative numbers were used, like infinitesimals in calculus, without a firm agreement on their meaning. But even without such agreement on what they were, and even given some of the strange distortions that allowed their presence60 they had permeated mathematics, as had complex numbers. For example, in 1629 Girard first stated the fundamental theorem of algebra (a polynomial of degree n is the product of n linear functions, and has exactly n roots up to multi60 for example, a negative root is actually a positive root of an equation which changes the sign of the odd powers – compare the roots of x2 − x + 6 and x2 + x + 6 47 plicity) — this only works if complex numbers are allowed. Newton thought of positive/negative as modeling several distinct types of phenomena: asset/debt, forward/backward motion, direction of vectors... The familiar number line was taking shape, with negative numbers to the left and positive to the right. Yet, still, Descartes called negative numbers “false roots” and positive numbers “true roots.” In the 18th century, Euler started to explicitly distinguish algebra and arithmetic, i.e., algebra became more than just generalized arithmetic; this gave negative numbers (and hence complex numbers) a firmer logical base. Yet even in the 19th century some folks still objected. It wasn’t until Peacock and DeMorgan further separated algebra from arithmetic, Hamilton invented the notion of an abstract algebraic system with his quaternions61 , and Weierstrass, Dedekind and Cantor firmly put all our number systems on solid logical grounds that “negative number” stopped being seen as somehow second class. p p √ √ An application. Why does 3 2 + −121 + 3 2 − −121 = 4? √ √ √ √ Note that −121 = 11i, so we are talking about 3 2 + 11i + 3 2 − 11i. The number 3 2 + 11i − √ 3 −2 + 11i turns out to be a root of x3 = 15x + 4. Why? First we need to note Dal Ferro’s formula for finding a root of the equation x3 = bx + c: s s r r 3 c 3 c c2 b3 c2 b3 x= + − + − − 2 4 27 2 4 27 From Dal Ferro’s formula62 for solving a cubic, we know that one solution of x3 = 15x + 4 is s s r r 3 3 4 42 (15)3 42 (15)3 + − + 2− − 2 4 27 4 27 (15)3 42 4 − 27 r = 4−125 = −121; p √ 3 2 − −121. 4 2 = 2. So 3 4 2 + q 42 4 − (15)3 27 + r 3 4 2 − q 42 4 − (15)3 27 = p √ 3 2 + −121+ √ √ Maybe you don’t believe Dal Ferro’s formula. We’ll show outright that 3 2 + 11i − 3 −2 + 11i really is a root of the equation x3 = 14x + 4. √ √ √ √ Suppose x = 3 2 + 11i − 3 −2 + 11i. Then 15x = 15[ 3 2 + 11i + 3 2 − 11i] and x3 = 2 + 11i − 3(2 + 11)2/3 (−2 + 11i)1/3 + 3(2 + 11i)1/3 (−2 + 11i)2/3 − −2 + 11i) = 4 + 3[(2 + 11i)(2 − 11i)]1/3 [(2 + 11i)1/3 + (2 − 11i)1/3 ] But x = (2 + 11i)1/3 + (2 − 11i)1/3 . And (2 + 11i)(2 − 11i) = 4 − 121i2 = 4 + 121 = 125. And 1251/3 = 5. So indeed x3 = 4 + 15x. √ √ Now why should 3 2 + 11i − 3 −2 + 11i = 4? If you try a few whole numbers, it turns out that that 4 is also√a solution of x3 = 15x + 4. But there are two other solutions, so just because 4 and √ 3 2 + 11i − 3 −2 + 11i are both solutions of x3 = 4 + 15x doesn’t mean they are equal. 61 62 which are now used in writing game software, don’t ask passed down to Tartaglia, and either stolen by or given to Cardano 48 A little algebra shows that x3 − 15x − 4 = (x √ − 4)(x2 +√4x + 1). So its roots are 4 and the roots of x2 + 4x + 1. The roots of x2 + 4x + 1 are −2 +√ 3, −2 − √ 3. Both of these roots are negative real numbers. So we’ll be done if we can show that 3 2 + 11i − 3 −2 + 11i is a positive real number. If you remember how to represent complex numbers on the plane and how to add them and how √ √ 3 3 to take their roots (see below), you can see that 2 + 11i + 2 − 11i is a positive real number. So it must be 4. So we need to represent complex numbers and their arithmetic on the plane. A complex number a + bi corresponds to the point (a, b) on the plane. Draw the line segment between the point and the origin. This line segment has a length, r, and makes an angle, θ, with the x-axis: a+bi r θ (0,0) To take the cube root of a + bi simply take the point whose associated angle is √ length is 3 r: θ 3 and whose a+bi (a+bi)^1/3 (0,0) Now suppose we’ve got z = (a + bi)1/3 , w = (a − bi)1/3 . w is a reflection of z about the x-axis, and to add them together (you saw this in linear algebra) you complete the parallelogram, getting a picture like this (if a, b > 0): 49 z = (a+bi)^(1/3) x-axis z+w (0,0) w = (a-bi)^(1/3) I.e., the sum z + w is a positive real number. √ √ Since 2, 11 > 0, 3 2 + 11i − 3 −2 + 11i is a positive real number. 50 11 Classifying numbers So far we’ve talked about integers (including a section on primes), reals, and, complex numbers. The purpose of this note is to talk about real and complex numbers more carefully, dividing them into different classes then we have before. 1. Rational vs. irrational We’ve already talked about rational and irrational numbers. A rational number is a fraction n (ratio) of the form m where n, m are integers. An irrational number is a real number which is not √ rational. For example (a very famous example) 2. An important characterization is the following: Theorem 8. A real number r is rational iff its decimal expansion repeats. Proof. Suppose r = nk and use the long division algorithm. There are only finitely many digits in your number base, so eventually you start repeating yourself. On the other hand, if you’re in base 1 which b and the decimal expansion repeats, say it’s a1 ...al .al+1 ...am d1 ..dm then d1 ..dm = b−m b−1 is rational. For example, 2.356 = 2.3565656... is rational. But 2.02002000200002... [keep adding one more zero before the next 2] is not rational. This characterization is not dependent on the number base. Any number base will work. Thus, 2.02002000200002... [keep adding one more zero before the next 2] in base 3 (where it equals 2 + 2/32 + 2/35 + 2/39 ...) is not rational, nor is 2.02002000200002... [keep adding one more zero before the next 2] in base 4 (where it equals 2 + 2/42 + 2/45 + 2/49 ...) rational, etc. It wasn’t until the mid-18th century that Lambert proved that π is not rational, nor is eq where q is rational, q 6= 0. It turns out that the rationals and irrationals look different geometrically. In topology, the rational numbers are a countable dense linear order with no endpoints, and any other such space looks exactly like (is homeomorphic to) the rational numbers. Meanwhile, the irrational numbers turn out to be homeomorphic to the product of countably many countable discrete spaces. This is all late 19th - early 20th century stuff. 2. Algebraic vs. transcendental √ √ 2 may be irrational, but it still is easily described: x = 2 iff x > 0 and x2 = 2. Similarly, i satisfies the equation x2 = −1. Numbers, whether real or complex, which are roots of polynomials with integer exponents are called algebraic. Numbers which are not algebraic are called transcendental. There are only countably many algebraic numbers.63 I.e., almost all real or complex numbers are transcendental.64 Any sum, difference, product, or quotient of algebraic numbers is algebraic. You can find a sketch of a proof at http://en.wikipedia.org/wiki/Resultant; go to Applications. Most transcendental numbers can’t be described (because language is countable65 ) but some of them can. For example, π, and eq where q is rational non-zero have nice descriptions that you 63 We’ll talk about countable soon. Right now, you just need to know it’s the smallest kind of infinity I.e., there are uncountably many real or complex numbers, see below. 65 see previous footnote 64 51 learned in, respectively, elementary school and your first calculus course. It wasn’t until the late 19th century that Hermite proved e was transcendental, and Lindemann proved the same for π. Lindemann’s proof was improved over the next 10 or 20 years by a number of mathematicians, including Weierstrass, Hilbert, and, finally, Hurwitz and Gordan, who made it elementary enough for people to not try to simplify further. Here’s a nice question: is e + π algebraic? You’d think we’d know the answer by now, but we don’t. We also don’t know if e − π is algebraic. But we do know that at least one of e + π, e − π is not algebraic. Why? Let a = e + π, b = e − π. Then a + b = 2e, which is transcendental. So at least one of a, b is transcendental. Note: The proofs that π and eq for rational non-zero q are irrational and the later proofs that they are transcendental involve quite a bit of analysis, and in fact the people who proved these theorems were among the major people who developed analysis. 3. What is a real number anyway? √ The set of rationals, thought of as sitting on the real line, has holes, lots of them. For example, 2 is not rational; it represents a hole in the rationals. Same for π, e, etc. Any irrational number is, in some sense, a hole in the rationals. How do we fill these holes? In the 19th century, this was an urgent question, since it was equivalent to the question: what is a real number anyway? A number of approaches were given (all of them equivalent, in the sense that they all give us the same set of real numbers). First, we’ll use the approach of Cauchy sequences. A sequence {an : n ∈ N} is a Cauchy sequence iff its elements get closer and closer to each other. More precisely, {an : n ∈ N} is Cauchy iff ∀ε > 0 there is N ∈ N so if N < n < m then |an = am | < ε. I.e., No matter how close you want the elements in your sequence to get (that’s the ε — we want the elements to be at least ε close) you can get far enough out (that’s the N ) so that any two elements that far out are within ε distance of each other. Every time we have a Cauchy sequence, it should converge to some point on the number line, i.e., to a real number. Because of the holes, not every Cauchy sequence converges to a rational number. So we just say: okay, if a Cauchy sequence doesn’t converge to a rational number, fill the hole with a real number. The problem with this construction is that you can have lots of Cauchy sequences which converge to the same number, and you have to have some way of filling the hole only once. You use equivalence classes for this, and it’s kind of messy. The basic idea is to say that two Cauchy sequences are equivalent iff they converge to the same number. But that’s a little circular. So you have to come up with a definition (similar to the definition of a single Cauchy sequence) of when two Cauchy sequences are similar. This is left to the reader. A second way of filling holes is very simple. Instead of worrying about Cauchy sequences, we worry about intervals of rationals which go all the way down to −∞, are bounded, but don’t have a maximum element. For example {q ∈ Q : q < 2}. We call these Dedeklind cuts. Then we define, for every Dedekind cut, a least upper bound. It’s the set of these least upper bounds of Dedekind cuts that make up the real line.66 66 Technically, the Dedekind cuts themselves are the real line, but this is a little strange to anyone but a set theorist... 52 12 Infinity What does “infinite” mean? How do you measure the size of an infinite set? Do line segments of different lengths have the same number of points? Are there more points in a big line segment or a small triangle? These questions are hard, and have occupied both mathematicians and philosophers for millennia (and still do). To make things a bit more concrete, consider two line segments of lengths 1 and 2; for example, the intervals A = [0, 1] and B = [0, 2] on the number line. (Strictly speaking, we’re thinking of each of A and B as a set of real numbers. For example, π/4 ∈ A, π/2 ∈ B, and π/2 6∈ A.)67 On the one hand, A is a proper subset of B (that is, every point in A is also in B, but not vice versa), so it is plausible that A should have fewer points than B does. On the other hand, both sets are clearly infinite, so they ought to have the same size. Actually, neither of these arguments is correct. Just because A is a subset of B does not mean that A has fewer points than B. In fact, these two intervals have exactly the same sizes. On the other hand, something even more mind-boggling is true: there are lots of different sizes of infinite sets — in fact, infinitely many different infinities! The ancient Greeks (and the ancient Babylonians — recall the Babylonian procedure for approximating a square root — and others in the ancient world) were familiar with infinite processes, which they thought of as potentially infinite (you could go as far as you want but couldn’t reach the end). But they did not think of infinity as a number, as something you could use to measure things (“this infinite set is bigger than that one”). It was illegal to use infinity in certain ways when doing mathematics.68 Aristotle codified this by distinguishing between “actual infinity” (which doesn’t exist) and “potential infinity” (which does). An example of this is Euclid’s prime number theorem (see “Prime numbers” from the course notes). Today, we would phrase the theorem as “There are infinitely many prime numbers,” or “The set of all prime numbers is infinite,” but Euclid said, “No finite set of prime numbers can possibly be the complete list of all prime numbers.” Another kind of phrasing you might see in ancient mathematical writings is “No finite set of prime numbers exhausts all the prime numbers.” The Greeks found ways to do mathematics without resorting to infinity. For example, let’s consider Archimedes’ “quadrature of the parabola”, developed in chapter 3, whose ideas anticipate modern integral calculus, but are phrased differently. To summarize, Archimedes showed that each triangle in the diagram on p. 16 has one-eighth the area of its parent triangle. etc. 67 Remember, the symbols ∈ and 6∈ mean “is a member of” and “is not a member of” respectively. At times — as recently as the late 19th century — this issue had religious overtones: only God could conceive of an infinite object, and it was blasphemy to think that a human being could do so. 68 53 So the total area covered in the first n steps is 1 1 1 1 1 1 n−1 = α 1+ + α 1+2· +4· + ··· + 2 · n−1 + · · · + n−1 . 8 64 8 4 16 4 This is a partial sum of a geometric series; it is equal to n 4α 1 1− . 3 4 (5) A modern mathematician would say that the area of the parabolic segment is therefore n 4α 1 4α Area(P ) = lim 1− = . n→∞ 3 4 3 But Archimedes didn’t have tools like limits. Instead, he reasoned as follows: • Any number x < 4α/3 must be less than the area of P , because if you repeat this process enough times, the area covered by the triangles eventually exceeds x. • Any number y > 4α/3 must be greater than the actual area of P , because no matter how many triangles you draw, the area they cover (namely the quantity in formula (5)) is always less than 4α/3. • Therefore, the area of P must be exactly 4α/3. This reasoning is perfectly correct, and actually contains some fairly deep ideas about limits and sequences and things hidden inside it — but notice how it’s phrased so as to avoid any explicit mention of infinity, or limits, or convergence, or infinitesimals. Now let’s look at the question of how you can decide whether one infinite set is bigger than another. The fact that we use a singular noun — infinity — tells us that people were conflating all infinite sets. Until the late 19th century and the work of Georg Cantor, everyone thought that all infinite sets had the same size. Cantor’s realization that this is false was a pioneering realization. To do this, he had to precisely define what “same size” and “smaller size” meant. Suppose that F and G are two sets. How can we tell if F and G are the same size? For that matter, what does “size” (or “cardinality”, which is the technical term) mean? To say that F has n elements (symbolically, |F | = n) is to say that there’s a function q : F → {1, 2, . . . , n} that is one-to-one and onto. Such a function is called a bijection. So for finite sets F, G, |F | = |G| iff there exist bijections q : {1, 2, . . . , n} → F, r : {1, 2, . . . , n} → G for some nonnegative integer n. On the other hand, we don’t need to know the actual size of two sets to know that they are the same cardinality. We can just say 54 Definition 1. |F | = |G| iff there exists a bijection b : F → G; |F | ≤ |G| iff there is a 1-1 map from F into G. In other words, |F | = |G| iff it is possible to pair the elements of F with the elements of G so that every element of F has exactly one “mate” in G and vice versa; |F | ≤ |G| iff it’s possible to pair each element of F with some element of G so that no two elements of F have the same “mate” (but you could have some elements of G left over). Our instinct that “subset” means “smaller” isn’t totally off the mark. From definition 1 it’s immediate that Fact 1. If F ⊆ G then |F | ≤ |G|. The definition of “same size” and “smaller size” in definition 1 has the advantage that it applies to infinite sets as well. A good way to think about a bijection between two sets is as a way of labelling the elements of one set with the elements of the other set, using each label exactly once. Under this definition, the intervals [0, 1] and [0, 2] have the same cardinality, because there is a bijection between them, namely f : [0, 1] → [0, 2] defined by f (x) = 2x. It doesn’t matter that the first interval is a proper subset of second — in fact, the rule “If A is a proper subset of B, then |A| < |B|” applies if and only if A is finite. This definition, while sensible, has a number of striking consequences. For example, the function f (x) = 2x defines a bijection between the infinite sets N = {0, 1, 2, 3, . . . }, E = {0, 2, 4, 6, . . . }. So |N| = |E|, even though there are infinitely many numbers that are in N but not in E. Similarly, |N| = |Z| : Z = {0, −1, 1, −2, 2, −3, 3 . . . }. Comparing a set to N is of some interest: Definition 2. X is countable iff X is finite or |X| = |N|. X is uncountable iff |N| < |Y|. So E, and Z are countable. What about Q and R? Definition 1 is due to Georg Cantor, who also proved the following theorem: Theorem 9. |N| = |Q| < |R|. In other words: Q is countable and R is uncountable. Or, to put it another way: There are the same number of rational numbers as non-negative integers, but there are more real numbers than rational numbers. Proof. First we show that |N| = |Q|. We being by considering non-negative rationals. n For each q ∈ Q with q 6= 0, we assign nq , mq , nq > 0 so q = mqq and nq , mq have no common divisor except 1. (I.e., q is in lowest terms.) We assign a 1-1 onto function f : N → Q by induction, as follows: n n f (0) = 10 . Suppose we know f (k) = m . If m > 0, f (k + 1) = −m . If m < 0, we ask: is there some integer i so n + i, m − i have no common divisor except 1? If so, let i∗ be the smallest such, n−i∗ and let f (k + 1) = |m|+i ∗ . If not, let f (k + 1) = n + |m|. 55 f is 1-1, because each rational number has exactly one representation in lowest terms. An inductive argument (not given here) shows that f is onto. Another way to understand this proof (we’ll just do the non-negative rationals) is to follow the red path in the diagram below: 1 0 2 3 1/2 4 5 3/2 1/3 2/3 1/4 6 ... 5/2 ... 4/3 5/3 ... 5/4 ... 3/4 1/5 2/5 3/5 4/5 1/6 6/5 ... 5/6 ... You go down each successive finite diagonal, from right to left. When you hit the left margin, scoot up to the first number in the top right you haven’t hit yet. Be sure to skip all fractions not in lowest terms (I simply left them out of the diagram). Repeat. To get from non-negative rationals to all of Q consider the following table: Table 1: Counting Q k f (k) 0 0 1 1 2 -1 3 2 4 -2 5 1 2 6 − 12 7 3 8 -3 9 10 − 13 1 3 11 4 12 -4 13 2 3 14 − 23 15 5 16 −5 ... ... We’ve just proved |N| = |Q| twice. Now we give Cantor’s proof — by contradiction — that |N| < |R|, i.e., |R| is uncountable.69 For technical reasons, we’ll show that X = {x ∈ [0, 1] : 9 does not appear in the decimal expansion of x} is uncountable. Since X ⊂ R, by fact 1 we’ll have shown that R is uncountable.70 Suppose that f : N → [0, 1] is any function. Make a table of values of f , where the 1st row contains the decimal expansion of f (1), the 2nd row contains the decimal expansion of f (2), . . . the nth row contains the decimal expansion of f (n) . . . Perhaps the table starts out like this: n 1 2 3 4 5 .. . 0 0 0 0 0 .. . . . . . . 3 3 1 7 3 1 7 4 0 7 4 3 2 7 5 f (n) 1 5 7 3 8 5 1 0 0 0 6 7 7 6 0 2 3 1 7 0 6 7 4 8 0 5 3 2 1 0 3 7 8 1 0 ... ... ... ... ... Of course, only part of the table can be shown on a piece of paper — it goes on forever down and to the right. 69 This method of proof is called a diagonal argument. The technical reason is that if we include 9’s the decimal representation isn’t unique, e.g., 0.29999... = 0.3. And that will mess up our proof. 70 56 Can f possibly be onto? That is, can every number in [0, 1] appear somewhere in the table? In fact, the answer is no — there are lots and lots of numbers that can’t possibly appear! For example, let’s highlight the digits in the main diagonal of the table. n 1 2 3 4 5 .. . 0 0 0 0 0 .. . . . . . . 3 3 1 7 3 1 7 4 0 7 4 3 2 7 5 f (n) 1 5 7 3 8 5 1 0 0 8 6 7 7 6 0 2 3 1 7 0 6 7 4 8 0 5 3 2 1 0 3 7 8 1 0 ... ... ... ... ... The highlighted digits are 0.37218 . . . . Suppose that we add 1 to each of these digits — because of our technical condition, we are adding them mod 9, so if a digit is 8, “adding 1” gives us 0 — to get the number 0.48320 . . . . Now, this number can’t be in the table. Why not? Because • it differs from f (1) in its first digit; • it differs from f (2) in its second digit; • ... • it differs from f (n) in its nth digit; • ... So it can’t equal f (n) for any n — that is, it can’t appear in the table. This looks like a trick, but in fact there are lots of numbers that are not in the table. For example, we could subtract 1 from each of the highlighted digits (changing 0’s to 8’s), getting 0.26107 — by the same argument, this number isn’t in the table. Or we could subtract 3 from the odd-numbered digits and add 4 to the even-numbered digits. Or we could even highlight a different set of digits: n 1 2 3 4 5 .. . 0 0 0 0 0 .. . . . . . . 3 3 1 7 3 1 7 4 0 7 4 3 2 7 5 f (n) 1 5 7 3 8 5 1 0 0 0 6 7 7 6 0 2 3 1 7 0 6 7 4 8 0 5 3 2 1 0 3 7 8 1 0 ... ... ... ... ... As long as we highlight at least one digit in each row and at most one digit in each column, we can change each the digits to get another number not in the table. Here, if we add 1 to all the highlighted digits, we end up with 0.042081 . . . — and, again, this is a real number that does not equal f (n) for any positive integer n. 57 What is the point of all this? Precisely that the function f can’t possibly be onto — there will always be (infinitely many, in fact uncountably many) missing values. Therefore, there does not exist a bijection between N and [0, 1]. The basic idea of this proof — the diagonal argument — can be applied in other contexts. Definition 3. If S is a set, then the power set P(S) is defined as the set of all subsets of S. For example, if S = {1, 3, 4}, then n o P(S) = ∅, {1}, {3}, {4}, {1, 3}, {1, 4}, {3, 4}, {1, 3, 4} . When S is finite, it’s not hard to see that |P(S)| = 2|S| : (because to choose a subset R of S, you need to decide whether each element of S does or does not belong to R, and there are 2|S| many such choices). In the above example, |S| = 3 and |P(S)| = 8 = 23 . What about infinite sets? Using a version of Cantor’s argument, it is possible to prove the following theorem: Theorem 10. For every set S, |S| < |P(S)|. Proof. Let f : S → P(S) be any function and define X = {s ∈ S : s ∈ / f (s)}. Now, is it possible that X = f (s) for some s ∈ S? If so, then either s belongs to X or it doesn’t. But by the very definition of X, if s belongs to X then it doesn’t belong to X, and if it doesn’t then it does. This situation is impossible — so X cannot equal f (s) for any s. But, just as in the original diagonal argument, this proves that f cannot be onto. To give a sense of how the proof works, here is a finite example: If S = {1, 2, 3, 4}, then perhaps f (1) = {1, 3}, f (2) = {1, 3, 4}, f (3) = ∅ and f (4) = {2, 4}. In this case X does not contain 1 (because 1 ∈ f (1)), X does contain 2 (because 2 6∈ f (2)), X does contain 3 (because 3 6∈ f (3)), and X does not contain 4 (because 4 ∈ f (4)), so X = {2, 3}. As a corollary to theorem 10 the set P(N) — whose elements are all sets of positive integers — has more elements than N itself; that is, P(N) is not countably infinite. As a consequence of this result, the sequence of infinite sets N, P(N), P(P(N)), P(P(P(N))), . . . must keep increasing in cardinality. That is, there are infinitely many different sizes of infinity!71 The diagonal argument is also useful in recursion theory, theoretical computer science, and is the heart of the proof of Gödel’s incompleteness theorem which, roughly, says that you can’t know everything — if you have a way of knowing all of the axioms of a mathematical system, and if the system is complicated enough to produce very basic number theory, then there are statements which cannot be proved true and cannot be proved false in the system. 71 And it gets worse, because we’ve only indicated how to generate countably many different sizes of infinity. There are many many many many more than many lots more than many more. 58 13 The origins of graph theory In the 18th century, the city of Königsberg in Prussia lay along the Pregel River. The river has several branches, which divided the city into four districts connected by seven bridges,72 as in the figure shown below (rivers in blue, bridges in red). A longstanding puzzle for the residents of Königsberg was as follows: Is it possible to design a stroll around the city which would cross each of the seven bridges exactly once? Here’s another problem: Draw a figure made out of line segments — a square, or a triangle, or a square inside another square, or any of the examples below or anything similar. Is it possible to draw that figure without ever (a) picking your pencil up off the paper or (b) retracing any segment you’ve already drawn? A drawing with these properties is called “unicursal”, and, in fact, the Königsberg bridge problem is a problem about a particular unicursal drawing, albeit differently stated. And both problems are about graphs — not a curve like y = x2 , but a collection of points and lines defined below in definition 4. In 1736, the great Swiss mathematician Leonhard Euler solved the Königsberg bridge problem. Euler’s key insight was that the islands and bridges could be modeled by a simple mathematical structure called a graph. Graph theory — the theory of Euler’s kind of graphs — has since developed 72 Not any more. Two of the bridges were destroyed in World War II, two have been demolished, two more bridges have been built. Königsberg is now Kaliningrad, Russia and the Pregel is now the Pregolya river. 59 into an extremely beautiful and useful area of mathematics, with deep theorems and surprisingly diverse applications. Definition 4. A graph G consists of a set of vertices and a set of edges, where each edge connects two vertices. For example, the map of Königsberg can be represented by a graph with four vertices a, b, c, d, representing the districts of the city, and seven edges 1, . . . , 7, representing the bridges. Edge #1 connects vertices a and b; edge #2 connects vertices b and c; etc. a 1 a 3 5 b 1 3 4 2 d b 4 2 c 5 6 d 6 7 7 c It looks like there’s a mistake in the figure — shouldn’t edges 2 and 4 be interchanged? (After all, in the map, bridge 2 is west of bridge 4.) Actually, it doesn’t matter. The definition of a graph has nothing to do with location; the only information the graph knows is the names of the vertices and edges and which is attached to which. So as long as both these edges connect the same pair of vertices (namely b and c), it doesn’t matter how we draw them. Here are the graph-theoretic definitions we need to talk about the Königsberg bridge problem: Definition 5. The degree of a vertex in a graph is the number of edges attached73 to that vertex. For example, vertex a has degree 3 (because it is attached to edges 1, 3, and 5) and vertex b has degree 5 (because it is attached to edges 1, 2, 3, 4, 6). Definition 6. An Eulerian path in a graph is a way to walk through the vertices of a graph, one edge at a time, so as to traverse every edge exactly once. (So an Eulerian path can be thought of as an order for the set of edges.) An Eulerian circuit is an Eulerian path whose starting vertex is the same as its ending vertex. For example, in the graph below, the sequence of edges 1,2,3,4,5,6 (pictured) and the sequence 2,6,3,4,5,1 (not pictured) form Eulerian paths. They arenot Eulerian circuits because the starting and ending vertex are not the same. 6 5 2 6 5 2 1 1 4 3 4 3 The Königsberg bridge problem is simply this: If G is the graph whose vertices are regions of Königsberg and whose edges are the bridges, then does G have an Eulerian path or an Eulerian circuit? Here is Euler’s answer. 73 There’s one slight complication. It is permissible for an edge to connect a vertex to itself; such an edge is called a loop. Many of the graphs we want to study don’t have any loops — for example, no bridge connects one of the districts of Königsberg to itself. However, if an edge connects a vertex to itself, we usually think of that edge as contributing 2, not 1, to the degree of the vertex. The reason for this will become clear later on. 60 Theorem 11. If G is a connected 74 graph, then: • If G has no vertices of odd degree, then G has an Eulerian circuit. • If G has 2 vertices of odd degree, then G has an Eulerian path but no Eulerian circuit. • If G has 4 or more vertices of odd degree, then G has no Eulerian path (or Eulerian circuit). Here’s part of Euler’s reasoning. Suppose that a graph G has an Eulerian path P . If v is a vertex that is neither the first nor last vertex of P , then P must enter v exactly as many times as it leaves v. Since every edge incident to v is traversed exactly once, this means that the number of such edges — that is, the degree of P — must be even. Therefore, G has at most two vertices of odd degree, namely the first and last vertices of P . On the other hand, if P was actually an Eulerian circuit (not just an Eulerian path), then the first and last vertices are the same vertex x, and in fact x has even degree (because, again, P entered x exactly as many times as it left x). So in this case G has no vertices of odd degree. This argument is not a complete proof; it is still necessary to show that if G has zero or two odddegree vertices, then G really does have an Eulerian circuit or Eulerian path respectively. (Another way of saying this is that we have to rule out the possibility of a connected graph that happens to have, say, zero odd-degree vertices but happens to have no Eulerian circuit.) But this can be done. There are a couple of obvious missing cases. What if G has one or three odd vertices? Investigating that is a homework problem. By the way, remember that a loop contributes 2, not 1, to the degree of the vertex x to which it is attached. (See the footnote about loops.) This rule makes sense in light of Euler’s theorem. On the one hand, adding a loop at x doesn’t affect the existence or non-existence of an Eulerian path or circuit — just insert the loop into an Euler path whenever you’re standing at x. On the other hand, if the loop contributes 2 to the degree of its vertex, then ignoring loops doesn’t affect the number of odd-degree vertices in the graph. Why should we care about graphs? Aside from their intrinsic interest, graphs come up in all kinds of real-world situations. Here are just a few: • The Web can be thought of as a graph, where the vertices are webpages and the edges are links. • Facebook can be modeled as a graph: vertices are people and edges are friendships. (Likewise for “Six Degrees of Kevin Bacon.”) • Family trees are graphs — vertices represent people and edges represent relationships such as marriage or parenthood. Similarly, so are the trees that evolutionary biologists use to model relationships between different species. • The GPS device in your car uses graph theory to calculate the shortest driving route between two points. For example, edges are blocks and vertices are intersections. 74 This means that it is possible to walk from any vertex to any other by some sequence of edges. 61 14 Statistics and probabilty Here’s a coarse outline of what we’re talking about: • data analysis: various ways of analyzing data. Visual representations are important. • probability: theoretical probability calculates precisely what to expect from a random process; experimental probability conjectures probabilities (e.g., what to expect) from data and then uses those conjectures to calculate what to expect from a random process. • statistics: predictions based on models — the normal curve, the t-distribution, and so on. I.e., given summary data from some kind of sample, you analyze the summary data according to the appropriate model and then predict things about the population. For example, ”cigarette smoking approximately doubles the risk of stroke.”75 Probability and statistics are truly modern. They developed in tandem beginning in 17th century Europe; there really wasn’t anything like them in the ancient or pre-modern world, or anywhere outside Europe until Europeans started colonizing everywhere. Data analysis, on the other hand, was necessary in many societies (although visual representations didn’t develop until mid 19th century Europe) and especially important in large societies with centralized government. In these notes we’ll try to trace the historical development. In some places our notes get very technical. Feel free to skip over the technical parts. First came data analysis in the form of a census. Ancient societies would do a census of land, or of people, or of property, or of certain people (for example, free citizens only), or of certain kinds of land (for example, farmland), or of certain kinds of property; or of various combinations. The first census of which we have a record was in Egypt around 3050 BC. Several censuses are mentioned in the Bible, in both the old and new testaments. Here’s an extract from the English Domesday Book, a census written up in 1086: “In (North) Allerton there are 44 carucates of land taxable, which 30 ploughs can plough. Earl Edwin held this as one manor before 1066, and he had 66 villagers with 35 ploughs. To this manor are attached 11 outliers [i.e., other estates, listed in the original; we won’t list them here]... Now it is in the King’s hands. Waste. Value then 80 pounds. There is there, meadow, 40 acres; wood and open land, 5 leagues long and as wide.” The first thing you learn these days about data analysis is to have a clear purpose (or clear purposes) that guides your gathering of data. The second thing is to describe your data clearly. By modern standards the Domesday Book is quite jumbled and unclear. Do the “66 villagers” include women? children? old people? Why are estates counted but not the nobility? (Surely some estates had more than one noble man in residence). Why count ploughs but no other kind of property? Presumably “taxable land” means farmland. What is worth 80 pounds? Does “waste” refer to all the outlying estates? Or the meadow, wood, and open land? The next step in data analysis was gathering together parish records on deaths in order to keep track of the plague, begun by the English King Henry VIIII in 1532. The data was not accurate — a parish clerk might miss a week, and then make up for it the next week by reporting two weeks’ worth of data as one; families might not correctly report the cause of death out of fear of being shunned; and, because medicine was far from an exact science, an honest report could simply be wrong. 75 from a Centers for Disease Control website, http://www.cdc.gov/tobacco/basic information/health effects/heart disease/. 62 In the mid 17th century, John Graunt systematically studied many decades of this data, looking for patterns and trends, and using what we would now recognize as statistical processes to make conclusions about the data. For example, he noticed that in 1625 the number of reported deaths not due to plague formed a sharp spike in the data. He found that unbelievable, and concluded that there were more plague deaths in 1625 than reported. While much of modern statistics has come about through such fields as quality control in manufacturing (e.g., the t-distribution was discovered by the statistician William S. Gosett who worked for Guiness Brewery), agriculture (the USDA was established in 1862, and its Division of Statistics was formed one year later), public health (the work of Florence Nightingale), or social science (especially psychology), the first real statistical problem that was widely studied was the problem of astronomical measurement. Even in the second century BC, astronomers knew that their measurements were not precise. They knew that different measurements of the same event would be different because of conditions they could not control, such as the vagaries of the instrument, or of the atmosphere, or of the weather. Different astronomers dealt with this situation differently: some used what we now call the mean, others the median, others grouped data, or resorted to using ad hoc formulas (which they often didn’t report)... It wasn’t until the mid 18th century that the mean was the standard method of summing up repeated observations, and even then there was controversy over whether you shouldn’t just take one observation and stick to it — why complicate your thinking with all these other observations? what could they really add to our understanding? Meanwhile, back in the mid 17th century, Fermat and Pascal were dealing with the queries of the aristocratic (and somewhat intellectual) gambler Chevalier de Méré. This was the birth of probability theory: essentially all the basic rules of probability theory came from this work, as well as quite a bit of advanced probability theory. You are probably (excuse the expression) familiar with some basic probability. For example, the probability of flipping a fair coin and getting a tail is 12 and the probability of flipping two fair coins and getting two tails is 14 . You probably know that, as a consequence, if you flip a fair coin 100 times you can expect to get about 50 tails . You may not know that if you get exactly 50 you should be suspicious of the randomness of the process. And while you probably know that a run of 100 would be highly improbable, you you may not know that you can expect runs of 3, 4, or 5 tails in a sequence of 100 coin flips.76 Trying to come up with probabilities for things like sequences of coin flips led to the normal distribution, which began its mathematical life as a way to calculate probabilities but which was re-interpreted by Gauss as a statistical model, and is in most common use as a statistical model for certain kinds of mass behavior — that’s the bell-shaped curve (although there are many other curves that are bell-shaped): you expect a lot of stuff in the middle, and not so much at the ends (think of how much people weigh, or how tall they are, or scores on the ACT...). The next page or so contains a technical discussion of how this came about: Probability theorists find themselves calculating the binomial distribution, used to figure out what you could expect in a repetition of n trials when there were only two possible outcomes (for example, flipping a coin n times). There are handy formulas for the binomial distribution, and if n is small it’s easy to calculate. But if n is large this is hard. So people tried to find good ways of approximating the binomial distribution. De Moivre, in 1733, proved that the probabiilty of getting exactly n2 + d 2 2 heads in n flips was √2πn e−2d /n which means that the probability of getting between n2 and n2 + d 76 A good sleight-of-hand magician can easily produce a run of 100 tails, or a run of 100 heads, etc.; coin tossing isn’t really random, it just seems random because most of us can’t consciously control it. 63 heads in n flips of a coin is 4 √ 2π Z √ d/ n 2 e−2y dy. 0 2 2 e−2d /n as one of the family of If you’ve had a calculus based statistics course, you recognize √2πn curves we now call the normal distribution, and you may remember that part of the course involved using the normal distribution to approximate the binomial distribution. Given the centrality of the normal distribution to the rest of statistics this might have struck you as somewhat quaint, but in fact it is exactly the effort to approximate the binomial distribution that is the origin of the normal distribution. Going back to astronomy, mathematicians were trying to figure out what curves could best describe the errors of observation they saw in astronomical data. Some principles were clear, for example: 1. Small errors are more likely than large errors. 2. For any real number , the likelihood of errors of magnitudes and − are equal. The R s goal was to find a function y = φ(x) so that the probability of an error between r and s was r ydx. This automatically adds a third principle: the total area between the curve and the x-axis must be 1. Using these principles, Laplace proposed two curves which had the disadvantage of either not −m|x| ) for some constant m, or having a vertical asymptote at 0 being differentiable at 0 (y = m 2e 1 (y = 2a ln xa where −a ≤ x ≤ a). This meant that these principles were not sufficient to determine the error curve. Gauss added a fourth principle: Given several measurements of the same quantity, the most likely value of the quantity is their average. 2 2 Using these four principles he determined that φ(x) = σ√12π e−x /2σ , where σ is the standard deviation. (Actually, he didn’t quite do this, since he didn’t have the notion of standard deviation. Instead, he had a quantity h which he thought of as the “precision of the measurement process”). This is, of course, yet another normal distribution. Gauss did not do this in an abstract context. On January 1, 1601, an astronomical object (in fact, the first asteroid to be noticed by humans) was discovered by the Italian astronomer Giuseppe Piazzi, who named it Ceres. To ascertain its orbit, many people observed it and recorded their observations. But six weeks later Ceres disappeared behind the sun. Where would it reappear? Gauss suggested searching an area of the sky that differed from the one most astronomers predicted, and he was right. It was this error curve that enabled him to make the prediction. When Gauss published his work in 1809, he claimed that his fourth principle depended on the method of least squares — what we call least squares regression. This was first published by Legendre in 1805, although Gauss claimed that he had known about it since 1795. This method is used to find a straight line that best fits the data, and since it’s very technical, that’s all I’ll say about it. The error curve has the property that its maximum height is at x = 0. Generalizing the formula to account for maximum heights elsewhere gives the family of normal distributions: N (µ, σ) = 2 2 √1 e−(x−µ) /2σ where µ is the mean and σ the standard deviation. End of technical discussion. σ 2π 64 In the mid 19th century, Quetelet was the first to apply the normal distribution to social science data. He was the first to hypthesize the mythical creature known as the “average man.” To Quetelet, this creature was not mythological but ideal, and the rest of us simply represented deviations from this norm. If indeed the rest of us (including Quetelet) are deviations of this ideal creature, it would be important to know its dimensions. The first data set he dealt with were measurements of the chest circumference of Scottish soldiers. (By modern standards this is ridiculously biased — only Scottish? only soldiers?) Quetelet set out to prove that the distribution of these measurements was normal. In fact he was wrong on several counts: the original measurements did not form a normal distribution; he copied several of the measurements incorrectly; and the notion of an “average man” (or woman, or bird, or fish, or earthworm, or...) compared to whom the rest of us are some kind of error was useless. But his ideas that certain kinds of data should fit the normal distribution (later there were other distributions to fit data to) and that you should be able to prove that data fit a particular distribution, were important to the development of statistics. The true importance of the normal distribution didn’t become apparent until the end of the 19th century when Lyapunov proved the central limit theorem (his proof published in 1901; the theorem itself is often attributed in an early form to Lagrange), the main idea of which is: if you take all possible samples of size n from a very large population, the distribution of the means is essentially a normal curve, whose mean is the population mean. (This was actually implicit in the early work on binomial probabilities.) The explosion of applications of statistics and data analysis came not only from high-level mathematics (such as the normal distribution and the central limit theorem) but also from carefully developed ways of presenting data visually. William Playfair (late 18th century) was a leader in coming up with clever ways of presenting data. He developed the time series graph, the bar graph, and the pie chart. Florence Nightingale’s famous graph of the monthly death rate during the Crimean War raised the pie chart to a level of sophistication it has seldom reached again. Playfair’s circle graph (not a pie chart) was another way to present complex data in a visually clear manner. (Edward Tufte’s book The Visual Display of Quantitative Information, written in the late 20th century, is the classic work on such innovative data displays.) The histogram was invented by A.M Guerry in 1833 (although not named until 1895 in Karl Pearson’s The Mathematical Theory of Evolution). The psychologist Galton invented the ogive (cumulative frequency distribution) in 1875. The history of the scatterplot is more obscure — nobody knows who invented it — but it was popularized by Galton. In 1952 Mary Eleanor Spear essentially created the box plot, later refined by J.W. Tukey in 1977; Tukey also invented the stem and leaf graph. And, finally, the explosion of these applications came about from the 19th century’s desire to put things on a scientific, i.e., mathematical, footing, so that issues of public health and welfare, agriculture, psychology, and so on were to be decided not only on political or theological or philosophical grounds, but by looking at trends in data (data analysis) and making predictions based on the data (statistics). We are still reaping the benefits and flaws of this approach. 65 15 Women in mathematics The history of women in mathematics is both more extensive than you might think, and far less extensive (due to cultural norms) than it should be. Here are brief biographies of some important women mathematicians. Theano (? - ?, Greek) Theano was the wife of Pythagoras. According to legend, she took over the Pythogorean cult after he died. But note that just about everything about the Pythagoreans is legend, and almost nothing is verifiable. Hypatia (360 CE? 355 CE? - 415, lived in the Greek culture of Alexandria, Egypt) Rather unusually, her father Theon Junion — himself a major intellectual figure — encouraged her education, which was both wide and deep. She was a major orator, and recognized as a leading scholar. She worked in astronomy, astrology, and mathematics, doing major work on conic sections. Towards the end of her life, Alexandria moved from a society tolerant of many religions to one dominated by Christianity. Although Hypatia corresponded with and taught many highly placed Christians, she was seen as a symbol of pagan culture, and was pulled from her chariot and brutally murdered by Christian rioters. Once Christianity became securely dominant, she was (long after her death of course) considered a model of virtue and chastity. Maria Agnesi (1718 - 1799, Italian) The eldest of 21 children, she was a child prodigy, giving a speech at the age of 9 on the importance of educating women. Her major book on analysis, Analytical Institutions, appeared in 1748. It covered maxima and minima, tangent lines, inflection points, differentials, integral calculus, and differential equations. It was a major contribution, perhaps the first systematic presentation of the state of the art at the time. Highly influential, it was widely translated and used as a text. She is famous for the versed sine curve — see chapter 5 —unfortunately known as the Witch of Agnesi, which was intensively studied by a number of mathematicians, including Fermat. She was elected to the Bologna Academy of Sciences — a high honor — and appointed as only the second woman professor in a European university, but gave up mathematics when her father died. After his death she gave up mathematics and spent the rest of her life studying theology and leading a life of pious service Sophie Germain (1776 - 1831, French) Her parents tried to stop her learning mathematics because it was unsuitable for women. So she obtained lecture notes from the Ecole Polytechnique and taught herself from them. When she anonymously submitted a paper on analysis to Lagrange, he was so impressed that he became her mentor. She corresponded with Gauss on number theory, in particular on Fermat’s last theorem, and proved that if x5 + y 5 = z 5 with x, y, z integers, then at least one is divisible by 5. (We now know there are no such integer triples.) This was one of the major results in early 19th century number theory. She switched to the study of elastic surfaces, and in 1816 won a major prize on this work (again, her entry was anonymous). Fourier was another mentor. She was allowed to attend sessions of the Institut de France, and offered an honorary degree from Göttingen just before she died. Ada Byron, Lady Lovelace (1815 - 1851, English) Her father was the poet Lord Byron. Her mother, who herself loved mathematics, left him when Ada was an infant and raised her daughter to be a mathematician and scientist. Ada liked both 66 mathematics and poetry, and, marrying into nobility, moved in both high society and intellectual society . She worked with Babbage on early computing machines (never built). In a theoretical sense — because there were no computers on which to test her ideas — she was the inventor of computer programming. She led a fairly stable life as an upper-class wife and mother, but is often seen through a romantic lens, and has appeared as or inspired fictional characters in novels, plays, cartoons, movies, and comic strips (for the latter, go to http://sydneypadua.com/2dgoggles/). Sofia Kovalevskaya (1850 - 1889, Russian, also lived in France, Germany, and Sweden) Born in Russia to a wealthy family, she taught herself calculus from wallpaper made from a calculus book. Because women could not get university degrees in Russia, she had to leave Russia for a university educaton; because Russian women could not get their own passports, she had to marry to be allowed to leave. She worked with Weierstrass in Berlin (but was not allowed to take classes from him; he tutored her privately). Her major work was in PDE’s and other areas of analysis and mathematical physics. She was one of the first women to receive a PhD from Göttingen. She found permanent employment in Stockholm. Her work On the Rotation of a Solid Body about a Fixed Point won the Prix Bordin in spectacular fashion (the judges were so impressed they increased the value of the prize). A member of the political left, she moved back and forth among Germany, Russia, France, and Sweden. A major mathematician, she was also a talented writer. At times she would suspend her work in mathematics for literature; at times she would suspend her work in literature for mathematics; at times she would do both. Her personal life was dramatic and, in many ways, tragic — she married for convenience; after many years convenience turned into deep love; she gave birth to a daughter; her husband killed himself because of financial reverses, leaving her a single mother; late in life she had a scandalous and passionate affair; until she was hired by the University of Stockholm, her financial situation was insecure; and she died of pneumonia incurred during a difficult journey in winter. She wrote a charming autobiography, Recollections of Childhood, which says almost nothing about mathematics. An asteroid is named after her. Florence Nightingale (1920 - 1910, English) She founded modern nursing. And she was a major figure in statistics. She did important work on how to present data visually, and also was a major pioneer in applied statistics. Emmy Noether (1882 - 1935, German, moved to the U.S. in 1933) She far surpassed her father, the well-known mathematician Max Noether. She was one of the (perhaps the major) founders of modern algebra and also did important work on relativity theory. The University of Erlangen wouldn’t let her take undergraduate classes but let her audit; when she passed the test for admission to doctoral study they let her become an official student. But they wouldn’t hire her after she got her PhD. So she worked for no salary at the Mathematics Institute there. She was invited by Felix Klein and David Hilbert to Göttingen, where she was an unpaid lecturer for three years (her mentor Hilbert angrily asked, “Is the university a bathhouse that it keeps women out?”) before receiving a small salary. She mentored many graduate students and did important work at Göttingen, but, being a Jew, left for the U.S. because of Hitler. No major university in America would hire her, so she went to Bryn Mawr College where she continued influencing students and producing seminal mathematics until her death two years later from cancer. When she began, the study of algebra was focused on specific objects, such as algebraic curves. Her work embedded these notions in far more general and abstract notions, such as ideals in rings. This was a fundamental shift in the way mathematicians thought about algebra. Her life was a quiet one, devoted to mathematics and to her students, and she was greatly loved for her warmth and caring; she was unquestionably among the greatest mathematicians of the 20th century. 67 16 Africa, Pre-Columbian America, and U.S. minorities: an overview Comparatively little is known about mathematics in southern Africa before the European invasions, although we know much more than we used to. For example, the Fida in Benin were able to do complex calculations without pencil and paper and essentially memorized their financial records — this sort of thing can’t be done without a sophisticated arithmetic that goes beyond basic algorithms. The Falani Nigerian Muhammad ibn Muhammad al-Fullani al-Kishnawi (his Arabic name) travelled to Egypt and wrote a major treatise on magic squares. And design and architecture, as in every culture, showed that there was extensive study of geometry, including transformational geometry. Some information about ancient mathematics in southern African can be found at saxakali.com/COLOR ASP/historymaf.htm, but much of this site is about northern Africa, especially ancient Egypt. More information on mathematics south of the Saraha can be found at http://www.math.buffalo.edu/mad/AMU/amu chma 09.html#2 — the AMU is the African Mathematical Union. As in Africa, there was significant mathematical activity in the Americas before the European invasions. For example, recent scholarship on the Mayan calendar (done by both archeologists and mathematicians) has uncovered the complex abstract algebra notions that underlay its construction, and there is other evidence that there was a lot of sophisticated mathematics going on; scholars are still not quite sure what. A good reference on pre-Columbian American culture in general is the book 1491, and a more technical overview is found in the anthology Native American Mathematics edited by Michael Closs. Studies of mathematics in cultures outside the web of Mediterranean/European/Arabic/south Asian/east Asian cultures belong to the field of ethnomathematics, and major scholars in the field include Michael Closs, Marcia Ascher, and Ubiratan D’Ambrosio. Claudia Zaslafsky was a popularizer in the field, and her books are fairly accessible (although necessarily not as deep). A significant difficulty in doing such studies is the tendency to project our ways of thinking about mathematics on cultures with different ways of looking at things. For example, here is a description from the AMU web page about techniques of traditional sand drawings in contemporary (i.e., 1970’s) Angola: “After cleaning and smoothing the ground, they [i.e., the artists] first set out with their fingertips an orthogonal net of equidistant points. Now one or more lines are drawn that ’embrace’ the points of the reference frame. By applying their method the drawing experts reduce the memorisation of a whole drawing to that of mostly two numbers (the dimensions of the reference frame) and a geometric algorithm (the rule of how to draw the embracing line(s)). Most drawings belong to a long tradition.” I.e., undoubtedly these sand drawing are using sophisticated mathematical techniques, but the language we use to describe them is our mathematical language, the way we think about mathematics. Whatever the artists in Angola are doing, we can be pretty sure that they are not thinking of it as “an orthogonal net” — they are almost certainly working in a different context. This is, of course, a general problem in the history of mathematics — it is essentially impossible to think the way Archimedes did — but the gap is much larger between cultures that did not particularly influence each other. Why do we know so little? The mathematics we use comes from a complex lineage involving ancient India, ancient Greece, not-so-ancient Arabia, and semi-modern Europe. Chinese mathematics — remember the silk road — was not so different. But it is sometimes difficult to recognize mathematics that is not explicitly stated in terms that we relate to as mathematics, e.g., the Mayan calendar (which, from our point of view, is all about modular arithmetic). And many of these cultures embedded their mathematics in what they did, without writing it down — we know the Fida 68 in Benin were up to something, but whatever they were up to is lost. We never thought to ask what they were doing, and now it may be too late to know. Or maybe not. For example, there has been extensive work on trying to understand how the Mayan’s thought about their calendar. In this way, intellectual historians are able to at least partially reconstruct ways of thought that no longer exist. As for modern Africa, in the last 80 years or so there have been a number of excellent African mathematicians, for example: George Okikiolu from Nigeria (whose daughter Katherine Okikiolu, born in Britain and now at UCLA is an important American mathematician), James Ezeilo, also from Nigeria, Themba Dube from South Africa (who was a featured speaker at a conference here ins June 2011). Many Africans who do serious research in math and science tend not to live in Africa, but there are more and more exceptions to this generalization. For example, Ezeilo helped create a thriving research environment in Nigeria, and most recently was working to do the same in Swaziland; Dube has created a strong research community in South Africa. Mathematicians in modern Africa generally have high teaching loads, which makes research difficult, but that too is changing. What about the situation for U.S. minorities? There were a handful of known prodigiously mathematically talented African American youths in the late 18th through early 20th centuries, and while some of of them were highly accomplished, their later accomplishments generally were not mathematical. For example Kelly Miller, in the late 19th century, was the first African American mathematics graduate student (at Johns Hopkins). Forced to leave Johns Hopkins for financial reasons, he began teaching at Howard University (a historically black university). While there he also got an MA in mathematics and a law degree (all from Howard) but focused most of his energies on both general administration (as a dean) and especially on the relatively new discipline of sociology. A relatively small number of African Americans received PhD’s in mathematics in the first half of the 20th century, and while the number has continued to grow the proportion is disproportionately small. The same is true of Hispanic American and Native American mathematicians, where the proportions are even smaller. The amount of discrimination U.S. minorities faced was shocking, even when their talents were both remarkable and obvious. For example, J. Ernest Wilkins earned a PhD from the University of Chicago at the age of 19, only the eighth African-American to earn a PhD in mathematics. He was unable to get a job in a research institution, and eventually left mathematics for engineering. David Blackwell, who earned a PhD from the University of Illinois at the age of 22, could not get a job at a research institution for 13 years. He persevered, was hired at Berkeley, and went on to receive many honors as a distinguished statistician. Even today, much depends on a small number of relatively welcoming departments. For example, by 1945 there were 14 African-Americans with math PhD’s; half of them were from the University of Michigan. This pattern of a small number of places granting a disproportionately large percentage of degrees continues to this day, with the University of Maryland (which had a prominent African-American mathematician, Ray Johnson, in its administration) and Howard University accounting for a disproportionately large number of relatively recent African-American mathematics PhD’s. 1943 saw the first African-American woman to get a PhD in mathematics (Euphemia Lofton Hayes at Catholic University). The basic situation of African-American mathematicians in the 21st century, as far as numbers go, is about the same as situation for women mathematicians in the beginning of the 20th century. The major difference is the lack of overt discrimination and stereotyping, but covert discrimination and stereotyping remain, especially at crucial early levels in K-12. 69 A comprehensive Web site devoted to black mathematicians is Mathematicians of the African Diaspora, founded by Prof. Scott Williams of SUNY Buffalo: http://www.math.buffalo.edu/mad/. It is encyclopedic and quite wonderful. Unfortunately, there doesn’t seem to be a central website devoted to Hispanic or Native American mathematicians. Because race and ethnicity are social constructs, just who counts as what is problematic, and neither names nor life histories are helpful. For example, the mathematician Bob Megginson had a British father and a Sioux (or maybe part-Sioux) mother; he considers himself an Oglala Sioux. Cora Sadofsky’s name is far from Hispanic, but she was Argentinean-American (she came to the U.S. after obtaining her PhD). Further complicating matters, from the mid-19th century to the early 20th century, people who could “pass” as white sometimes chose to do so, severing later generations from their origins, so there are mathematicians with significant Native American, Hispanic, or African-American ancestry who have been raised with no or little connection to their non-European ancestry. 70 17 Homework Rules of the game If a problem (or part of a problem) is marked with an asterisk (*) you can not use the internet in any way. For unmarked problems you can look things up. * 1. (a) In the 6th century, the Indian mathematician Aryabhata wrote Half the circumference multiplied by half the diameter is the area of a circle. Is his statement correct? Why or why not? (b) Arbyabhata also gave a method to approximate π: Add four to one hundred, multiply by either and then add sixty-two thousand. The result is approximately the circumference of a circle of diameter twenty thousand. By this rule and relation of the circumference to diameter is given. What numerical value of π does this method give? 2. In chapter 3 we showed one way in which Archimedes calculated π. Here’s another way he did it. Start with a circle, inscribe a regular n-gon inside it, and circumscribe a regular n-gon outside it, as in the picture below (in general, An is the perimeter of the circumscribed polygon, and Bn is the perimeter of the inscribed polygon): r r perimeter of large square: A4 perimeter of small square: B4 perimeter of large hexagon: A6 perimeter of small hexagon: B6 He knew that the circumference of the circle was squeezed between An , the perimeter of the large n-gon and the Bn , the perimeter of the small n-gon.77 Using this, he got an estimate for π using regular 96-gons.78 After that lengthy introduction, here’s the homework problem: find the formulas (figure it out or go online) for the perimeters An and Bn (we’ve essentially already done one of these) And explain why it works. * 3. The year is 500 BCE. You are a Greek mathematician, currently working for the Egyptian government as a consultant. You’ve been asked to determine the distance from a lighthouse, which stands on a rock in the sea, to the mainland (see figure). Lighthouse Distance to be determined Shoreline Explain to your Egyptian employers how you’re going to carry out the project, and why your method works. You can assume you know the point on shore that is closest to the rock. You can 77 78 So, with both an upper and a lower bound, Archimedes could get an idea of how good his approximation was. and no calculator. 71 use your compass, and you can measure as many line segments as you need, as long as they are all between points on land. You can’t use a protractor since they haven’t been invented yet. Bonus: How do you do it even if you don’t know where the nearest point on shore is? [Hint: if you need a hint, ask me the Tuesday before the problem is due.] √ * 4. Consider the Babylonian method of approximating r in chapter 8.79 √ (a) Use this to approximate 3, starting with an estimate of 1. How many iterations does it take to be accurate to within 7 decimal places? (Use your calculator to find the correct 7 decimal approximation.) (b) What happens if you start with the really bad estimate of 5? (c) And what happens if you start with the even worse estimate of -1?80 √ (d) Now try using this method to approximate −3 with a starting estimate of -1.81 What happens? 5. (a) State the following: (i) the Goldbach conjecture; (ii) Dirichlet’s theorem on primes; (iii) the prime number theorem; (iv) the Green-Tao theorem. (b) In the last year there has been a tremendous leap forward in our understanding of the twin prime conjecture. What happened and who did it? * 6. (a) Here a test for divisibility by 3: A number is divisible by 3 iff the sum of its digits is divisible by 3. Example: 72,465,702 is divisible by 3 because 7 + 2 + 4 + 6 + 5 + 7 + 0 + 2 = 33, which is divisible by 3, but 69,428,123 is not, because 6 + 9 + 4 + 2 + 8 + 1 + 2 + 3 = 35. which is not divisible by 3. Here is a test for divisibility by a mystery number m: A number is divisible by m iff the alternating sum of its digits is 0. That is, 3,438,556 is divisible by m because 3 - 4 + 3 - 8 + 5 5 + 6 = 0, but 4,438,557 is not divisible by m because 4 - 4 + 3 - 8 + 5 - 5 + 7 = 2, which i not divisible by m. (a) Make a conjecture: what’s m? [Hint: experiment with numbers less than 100.] (b) Test your conjecture on the numbers 2009, 3124, 4567, 8481, and 21,870,2088. That is, for each of these numbers, form the alternating sum and see whether or not it’s 0. Then divide by your conjectured value of m from (a). Does your conjecture work out? (c) Honors problem: prove that your conjecture works. [Hint: think of multiplication as repeated addition and use induction.82 ] 7. Do a Web search on Shor’s algorithm and quantum computing. (a) In a short sentence or two, explain what this has to do with P = NP. (b) If indeed quantum computing becomes practical, what will have to be done differently? (Just mention two or three things briefly. For example — and this is not correct, by the way — “we will have to find an alternative to using metal in cars, and an alternative to eating broccoli.”) 79 you can find an extended discussion at Babylon and the Square Root of 2. Which the Babylonian wouldn’t have done because, as far as we know, they didn’t know about negative numbers. √ 81 Which, if you’d tried it in Babylonia, probably would have gotten you burned as a witch or something – −3? What’s that?. 82 if you know how to use induction 80 72 * 8. The Persian Omar Khayyam (c. 1050–1130 CE), best known as a poet, was also an outstanding mathematician. Several centuries before del Ferro (and/or Tartaglia and/or Cardano) solved the cubic equation algebraically, Khayyam came up with a geometric solution. His solution is non-Euclidean because it involves a parabola, but it’s not hard to see that it works.83 Khayyam considered the equation x3 + a2 x = b where a and b are positive real numbers. His solution is as follows (in modern coordinate notation): 1. Construct the parabola with equation x2 = ay (shown in blue below). 2. Construct a semicircle with diameter AC = b/a2 on the x-axis (shown in red below). 3. Let P be the point where the parabola meets the semicircle. Drop a perpendicular from P to the x-axis to find the point Q. 4. Let z be the length of segment AQ. ay = x 2 P A Q C Claim: z is a solution of the equation. Verify the claim by the following steps. (a) Prove that z 2 = a · P Q. (b) Prove that z PQ = . PQ QC (Hint: Use similar triangles and a theorem or two from Euclidean geometry.) (c) Use (b) to write (P Q)2 in terms of a, b and z. (d) (Before going on, take a step back and remind yourself of what Khayyam was trying to do!) Combine the equations from parts (a) and (c) to complete your verification that Khayyam’s construction is correct. * 9. The Hilbert Hotel has infinitely (countably infinitely) many rooms. (a) A guest wants to check in, but all the rooms are full. You are the manager. How can you accommodate the new guest and not kick anyone out? [Hint: Some guests might have to change rooms.] (b) Now 100 new guests arrive. All the rooms are still full. You can still accommodate the new guests without kicking anyone out. How? 83 Adapted from Burton’s HIstory of Mathematics: An Introduction, pp. 300–301. 73 (c) Now infinitely (countably infinitely) many guests arrive. And all the rooms are full! Yet you can still accommodate everyone. How? (d) Now uncountably many guests arrive, one for every real number. Can you accommodate them? Briefly explain. 10. This problem is in three steps. * Step 1. Do your best to write down a “random” sequence of fifteen 0’s and 1’s. No assistance from coin-tossing or computer apps or dice or... — you have to do it out of your own head. Write down your sequence. Step 2. Go to random.org, random integer generator (at random.org/integers/) and generate fifteen random integers, with values between 0 and 1. Write down random.org’s sequence. Step 3. Read this blog on one way to tell if a sequence of heads and tails is random: http://blogs.sas.com/content/iml/2013/10/09/how-to-tell-if-a-sequence-is-random/. Check your sequence and check random.org’s sequence with this test. Give me the results. What conclusion can you draw? 11. The normal distribution is a particular bell-shaped curve (look it up online). The central limit theorem (discussed in the chapter on statistics and probability) says that the distribution of the means of samples of a given size should resemble a normal distribution. The rest of this exercise is designed to explain what the previous sentence means. Go to random.org’s random integer generator (see the previous problem) and generate a sequence of 10 random integers between 0 and 1. Take the average. For example, if your sequence is 0111010000, its average is (0+1+1+1+0+1+0+0+0+0)/10 = .4. Do this 100 times. Plot a graph in which the x-coordinates give the possible averages (0, .1, .2, .3, .4, ... 1) and the y-coordinates give us how many times the value occurs (a.k.a.. frequency). For example, if you got an average of .1 thirteen times, the the frequency of .1 is 13 and the point (.1, 13) would be on your graph. Now connect the points on your graph as smoothly as possible. Hand in your data (in a table like table 2) and your finished smoothly connected graph. Given the central limit theorem, does this graph surprise you? Table 2: sample data value 0 .1 .2 .. . frequency 4 8 17 .. . 12. Measure the height of the Campanile! Do it the way the ancient Greeks would have done it: no tools other than some kind of linear measuring tool (measuring tape, ruler, etc.) and, if it’s appropriate, compass. No trig functions! Length and area calculations are okay, as are similar 74 triangles and the Pythagorean theorem. Explain yourself clearly, including diagrams as needed (explanation and diagrams are what you will be graded on). Finally, compare your result with the official height on the KU website. (You will not be graded on how close you got, but you should tell me anyway.) 13. In the proof of theorem 11 some parts are missing. What are they? Fill at least one of them in. 75

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Notes on the History of Mathematics