* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Differentiation - Keele Astrophysics Group
Limit of a function wikipedia , lookup
Multiple integral wikipedia , lookup
Sobolev space wikipedia , lookup
Automatic differentiation wikipedia , lookup
Distribution (mathematics) wikipedia , lookup
Lie derivative wikipedia , lookup
Fundamental theorem of calculus wikipedia , lookup
Matrix calculus wikipedia , lookup
1 1. Calculus: Differentiation In Physics we are often interested in how a quantity changes with time or distance, or in response to changing some other physical quantity (e.g., temperature). Mathematically, we can represent such a relationship using algebraic notation. For instance, suppose an object travels a distance y in a time t and that y(t) = t2 tells us the relationship between the two. Graphically, we can draw this on a two-dimensional plot of y versus t. Fig. 1.1.— A relationship between distance travelled (y) and time (t). The gradient (slope) of the straight line tangent to the curve y(t) at the point P is equal to the instantaneous velocity (speed plus direction, which is given by the sign of the slope) of the object at that point and time. Suppose we want to work out how fast the object is travelling at any moment in time. Speed (or velocity, if we care about the sign) is defined as the rate of change of distance with respect to time—in other words, how far an object travels in a given time interval, divided by the time. But unless the relationship between distance and time is linear (i.e. it looks like a straight line when we plot y against t), the speed is changing all the time and a measurement of it will be different in different time intervals. What we need to know in general is how to find the instantaneous speed/velocity at an arbitrary instant in time. This is given at any point by the slope, or gradient, of a straight line that just touches (i.e., is tangent to) the graph of y(t) at that point (see Figure 1.1). 1.1. Mathematical definition of differentiation The process of differentiation is what tells us the instantaneous slope of the relationship between two physical quantities. Suppose we have a linear relationship y(t) = 3t (sketched in the left part of Figure 1.2). This relationship has the same slope (i.e., 3) at all times. If y is measured in metres and t in seconds, then the velocity is a constant 3 m s−1 . But now consider the relationship y(t) = t2 again (Figure 1.2, right). Here the slope or gradient of the tangent to the curve changes with time. The instantaneous gradient of a function at any point is known as the derivative of the function. Here, the derivative of position, y, with respect to time, t, is our instantaneous 2 velocity. It is represented mathematically by the notation dy , and in general (i.e., for dt any other than linear y–t relationships) it will itself be a function of t. It is evaluated by finding the slope of a straight line extending from a point y(t) on the curve at any particular time t to a second point y(t + h) on the curve at a second time t + h, and then letting these two points come arbitrarily close to each other by taking the limit that h → 0 [so that t + h → t and — provided the function is continuous — y(t + h) → y(t)]. In this limit, the slope of the straight line between [t, y(t)] and [t + h, y(t + h)] approaches the slope of the tangent to the curve at time t. Fig. 1.2.— The plot on the left shows a linear relationship between position y and time t, which has a constant gradient (and thus a constant velocity) equal to +3. The plot on the right shows a curved relationship where the instantaneous gradient of the curve changes with time, so that velocity is a function of time. The slope of the straight line connecting any two points at times t and t + h is given by ± £ ¤±£ ¤ “rise-over-run”: ∆y ∆t = y(t + h) − y(t) (t + h) − t . Taking the limit h → 0 then gives the formal definition for the derivative of any function y(t) as · · ¸ ¸ y(t + h) − y(t) y(t + h) − y(t) dy = lim . = lim h→0 h→0 dt (t + h) − t h 1.2. Examples of differentiation For the relationship y(t) = t2 between position and time: · ¸ · ¸ · 2 ¸ dy y(t + h) − y(t) (t + h)2 − t2 t + 2th + h2 − t2 = lim = lim = lim h→0 h→0 h→0 dt h h h · ¸ 2th + h2 = lim = lim [2t + h] = 2 t . h→0 h→0 h Thus, the velocity of the object at time t in this case is just twice the value of the time. More generally, irrespective of any physical interpretation of the variable t as a time, or the variable y as a position, we have that d 2 (t ) = 2 t . dt 3 1 instead: t ¸ ¶¸ · µ · 1 1 1 dy y(t + h) − y(t) = lim = lim − h→0 h h→0 dt h t+h t · · ¸ ¸ ¸ · 1 −1 1 t − (t + h) = lim = lim − 2 = lim h→0 t (t + h) h→0 h→0 h (t + h) t t + ht Now, consider y(t) = = − 1 . t2 Since 1/t = t−1 and 1/t2 = t−2 , we can equivalently (and more conveniently, in fact) write d ¡ −1 ¢ = −t−2 . t dt 1.3. Standard derivatives So far we have used the symbols y and t to represent the dependent and independent variables – i.e. the value of y depends on the value of t. But the basic ideas, and the rules and definitions that follow now, apply whatever symbols are being used and whatever physical quantities are being considered. It is standard to use x to denote a generic independent variable, and this is the convention that we shall follow for the most part. Whether x is meant to represent a time t, a position y, an angle θ, a temperature T , etc. is a detail to be decided in the context of a specific problem. The Keele Handbook of Mathematics, Physics and Astronomy contains listings of a large number of standard derivatives—that is, the derivatives of many elementary and some more complicated functions f (x), which you do not have to prove every time you encounter them. Unless you are specifically told otherwise, you can simply look up these standard derivatives in the Handbook to use them as necessary. But it won’t hurt to memorise the most basic (and most common) of them; and this will happen with several of them anyway, just through repeated use. Some of the most important standard derivatives are: Powers of x The derivative of any power of x is given by d dx (xn) = nxn−1 . The two examples we looked at just above, where we found dy/dt for y = t2 and y = 1/t, are special cases of this rule for integer exponents n = 2 and n = −1. In general, though, the power n does not have to be an integer; it can be any real number whatsoever (but it must be a constant number, not itself a function of x or any other variable). 4 Trigonometric functions (sine and cosine) We very often need to use the derivatives of trigonometric functions, and it is important that you become very familiar with these. At the most basic level, it is actually enough to think about sin(x) and cos(x) only; more complicated functions—including tan(x), products and powers of trigonometric functions, and functions of arguments more complicated than just x—can be dealt with using the “rules” of differentiation that we discuss in Section 1.4 below. Consider y(x) = sin(x). Its derivative is obtained from first principles as · · ¸ ¸ y(x + h) − y(x) sin(x + h) − sin x dy = lim = lim h→0 h→0 dx h h · ¸ ¸ · sin x cos h + cos x sin h − sin x cos x sin h sin x (cos h − 1) = lim + = lim h→0 h→0 h h h · ¸ ¸ · sin h cos h − 1 + cos x × lim (1.1) = sin x × lim h→0 h→0 h h The second line here has used the relation sin(A + B) = sin A cos B + cos A sin B (which can be found in the Handbook of Mathematics, Physics and Astronomy). On the third line, there are two non-trivial limits to deal with. Both of these can be evaluated by looking at an isosceles triangle with two sides of length 1 and an angle h (measured in radians) between these two sides, inscribed in a circle of radius 1. Let the third side of the isosceles triangle have length H. This set-up is shown in Figure 1.3: first for a relatively large angle h, and just below that a blow-up of the case when the angle h is very small (i.e., |h| ≪ 1 or h → 0). It is apparent from the upper part of Figure 1.3 that: (1) the angle h is subtended by an arc of length h on the perimeter of the unit-radius circle (recalling the definition of radian); and (2) the length H of the third side of the inscribed triangle is given by (recalling Pythagoras’ theorem, and sin2 h + cos2 h = 1) ¡ ¢ H 2 = (sin h)2 + (1 − cos h)2 = sin2 h + 1 − 2 cos h + cos2 h = 2 − 2 cos h . (1.2) This equality holds for any value of the angle h, large or small. Getting now to the limit of very small h → 0, the lower part of Figure 1.3 makes it clear that: (1) the vertical line segment with length sin h must be approximately equal to the base H of the isosceles triangle; and (2) the base H must in turn be approximately equal to the arc length h subtended by the angle h. That is: for |h| ≪ 1 : sin h ≈ h =⇒ sin h ≈ 1. h This so-called small-angle approximation, sin h ≈ h for |h| ≪ 1 (again: in radians!), is an extremely useful approximation in its own right. It is routinely employed in analyses that need not have anything to do with differentiation. It (and so, of course, sin h/h ≈ 1) 5 Fig. 1.3.— Diagram justifying the small-angle approximations for sine and cosine func³ h2 ´ for angles with values |h| ≪ 1 in radians. tions: sin(h) ≈ h and cos(h) ≈ 1 − 2 is most definitely an approximate, not exact, equality; but it gets better and better as |h| gets smaller and smaller, and ultimately in the limit that h → 0 we have exactly, ¸ · sin h = lim [ 1 ] = 1 . lim h→0 h→0 h This is one of the limits needed in equation (1.1) to find the derivative of sin x with respect to x. To find the other limit, which involves cos h, we make use of equation (1.2). Again, for h → 0 we have H ≈ h, and therefore H 2 ≈ h2 , or for |h| ≪ 1 : 2−2 cos h ≈ h2 =⇒ cos h ≈ 1− h2 2 =⇒ cos h − 1 h ≈ − . h 2 The small-angle approximation, cos h ≈ 1 − h2 /2 for |h| ≪ 1, is again just that— approximate—but again it gets better as h gets smaller, and again the limit as h → 0 is exact: · ¸ · ¸ cos h − 1 h lim = lim − = 0. h→0 h→0 h 2 Putting both of these limits into equation (1.1), then, we have our main result for the 6 derivative of the sine function: d dx (sin x) = cos x . A similar analysis, which makes use of the same limits for (cos h − 1) /h and (sin h) /h as h → 0, further shows that the derivative of the cosine function is d dx (cos x) = − sin x . In both cases, it is always understood that the angle x is in radians. The exponential function The derivative of the exponential function is d dx (ex) = ex , where e = 2.71828 . . . is Napier’s constant. Thus, ex is its own derivative. This is a large part of what makes e such an important number in Physics (and Mathematics), and it is essential that you know how to work with exponentials and their derivatives. Note that another standard way of writing ex is as exp(x). A proof that d(ex )/dx = ex is straightforward if we make use of one definition of e itself, as the limit, e ≡ lim (1 + h)1/h . h→0 (This was discovered in the 17th century by Jacob Bernoulli, in his solution to a problem about compound interest.) Therefore, for h “small enough” we can write e ≈ (1 + h)1/h for |h| ≪ 1 : =⇒ eh ≈ 1 + h . These are only approximate equalities for any value of h that is actually non-zero; but the approximations get increasingly accurate as |h| gets smaller and are exact in the limit that h → 0. The derivative of ex with respect to x follows directly: · x+h ¸ · x h ¸ · ¸ h e − ex e e − ex d x x e −1 (e ) = lim = lim = lim e h→0 h→0 h→0 dx h h h eh − 1 = e lim h→0 h x = ex . · ¸ x −→ e · (1 + h) − 1 h ¸ x = e · h h ¸ 7 The natural logarithm The derivative of the natural logarithm of x is d dx (ln x) = 1 x (for x > 0 only). Recall that the natural logarithm is the inverse function of the exponential function, i.e., ln(ex ) = x and eln x = x . We shall use this fact below to prove that d(ln x)/dx = 1/x. 1.4. Rules of differentiation In the following, u ≡ u(x) and v ≡ v(x) are any two (continuous) functions of an independent variable x, and k is any constant (a real or a complex number). Sum rule : Factor rule : Product rule : Chain rule : d dx (u + v) = d dx d dx du dx (k u) = k (u v) = dy dx = dv + (1.3) dx du (1.4) dx µ du dx dy du ¶ × v + u du dx µ dv dx ¶ (1.5) (1.6) All of these rules can be derived from first principles by applying the basic definition of a derivative and doing some algebra involving limits. The “sum rule” and the “factor rule” together express the fact that differentiation is a linear operation on functions—that is, ¤ d£ dv du + k2 . This is of further mathematical interest, and it k1 u(x) + k2 v(x) = k1 dx dx dx also has important consequences for the structure of physical theories such as quantum mechanics. We will come to it again in our study of differential equations later in this course, but for the moment it is more or less a note in passing. The product and chain rules can be used, sometimes together, to derive other “rules” of differentiation, which are in fact not as fundamental. One of these is the so-called quotient 8 rule. Again, if u(x) and v(x) are any two continuous functions of x, then: µ ¶ d u d = (u v −1 ) dx dx v du d (using the product rule) = u (v −1 ) + v −1 dx dx dv du d + v −1 (using the chain rule) = u (v −1 ) dv dx dx dv du = u (−v −2 ) + v −1 dx dx du/dx −u × (dv/dx) + = v2 v v × (du/dx) − u × (dv/dx) . = v2 This illustrates how knowing the derivatives of just a few elementary functions (powers, exponentials, logarithms and trigonometric functions) plus the rules in equations (1.3) through (1.6) will allow you to differentiate any “well-behaved” function of a single variable. Using the chain rule We very often encounter “functions of functions” of some variable x, which require the chain rule to find their derivatives. For example, consider y = sin(t3 ). To make this look simpler (i.e., more like a standard derivative that we already know), we define the function u(t) = t3 . We can then write y = sin u, and we know immediately that dy/du = cos u. Moreover, the derivative of u = t3 with respect to t is easy to compute: du/dt = 3t2 . The chain rule pulls everything together: d du d (sin t3 ) = (sin u) × = cos u × (3t2 ) = cos t3 × (3t2 ) = 3t2 cos t3 . dt du dt As another example, let us use the chain rule to prove the standard derivative d(ln x)/dx = 1/x by starting from the standard derivative of the exponential function, d(ex )/dx = ex , taken as a given. To do this, we first note that, by definition, ln(ex ) = x, so differentiating both sides of this trivial equation tells us that d d [ln(ex )] = (x) = 1 . dx dx Now we use the chain rule to manipulate the left-hand side of this equality. Defining u(x) ≡ ex , so that du/dx = ex = u, we have d d d du d [ln(ex )] = [ln u] = (ln u) × = (ln u) × u . dx dx du dx du This must equal 1, so u× d (ln u) = 1 du =⇒ d du (ln u) = 1 u . 9 But this is exactly the standard derivative that we are after, only with u chosen instead of x to represent the independent variable (it no longer matters that we said u ≡ ex earlier in the argument). Using the chain and product rules together One example of the chain and product rules working together was seen above, in the derivation of the quotient rule for differentiation. As another example, take the function 2 y(x) = xe−ax . 2 In this case, use the product rule first, with u = x and v = e−ax , to find dy d d 2 2 d 2 2 = x (e−ax ) + e−ax (x) = x (e−ax ) + e−ax . dx dx dx dx Next, use the chain rule to find d −ax2 dw (e ) by setting w = −ax2 , so that = −2ax and dx dx d w d w dw d −ax2 2 (e ) = (e ) = (e ) × = ew × (−2ax) = − 2ax e−ax . dx dx dw dx Finally, then: dy 2 2 2 = x × (−2ax e−ax ) + e−ax = (1 − 2ax2 ) e−ax . dx 1.5. Velocity and acceleration, and higher-order derivatives Suppose that y(t), v(t), a(t) are the position, velocity and acceleration of an object at time t as it moves along the y-axis. Since velocity is the rate of change of position with respect to time, we have dy ; v(t) = dt and since acceleration is the rate of change of velocity with respect to time, we also have a(t) = dv . dt But this means that we have arrived at the acceleration by successive differentiation of the position. That is, µ ¶ d dy d2 d d dv = (y) ≡ (y) . = a(t) = dt dt dt dt dt dt2 Thus, we say that velocity is the first derivative of position (with respect to time) and write this in the familiar way, v(t) = dy/dt. We can then say either that acceleration is the (first) derivative of velocity [the notation a(t) = dv/dt says precisely this] or that it is the second derivative of position, which is what a(t) = d2 y/dt2 denotes. Be careful here: 10 d2 y/dt2 is not the same as (dy/dt)2 —the latter is the square of the velocity, which is not (in general) the acceleration. If we wanted to know the rate of change of acceleration with time (a quantity known as “jerk”), we could speak of this either as the (first) derivative of a with respect to t (i.e., da/dt), or as the second derivative of velocity with respect to time (i.e., d2 v/dt2 ), or as the third derivative of position with respect to time (i.e., d3 y/dt3 ). dn f means “differdxn entiate f (x) with respect to x, n times in succession” and the result is called the “nth (or nth -order) derivative” of f . In this context, n is obviously only ever a positive integer. In general, for a function f of an independent variable x, the notation 1.6. Maxima and minima One of the most useful and powerful applications of differentiation is to determine maxima and minima—in other words, to optimise some quantity by varying another. If f (x) is a curve with peaks and troughs then the maxima and minima occur wherever the instantaneous gradient of the curve is zero (i.e., wherever the straight line tangent to the curve is horizontal)—see Figure 1.4. The instantaneous gradient (or slope of the tangent line) is the derivative df /dx, so maxima and minima occur anywhere that df /dx = 0. There can be several minima and/or maxima (collectively referred to as stationary points or extrema); and if there are, then any one of them need not correspond to a global maximum or minimum of the function f (x)—just a local maximum or minimum. Fig. 1.4.— A relationship that shows two local maxima and one local minimum of a function f (x). At each of these stationary points, the instantaneous gradient (or derivative, df /dx) of the curve is zero. At either maximum, the gradient changes from being positive (just to the left of the maximum) to negative (just to the right of the maximum), so the second derivative of the function, d2 f /dx2 , is negative. At the minimum, the gradient changes from being negative (just to the left of the minimum) to positive (just to the right of it), so the second derivative is positive. Solving algebraically the equation df /dx = 0 will give all the values of x where maxima or minima occur, but it will not tell whether any one stationary point is a local maximum or a 11 local minimum. To distinguish between these, we need to calculate the second derivative, d2 f /dx2 , at any value of x where df /dx = 0. If the second derivative is negative at a stationary point, it means that the instantaneous gradient of the function changes from positive, to zero, to negative as x goes through the stationary point from left to right, and this identifies the point as the top of a peak—a local maximum (see Figure 1.4). If the second derivative is positive at a stationary point, it means that the instantaneous gradient of the function changes from negative, to zero, to positive as x goes through the stationary point from left to right, and this identifies the point as the bottom of a trough—a local minimum. Therefore: A local maximum occurs at any x value where df /dx = 0 and d2 f /dx2 < 0 . A local minimum occurs at any x value where df /dx = 0 and d2 f /dx2 > 0 . There are functions for which some values of x give both df /dx = 0 and d2 f /dx2 = 0 — a few examples are f (x) = x4 , f (x) = 1 − x4 and f (x) = x3 at the point x = 0. Any point where df /dx = 0 is always called a stationary point, no matter what value d2 f /dx2 takes there. A stationary point that has d2 f /dx2 = 0 as well might be a local minimum (e.g., x4 has a minimum at x = 0), or it might be a local maximum (e.g., 1 − x4 is maximised at x = 0), or it might simply be an inflection point which is neither a minimum nor a maximum but a sort of instantaneous “plateau” (e.g., the behaviour of x3 at x = 0). To distinguish between these possibilities, higher than second-order derivatives of the function need to be examined at the stationary point. 1.7. A sample optimisation problem (a) A projectile launched from the ground with speed v0 , at an angle θ to the horizontal, has (in the absence of air resistance) a horizontal range given by R(θ) = v02 sin(2θ) , g where g is the acceleration due to gravity at the surface of the Earth. For what launch angle, θ, is the range maximised, and what is the maximum range in terms of v0 and g? This can be solved without calculus, of course, by simply recalling that the maximum value of the sine function is +1, and that this value is achieved for an angle of π/2 radians (90◦ ). Thus, the launch angle θ that maximises the range R is obtained by putting 2θmax = π/2, so that θmax = π/4 radians (or 45◦ ). The maximum range is then just ± v02 × sin(π/2) g = v02 /g. However, realistic maximum/minimum problems are rarely as easy as this; usually they do require the use of calculus to identify and classify any and all stationary points of more complicated functions. To see how this works in principle, then: The first step in this particular problem is to differentiate R with respect to θ, using the factor rule to take the constants v02 and g outside the derivative and then applying the 12 chain rule and recalling the standard derivative of the sine function: · ¸ v2 dR d v02 sin(2θ) d v2 2v02 = 0 × = sin(2θ) = 0 × [2 cos(2θ)] = cos(2θ) . dθ dθ g g dθ g g To find any angles that make R a (local) maximum or minimum, we set dR/dθ = 0, which requires cos(2θ) = 0 and therefore 2θ = ±π/2, ±3π/2, ±5π/2, . . . — so, finally, θ = ±π/4, ±3π/4, ±5π/4, . . . . Physically, we discount any negative angles and any greater than π (i.e., 180◦ ), which correspond to “launching” the projectile into the ground (draw a sketch). This leaves the optimal launch angle to be one or both of θ = π/4 or θ = 3π/4 (that is, 45◦ or 135◦ ). It remains to determine the sign of d2 R/dθ2 < 0 at each of these θ-values, since d2 R/dθ2 < 0 is required along with dR/dθ = 0 if R is to be maximised (rather than minimised). The second derivative of R with respect to θ is (using the chain rule) 2v02 d 4v02 d2 R = [cos(2θ)] = − sin(2θ) , dθ2 g dθ g which is negative for θ = π/4 [since then sin(2θ) = sin(π/2) = +1] and positive for θ = 3π/4 [because then sin(2θ) = sin(3π/2) = −1]. Thus, we conclude—as expected— that the launch angle for maximum range is θmax = π/4 (or 45◦ ), and the value of the maximum range is Rmax = R(θmax ) = v2 v02 × sin(2θmax ) = 0 . g g Notice that this procedure has also given us a launch angle, θ = 3π/4 (i.e., 135◦ ), that gives dR/dθ = 0 and d2 R/dθ2 > 0 and so yields a minimum range, of Rmin = −v02 /g. What does this mean physically? (b) The vertical height, y, of the same projectile is given as a function of time by y(t) = v0 sin(θ) t − 1 2 gt . 2 Calculate the maximum height reached by the projectile, in terms of the launch angle θ. Now the launch angle θ is considered to be fixed (at some unspecified value) and the variable quantity is the time, t. Thus, to find the maximum height for a given θ, we differentiate y with respect to t: dy = v0 sin(θ) − gt . dt The maximum (or minimum!) occurs when dy/dt = 0 (physically: when the velocity is instantaneously zero), which happens at time tmin/max = v0 sin(θ) . g 13 Differentiating again, we see that d2 y/dt2 = −g, which is always negative in this case (it is, of course, the downwards acceleration due to gravity), and so the zero in velocity indeed corresponds to a maximum, not a minimum, in height. Putting the time at maximum height back into the original expression for y(t) we obtain ymax · ¸2 1 v0 sin(θ) v0 sin(θ) 1 v02 sin2 (θ) − g× . = y(tmax ) = v0 sin(θ) × = g 2 g 2 g The maximum height naturally depends on the launch angle, θ. If θ = π/4 to give the maximum horizontal range, then the maximum height reached by the projectile when in ± ± ± flight is ymax = (v02 2g) × sin2 (π/4) = (v02 2g) × (1/2) = v02 4g—exactly one-quarter of the maximum range. 1.8. Differentiation: What you need to know You must know how to differentiate simple mathematical functions, simple products or ratios of these functions, and compound functions using the chain rule. These will crop up all the time in the Physics/Astrophysics course, so the happier you are with the differentiation the more time you can spend understanding the Physics. In addition you must know how to find the point(s) at which a mathematical function is maximised or minimised and use this to solve “optimisation” problems such as the example in §1.7. Another crucial skill that you must practise is to translate a problem described in words into graphical and/or mathematical representations, and then use your mathematics and physics knowledge to obtain quantitative solutions. In particular, you need to be aware of situations where one physical quantity can be obtained by differentiating another (e.g., if you have an expression for velocity you can differentiate it to get the acceleration and then use Newton’s second law to deduce the applied force as a function of time). On the next page is a set of mathematical expressions of types that will be encountered frequently throughout the course. If you can differentiate all of these then you should have no problem with differentiation throughout the Physics/Astrophysics degree. 14 1.8.1. Functions to differentiate Find dy/dx for each of the following: y = x3 + 3x2 − 2x + 1 y = x1/2 + x−1/2 y = 1 1 − 2 x x y = sin(2x) + cos(2x) y = sin2 (2x) y = x sin(x) y = e2x + e−2x 2 y = e2x + e−2x 2 y = 4 exp(sin2 x) y = ln(x) + ln(1 + x) y = ln(2 + x2 ) y = (1 + x) x2 y = x e−x − (1 + x)ex y = x (1 + √ x )1/2 y = 2x2 ln(2x + x3 ) It won’t hurt to attempt d2 y/dx2 for all of the above as well.