Notes for Math 25b: Honors Linear Algebra and Real Analysis II (Spring 20[13–]14)

If you find a mistake, omission, etc., please let me know by e-mail.

Classroom: Math 25b meets MWF 11–12 in Harvard Hall, Room 201.

𝄞 This is one of the rare Harvard classrooms outside the music building that has a piano. Here’s what it was used for at the beginning of each class.

Textbooks:
Simmons, G.F.: Introduction to Topology and Modern Analysis, McGraw-Hill 1963
Edwards C.H.: Advanced Calculus of Several Variables, Dover 1994 (Academic Press 1973).

We shall use Simmons for metric topology, and Edwards for differential and integral calculus in R and Rn. For topology see also the notes I wrote for Math 55 (which go further than what we’ll cover in 25b): 1, 2, 3, 4, 5, 6.
Application: Fundamental Theorem of Algebra, as promised in Math 25a (corrected: towards the end, |a0|, not a0)

Problem set 1: Metric topology basics
Revised Feb.1, see Problem 10 (which now gives a different hint for Problem 3); and Feb.2, to fix the title (oops…).
[and here’s the PDF from the Harvard site that includes the mapping from problems to CAs]
Problem set 2: Continuity, sequences, and compactness [and the version with the CA assignments]
Problem set 3: Uniform continuity, compactness, and completeness [and the version with the CA assignments]
Corrected Feb.15 (typo in Problem 4: last clause is for nx ≤ −1, not nx ≤ 1; in the following text, specified the value of the pointwise limit also at x = 0; also fixed a trivial spelling error in Problem 6)
For Problems 4 and 5: here’s a PostScript plot showing  fn for n = 1, 2, 3, 4, 5, 10 in black, red, orange, green, blue, and purple respectively
Problem set 4: More on completeness and compactness, and on polynomials of a compex variable; start differential calculus [and the version with the CA assignments]
Problem 6 (1.1 in Edwards p.61) postponed till next week because we didn’t cover local max/min until Friday.
Problem set 5: Differentiation cont’d [and the version with the CA assignments]
Corrected March 2 to fix a typo in Problem 2: g(x), not G(x); and March 6 to fix a mistake in Problem 6:  f  returns maps from Rn to Rp, and g returns maps from Rm to Rn, not the other way around!
Problem 10 postponed till next week because we didn’t cover the multivariable Chain Rule until Friday.
Problem set 6: Differentiation cont’d [and the version with the CA assignments]
Corrected March 11: part d of Problem 2 (= Edwards 2.5) is wrong! The function of x,y given by y2 for x = 0 and x3 sin(1/x) + y2 otherwise does have a partial derivative with respect to x that is continuous at the origin. The formula for x ≠ 0 should be x2 sin(1/x) + y2 (square, not cube).
Problem set 7: The Laplacian; single-variable Taylor series [and the version with the CA assignments]
problem 1 typos Corrected March 26: stray “i)” removed, necessary + sign (before “(C2+D2)”) inserted
[Re Problem 6: Edwards writes (x−1)4, not (1−x)4. The two are equal. Likewise for (x−1)n and (1−x)n when n is even; if n is odd they’re off by a factor of −1, but I trust that if you can find the maxima and minima of a function  f  then you can do it for −f  as well.]
Problem set 8: Taylor series in one and more variables; multivariate critical points [and the version with the CA assignments]
Typo in Problem 6 = Edwards 7.7: the last term of the Taylor polynomial should be not (−1)nx4n/(2n)! but (−1)n+1x4n−2/(2n−1)!. (Edwards [and I] must have been thinking of cos(x2), though then the beginning of the series is wrong.)
Problem set 9: Inverse and implicit functions [and the version with the CA assignments]
Problem set 9¾, due Wednesday April 16 at 5PM: Edwards, Exercises 1.4 through 1.9 (inclusive) on pages 213–214. For 1.4, show that more generally if  f  and g are continuous real-valued functions on some metric space then so are max( f, g) and min( f, g), by writing the max and min as [( f +g) ± | f −g| ] / 2.
[here’s the version with the CA assignments]
Problem set 10½: Integrable functions (and the version with CA assignments)
Final problem set (#12) (with CA assignments), due Monday 5/5 at 5PM
(As noted in class, Exercise 5.3 [= Problem 5] contains 5.1 as a special case.)
Monday, 27 January: Introduction to Math 25b

You might wonder why it’s taking us so long to get to the calculus part of Math 25: the first semester has come and gone and we have yet to even define the derivative of a function of one real variable. The reason is that a coherent treatment of multivariate calculus requires the context of linear algebra and metric topology. Linear algebra was of course the topic of Math 25a; topology will occupy us for the first few weeks of 25b.

While one can study calculus in one variable without explicit discussion of vector spaces etc., linear structures can already be recognized there, and become ubiquitous in calculus of several variables. Some key examples:

• Differentiation and integration are linear. If f and g have derivatives f ' and g', then any linear combination af +bg (for real numbers a and b) is differentiable, with derivative af ' + bg'. Likewise for the integral of a linear combination of integrable functions. [Indeed linearity is so ubiquitous that students are sometimes tempted to incorrectly generalize it to nonlinear functions, resulting in “Freshman’s dream” errors such as (x+y)n = xn + yn and cos(x+y) = cos x + cos y.]
• The derivative at a point of a function from Rm to Rn is a linear transformation from Rm to Rn. Recall that a function f of one variable has derivative f ' (x0) at x0 iff f (x) is approximately f (x0) + f ' (x0) (xx0) for x near x0 (in a sense we shall make precise before long). We shall use the same formula for a function from Rm to Rn, with f ' (x0) interpreted as a linear transformation. That is,  f  is differentiable at x0 iff f (x) − f (x0) is approximated by a linear function of xx0 for x near x0. In short, a differentiable function is locally constant + linear.
• The dy/dx factor in the change-of-variable formula for integrals over one variable generalizes to a determinant for multivariate integrals. In univariate calculus, if y is a differentiable function of x over some interval, then the integral of any g(y) dy can be written as the integral of g(y(x)) |y'(x)| dx. If y is a differentiable function from Rn to Rn (NB same dimension, i.e. m=n in the previous item) then y'(x) is a square matrix, and we shall obtain the same kind of formula but with y'(x) generalized to the determinant of that matrix — this is related with the interpretation of the determinant of a linear transformation as a volume ratio.
What, then, of topology? Topology is our language for saying precisely what we mean by “approximately” and “near” in the above informal description of the derivative. Admittedly it is not strictly necessary to develop either linear algebra or topology to give a rigorous treatment of calculus, because such a treatment was known first (though it took a long time, see historical note below), and both topology and linear algebra arose around 1900 as a generalization of results that had already been obtained in the 19th century and before. Still, it is now recognized that — for both linear algebra and metric topology — the additional investment in developing a general theory pays for itself many times over when we reuse the same concepts and results in different settings (already in Math 25b, and all the more so in further study of mathematics) instead of repeating the same argument each time we need an instance of the rank-nullity theorem or the convergence of a Cauchy sequence in a complete metric space. For metric topology, we shall begin by axiomatizing the notion of a distance (a.k.a. metric) implicit in “approximately” and “near”, and using the distance to define and study limits, continuous functions, and other building blocks that we’ll use in constructing most of our calculus proofs.

Historical note(*): The discovery/invention of integral and differential calculus is generally attributed to Newton (1642–1727) and Leibniz (1646–1716), who engaged in a notorious priority battle over it. Nontrivial results predate them; e.g. Fermat (1604±3–1665) integrated xr dx for any rational r, Bhaskara II (1114–1185) is “credited with knowledge of Rolle’s theorem”, and the “method of exhaustion” of classical Greek geometry prefigures Riemann integration. But the fundamental theorem of calculus, linking differentiation and integration, was not known before the second half of the 17th century. Still the definition of the derivative, and indeed the notion of a function, was only made precise in the late 19th century, culminating in the work of Weierstrass (1815–1897). The difficulty was already recognized by George Berkeley (1685–1753), who famously lampooned the tiny-but-not-quite-zero dx’s and dy’s as “ghosts of departed quantities”; but Euler (1707–1783) (in his Introductio in Analysin Infinitorum) and even Gauss (1777–1855) relied on those evanescent “ghosts”, and most of their analytical work survived the transition to ε-δ calculus. The impetus for the rigorization of real analysis came not from outside challenges such as Berkeley’s, but from within mathematics, notably the discovery of certain Fourier series, such as the piecewise linear but discontinuous “sawtooth waveσ(x) = −∑k≥1 sin(kx)/k, that challenged the notion of a mathematical “function”, let alone the derivative of such a function (which one is tempted to construct by termwise differentiation to an even less well-behaved series σ'(x)” = −∑k≥1 cos(kx)).

(*) NB I am not a historian of mathematics; this sequence of events is told in many secondary sources, but I do not claim to have read most of the primary sources (many of them in Latin) myself.

Wednesday, 29 January: Metric topology I: basic definitions and examples

For the first few weeks there will not be much commentary here because it would duplicate what’s in the TeXed lecture notes on metric topology. These notes are being edited so that the references and some notations match what’s in our textbook (Simmons, mostly Chapter 2) rather than the Rudin textbook that accompanied Math 55 text when the notes were first written. We shall have very little to say about general topological spaces (see Chapter 3 in Simmons) beyond the definition, but will note when some concept or argument is purely topological (i.e. can be stated purely in terms of open sets, as with the fundamental concept of a continuous function between metric spaces), because sometimes we’ll have several different choices for the metric that yield the same topology (e.g. the sup metric and Euclidean metric on Rn), and thus the same result for any topological notion (e.g. a continuous function on the sup-metric Rn remains continuous with respect to the Euclidean metric). By the way, we won’t officially cover Chapter 1, but you may want to review it for fundamentals of Boolean algebra, cardinality, etc.

Friday, 31 January: Metric topology II: open and closed sets

To the notes about terminology (ball, sphere, neighborhood) at the start of the second lecture notes, I add that “neighborhood of p” is used nowadays to mean “open set containing p” (equivalently: containing some open ball centered at p), and that a set that is simultaneously closed and open is sometimes known as a “clopen set” (though some deprecate this portmanteau as an ugly word).

Monday, 3 February: Metric topology III: continuous maps between metric spaces

(In class we covered almost all the key points here except that we didn’t actually prove the topological characterization of continuity! So this will have to be next time, together with sequences which provide yet another equivalent condition.)

Wednesday, 5 February: Metric topology IV: Sequences, convergence, and uniform convergence

(This topic will occupy us also for at least part of Friday’s lecture.)

It can sometimes be helpful to know that an equivalent condition for convergence is that for all ε>0 there exists N such that d(p, pn) ≤ ε once nN (note the “ ≥ ” rather than “ > ” signs; it also works with only one sign changed). The reason is basically that the closed ε-ball contains the open one which in turn contains contains the closed ε/2-ball, and we can always change N to N+1 to go between “ >N ” and “ ≥ N+1 ”.

Friday, 7 February: Function spaces and uniform convergence; Preview of Metric topology V: Compactness

As an example of the note on “ ≥ ” vs. “ > ”: Logically, the “unwound” definition displayed near the top of page 3 of the “Metric Topology IV” notes isn’t quite right, because we could have d( fn, f ) = ε even though d( fn(x), f (x)) < ε for all x. But it still works because we require it for every positive ε.

Note that Simmons postpones discussion of compactness until Chapter 4, after introducing (general, i.e. not necessarily metric) topological spaces in Chapter 3. Most of the results we’ll cover are in the first few pages of Chapter 4 and in “24. Compactness for metric spaces” starting on page 120. As long as we work only in subsets of Rn, the compact subsets will be precisely those that are closed and bounded (the Heine-Borel theorem; as Simmons suggests on page 114, the proof given there is more complicated than we’ll need, and we’ll give one of the usual proofs via ε-nets). In general, though, the “closed and bounded” condition is necessary but not sufficient, even in a complete metric space (more on completeness next week): a simple counterexample is an infinite metric space with the discrete metric, which is complete, closed, and bounded, but not compact (why?).

Monday, 10 February: Metric topology V: Compactness

You may have noticed that I’m soft-pedalling the notion of “limit point”. Limit points essentially duplicate limits of (sub)sequences, and I don’t think we’ll ever need both notions, so for example I won’t say much if anything about the “Bolzano-Weierstrass property” because sequential compactness accomplishes much the same thing. (It feels like this is the usual expository practice for this material nowadays.)

Wednesday, 12 February: Metric topology V cont’d — Sequential compactness

For the Proposition at the bottom of page 3: Recall that a subset S of a metric space X is said to be dense when it intersects every open ball; equivalently (and showing that this is a topological notion), when every nonempty open subset of X has nonempty intersection with S.

Friday, 14 February: Metric topology VI: Completeness; compact = complete & totally bounded, and the Heine-Borel theorem

We have several times used (at least implicitly) the “Archimedean axiom” (a.k.a. the “Eudoxus axiom” — we shall meet Eudoxus’ name again when we discuss integration): for every real number x there exists an integer N > x; equivalently (by considering 1/ε), for every real ε > 0 there exists an integer N such that 1/N < ε. In classical Greek geometry one might say that any two lengths l, L are comparable, in the sense that one can divide L into finitely many intervals none of which exceeds the length of l (to see that this is the same, let x = L / l, or ε = (L / l)−1 = l / L). This is in effect what we do when we construct an ε-net in [0,1], or more generally in any bounded real interval.

Another example: once we have shown that a sequentially compact space X (or even a space X where every sequence has a Cauchy subsequence) is totally bounded, we can quickly deduce that X is separable. Just take the union over integers n of (1/n)-nets to get a countable set (countable union of finite sets) that is dense (any positive ε is less than some 1/n).

Monday, 17 February: NO CLASS — University holiday (Presidents’ [Presidents? President’s?!] Day)
Wednesday, 19 February: Metric topology VI cont’d: Lebesgue numbers; continuous functions on compact spaces are uniformly continuous

Simmons gives as exercises the results that the continuous image of a compact set is bounded (page 115, Exercise 7) and that a continuous real-valued function on a compact set attains its supremum and infimum (page 115, Exercise 8).

Here’s the alternative approach (via sequential compactness rather than Lebesgue numbers) used in class to prove that every continuous function  f  from a compact space X to some metric space Y is uniformly continuous. Suppose not. Then there is some ε > 0 such that no δ works. So (again exploiting Archimedes), for each n there exist xn, x'n such that d(xn, x'n) < 1/n but d( f (xn),  f (x'n)) ≥ ε. Use sequential compactness to find a subsequence {xni} of {xn} that converges to some x. Then the x'ni converges to the same x (why?). But then f (x) is the limit of both f (xni) and f (x'ni), which is impossible because the distance between f (xni) and f (x'ni) is always at least ε. This contradiction proves that  f  is uniformly continuous.

Friday, 21 February: Metric topology: conclusion; application to the Fundamental Theorem of Algebra

Besides the fundamental [sic] importance of the result, this proof of the Fundamental Theorem of Algebra illustrates several techniques and ideas that we shall use repeatedly in the development of multivariate calculus. (See the Wikipedia article on this theorem for an idea of other approaches that have been used to prove it.) The fact that the absolute value of a polynomial  f  never has a local minimum except at a zero (a.k.a. root) of  f also generalizes to a fundamental property of differentiable complex-valued functions of a complex variable, as you’ll see when you take Math 113 or study complex analysis in some other setting.

Monday, 24 February: Start multivariate differential calculus

We start with Chapter II of Edwards. Single-variable calculus studies functions from (nice subsets of) R to R; this is the special case m = n = 1 of functions from (nice subsets of) Rm to Rn, which are the topic of multivariate calculus. As long as we fix m at 1, letting n be arbitrary doesn’t change much at first, though beware that Rolle’s theorem already fails for n=2 (see the last problem on the fourth problem set).

An equivalent definition of differentiablity that does not explicitly single out h = 0: the function  f  has derivative  f '(a) at a if for all ε > 0 there exists δ > 0 such that ||f (a+h) − f (a) − hf '(a)|| ≤ ε |h| for all h such that |h| < δ. (Note the use of “≤” rather than “<” which makes the inequality hold also for h = 0; it would also be OK to require the inequality whenever |h| ≤ δ, via the usual trick of halving δ.)

Whether we use this definition or the usual one given by Edwards, we must make sure that both a and a+h are in the domain of  f, because we want to be able to differentiate functions such as 1/x or x½ that are not defined on all of R. Note that if the domain is open then as long as a is in the domain we know that once δ is small enough all choices of h with |h| < δ yield a+h in the domain too.

A function  f  from (a subset of) R to Rn is just an n-tuple of functions  fi from (that subset of) R to R, and then  f  is differentiable iff each of the coordinate functions  fi is differentiable, in which case the derivative of  f is just the n-tuple of derivatives of the  fi. Once we’ve proven this we soon recover the formulas for differentiating products and compositions of functions RRn, which Edwards gives in Theorem 1.1 (page 59–60), from the corresponding formulas for n = 1. But we haven’t yet given ε-δ proofs of these single-variable formulas! The formula for the derivative of a product of differentiable functions is a bit tricky to prove; since we’ll soon prove the multivariate chain rule, we’ll be able to recover the formula for the product of two differentiable real-valued functions f, g from the chain rule together with the derivative of the single function R2R taking (x, y) to xy — which is the same kind of trick we used to show that the product of two continuous real-valued functions is continuous. In any case we’ll need the notion of a differentiable function of two real variables to properly do “implicit differentiation” even though this topic is often presented already in texts on single-variable calculus. (It’s not hard to use single-variable techniques to differentiate a function y(x) defined implicitly by a relation like sin(x3+y3) = 3xy assuming that the derivative exists, but not so easy if we don’t know this in advance!)

[NB there are two forms of the lower-case Greek letter φ, and (at least in the edition I’m looking at now) both appear in the statement of Theorem 1, one in the introductory text and the other in formulas (3) and (5) and in the proof on page 60; there’s no distinction between them, and Edwards should have used the same φ throughout. (In TeX the two forms are obtained in math mode using the commands \phi and \varphi .) While I’m at it, I see a typo in the second displayed block on page 60: f ('g(t)) should be f '(g(t)).]

For any subset G of R, Theorem 1.1 (generalized routinely to functions on G) includes (how?) the result that the differentiable functions from G to Rn form a vector space, and differentiation is a linear transformation from that space to the space of functions from G to Rn. The kernel contains all constant functions (and even all locally constant functions, i.e. functions such that each x in G has a neighborhood on which the function is constant); When G is an interval, we shall soon show using Rolle’s theorem that this is the entire kernel. Describing the image of this linear transformation is a notoriously hard problem, even when G is an interval. (Note that these vector spaces as “very” infinite-dimensional, so we can’t use the rank-nullity theorem…) When we prove the Fundamental Theorem of Calculus we’ll see that the image includes all continuous functions on G; but there are also differentiable functions whose derivative isn’t everywhere continuous.

Wednesday, 26 February: Differentiable vector-valued functions of one variable, cont’d

Yet another equivalent definition of the derivative of a function  f  from G to Rn: the derivative at a exists, and equals f '(a), iff there exists a function s (“s” as in “slope”) on G − a = {h : a+h G} = {xa : x G} such that f (a+h) = f (a) + h s(h) for all h in G − a and s is continuous at h=0, in which case f '(a) = h(0). This makes it easy to prove that the product of functions differentiable at a is itself differentiable at a and to give the formula for its derivative there. By induction we obtain the formula for the derivative of the product of n continuous functions for each integer n.

Friday, 28 February: More about single-variable differential calculus

An essential application of the derivative is to local extrema (minima and maxima): if a function from G R to R has a local maximum or minimum at an interior point a of G, and the function is differentiable at a, then its derivative at a equals zero.

Combining this with Heine-Borel gives a rigorous foundation of the familiar method for finding the maximum and minimum of a differentiable function on an interval. It also yields the proof of Rolle’s theorem, and thence of the Mean value theorem, which as promised shows that a function on an interval whose derivative is everywhere zero is the constant function. We first prove this result (kernel of the derivative map equals constant functions) for functions taking values in R, and then deduce it for functions from an interval to Rn by considering each coordinate separately.

Monday, 3 March: Starting multivariable differential calculus: partial derivatives, directional derivatives, and the derivative and differential at a (usually interior) point

See Edwards, section 2 of Chapter 2. We introduced this by solving the two-variable optimization of finding the largest value of xyz for nonnegative reals x, y, z such that x + y + z = 1. (Two variables, not three, because we can solve for z.) For this purpose partial derivatives suffice, but for other applications we do need the stronger condition of differentiability.

Wednesday, 5 March: multivariable differential calculus, cont’d: the gradient; continuous differentiability

In the special case m=1, i.e. a function  f  from a neighborhood of a ∈ Rn to R, if  f  is differentiable at a then its derivative is a row vector of length n, called the gradient of  f  at a, written f (a). [Edwards, page 70–71. “f ” may be pronounced “del f ”. The upside-down-capital-Delta symbol ∇ itself is called a nabla, produced in TeX by writing \nabla in math mode; it was named for the Aramaic word for “harp” (cognate with modern Hebrew NEVEL), but it seems that my recollection that this name was introduced by Tullio Levi-Civita (1873–1941) was mistaken.] If f (a) exists then it points in the direction where  f  increases fastest, because the directional derivative of  f  in the direction v is the scalar product f (a) · v. If  f  has a local extremum at a then f (a) = 0; in general a point a where f (a) = 0 is called a “critical point” of  f. (NB Already for n=1 we know that there can be critical points that are not local extrema; in dimension 2 and higher we shall see that there are even more possibilities for the local behavior of  f at a critical point.)

For general m, we can write F as an ordered m-tuple (F(1), F(2), …, F(m)) of real-valued functions. Then we easily show that F is differentiable iff each of its coordinates F(j) (j = 1, 2, …, m) is differentiable. [Edwards, Lemma 2.3 on page 71.] In this case, the j-th row of the derivative F '(a) is the gradient of F(j). This gives us an interpretation of each of the rows of the derivative matrix, as we earlier interpreted each of the columns as a partial derivative. The individual entries of F '(a) are then the partial derivatives DiF(j)(a) of the components of F [Theorem 2.4 on page 72].

There are plenty of (counter)examples of functions on open sets in Rn that have partial derivatives or even directional derivatives but are not differentiable at some point. [Edwards notes the cases of 2x2y / (x4 + y2) and 2xy2 / (x2 + y2) as Example 4 on page 69 and Exercise 3 on page 75 respectively.] However, if all the partial derivatives Di f (b) exist for all b in some neighborhood of a, and are all continuous at a, then  f  is differentiable at a (and the partial derivatives are the columns of f '(a) as usual). This is Theorem 5 on page 72, proved (on pages 72–73) by moving from a to a+h by changing one coordinate at a time. Such  f  is then said to be continuously differentiable at a. (Thus we naturally say that  f  is “continuously differentiable on G” for some open set G in Rn if  f  is continuously differentiable at a for every a in G.)

Friday, 7 March: multivariable differential calculus, cont’d: the multivariate Chain Rule

See Edwards Theorem 3.1 (pages 76–77). This is much like the proof for the single-variable case, except that we need the fact that if T is a linear transformation from Rn to Rm then there exists a real number M such that ||Tv|| ≤ M ||v|| for all v in Rn. (We needed this in the single-variable case too, but there it was immedaite because T was a 1×1 matrix, i.e. a scalar, and we could simply use M = |T|.) Recall that if v has coordinates a1, a2, …, an then Tv = a1Te1 + a2Te2 + … + anTen, where e1, e2, …, en are the unit vectors in Rn. By the triangle inequality it follows that ||Tv|| is at most ||a1Te1|| + ||a2Te2|| + … + ||anTen||. Each term ||aiTei|| equals |ai| ||Tei||. Since each |ai| is no larger than ||v||, we deduce that ||Tv|| is at most ||Tv|| ||Te1|| + ||Tv|| ||Te2|| + … + ||Tv|| ||Ten||, so we may take M = ||Te1|| + ||Te2|| + … + ||Ten|| and we’re done.

The existence of M can also be obtained by continuity of T (which we already know from an early problem set): choose any ε, say ε=1, and find δ such that ||v|| < δ implies ||Tv|| < ε; by continuity it follows that ||v|| ≤ δ implies ||Tv|| ≤ ε; and then linearity of T, together with the identity ||ax|| = |a| ||x||, gives us ||Tv|| ≤ (ε/δ) ||v|| for all v, so we succeed with M = ε/δ. But then if you unwind the proof of continuity of T you’ll see that this proof is actually not all that different from the argument in the previous paragraph.

Monday, 10 March: First midterm examination
Wednesday, 12 March: Midterm post-mortem; continuous mixed partials commute
Friday, 14 March: No class
Monday–Friday, 17–21 March: Spring Break
Monday, 24 March: Taylor’s formula for functions of one variable

Sections 4 and 5 of Chapter II are oddly placed: they develop special cases of material that is covered many pages later, and they even refer to results from later sections in the book. It seems that this is done in part to motivate the more general treatment of constrained max/min problems later in the book, so you might want to read through those pages for this reason; but for now we proceed to section 6. Meanwhile Theorem 4.3 (page 95), describing positive-definite quadratic forms in two variables, should be familiar from Math 25a.

Some context: if  f  is a real-valued function of one variable that is infinitely differentiable in a neighborhood of a then it has a formal Taylor series Σn≥0 cn (xa)n, where each coefficient cn is 1/n! times the n-th derivative of  fat x=a. But these coefficients might grow too fast for the series to converge at any xa; e.g. one can concoct an infinitely differentiable function with Taylor series Σn≥0 n! (xa)n. Indeed it is known that every sequence (c0, c1, c2, … ) arises for some  f. Moreover, even if the series does converge it need not converge to  f. A famous counterexample is the function sketched in Edwards page 124, which equals exp(−1/x2) for nonzero x and vanishes at x=0: the Taylor series has cn = 0 for each n, and thus does not converge to  f except at x=0. [In linear-algebra terms, there infinitely differentiable functions form a vector space, as do the sequences (c0, c1, c2, … ); the map taking any function to its Taylor series about x=a is a linear transformation, and this map is surjective (in particular the image need not yield a convergent Taylor series) but not injective (indeed the kernel is infinite dimensional).] So it is a nontrivial question how well a function is approximated by finite sums of its Taylor series.

Typo in the text: in formula (5) on page 118 (statement of Theorem 6.1, Lagrange form of the remainder in the Taylor expansion) the denominator should be (k+1)!, not k+1 as written. (The factorial does appear correctly in the two displayed equations immediately following.)

At the bottom of page 121 Edwards gives a proof of e<4 using integration and the logarithm function, neither of which we have discussed yet. (To be sure, we haven’t discussed the exponential function either, nor developed enough of a theory of power series to justify the termwise differentiation needed to prove that the Taylor series for ex is its own derivative…) I’m not sure why Edwards takes that detour, because he’s just proved the formula ex = Σn≥0 xn/n!, and taking x=1 easily yields the better bound e≤3: the first three terms are 1+1+½ = 2½, and each succeeding term 1/n! is less than 1/2n−1 (compare n! = 2·3·4···n with 2·2·2···2), so for n>2 the n-th partial sum is less than 3 − (1/2n−1) (by the formula for a geometric series, proved by induction or “telescoping series”), whence the limit is at most 3 as claimed.

There aren’t many functions  f  whose k-th derivatives are all simple enough that we can easily use Theorem 6.1 to prove the Taylor series converges to  f : Basically just linear combinations exp(bx), sin(bx) and cos(bx) (which are essentially exp(ibx) by Euler’s “cis” formula), and power functions (xx0)r (which yield the binomial formula for arbitrary real exponent, though curiously only for ¾ of the correct interval; e.g. the Taylor series about x=0 for (1+x)r converges to that function for all x in (−1,1), but we cannot derive this from Theorem 6.1 if x < −½). [Added later: also logarithmic functions such as log(1+x), for which the first derivative yields a power function with exponent r = −1.] Still we shall make good use of this Lagrange form for fixed k even when we cannot effectively use it for all k at once.

Wednesday, 26 March: Proof of Taylor’s formula with remainder for functions of one variable; derivative tests for local maxima and minima

For the derivative tests (Theorem 6.3), it is enough to assume that the (k+1)-st derivative is bounded in a neighborhood of a, since that is the hypothesis of Corollary 2. In fact the (k+1)-st derivative is not needed at all as long as the k-th derivative is continuous in a neighborhood of a; this can be seen by applying Theorem 6.1 to the Taylor remainder Rk−1. But in practice it will hardly ever matter: we’ll almost(?) never have occasion to apply these derivative tests to a function that is not infinitely differentiable.

Friday, 28 March: Start on multivariate Taylor

At a first pass through section 7 of the text it may be hard to see through the thicket of multiple indices. To get a feeling for what’s going on. it may help to first ork out the cases n=2 (and check that the formulas also work in the known case n=1). For instance, start with the example that spans pages 130–131 (second and third directional derivatives in R2).

It will also help a bit to correct yet another missing-factorial typo in the text: on page 131, the third formula in the multi-line display just below formula (3) should have a factor of 1/k! before the Σ. [Equivalently, divide the multinomial coefficient by k! to get 1 / ( j1! j2! … jn!).]

Monday, 31 March; Wednesday, 2 April: multivariate Taylor, cont’d

In Lemma 7.3 and Theorem 7.4 (page 135), the polynomials are to be of degree at most k; such polynomials form a vector space, whereas those of degree exactly equal k don’t quite.

In Lemma 7.3, strictly speaking Edwards should prove that if P is a polynomial of several variables (a.k.a. a polynomial function of a vector) that is not the zero polynomial then there is some vector b at which P(b) ≠ 0. In Math 25a this was shown for polynomials in one variable [with the observation that the proof depended on the field R being infinite, since in the case of a finite field (such as the integers mod 2) there are polynomials that vanish at every field element without being zero identically (such as x² − x for the 2-element field).] Once we have the one-variable result we can do the general case by induction on the number of variables: assume it for polynomials in n≥1 variables, and write any polynomial in n+1 variables as P(b) = P0 + P1 bn+1 + P2 (bn+1)2 + ··· + Pd (bn+1)d where each Pj is a polynomial in the first n coordinates b1, …, bn. Then by the 1-variable lemma from Math 25a, if P is always zero then each of the Pj is always zero; and then by the inductive assumption each Pj is the zero polynomial.

(Alternatively, now that we know about partial derivatives, we can observe that if P is everywhere zero then so is each of its partial derivatives, and thus each k-th partial derivative for every k; but we’ve already seen how to isolate any coefficient of P by taking the corresponding partial derivative, dividing by the appropriate product of factorials, and evaluating at the origin — and since this recipe always yields zero we conclude that every coefficient of P is zero, as desired.)

As in the single-variable case, Corollary 7.2 and Theorem 7.4 require only that  f  have k continuous derivatives at a (and thus Theorem 7.5 on page 138 requires only two continuous derivatives, not three); but in practice we’ll hardly ever (never?) have use for this refinement.

Friday, 4 April: critical points, and the diagonalization of quadratic forms

If q is a quadratic form on Rn (some integer n>0), then the associated Rayleigh quotient R(x) is defined for all nonzero vectors x by R(x) = q(x) / ||x|| = (x, Ax) / (x, x) where A is the associated symmetric matrix and (·,·) is the inner product. Clearly R is homogeneous of degree zero: R(cx) = R(x) for all nonzero vectors x by and scalars c≠0. If y is any vector orthogonal to x then the directional derivative of R(cx) in the direction y is 2(Ax, y) / ||x||. If we already know that A has an orthonormal basis of eigenforms, i.e. that we can choose coordinates for which q(x) = Σi λi xi2, then it is clear that R is maximized (minimized) at an eigenvector with maximal (resp. minimal) eigenvalue. Conversely, we calculate that if some nonzero x is a critical point for R then every y that is orthogonal to x is also orthogonal to Ax. But this makes Ax an eigenvector of the symmetric matrix A. Moreover, there must be at least one critical point: R gives a continuous function on the unit sphere Sn−1 in Rn, and this sphere is compact (Heine-Borel), so R attains its supremum at some vector x, which is then a local maximum for R even on Rn by homogeneity. So we have found an eigenvector x of A (with eigenvalue equal to R(x)), and can deduce that A can has an orthonormal basis of eigenvectors by an inductive argument that you probably remember from Math 25a (if n=1 we’re done; otherwise consider the action of A on the orthogonal complement of x, which has dimension n−1 and is mapped to itself by A because once x is an eigenvector we have xy  ⇒  0 = (Ax, y) = (λx, y) = (x, Ay)  ⇒  xAy, etc.).

P.S. I see that I never explained the term “Hessian” for the matrix A (or its determinant, see page 151 of Edwards). It is named for Ludwig Otto Hesse (1811–1874); see the Wikipedia page on “Hessian matrix”.

Monday, 7 April, and Wednesday, 9 April: inverse and implicit functions etc.

The main theme of Chapter 3 is the differential behavior of “implicit functions”, i.e. functions y(x) defined implicitly by a relation G(x, y) = 0. This is familiar from single-variable calculus, but is also useful in our multivariable generality to constructing a function from Rm to Rn as long as G=0 gives n conditions on y, that is, as long as G is a function (an open subset in) Rm+n to Rn. If y is a differentiable function of x then the Chain Rule gives Gx + (dy/dx) Gy = 0, which we can solve for dy/dx provided that Gy is invertible (NB this makes sense because Gy is a linear map from Rn to Rn). But it is not obvious that y is even well-defined, let alone differentiable, in a neighborhood of any point where G(x, y) = 0 and Gy is invertible. In fact one can construct counterexamples already when m=n=1 in which G is continuous and differentiable but G=0 does not define an implicit function. It turns out that such a difficulty can arise when Gy, though defined at every point, is not a continuous function. The main result of Chapter 3 is that y is well-defined and differentiable in a neighborhood of any point where G(x, y) = 0 and Gy is invertible, as long as G is continuously differentiable near that point.

A closely related notion is an inverse function, where we have a differentiable map  f  from an open set in Rn to Rn, and seek a differentiable inverse function f−1 in a neighborhood of some  f (x0). Again it turns out that such an inverse function exists provided f '(x) (recall that for each x this is a linear map from Rn) is invertible and continuous at x0. This is clearly a special case of the implicit function theorem, namely the case with m=n and G(x, y) =  f (y)−x. But conversely once we work in spaces of arbitrary (finite) dimension we can derive the implicit function theorem from the inverse function theorem! Consider the function of m+n variables taking (x, y) to (x, G(x, y)). Its derivative is a block-triangular matrix (shown at the top of page 191 in Edwards) that is invertible iff Gy is invertible, and once we have a differentiable inverse function we can restrict it to the subspace y=0 to recover the desired implicit function. This is the approach Edwards (and we) take, first proving the inverse function theorem (Theorem 3.3 on page 185), and deducing from it the implicit function theorem (Theorem 3.4 on page 190).

It will be convenient (though not logically necessary) to prove the inverse mapping theorem in the special case a =  f (a) = 0 and f'(0) = Id. The general case can be deduced from this via the Chain Rule by composing with translations and linear transformations. This is essentially Edwards’ approach (see Lemma 3.2 on page 183, preceding the proof of the full inverse function theorem (3.3) starting on page 185), though since we are skipping section 2 which defines the operator norm || dfxI || we just say that each coordinate of dfxI is less than ε and thus that dfxI multiplies norms by at most nε. Hence we replace the estimates (1±ε)r by (1±nε)r.

For the inverse mapping theorem, Edwards asserts on page 182 that “It is easy to see that invertibility of dfa is a necessary condition for the local invertibility of  f  near a.” In fact this is necessary if we require the inverse function to be differentiable as well, but it is possible for  f  to have a continuous (but not differentiable) inverse even where df  is not invertible, even in dimension 1: consider f (x) = x³. (See also Exercise 3.3 on page 194 of Edwards, which gives a map from R³ to R³ which has an inverse function g which is not differentiable at the origin; where else is this g not differentiable, and why?)

Edward’s (and our) key tool is the contraction mapping theorem, introduced and motivated in Section 1 of this chapter. The notion of contraction mapping (page 162) makes sense in any metric space: just replace (3) by the condition d(ϕ(x), ϕ(y)) ≤ k d(x, y). Theorem 1.1 (p.162) and its proof (pages 162–163) then holds provided the space is nonempty and complete. [Note that Edwards requires a closed interval in R. In a non-complete space, there might not be any fixed point; e.g. consider the contraction mapping xx/2 of (0,1). But if there is a fixed point then it is still unique for the reason given on page 163.] This is not just generalization for generality’s sake: it contains Theorem 3.1 (page 181–182), and indeed also Theorem 1.4 (which in effect constructs  f  as the fixed point of a contraction of a closed subset of the function space C([a, b]) ). This strategy of using a contraction mapping is also a key ingredient in the existence theorem for differential equations, though we probably won’t reach this result in Math 25b. [There is no need to assume that our metric space is bounded, as Edward suggests on page 182; but in the present context we shall apply the contraction mapping theorem only to bounded spaces. Indeed, we often start with a function ϕ that is initially defined on a larger unbounded space S, we’ll sometimes have to find a suitable subspace S0 that is mapped to (a subset of ) itself under ϕ and for which the restrction of ϕ to S0 is a contraction (which it might not be on all of S); and this S0 will often have to be bounded.]

We shall skip Section 2; the tools developed in that section, interesting and important though they are for other purposes(*), are more than we need, which is basically that if all the entries of a matrix A are of absolute value at most ε then ||Ax|| ≤ Kε||x|| for every vector x, where K depends only on the size of A; for instance if we use the sup metric then we can take K to be the length of x (i.e. the dimension of the domain of the linear transformation).
(*) For example, if a symmetric matrix A is positive-definite then its norm (relative to the Euclidean norm) is the largest eigenvalue, and for a general symmetric A the norm is the maximum of |λ| as λ ranges over the eigenvalues of A.

Does the two-dimensional Example 1 on page 183 look familiar? It’s just the map z ↦ z² of the complex plane. So of course it is two-to-one on the complement of the origin. This might also help with Exercise 3.1 on page 191.

Like Edwards (see the brief Section 5 = pages 201 and 202), we shall punt on proving the Ck versions of the implicit and inverse function theorems, whose proofs are somewhat tedious and messy, and introduce no fundamentally new ideas. However, in some special cases we get the higher derivatives more-or-less for free. For example, suppose we know that log(x) has derivative 1/x for all x > 0. Then it follows that the inverse function (a.k.a. exp(x)) is its own derivative, and then by induction its own k-th derivative for each k = 2, 3, 4, ….

Friday, 11 April: Introduction to volume and integration in Rn

Edwards defines the integral of a function on a bounded subset of Rn via volumes of bounded subsets of Rn+1. This approach has very old roots in the “method of exhaustion” of classical geometry. We cannot reasonably expect to assign a volume to every bounded subset of Rn; for example the subsets of R3 that occur in the Banach-Tarski paradox cannot all have volume without violating at least one of the basic properties/axioms that we need volumes to have (see the list on page 203 at the start of Chapter IV of Edwards) — or assigning a volume of zero to the sphere, which is not useful to us either. One could strengthen the axioms to require additivity under countable unions of sets with pairwise disjoint interiors; but while such a theory of volume can be developed, it is much harder, and we relegate it to a class such as Math 114. Edwards’ definition excludes even countable sets such as Q ∩ [0,1], which “should” be negligible (i.e. should have zero length [= 1-dimensional volume]); but it does assign a volume to a rich enough collection of subsets of Rn to let us integrate any continuous function, and also some mildly discontinuous functions such as characteristic functions of intervals. Perhaps because his notion of volume is more restrictive than the countably-additive one you might encounter in the future, Edwards sometimes calls it “content”, and refers to a set with a well-defined volume or content as “contented” (be sure to accent the first syllable!).

One key point that Edwards does not sufficiently emphasize is that we need to make sure that the volume is well-defined, i.e. that the definition does not allow more than one value of the same subset of Rn! In Greek geometry it is basically assumed that every reasonable region in the plane or in space has a single area or volume, and that these satisfy the properties on page 203; but this is not easy to show directly, and it is well known that one can construct dissections that seem to violate this assumption, such as this “infinite chocolate” GIF (see also this Wikipedia page and these two variations on my extension of one of those classes of dissections). This is why Edwards defines his volume using only finite unions of boxes (“intervals”) whose sides are parallel to the coordinate axes, i.e. Cartesian products of bounded intervals in the n coordinates of Rn. This lets us show the key fact that if the one such union, call it A (with pairwise disjoint interiors), is contained in another, say B, then the total volume of A is no larger than the total volume of B. (This is done by listing for each i the numbers that occur as a coordinate of some box of A or B, sorting each of these n lists in increasing order, and splitting each box into sub-boxes each of whose coordinates is bounded by consecutive numbers in its list. One then need only check that each box is assigned the same volume as the sum of its sub-boxes’ volumes.) This means that the volume is not obviously invariant under rigid motions of Rn, but we can then prove this invariance as a theorem, and show more generally that a map of the form xAx + b (where b is a fixed vector and A is any n × n matrix) multiplies the volume of any set by |det(A)| (assuming that our set had a volume to begin with).

[Added later: Edwards does hint at this technique with the “partitions” introduced on page 215 (proof of Theorem 2.1: sets with volume are precisely the bounded sets with negligible boundary); our collection of sub-boxes is a partition, and such partitions suffice for the proof of Theorem 2.1.]

Monday, 14 April: Integration over bounded intervals (or more generally over boxes in Rn); the Fundamental Theorem of Calculus

When we use the area in R² to define the integral of a bounded function on an interval we are naturally led to upper and lower Riemann sums. The integrability (= existence of the relevant areas) of any continuous function on a closed interval is then a consequence of the fact that the interval is compact (Heine-Borel) and continuous functions on compact metric spaces are uniformly continuous. This generalizes readily to integrals over boxes in Rn, which exist for continuous functions on closed boxes for much the same reason.

Some basic properties of the area, and thus of the integral, already suffice to recover the Fundamental Theorem of Calculus: all we need is that ab f (x) = ∫ac f (x) = ∫cb f (x) for any integrable function  f  on an interval [a, b] that contains c, and that if  f  takes values in [m, M] then ab f (x) is in the interval [m(b-a), M(b-a)]. We shall find many uses for this theorem in the remaining weeks of Math 25b. For starters, we can use it to find a function  f  on the positive reals whose derivative is 1/x; and we have already seen that the inverse function of such  f  is its own derivative, which lets us construct the exponential function and prove its standard properties. (One can construct the trigonometric functions in the same way by integrating 1 / sqrt(1−x²) or 1 / (1+x²) and forming the inverse function, though it takes some more work to recover the periodicity and basic identities satisfied by these functions.)

Monday, 16 April: More about integration in Rn and volume (a.k.a. content) in Rn+1

A bounded set in Rn set has content if and only if its boundary is negligible (of content zero, i.e. is contained in a finite union of boxes of total volume <ε for every ε > 0). This is Theorem 2.1 in Edwards chapter IV (pages 215–216). A finite union of negligible sets, or an arbitrary subset of a negligible set, is again negligible; and if a set A has volume v then so does any set A' obtained from A by adding or removing a negligible set.

A function F from a metric space to R is said to have bounded support if there is a bounded set B such that F(x) = 0 for all x not in B. F is said to have bounded support if B can be taken to be compact. For functions on Rn the two notions are the same because of Heine-Borel (and because the closure of a bounded set is again bounded). Edwards says (p. 219) that a function F: Rn → R is admissible if it has bounded support (equivalently, compact support) and is continuous outside a negligible set. In this case the ordinate sets of F+ and F  have volume, so F is integrable, and we readily obtain the four basic properties or axioms of integrals enumerated on page 218. Even if there are integrable functions that are not admissible, the admissible functions are all that we shall need.

Note that even in R the cardinality of a bounded set S does not correlate all that well with the “volume” (a.k.a. length, in this 1-dimensional setting) of S; both notions are trying to get at “how large” S is, but in rather different senses. It is true that any finite set is negligible, and a set with positive volume must have cardinality c (the continuum) because it contains an interval. But there are countable sets that are not negligible, such as the rational numbers in (0,1): the boundary of this set is the entire interval [0,1], which is certainly not negligible, so the criterion of Theorem 2.1 fails. Conversely, there are negligible sets of cardinality c. This is clear in dimensions 2 or above (use an interval), but even R contains such sets, for example the Cantor set C. (In step k of the construction of C we find 2k intervals, each of length 1/3k, whose union contains C; since the total length (2/3)k of these intervals approaches zero as k → ∞, we can find for every ε > 0 a finite union of intervals that cover C and have total length less than ε, so C is negligible as claimed.) It follows that there are as many negligible subsets of R as there are arbitrary subsets of R: both cardinalities are 2c (any subset of the Cantor set is again negligible, and this already gives us 2c negligible sets).

Friday, 18 April: Second midterm examination
Monday, 21 April: Step functions and Riemann sums; midterm post-mortem
Wednesday, 23 April: More applications of the integral

Some examples of the power of the tools we have developed so far:

Intermediate value and Taylor revisited. If F is continuously differentiable on [x, y] then the Fundamental Theorem of Calculus says F(y)−F(x) is the integral of the derivative F' over [x, y]. Thus if the derivative is always in [m, M] then (by our basic axioms or properties of the integral) F(y)−F(x) is in [m(yx), M(yx)]. This isn’t quite our statement of the Intermediate Value Theorem, but it is a consequence of it, and is equivalent in our present case of a function with a continuous derivative, which will almost always be the case for us (and indeed it is these bounds on F(y)−F(x) that we usually need, as in that contraction problem on the second midterm). With some more work we can obtain Taylor’s theorem with remainder for functions with a continuous derivative of order k+1. [“More work” = integration by parts plus induction, plus remembering to treat both y > x and x > y, which are not equivalent once k > 0.]

Termwise differentiation of convergent power series in the interior of their interval of convergence. In general the pointwise limit, or even the uniform limit, of a sequence {Fn} of differentiable functions need not be differentiable. However, we can go in the opposite direction thanks to the Fundamental Theorem of Calculus and Exercise 3.4 (which we cover in class today): given x0, an interval I containing x0, any sequence {Fn} of continuously differentiable functions on an interval whose derivatives converge uniformly to some f, and for which Fn(x0) is constant, does converge uniformly to a function F whose derivative is f. Applying this to the sequence of partial sums of a power series yields the differentiability of power series inside their interval of convergence. (For this we also need the fact that the derivative of a power series has the same interval of convergence, which ultimately comes down to the fact that as n→∞ then n-th root of n (which enters into the limsup formula for the radius of convergence of the derivative) approaches 1.

Linear transformations and volume. We have yet to make good on our promise of showing that our definition of volume is invariant under rigid motions of Euclidean space: so far only translations and coordinate permutations are easy. But by now we know enough to prove more generally that if A has volume v(A) and T is any linear transformation then T(A) has volume |det(T)| v(A). Not only will this yield the invariance of volume under rigid motions as a special case, but it will be one of two essential ingredients for the change-of-variable formula for integral in dimension greater than 1 (for which we can no longer use the Fundamental Theorem of Calculus as Edwards does in dimension 1). The proof reinterprets a construction from Math 25a: T is a finite composition (matrix product) of coordinate permutations, diagonal matrices, and shears. (These correspond to the row operations of switching rows, multiplying a row by a scalar, and adding a multiple of one row to another.) If we prove the |det(T)| v(A) formula for each of these elementary matrices, then the general result will follow by multiplicativity of the determinant! And only the shears present more than ε difficulty.

Friday, 25 April: The change-of-variable formula in Rn

Motivation for the last few sections of Chapter IV: the classical derivation of the amazing definite integral R exp(−x²) = π½. (NB it is known that the antiderivative of exp(−x²) does not have an elementary formula; the normalized antiderivative (2/π½) ∫ 0x exp(−t²) dt is known as the error function erf(x), and is needed in some contexts in mathematics, and especially in statistics, where normal distributions often arise naturally.)

Step 1: By symmetry, the integral is 2I where I := ∫ 0 exp(−x²).

Step 2: Because we know that I will involve a square root, consider I 2, and write it as a double integral of exp(−x²) exp(−y²) = exp(−(x²+y²)) over positive x and y, that is, over (x, y) in the “first quadrant” of R2. [Note that when Gauss first obtained this integral he did not have the hint that I will be a multiple of π½ ! Though it is conceivable that he surmised it by numerical computation, as Euler first surmised that ζ(2)=π²/6.]

Step 3: Change to polar coordinates (x, y) = (r cos θ, r sin θ). The integrand is exp(−r²), and the Jacobian derivative ∂(x, y) / ∂(r, θ) has absolute value r. So I 2 is the integral of r exp(−r²) over the infinite rectangle (0, ∞) × (0, π/2) in the (r, θ) plane.

Step 4: This integral factors as the integral of r exp(−r²) over r>0 times the integral of 1 over θ in (0, π/2). The latter integral is of course π/2. The former is easy because thanks to the factor of r there is an elementary antderivative −exp(−r²)/2. Hence I 2 is (1/2)(π/2) = π/4. Since I is manifestly positive, it must be the positive square root π½ / 2 of π/4, so R exp(−x²) = 2I = π½, QED!

Each step requires some justification.
• In step 1: the integral is “improper (of the first kind)” because it extends over all real x, so we must write it as a limit over large but finite intervals so that the integral is well-defined. Such limit definitions of improper integrals are treated more systematically in section 6 of Chapter IV, but we have run out of time and cannot cover this in class. For our purposes it is enough to define the desired integral as the limit as M→∞ of 2IM where IM is the integral of exp(−x²) over (0, M). [Compare the “careful proof” in the Wikipedia page for this integral (which currently uses a for what is called M in this paragraph) with the “computation by polar coordinates” which is essentially what we did in steps 2 to 4 — I inserted the first step to avoid the additional complication that integrating exp(−(x²+y²)) over the entire plane covers the positive real axis twice (θ=0 and θ=2π) in the transformation to polar coordinates.]
• In step 2, we use Exercise 4.1, which is a special case of Fubini (Theorem 4.1, pages 238–239).
• Step 3 uses the change of variable formula, Theorem 5.5 (stated at the bottom of page 252, proved in the next few pages). Since the integral is improper we actually apply this argument to the square of IM, bounding it between the integrals over quarter-circles of radius M and 2½M, both of which are seen (in step 4) to approach π/4 as M→∞.
• Finally in step 4 we use Exercise 4.1 again (though we don’t really need it in this special case that the function depends on just one of the two variables), together with the Fundamental Theorem of Calculus which we have already proved in Chapter III.

Today we finished the outline of the proof of the change of variable formula.

First prove that linear changes of variable T multiply the volume of any box by |det(T)|; it follows (via some routine “epsilonics” [Edwards, p.252]) that the same is true for the volume of any contented set. (For shears, Edwards uses a result from section 4 of this chapter; we do not have this result yet, but the basic idea is readily available from what we’ve done already.) It is also clear that the volume is preserved by any translation, i.e. any transformation xx+c for some constant vector c.

For a general change of variable we need the behavior of volumes under an arbitrary map F from (an open set U in) Rn to Rn for which both F and its inverse are continuously differentiable. We have seen already (Inverse Function Theorem) that this is the case iffF is continuously differentiable and its derivative is an invertible linear transformation. Under this hypothesis, if we fix some point x0 in U, we can assume (after a linear change of variable, and translation of both domain and range) that x0 = F(x0) = 0 and the derivative of F at x0 is the identity. Then we know that for any ε>0 there is a δ such that if for all r<δ our map F takes the box [−δ, δ]n (a.k.a. the δ-neighborhood of 0 under the sup norm) to some set contained in the ((1+ε)δ)-neighborhood of the origin and containing the ((1−ε)δ)-neighborhood. It follows that the volume (which exists by the negligible-boundary criterion) is between (1−ε)n and (1+ε)n times the volume of [−δ, δ]n.

To prove the change-of-variable formula, we now divide the region of integration into small boxes and apply the above result to each one. We need to know that δ can be chosen uniformly given ε; we can do this using uniform continuity of the partial derivatives, at least if F and its inverse extends to a neighborhood of the closure of U (a hypothesis that soon gets removed in Addendum 5.6 on page 255–256; NB we need this, too, for our motivating example R exp(−x²) = π½, because the polar-coordinates map is not invertible at the origin). Then it’s just a few pages of epsilonics (253–255) to check that everything fits together as expected.

Monday, 28 April: Fubini, etc.: integration over Rm+n reduces to integration over Rm and over Rn

See Section 4, pages 235 to 240. Even after solving Exercise 3.3, Theorem 4.1 remains useful because 3.3 requires continuity, which is lost even when we want to integrate continuous functions over simple regions R other than boxes (because multiplying by the characteristic function φR yields a function that, though integrable, is generally not continuous on the boundary of R. I gave in class the example of the formula for the volume of a pyramid (which was mentioned after the second midterm in connection with the probability that the sum of three rounded numbers differs from the rounding of their sum). I also noted that in this case we have the following underhanded alternative for avoiding integration altogether (once we know that how volume behave under linear transformation, and know that the pyramid is “contented”): it is enough to show that the pyramid P defined by 0 < x < y < z < 1 has volume 1/6, because any pyramid of base B and height h can be taken to P by a linear transformation of determinant ±Bh/3; and six congruent copies of P (corresponding to the 3! possible orders of x, y, z) can be combined to form the unit cube. The same trick yields the volume of a pyramid (a.k.a. “simplex”) in Rn for any dimension n, and explains the appearance of the factor 1/n! in the formula. We could then even use “Cavalieri’s principle” (Theorem 4.2 on page 240) to recover the formula for xn−1 that replaces the usual Riemann-sum or antiderivative approach with n-dimensional geometry. See Edwards, pages 241–242 (and the Exercises for this section) for further examples of applications of the theorems of Fubini and Cavalieri.

At the end of the proof of Theorem 4.1, the punchline |f  − ∫ F| < ε follows from the observation that ∫ f  and ∫ F  are known to be in the same interval (∫  h, ∫ k) = (∫ H, ∫ K) whose length is <ε. (In presenting this proof in class I may have switched the roles of k&K with h&H compared with Edwards; sorry if this caused an ε of confusion.)

Towards the end of Monday’s class I gave an overview of the little of Chapter V that we will be able to cover in our final meeting on Wednesday. Inevitably this is only a teaser, and even in Math 55, when we had already covered exterior algebra (the natural context for “differential forms”) in the linear-algebra semester, I could only barely give an honest treatment of Stokes’ theorem. For us, an outline of Green’s theorem (the very special case which connects line and surface integrals in R²) will have to suffice. You can learn the larger story in its proper context and depth in a class on differential geometry (usually Math 132).

Monday, 30 April: Line integrals and Green’s Theorem

With only 1¼ class meetings left for Chapter V [the ¼ was the end of Monday], we barely have the time to introduce line integrals (a.k.a. path integrals) and Green’s Theorem, which is the very special case of “Stokes’ Theorem”  D∂ω = ∫ ∂Dω  in which D is a plane region (and one well enough behaved that ∂D makes sense). [The scare quotes(?) are because the result is actually due to Lord Kelvin — Stokes had no more to do with it than Pell did with “Pell’s equationx² − Dy² = 1; such misattributions happen surprisingly often… — plus the 1850 result is also a very special case, where D is a region in 3-space.] See Math 132 for the full story. Meanwhile here are some lecture notes to supplement the CAs’ “live lecture notes” for the day.

Already these very special cases are useful. The line-integral one (F(s) − F(r) = ∫γ dF, where γ is a path from r to s) becomes the “work-energy theorem” in Newtonian mechanics. For Green’s theorem, Edwards gives several applications, to which we can add a proof of the Fundamental Theorem of Algebra. There are at least two ways to use Green’s Theorem to obtain this result. The first uses the following corollary of  D∂ω = ∫ ∂Dω  if ∂ω vanishes throughout D then  ∂Dω = 0. If P were a nonconstant polynomial with complex coefficients and no zeros then we could take ω = dF / F and use a large disc for D to get a contradiction (though the fact that ∂(dF / F) = 0 is not immediately obvious and requires some verification, as does the behavior of  ∂Dω  as the radius goes to infinity; complex analysis (as in Math 113) explains why this works. A second route is to use the fact that P has harmonic real and imaginary parts (see Problem 2 of the 7th problem set), and thus can be evaluated at a point z0 by averaging over any circle centered at z0 (see the final problem of Math 25b). From this one can deduce that the function |P(·)| has no local minimum, and we already know that this is what we need to finish the topological proof of the Fundamental Theorem of Algbera.