Notes for
Math 25b: Honors Linear Algebra and Real Analysis II
(Spring 20[13–]14)
If you find a mistake, omission, etc., please
let me know
by e-mail.
Classroom: Math 25b meets MWF 11–12 in
Harvard Hall, Room 201.
𝄞
This is one of the rare Harvard classrooms outside the music building
that has a piano.
Here’s what it was
used for at the beginning of each class.
Textbooks:
Simmons, G.F.: Introduction to Topology and Modern Analysis,
McGraw-Hill 1963
Edwards C.H.: Advanced Calculus of Several Variables,
Dover 1994 (Academic Press 1973).
We shall use Simmons for metric topology,
and Edwards for differential and integral calculus in
R and R^{n}.
For topology see also the notes I wrote for Math 55
(which go further than what we’ll cover in 25b):
1,
2,
3,
4,
5,
6.
Application:
Fundamental Theorem of Algebra,
as promised in Math 25a (corrected: towards the end,
|a_{0}|,
not a_{0})
Problem set 1:
Metric topology basics
Revised Feb.1, see Problem 10
(which now gives a different hint for Problem 3);
and Feb.2, to fix the title (oops…).
[and
here’s the PDF from the Harvard site
that includes the mapping from problems to CAs]
Problem set 2:
Continuity, sequences, and compactness
[and the version with the CA assignments]
Problem set 3:
Uniform continuity, compactness, and completeness
[and the version with the CA assignments]
Corrected Feb.15 (typo in Problem 4: last clause
is for nx ≤ −1,
not nx ≤ 1; in the following text,
specified the value of the pointwise limit also at
x = 0; also fixed a trivial
spelling error in Problem 6)
For Problems 4 and 5:
here’s
a PostScript plot
showing f_{n}
for n = 1, 2, 3, 4, 5, 10
in black, red, orange, green, blue, and purple
respectively
Problem set 4:
More on completeness and compactness, and on polynomials of
a compex variable; start differential calculus
[and the version with the CA assignments]
Problem 6 (1.1 in Edwards p.61) postponed
till next week because we didn’t cover local max/min until Friday.
Problem set 5: Differentiation cont’d
[and the version with the CA assignments]
Corrected March 2 to fix a typo in Problem 2:
g(x), not G(x);
and March 6 to fix a mistake in Problem 6:
f returns maps from
R^{n}
to R^{p},
and g returns maps from
R^{m}
to R^{n},
not the other way around!
Problem 10 postponed
till next week because we didn’t cover the multivariable Chain Rule
until Friday.
Problem set 6: Differentiation cont’d
[and the version with the CA assignments]
Corrected March 11: part d of Problem 2 (= Edwards 2.5)
is wrong! The function of x,y given by
y^{2} for x = 0 and
x^{3} sin(1/x)
+ y^{2}
otherwise does have a partial derivative
with respect to x that is continuous at the origin.
The formula for x ≠ 0 should be
x^{2} sin(1/x)
+ y^{2}
(square, not cube).
Problem set 7:
The Laplacian; single-variable Taylor series
[and the version with the CA assignments]
problem 1 typos Corrected March 26:
stray “i)” removed, necessary + sign (before
“(C^{2}+D^{2})”) inserted
[Re Problem 6: Edwards writes (x−1)^{4},
not (1−x)^{4}. The two are equal.
Likewise for (x−1)^{n}
and (1−x)^{n}
when n is even; if n is odd they’re off
by a factor of −1, but I trust that if you can find the
maxima and minima of a function f
then you can do it for −f as well.]
Problem set 8:
Taylor series in one and more variables; multivariate critical points
[and the version with the CA assignments]
Typo in Problem 6 = Edwards 7.7:
the last term of the Taylor polynomial should be not
(−1)^{n}x^{4n}/(2n)!
but
(−1)^{n+1}x^{4n−2}/(2n−1)!.
(Edwards [and I] must have been thinking of
cos(x^{2}),
though then the beginning of the series is wrong.)
Problem set 9: Inverse and implicit functions
[and the version with the CA assignments]
Problem set 9¾, due Wednesday April 16 at 5PM:
Edwards, Exercises 1.4 through 1.9 (inclusive) on pages 213–214.
For 1.4, show that more generally if f
and g are continuous real-valued functions on some
metric space then so are
max( f, g)
and min( f, g),
by writing the max and min as
[( f +g) ±
| f −g| ] / 2.
[here’s the version with the
CA assignments]
Problem set 10½: Integrable functions
(and the version with CA assignments)
Final problem set (#12) (with CA assignments),
due Monday 5/5 at 5PM
(As noted in class,
Exercise 5.3 [= Problem 5] contains 5.1 as a special case.)
Monday, 27 January: Introduction to Math 25b
You might wonder why it’s taking us so long to get to the
calculus part of Math 25: the first semester
has come and gone and we have yet to even define the derivative
of a function of one real variable. The reason is that a
coherent treatment of multivariate calculus requires the
context of linear algebra and metric topology.
Linear algebra was of course the topic of Math 25a;
topology will occupy us for the first few weeks of 25b.
While one can study calculus in one variable without explicit
discussion of vector spaces etc., linear structures
can already be recognized there, and become ubiquitous in
calculus of several variables. Some key examples:
- Differentiation and integration are linear.
If f and g have derivatives
f ' and g',
then any linear combination af +bg
(for real numbers a and b) is differentiable,
with derivative af ' + bg'.
Likewise for the integral of a linear combination of integrable functions.
[Indeed linearity is so ubiquitous that students are sometimes tempted to
incorrectly generalize it to nonlinear functions,
resulting in “Freshman’s dream” errors such as
(x+y)^{n} =
x^{n} + y^{n} and
cos(x+y) =
cos x + cos y.]
- The derivative at a point of a function from
R^{m}
to R^{n}
is a linear transformation from
R^{m}
to R^{n}.
Recall that a function f of one variable has derivative
f ' (x_{0})
at x_{0} iff
f (x) is approximately
f (x_{0}) +
f ' (x_{0})
(x−x_{0})
for x near x_{0}
(in a sense we shall make precise before long).
We shall use the same formula for a function from
R^{m}
to R^{n},
with f ' (x_{0})
interpreted as a linear transformation. That is,
f is differentiable at x_{0}
iff
f (x) −
f (x_{0})
is approximated by a linear function of
x−x_{0}
for x near x_{0}.
In short, a differentiable function is locally constant + linear.
- The dy/dx factor in the change-of-variable formula
for integrals over one variable generalizes to a determinant
for multivariate integrals. In univariate calculus,
if y is a differentiable function of x
over some interval, then the integral of any
g(y) dy
can be written as the integral of
g(y(x))
|y'(x)|
dx.
If y is a differentiable function from
R^{n}
to R^{n}
(NB same dimension, i.e. m=n in the previous item)
then y'(x) is a square matrix,
and we shall obtain the same kind of formula but with
y'(x) generalized to the
determinant of that matrix — this is related with the interpretation
of the determinant of a linear transformation as a volume ratio.
What, then, of topology? Topology is our language for saying precisely
what we mean by “approximately” and “near”
in the above informal description of the derivative. Admittedly it is not
strictly necessary to develop either linear algebra or topology to give
a rigorous treatment of calculus, because such a treatment was known first
(though it took a long time, see historical note below), and both topology
and linear algebra arose around 1900 as a generalization of results that
had already been obtained in the 19th century and before. Still, it is
now recognized that — for both linear algebra and metric topology
— the additional investment in developing a general theory
pays for itself many times over when we reuse the same
concepts and results in different settings (already in Math 25b,
and all the more so in further study of mathematics) instead of
repeating the same argument each time we need an instance of the
rank-nullity theorem or the convergence of a Cauchy sequence
in a complete metric space. For metric topology, we shall begin by
axiomatizing the notion of a distance (a.k.a. metric)
implicit in “approximately” and “near”, and using
the distance to define and study limits, continuous functions, and other
building blocks that we’ll use in constructing most of our
calculus proofs.
Historical note(*):
The discovery/invention of integral and differential calculus
is generally attributed to
Newton (1642–1727)
and
Leibniz (1646–1716),
who engaged in a notorious priority battle over it.
Nontrivial results predate them; e.g.
Fermat (1604±3–1665)
integrated x^{r} dx
for any rational r,
Bhaskara II (1114–1185)
is “credited with knowledge of Rolle’s theorem”,
and the
“method of exhaustion” of classical
Greek geometry prefigures Riemann integration. But the
fundamental theorem of calculus,
linking differentiation and integration, was not known before
the second half of the 17th century. Still the definition of
the derivative, and indeed the notion of a function,
was only made precise in the late 19th century, culminating in the work of
Weierstrass (1815–1897).
The difficulty was already recognized by
George Berkeley (1685–1753),
who famously lampooned the tiny-but-not-quite-zero
dx’s and dy’s as
“ghosts of departed quantities”; but
Euler (1707–1783)
(in his
Introductio in Analysin Infinitorum) and
even Gauss (1777–1855)
relied on those evanescent “ghosts”,
and most of their analytical work survived the transition to
ε-δ calculus.
The impetus for the rigorization of real analysis came
not from outside challenges such as Berkeley’s,
but from within mathematics, notably the discovery of certain
Fourier series, such as the piecewise linear
but discontinuous
“sawtooth wave”
σ(x) =
−∑_{k≥1} sin(kx)/k,
that challenged the notion of a mathematical
“function”, let alone the derivative of
such a function (which one is tempted to construct by termwise
differentiation to an even less well-behaved series
“σ'(x)” =
−∑_{k≥1} cos(kx)).
(*) NB I am not a historian of mathematics; this sequence of events
is told in many secondary sources, but I do not claim to have read
most of the primary sources (many of them in Latin) myself.
Wednesday, 29 January: Metric topology I:
basic definitions and examples
For the first few weeks there will not be much commentary here
because it would duplicate what’s in the TeXed lecture notes
on metric topology. These notes are being edited so that the
references and some notations match what’s in our textbook
(Simmons, mostly Chapter 2) rather than the Rudin textbook that
accompanied Math 55 text when the notes were first written.
We shall have very little to say about general topological spaces
(see Chapter 3 in Simmons) beyond the definition, but will note when
some concept or argument is purely topological (i.e. can be stated
purely in terms of open sets, as with the fundamental concept of a
continuous function between metric spaces), because sometimes
we’ll have several different choices for the metric that yield
the same topology (e.g. the sup metric and Euclidean metric
on R^{n}),
and thus the same result for any topological notion
(e.g. a continuous function on the sup-metric
R^{n}
remains continuous with respect to the Euclidean metric).
By the way, we won’t officially cover Chapter 1,
but you may want to review it for fundamentals of Boolean algebra,
cardinality, etc.
Friday, 31 January: Metric topology II:
open and closed sets
To the notes about terminology (ball, sphere, neighborhood) at the start of the
second lecture notes,
I add that “neighborhood of p” is used nowadays to mean
“open set containing p” (equivalently: containing
some open ball centered at p), and that a set that is
simultaneously closed and open is sometimes known as a
“clopen set”
(though some deprecate this
portmanteau as an ugly word).
Monday, 3 February: Metric topology III:
continuous maps between metric spaces
(In class we covered almost all the key points here except that we
didn’t actually prove the topological characterization of
continuity! So this will have to be next time, together with
sequences which provide yet another equivalent condition.)
Wednesday, 5 February: Metric topology IV:
Sequences, convergence, and uniform convergence
(This topic will occupy us also for at least part of Friday’s lecture.)
It can sometimes be helpful to know that an equivalent condition for
convergence is that for all ε>0 there exists N such that
d(p, p_{n}) ≤ ε
once n ≥ N
(note the “ ≥ ”
rather than “ > ” signs;
it also works with only one sign changed). The reason is basically that
the closed ε-ball contains the open one
which in turn contains contains the closed
ε/2-ball, and we can always
change N to N+1 to go between
“ >N ”
and “ ≥ N+1 ”.
Friday, 7 February: Function spaces and uniform convergence;
Preview of Metric topology V: Compactness
As an example of the note on
“ ≥ ”
vs. “ > ”:
Logically, the “unwound” definition displayed
near the top of page 3 of the “Metric Topology IV” notes
isn’t quite right, because we could have
d( f_{n}, f )
= ε
even though
d( f_{n}(x), f (x))
< ε
for all x. But it still works because
we require it for every positive ε.
Note that Simmons postpones discussion of compactness until
Chapter 4, after introducing (general, i.e. not necessarily metric)
topological spaces in Chapter 3. Most of the results we’ll cover
are in the first few pages of Chapter 4 and in “24. Compactness
for metric spaces” starting on page 120.
As long as we work only in subsets
of R^{n},
the compact subsets will be precisely those that are closed and bounded
(the Heine-Borel theorem; as Simmons suggests on page 114,
the proof given there is more complicated than we’ll need,
and we’ll give one of the usual proofs via ε-nets).
In general, though, the “closed and bounded” condition
is necessary but not sufficient, even in a complete metric space
(more on completeness next week): a simple counterexample is
an infinite metric space with the discrete metric, which is
complete, closed, and bounded, but not compact (why?).
Monday, 10 February: Metric topology V: Compactness
You may have noticed that I’m soft-pedalling
the notion of “limit point”. Limit points
essentially duplicate limits of (sub)sequences, and I don’t think
we’ll ever need both notions, so for example I won’t
say much if anything about the “Bolzano-Weierstrass property”
because sequential compactness accomplishes much the same thing.
(It feels like this is the usual expository practice for this material
nowadays.)
Wednesday, 12 February:
Metric topology V cont’d — Sequential compactness
For the Proposition at the bottom of page 3: Recall that a subset
S of a metric space X is said to be dense
when it intersects every open ball; equivalently (and showing that this
is a topological notion), when every nonempty open subset of X
has nonempty intersection with S.
Friday, 14 February:
Metric topology VI: Completeness; compact = complete & totally bounded,
and the Heine-Borel theorem
We have several times used (at least implicitly)
the “Archimedean axiom” (a.k.a. the
“Eudoxus axiom”
— we shall meet Eudoxus’ name again when we discuss integration):
for every real number x there exists an integer
N > x; equivalently (by considering 1/ε),
for every real ε > 0
there exists an integer N such that
1/N < ε.
In classical Greek geometry one might say that any two lengths
l, L are comparable,
in the sense that one can divide L into finitely many intervals
none of which exceeds the length of l
(to see that this is the same,
let x = L / l,
or ε
= (L / l)^{−1}
= l / L).
This is in effect what we do when we construct an
ε-net in [0,1],
or more generally in any bounded real interval.
Another example: once we have shown that a sequentially compact
space X (or even a space X where
every sequence has a Cauchy subsequence)
is totally bounded, we can quickly deduce that
X is separable. Just take the union over integers n
of (1/n)-nets to get a countable set
(countable union of finite sets) that is dense
(any positive ε is less than some 1/n).
Monday, 17 February:
NO CLASS — University holiday
(Presidents’ [Presidents?
President’s?!] Day)
Wednesday, 19 February:
Metric topology VI cont’d: Lebesgue numbers;
continuous functions on compact spaces are uniformly continuous
Simmons gives as exercises the results that the continuous image of
a compact set is bounded (page 115, Exercise 7)
and that a continuous real-valued function on a compact set
attains its supremum and infimum (page 115, Exercise 8).
Here’s the alternative approach (via sequential compactness
rather than Lebesgue numbers) used in class to prove that
every continuous function f from
a compact space X to some metric space Y
is uniformly continuous. Suppose not. Then there is some
ε > 0 such that no δ works.
So (again exploiting Archimedes), for each n there exist
x_{n}, x'_{n}
such that
d(x_{n}, x'_{n})
< 1/n
but
d( f (x_{n}),
f (x'_{n})) ≥ ε.
Use sequential compactness to find a subsequence
{x_{ni}}
of {x_{n}}
that converges to some x. Then the
x'_{ni}
converges to the same x (why?).
But then f (x)
is the limit of both
f (x_{ni})
and
f (x'_{ni}),
which is impossible because the distance between
f (x_{ni})
and
f (x'_{ni})
is always at least ε. This contradiction proves that
f is uniformly continuous.
Friday, 21 February:
Metric topology: conclusion;
application to the
Fundamental Theorem of Algebra
Besides the fundamental [sic] importance of the result,
this proof of the Fundamental Theorem of Algebra
illustrates several techniques and ideas that we shall use repeatedly
in the development of multivariate calculus. (See the
Wikipedia article on this theorem for
an idea of other approaches that have been used to prove it.)
The fact that the absolute value of a polynomial f
never has a local minimum except at a zero (a.k.a. root)
of f also generalizes
to a fundamental property of differentiable complex-valued functions
of a complex variable, as you’ll see when you take
Math 113 or study complex analysis in some other setting.
Monday, 24 February: Start multivariate differential calculus
We start with Chapter II of Edwards.
Single-variable calculus studies functions from (nice subsets of)
R to R;
this is the special case m = n = 1 of
functions from (nice subsets of)
R^{m} to
R^{n},
which are the topic of multivariate calculus.
As long as we fix m at 1,
letting n be arbitrary doesn’t change much at first,
though beware that Rolle’s theorem already fails for n=2
(see the last problem on the
fourth problem set).
An equivalent definition of differentiablity that does not explicitly
single out h = 0:
the function f
has derivative f '(a)
at a if for all ε > 0
there exists δ > 0 such that
|| f (a+h) −
f (a) −
h f '(a)||
≤ ε |h|
for all h such that |h| < δ.
(Note the use of “≤” rather than “<”
which makes the inequality hold also for h = 0;
it would also be OK to require the inequality whenever
|h| ≤ δ,
via the usual trick of halving δ.)
Whether we use this definition or the usual one given by Edwards,
we must make sure that both a and a+h
are in the domain of f, because we want to be able to
differentiate functions such as 1/x or
x^{½} that are not defined
on all of R. Note that if the domain is
open then as long as a is in the domain
we know that once δ is small enough all choices of h
with |h| < δ yield
a+h in the domain too.
A function f from
(a subset of) R to
R^{n}
is just an n-tuple of functions
f_{i} from
(that subset of) R to R,
and then f is differentiable iff
each of the coordinate functions f_{i}
is differentiable, in which case the derivative
of f is just the
n-tuple of derivatives of
the f_{i}.
Once we’ve proven this we soon recover the formulas for
differentiating products and compositions of functions
R → R^{n},
which Edwards gives in Theorem 1.1 (page 59–60),
from the corresponding formulas for n = 1.
But we haven’t yet given ε-δ proofs
of these single-variable formulas! The formula for the derivative
of a product of differentiable functions is a bit tricky to prove;
since we’ll soon prove the multivariate chain rule,
we’ll be able to recover the formula for the product of
two differentiable real-valued functions
f, g
from the chain rule together with the derivative of the single function
R^{2} → R
taking (x, y) to xy
— which is the same kind of trick we used to show that the product of
two continuous real-valued functions is continuous.
In any case we’ll need the notion of
a differentiable function of two real variables to properly do
“implicit differentiation” even though this topic
is often presented already in texts on single-variable calculus.
(It’s not hard to use single-variable techniques to differentiate
a function y(x) defined implicitly by a relation like
sin(x^{3}+y^{3}) = 3xy
assuming that the derivative exists, but not so easy if we
don’t know this in advance!)
[NB there are two forms of the lower-case Greek letter φ,
and (at least in the edition I’m looking at now)
both appear in the statement of Theorem 1,
one in the introductory text and the other in formulas (3) and (5)
and in the proof on page 60; there’s no distinction between
them, and Edwards should have used the same φ throughout.
(In TeX the two forms are obtained in math mode using the commands
\phi and \varphi .) While I’m at it,
I see a typo in the second displayed block on page 60:
f ('g(t))
should be f '(g(t)).]
For any subset G of R,
Theorem 1.1 (generalized routinely to functions on G)
includes (how?) the result that the differentiable functions from
G to R^{n}
form a vector space, and differentiation is a linear transformation
from that space to the space of functions from
G to R^{n}.
The kernel contains all constant functions (and even all locally
constant functions, i.e. functions such that each x
in G has a neighborhood on which the function is constant);
When G is an interval, we shall soon show using Rolle’s
theorem that this is the entire kernel. Describing the image
of this linear transformation is a notoriously hard problem, even when
G is an interval. (Note that these vector spaces as
“very” infinite-dimensional, so we can’t use
the rank-nullity theorem…) When we prove the Fundamental
Theorem of Calculus we’ll see that the image includes all
continuous functions on G; but there are also differentiable
functions whose derivative isn’t everywhere continuous.
Wednesday, 26 February: Differentiable vector-valued functions
of one variable, cont’d
Yet another equivalent definition of the derivative of a function
f from
G to R^{n}:
the derivative at a exists, and equals
f '(a),
iff there exists a function s
(“s” as in “slope”) on
G − a =
{h : a+h ∈ G}
=
{x−a : x ∈ G}
such that
f (a+h)
= f (a) + h s(h)
for all h in G − a
and s is continuous at h=0, in which case
f '(a) = h(0).
This makes it easy to prove that the product of functions differentiable
at a is itself differentiable at a
and to give the formula for its derivative there.
By induction we obtain the formula for the derivative of the product
of n continuous functions for each integer n.
Friday, 28 February: More about single-variable differential calculus
An essential application of the derivative is to local extrema
(minima and maxima):
if a function from
G ⊆ R
to R has a local maximum or minimum at
an interior point a of G,
and the function is differentiable at a,
then its derivative at a equals zero.
Combining this with Heine-Borel gives a rigorous foundation of the
familiar method for finding the maximum and minimum of a differentiable
function on an interval. It also yields the proof of
Rolle’s theorem,
and thence of the
Mean value theorem,
which as promised
shows that a function on an interval
whose derivative is everywhere zero is the constant function.
We first prove this result (kernel of the derivative map
equals constant functions) for functions taking values in R,
and then deduce it for functions from an interval
to R^{n}
by considering each coordinate separately.
Monday, 3 March: Starting multivariable differential calculus:
partial derivatives, directional derivatives, and the derivative and
differential at a (usually interior) point
See Edwards, section 2 of Chapter 2. We introduced this by solving
the two-variable optimization of finding the largest value of
xyz for nonnegative reals
x, y, z such that
x + y + z = 1.
(Two variables, not three, because we can solve for z.)
For this purpose partial derivatives suffice, but for other applications
we do need the stronger condition of differentiability.
Wednesday, 5 March: multivariable differential calculus, cont’d:
the gradient; continuous differentiability
In the special case m=1, i.e. a function f
from a neighborhood of
a ∈ R^{n}
to R, if f is differentiable
at a
then its derivative is a row vector of length n, called
the gradient of f at a,
written ∇f (a).
[Edwards, page 70–71.
“∇f ” may be pronounced
“del f ”.
The upside-down-capital-Delta symbol ∇ itself is called a
nabla,
produced in TeX by writing \nabla in math mode;
it was named for the Aramaic word for “harp”
(cognate with modern Hebrew NEVEL),
but it seems that my recollection that this name was introduced by
Tullio Levi-Civita (1873–1941)
was mistaken.]
If ∇f (a) exists
then it points in the direction where f
increases fastest, because the directional derivative of
f in the direction v is
the scalar product
∇f (a) · v.
If f has a local extremum at a
then ∇f (a) = 0;
in general a point a where
∇f (a) = 0
is called a “critical point” of f.
(NB Already for n=1 we know that there can be critical points that
are not local extrema; in dimension 2 and higher we shall see that there are
even more possibilities for the local behavior
of f at a critical point.)
For general m,
we can write F as an ordered m-tuple
(F^{(1)},
F^{(2)},
…,
F^{(m)})
of real-valued functions. Then we easily show
that F is differentiable iff each of its coordinates
F^{(j)}
(j = 1, 2, …, m) is differentiable.
[Edwards, Lemma 2.3 on page 71.]
In this case, the j-th row of the derivative
F '(a) is the gradient
of F^{(j)}.
This gives us an interpretation of each of the rows of the
derivative matrix, as we earlier interpreted each of the columns
as a partial derivative. The individual entries
of F '(a)
are then the partial derivatives
D_{i}F^{(j)}(a)
of the components of F [Theorem 2.4 on page 72].
There are plenty of (counter)examples of functions on open sets in
R^{n} that have partial derivatives
or even directional derivatives but are not differentiable at some point.
[Edwards notes the cases of
2x^{2}y /
(x^{4} + y^{2})
and
2xy^{2} /
(x^{2} + y^{2})
as Example 4 on page 69 and Exercise 3 on page 75
respectively.]
However, if all the partial derivatives
D_{i} f (b)
exist for all b in some neighborhood of a,
and are all continuous at a,
then f is differentiable at a
(and the partial derivatives are the columns of
f '(a) as usual).
This is Theorem 5 on page 72,
proved (on pages 72–73) by moving from
a to a+h by changing one coordinate at a time.
Such f is then said to be
continuously differentiable at a.
(Thus we naturally say that f is
“continuously differentiable on G”
for some open set G in R^{n} if
f is continuously differentiable at a
for every a in G.)
Friday, 7 March: multivariable differential calculus, cont’d:
the multivariate Chain Rule
See Edwards Theorem 3.1 (pages 76–77). This is much like the
proof for the single-variable case, except that we need the fact that
if T is a linear transformation
from R^{n}
to R^{m}
then there exists a real number M such that
||Tv|| ≤ M ||v|| for all
v in R^{n}.
(We needed this in the single-variable case too, but there it was
immedaite because T was a 1×1 matrix, i.e. a scalar, and
we could simply use M = |T|.)
Recall that if v has coordinates
a_{1},
a_{2}, …, a_{n}
then
Tv = a_{1}Te_{1}
+ a_{2}Te_{2}
+ …
+ a_{n}Te_{n},
where
e_{1},
e_{2}, …, e_{n}
are the unit vectors
in R^{n}.
By the triangle inequality it follows that
||Tv|| is at most
||a_{1}Te_{1}||
+ ||a_{2}Te_{2}||
+ …
+ ||a_{n}Te_{n}||.
Each term ||a_{i}Te_{i}||
equals |a_{i}| ||Te_{i}||.
Since each |a_{i}| is no larger than ||v||,
we deduce that
||Tv|| is at most
||Tv|| ||Te_{1}||
+ ||Tv|| ||Te_{2}||
+ …
+ ||Tv|| ||Te_{n}||,
so we may take
M = ||Te_{1}||
+ ||Te_{2}||
+ …
+ ||Te_{n}||
and we’re done.
The existence of M can also be obtained by continuity of T
(which we already know from an early problem set): choose any ε,
say ε=1, and find δ such that
||v|| < δ implies
||Tv|| < ε;
by continuity it follows that
||v|| ≤ δ implies
||Tv|| ≤ ε;
and then linearity of T,
together with the identity
||ax|| = |a| ||x||,
gives us
||Tv|| ≤ (ε/δ) ||v||
for all v, so we succeed with
M = ε/δ.
But then if you unwind the proof of continuity of T
you’ll see that this proof is actually not all that different from
the argument in the previous paragraph.
Monday, 10 March: First midterm examination
Wednesday, 12 March: Midterm post-mortem; continuous mixed partials
commute
Friday, 14 March: No class
Monday–Friday, 17–21 March:
Spring Break
Monday, 24 March: Taylor’s formula for functions of one variable
Sections 4 and 5 of Chapter II are oddly placed: they develop
special cases of material that is covered many pages later,
and they even refer to results from later sections in the book.
It seems that this is done in part to motivate the more general
treatment of constrained max/min problems later in the book,
so you might want to read through those pages for this reason;
but for now we proceed to section 6.
Meanwhile Theorem 4.3 (page 95), describing positive-definite
quadratic forms in two variables, should be familiar from Math 25a.
Some context: if f is a real-valued function
of one variable that is infinitely differentiable in a neighborhood
of a then it has a formal Taylor series
Σ_{n≥0}
c_{n} (x−a)^{n},
where each coefficient c_{n} is 1/n!
times the n-th derivative
of f
at x=a.
But these coefficients might grow too fast for the series to converge
at any x≠a;
e.g. one can concoct an infinitely differentiable function with
Taylor series
Σ_{n≥0}
n! (x−a)^{n}.
Indeed it is known that every sequence
(c_{0},
c_{1}, c_{2}, … )
arises for some f.
Moreover, even if the series does converge it need not converge
to f. A famous counterexample
is the function sketched in Edwards page 124,
which equals exp(−1/x^{2})
for nonzero x and vanishes at x=0:
the Taylor series has c_{n} = 0
for each n, and thus does not converge
to f
except at x=0.
[In linear-algebra terms, there infinitely differentiable functions
form a vector space, as do the sequences
(c_{0},
c_{1}, c_{2}, … );
the map taking any function to its Taylor series about
x=a is a linear transformation,
and this map is surjective (in particular the image need not yield
a convergent Taylor series) but not injective (indeed the kernel is
infinite dimensional).] So it is a nontrivial question how well
a function is approximated by finite sums of its Taylor series.
Typo in the text: in formula (5) on page 118
(statement of Theorem 6.1,
Lagrange form of the remainder in the
Taylor expansion)
the denominator should be (k+1)!,
not k+1 as written.
(The factorial does appear correctly in the two displayed equations
immediately following.)
At the bottom of page 121 Edwards gives a proof of e<4
using integration and the logarithm function, neither of which we have
discussed yet. (To be sure, we haven’t discussed the exponential
function either, nor developed enough of a theory of power series
to justify the termwise differentiation needed to prove that the
Taylor series for e^{x} is its own derivative…)
I’m not sure why Edwards takes that detour,
because he’s just proved the formula
e^{x} =
Σ_{n≥0} x^{n}/n!,
and taking x=1 easily yields the better bound e≤3:
the first three terms are 1+1+½ = 2½,
and each succeeding term 1/n! is less than
1/2^{n−1}
(compare n! =
2·3·4···n
with 2·2·2···2),
so for n>2 the n-th partial sum
is less than 3 − (1/2^{n−1})
(by the formula for a geometric series, proved by induction or
“telescoping series”),
whence the limit is at most 3 as claimed.
There aren’t many functions f
whose k-th derivatives
are all simple enough that we can easily use Theorem 6.1
to prove the Taylor series converges to f :
Basically just linear combinations exp(bx),
sin(bx) and cos(bx)
(which are essentially exp(ibx)
by Euler’s “cis” formula),
and power functions
(x−x_{0})^{r}
(which yield the binomial formula for arbitrary real exponent, though
curiously only for ¾ of the correct interval; e.g. the
Taylor series about x=0 for
(1+x)^{r}
converges to that function for all x in (−1,1),
but we cannot derive this from Theorem 6.1 if
x < −½).
[Added later: also logarithmic functions such as
log(1+x), for which the first derivative
yields a power function with exponent r = −1.]
Still we shall make good use of this Lagrange form for fixed k
even when we cannot effectively use it for all k at once.
Wednesday, 26 March: Proof of Taylor’s formula with remainder
for functions of one variable; derivative tests for local maxima and minima
For the derivative tests (Theorem 6.3), it is enough to assume that
the (k+1)-st derivative is bounded
in a neighborhood of a, since that is the hypothesis of
Corollary 2. In fact the (k+1)-st derivative
is not needed at all as long as the k-th derivative
is continuous in a neighborhood of a;
this can be seen by applying Theorem 6.1 to the
Taylor remainder R_{k−1}.
But in practice it will hardly ever matter: we’ll almost(?) never
have occasion to apply these derivative tests to a function that is not
infinitely differentiable.
Friday, 28 March: Start on multivariate Taylor
At a first pass through section 7 of the text
it may be hard to see through the thicket of multiple indices.
To get a feeling for what’s going on.
it may help to first ork out the cases n=2
(and check that the formulas also work in the known case n=1).
For instance, start with the example that spans pages 130–131
(second and third directional derivatives
in R^{2}).
It will also help a bit to correct
yet another missing-factorial typo in the text:
on page 131, the third formula in the multi-line display
just below formula (3) should have a factor of
1/k! before the Σ.
[Equivalently, divide the multinomial coefficient by k! to get
1 / ( j_{1}!
j_{2}! … j_{n}!).]
Monday, 31 March; Wednesday, 2 April:
multivariate Taylor, cont’d
In Lemma 7.3 and Theorem 7.4 (page 135), the polynomials are to be of degree
at most k; such polynomials form a vector space,
whereas those of degree exactly equal k don’t quite.
In Lemma 7.3, strictly speaking Edwards should prove that
if P is a polynomial of several variables (a.k.a. a polynomial
function of a vector) that is not the zero polynomial then there is some
vector b at which P(b) ≠ 0.
In Math 25a this was shown for polynomials in one variable
[with the observation that the proof depended on the field R
being infinite, since in the case of a finite field (such as the
integers mod 2) there are polynomials that vanish at every
field element without being zero identically (such as
x² − x
for the 2-element field).] Once we have the one-variable result
we can do the general case by induction on the number of variables:
assume it for polynomials in n≥1 variables,
and write any polynomial in n+1 variables as
P(b) =
P_{0}
+ P_{1} b_{n+1}
+ P_{2} (b_{n+1})^{2}
+ ···
+ P_{d}
(b_{n+1})^{d}
where each P_{j} is a polynomial in the first n
coordinates
b_{1}, …, b_{n}.
Then by the 1-variable lemma from Math 25a,
if P is always zero then each of the P_{j}
is always zero; and then by the inductive assumption each
P_{j} is the zero polynomial.
(Alternatively, now that we know about partial derivatives,
we can observe that if P is everywhere zero then so is
each of its partial derivatives, and thus each k-th
partial derivative for every k; but we’ve already seen
how to isolate any coefficient of P by taking the corresponding
partial derivative, dividing by the appropriate product of factorials,
and evaluating at the origin — and since this recipe always
yields zero we conclude that every coefficient of P is zero,
as desired.)
As in the single-variable case, Corollary 7.2 and Theorem 7.4 require only
that f have k continuous derivatives
at a (and thus Theorem 7.5 on page 138 requires only
two continuous derivatives, not three); but in practice we’ll
hardly ever (never?) have use for this refinement.
Friday, 4 April: critical points, and the diagonalization of
quadratic forms
If q is a quadratic form on R^{n}
(some integer n>0), then
the associated Rayleigh quotient R(x)
is defined for all nonzero vectors x by
R(x) = q(x) / ||x||
= (x, Ax) / (x, x)
where A is the associated symmetric matrix and
(·,·) is the inner product.
Clearly R is homogeneous of degree zero:
R(cx) = R(x)
for all nonzero vectors x by and scalars c≠0.
If y is any vector orthogonal to x then the
directional derivative of R(cx)
in the direction y is
2(Ax, y) / ||x||.
If we already know that A has an orthonormal basis of
eigenforms, i.e. that we can choose coordinates for which
q(x) = Σ_{i}
λ_{i} x_{i}^{2},
then it is clear that R is maximized (minimized) at an
eigenvector with maximal (resp. minimal) eigenvalue. Conversely,
we calculate that if some nonzero x is
a critical point for R
then every y that is orthogonal to x
is also orthogonal to Ax. But this makes
Ax an eigenvector of the symmetric matrix A.
Moreover, there must be at least one critical point:
R gives a continuous function on the unit sphere
S^{n−1}
in R^{n},
and this sphere is compact (Heine-Borel), so R
attains its supremum at some vector x,
which is then a local maximum for R even
on R^{n} by homogeneity.
So we have found an eigenvector x of A
(with eigenvalue equal to R(x)),
and can deduce that A can has an orthonormal basis of
eigenvectors by an inductive argument that you probably remember from
Math 25a (if n=1 we’re done; otherwise
consider the action of A on the orthogonal complement
of x, which has dimension n−1
and is mapped to itself by A because
once x is an eigenvector we have
x ⊥ y ⇒
0 = (Ax, y)
= (λx, y)
= (x, Ay)
⇒ x ⊥ Ay, etc.).
P.S. I see that I never explained the term “Hessian” for
the matrix A (or its determinant, see page 151 of Edwards).
It is named for Ludwig Otto
Hesse (1811–1874);
see the Wikipedia page on
“Hessian matrix”.
Monday, 7 April, and Wednesday, 9 April: inverse and implicit functions etc.
The main theme of Chapter 3 is the differential behavior of
“implicit functions”, i.e. functions
y(x) defined implicitly
by a relation
G(x, y) = 0.
This is familiar from single-variable calculus, but is also useful
in our multivariable generality to constructing a function from
R^{m}
to R^{n}
as long as G=0 gives n conditions on y,
that is, as long as G is a function (an open subset in)
R^{m+n}
to R^{n}.
If y is a differentiable function of x
then the Chain Rule gives
G_{x}
+ (dy/dx) G_{y}
= 0,
which we can solve for dy/dx
provided that G_{y} is invertible
(NB this makes sense because G_{y}
is a linear map from R^{n}
to R^{n}).
But it is not obvious that y is even well-defined,
let alone differentiable, in a neighborhood of any point where
G(x, y) = 0
and G_{y} is invertible.
In fact one can construct counterexamples already when
m=n=1 in which
G is continuous and differentiable but
G=0 does not define an implicit function.
It turns out that such a difficulty can arise when
G_{y}, though defined at every point,
is not a continuous function. The main result of Chapter 3
is that y is well-defined and differentiable in a neighborhood
of any point where
G(x, y) = 0
and G_{y} is invertible,
as long as G is continuously differentiable
near that point.
A closely related notion is an inverse function,
where we have a differentiable map f from
an open set in R^{n}
to R^{n},
and seek a differentiable inverse function
f ^{−1}
in a neighborhood of
some f (x_{0}).
Again it turns out that such an inverse function exists provided
f '(x)
(recall that for each x this is a linear map
from R^{n}) is invertible
and continuous at x_{0}.
This is clearly a special case of the implicit function theorem,
namely the case with m=n and
G(x, y) =
f (y)−x.
But conversely once we work in spaces of arbitrary (finite) dimension
we can derive the implicit function theorem from the inverse function
theorem! Consider the function of m+n variables
taking (x, y)
to (x, G(x, y)).
Its derivative is a block-triangular matrix (shown at the top of
page 191 in Edwards) that is invertible
iff G_{y} is invertible,
and once we have a differentiable inverse function we can restrict it
to the subspace y=0 to recover the desired implicit function.
This is the approach Edwards (and we) take, first proving the
inverse function theorem (Theorem 3.3 on page 185),
and deducing from it the implicit function theorem
(Theorem 3.4 on page 190).
It will be convenient (though not logically necessary) to prove
the inverse mapping theorem in the special case
a = f (a) = 0
and f'(0) = Id. The general case
can be deduced from this via the Chain Rule by composing with
translations and linear transformations.
This is essentially Edwards’ approach
(see Lemma 3.2 on page 183, preceding the proof of the full
inverse function theorem (3.3) starting on page 185),
though since we are skipping section 2 which defines the operator norm
|| df_{x}−I ||
we just say that each coordinate of
df_{x}−I
is less than ε and thus that
df_{x}−I
multiplies norms by at most nε.
Hence we replace the estimates
(1±ε)r by
(1±nε)r.
For the inverse mapping theorem, Edwards asserts on page 182
that “It is easy to see that invertibility of
df_{a} is a necessary condition
for the local invertibility of f
near a.” In fact this is necessary if we require
the inverse function to be differentiable as well, but it is possible
for f to have a continuous (but not
differentiable) inverse even where df
is not invertible, even in dimension 1: consider
f (x) = x³.
(See also Exercise 3.3 on page 194 of Edwards, which gives a map from
R³ to R³
which has an inverse function g which is not differentiable
at the origin; where else is this g not differentiable, and why?)
Edward’s (and our) key tool is the contraction mapping theorem,
introduced and motivated in Section 1 of this chapter.
The notion of contraction mapping (page 162)
makes sense in any metric space: just replace (3) by the condition
d(ϕ(x), ϕ(y))
≤ k d(x, y).
Theorem 1.1 (p.162) and its proof (pages 162–163)
then holds provided the space is nonempty and complete.
[Note that Edwards requires a closed interval in R.
In a non-complete space, there might not be any fixed point;
e.g. consider the contraction mapping
x ↦ x/2 of (0,1).
But if there is a fixed point then it is still unique for
the reason given on page 163.] This is not just generalization for
generality’s sake: it contains Theorem 3.1 (page 181–182),
and indeed also Theorem 1.4 (which in effect constructs
f as the fixed point of a contraction of
a closed subset of the function space
C([a, b]) ).
This strategy of using a contraction mapping is also a key ingredient
in the existence theorem for differential equations, though we probably
won’t reach this result in Math 25b.
[There is no need to assume that our metric space is bounded,
as Edward suggests on page 182; but in the present context
we shall apply the contraction mapping theorem only to bounded spaces.
Indeed, we often start with a function ϕ that is initially defined on
a larger unbounded space S, we’ll sometimes have to
find a suitable subspace S_{0} that is mapped
to (a subset of ) itself under ϕ and for which
the restrction of ϕ to S_{0} is a contraction
(which it might not be on all of S); and this
S_{0} will often have to be bounded.]
We shall skip Section 2; the tools developed in that section,
interesting and important though they are for other purposes(*),
are more than we need, which is basically that if all the entries
of a matrix A are of absolute value at most ε then
||Ax|| ≤ Kε||x||
for every vector x, where K depends only on
the size of A; for instance if we use the sup metric
then we can take K to be the length of x
(i.e. the dimension of the domain of the linear transformation).
(*) For example, if a symmetric matrix A is positive-definite
then its norm (relative to the Euclidean norm) is the largest eigenvalue,
and for a general symmetric A the norm is the maximum of
|λ| as λ ranges over the eigenvalues
of A.
Does the two-dimensional Example 1 on page 183 look familiar?
It’s just the map
z ↦ z²
of the complex plane. So of course it is two-to-one on the
complement of the origin. This might also help with Exercise 3.1
on page 191.
Like Edwards (see the brief Section 5 = pages 201 and 202),
we shall punt on proving the C^{k} versions
of the implicit and inverse function theorems,
whose proofs are somewhat tedious and messy,
and introduce no fundamentally new ideas.
However, in some special cases we get the higher derivatives
more-or-less for free. For example, suppose we know that
log(x) has derivative 1/x
for all x > 0.
Then it follows that the inverse function
(a.k.a. exp(x))
is its own derivative, and then by induction its own
k-th derivative for each
k = 2, 3, 4, ….
Friday, 11 April: Introduction to volume and integration in
R^{n}
Edwards defines the integral of a function on a bounded subset
of R^{n} via volumes of
bounded subsets of R^{n+1}.
This approach has very old roots in the
“method of exhaustion”
of classical geometry. We cannot reasonably expect to assign a volume to
every bounded subset of R^{n};
for example the subsets of R^{3}
that occur in the
Banach-Tarski paradox cannot all have volume
without violating at least one of the basic properties/axioms
that we need volumes to have (see the list on page 203
at the start of Chapter IV of Edwards) — or assigning
a volume of zero to the sphere, which is not useful to us either.
One could strengthen the axioms to require additivity under
countable unions of sets with pairwise disjoint interiors;
but while such a theory of volume can be developed,
it is much harder, and we relegate it to a class such as Math 114.
Edwards’ definition excludes even countable sets such as
Q ∩ [0,1],
which “should” be negligible
(i.e. should have zero length [= 1-dimensional volume]);
but it does assign a volume to a rich enough collection of subsets
of R^{n} to let us integrate
any continuous function, and also some mildly discontinuous functions
such as characteristic functions of intervals. Perhaps because his
notion of volume is more restrictive than the countably-additive one you
might encounter in the future, Edwards sometimes calls it
“content”, and refers to a set with a well-defined volume
or content as “contented” (be sure to accent the first syllable!).
One key point that Edwards does not sufficiently emphasize is that
we need to make sure that the volume is well-defined, i.e. that
the definition does not allow more than one value of the same subset
of R^{n}! In Greek geometry
it is basically assumed that every reasonable region in the plane
or in space has a single area or volume, and that these satisfy the
properties on page 203; but this is not easy to show directly,
and it is well known that one can construct dissections that seem
to violate this assumption, such as this
“infinite chocolate” GIF
(see also
this Wikipedia page
and these
two
variations
on my extension of one of those classes of dissections).
This is why Edwards defines his volume using only finite unions of
boxes (“intervals”) whose sides are parallel to the
coordinate axes, i.e. Cartesian products of bounded intervals in the
n coordinates of R^{n}.
This lets us show the key fact that if the one such union,
call it A (with pairwise disjoint interiors),
is contained in another, say B, then
the total volume of A
is no larger than the total volume of B.
(This is done by listing for each i
the numbers that occur as a coordinate of some box of
A or B, sorting each of these n
lists in increasing order, and splitting each box into sub-boxes
each of whose coordinates is bounded by consecutive numbers
in its list. One then need only check that each box is assigned
the same volume as the sum of its sub-boxes’ volumes.)
This means that the volume is not obviously invariant under
rigid motions of R^{n},
but we can then prove this invariance as a theorem, and
show more generally that a map of the form
x ↦ Ax + b
(where b is a fixed vector and A is any
n × n matrix)
multiplies the volume of any set by
|det(A)|
(assuming that our set had a volume to begin with).
[Added later: Edwards does hint at this technique with the
“partitions” introduced on page 215 (proof of Theorem 2.1:
sets with volume are precisely the bounded sets with negligible boundary);
our collection of sub-boxes is a partition, and such partitions suffice
for the proof of Theorem 2.1.]
Monday, 14 April: Integration over bounded intervals (or more generally
over boxes in R^{n}); the
Fundamental Theorem of Calculus
When we use the area in R² to define the integral of
a bounded function on an interval we are naturally led to upper and lower
Riemann sums.
The integrability (= existence of the relevant areas)
of any continuous function on a closed interval
is then a consequence of the fact that the interval is compact
(Heine-Borel) and continuous functions on compact metric spaces
are uniformly continuous. This generalizes readily to integrals
over boxes in R^{n}, which exist for
continuous functions on closed boxes for much the same reason.
Some basic properties of the area, and thus of the integral,
already suffice to recover the Fundamental Theorem of Calculus:
all we need is that
∫_{a}^{b}
f (x)
= ∫_{a}^{c}
f (x)
= ∫_{c}^{b}
f (x)
for any integrable function f
on an interval [a, b]
that contains c, and that if f
takes values in [m, M] then
∫_{a}^{b}
f (x)
is in the interval
[m(b-a),
M(b-a)].
We shall find many uses for this theorem in the remaining weeks of
Math 25b. For starters, we can use it to find a function
f on the positive reals whose derivative is
1/x; and we have already seen that the inverse function
of such f is its own derivative, which
lets us construct the exponential function and prove its
standard properties. (One can construct the trigonometric
functions in the same way by integrating
1 / sqrt(1−x²)
or 1 / (1+x²)
and forming the inverse function, though it takes some more work
to recover the periodicity and basic identities satisfied by
these functions.)
Monday, 16 April: More about integration in R^{n}
and volume (a.k.a. content) in R^{n+1}
A bounded set in R^{n} set has content
if and only if its boundary is negligible (of content zero,
i.e. is contained in a finite union of boxes of total volume
<ε for every ε > 0).
This is Theorem 2.1 in Edwards chapter IV (pages 215–216).
A finite union of negligible sets, or an arbitrary subset of
a negligible set, is again negligible; and if a set A
has volume v then so does any set A'
obtained from A by adding or removing a negligible set.
A function F from a metric space to R
is said to have bounded support
if there is a bounded set B such that
F(x) = 0
for all x not in B.
F is said to have bounded support
if B can be taken to be compact. For functions
on R^{n} the two notions are the same
because of Heine-Borel (and because the closure of a bounded set
is again bounded). Edwards says (p. 219) that a function
F: R^{n} → R
is admissible if it has bounded support (equivalently,
compact support) and is continuous outside a negligible set.
In this case the ordinate sets of F_{+}
and F_{−} have volume, so
F is integrable, and we readily obtain the
four basic properties or axioms of integrals enumerated on page 218.
Even if there are integrable functions that are not admissible,
the admissible functions are all that we shall need.
Note that even in R the cardinality of a bounded set S
does not correlate all that well with the “volume”
(a.k.a. length, in this 1-dimensional setting)
of S; both notions are trying to get at
“how large” S is, but in rather different senses.
It is true that any finite set is negligible,
and a set with positive volume must have cardinality c
(the continuum) because it contains an interval. But there are
countable sets that are not negligible, such as the rational numbers
in (0,1): the boundary of this set is the entire interval
[0,1], which is certainly not negligible, so the
criterion of Theorem 2.1 fails. Conversely, there are
negligible sets of cardinality c. This is clear
in dimensions 2 or above (use an interval), but even R
contains such sets, for example the Cantor set C.
(In step k of the construction of C
we find 2^{k} intervals,
each of length 1/3^{k},
whose union contains C; since the total length
(2/3)^{k} of these intervals
approaches zero as k → ∞,
we can find for every ε > 0
a finite union of intervals that cover C
and have total length less than ε,
so C is negligible as claimed.) It follows that
there are as many negligible subsets of R
as there are arbitrary subsets of R:
both cardinalities are 2^{c}
(any subset of the Cantor set is again negligible,
and this already gives us 2^{c} negligible sets).
Friday, 18 April: Second midterm examination
Monday, 21 April: Step functions and Riemann sums; midterm post-mortem
Wednesday, 23 April: More applications of the integral
Some examples of the power of the tools we have developed so far:
Intermediate value and Taylor revisited.
If F is continuously differentiable on
[x, y]
then the Fundamental Theorem of Calculus says
F(y)−F(x)
is the integral of the derivative F'
over [x, y].
Thus if the derivative is always in
[m, M]
then (by our basic axioms or properties of the integral)
F(y)−F(x)
is in
[m(y−x), M(y−x)].
This isn’t quite our statement of the Intermediate Value Theorem,
but it is a consequence of it, and is equivalent in our present case of
a function with a continuous derivative, which will almost always be
the case for us (and indeed it is these bounds on
F(y)−F(x)
that we usually need, as in that contraction problem on the second midterm).
With some more work we can obtain Taylor’s theorem with remainder
for functions with a continuous derivative of order k+1.
[“More work” = integration by parts plus induction, plus remembering
to treat both y > x and
x > y, which are not
equivalent once k > 0.]
Termwise differentiation of convergent power series
in the interior of their interval of convergence. In general
the pointwise limit, or even the uniform limit, of a sequence
{F_{n}} of differentiable functions
need not be differentiable. However, we can go in the opposite
direction thanks to the Fundamental Theorem of Calculus and
Exercise 3.4 (which we cover in class today): given x_{0},
an interval I containing x_{0}, any sequence
{F_{n}} of continuously differentiable
functions on an interval whose derivatives converge
uniformly to some f, and for which
F_{n}(x_{0}) is constant,
does converge uniformly to a function F whose derivative
is f. Applying this to the sequence of partial sums of
a power series yields the differentiability of power series inside their
interval of convergence. (For this we also need the fact that the
derivative of a power series has the same interval of convergence,
which ultimately comes down to the fact that as
n→∞ then n-th
root of n (which enters into the
limsup formula for the radius of convergence
of the derivative) approaches 1.
Linear transformations and volume.
We have yet to make good on our promise of showing that
our definition of volume is invariant under rigid motions of
Euclidean space: so far only translations and coordinate permutations
are easy. But by now we know enough to prove more generally that
if A has volume v(A)
and T is any linear transformation then
T(A) has volume
|det(T)| v(A).
Not only will this yield the invariance of volume under rigid motions
as a special case, but it will be one of two essential ingredients
for the change-of-variable formula for integral in dimension
greater than 1 (for which we can no longer use the
Fundamental Theorem of Calculus as Edwards does in dimension 1).
The proof reinterprets a construction from Math 25a: T
is a finite composition (matrix product) of coordinate permutations,
diagonal matrices, and shears. (These correspond to the row operations
of switching rows, multiplying a row by a scalar, and adding a multiple
of one row to another.) If we prove the
|det(T)| v(A) formula
for each of these elementary matrices, then the general result
will follow by multiplicativity of the determinant! And only the shears
present more than ε difficulty.
Friday, 25 April: The change-of-variable formula in R^{n}
Motivation for the last few sections of Chapter IV:
the classical derivation of the amazing definite integral
∫_{ R} exp(−x²) = π^{½}.
(NB it is known that the antiderivative of
exp(−x²) does not have
an elementary formula; the normalized antiderivative
(2/π^{½})
∫_{ 0}^{x}
exp(−t²) dt
is known as the error function erf(x),
and is needed in some contexts in mathematics, and especially in statistics,
where normal distributions often arise naturally.)
Step 1: By symmetry, the integral is 2I where
I := ∫_{ 0}^{∞}
exp(−x²).
Step 2:
Because we know that I will involve a square root, consider
I^{ 2}, and write it as a double integral of
exp(−x²) exp(−y²)
= exp(−(x²+y²))
over positive x and y, that is, over
(x, y) in the “first quadrant”
of R^{2}.
[Note that when Gauss first obtained this integral he did not have the hint
that I will be a multiple of π^{½} !
Though it is conceivable that he surmised it by numerical computation,
as Euler first surmised that ζ(2)=π²/6.]
Step 3: Change to polar coordinates
(x, y) =
(r cos θ, r sin θ).
The integrand is exp(−r²),
and the Jacobian derivative
∂(x, y) / ∂(r, θ)
has absolute value r. So I^{ 2}
is the integral of r exp(−r²)
over the infinite rectangle (0, ∞) × (0, π/2)
in the (r, θ) plane.
Step 4: This integral factors as the integral of
r exp(−r²) over
r>0 times the integral of 1 over θ in
(0, π/2). The latter integral is of course
π/2. The former is easy because thanks to the
factor of r there is an elementary antderivative
−exp(−r²)/2.
Hence I^{ 2} is
(1/2)(π/2) = π/4.
Since I is manifestly positive, it must be the positive
square root π^{½} / 2
of π/4, so
∫_{ R} exp(−x²)
= 2I = π^{½}, QED!
Each step requires some justification.
• In step 1: the integral is
“improper (of the first kind)”
because it extends over all real x,
so we must write it as a limit over large but finite intervals
so that the integral is well-defined. Such limit definitions of
improper integrals are treated more systematically in section 6
of Chapter IV, but we have run out of time and cannot cover this in class.
For our purposes it is enough to define the desired integral as
the limit as M→∞ of 2I_{M}
where I_{M} is the integral of
exp(−x²) over (0, M).
[Compare the “careful proof”
in the Wikipedia page for this integral (which currently uses a
for what is called M in this paragraph) with the
“computation by polar coordinates”
which is essentially what we did in steps 2 to 4 —
I inserted the first step to avoid the additional complication that
integrating exp(−(x²+y²))
over the entire plane covers the positive real axis twice (θ=0
and θ=2π) in the transformation to polar coordinates.]
• In step 2, we use Exercise 4.1, which is a special case of
Fubini (Theorem 4.1, pages 238–239).
• Step 3 uses the change of variable formula, Theorem 5.5
(stated at the bottom of page 252, proved in the next few pages).
Since the integral is improper we actually apply this argument to
the square of I_{M}, bounding it between
the integrals over quarter-circles of radius M and
2^{½}M, both of which are seen
(in step 4) to approach π/4 as M→∞.
• Finally in step 4 we use Exercise 4.1 again
(though we don’t really need it in this special case
that the function depends on just one of the two variables),
together with the Fundamental Theorem of Calculus
which we have already proved in Chapter III.
Today we finished the outline of the proof of the change of variable
formula.
First prove that linear changes of variable T
multiply the volume of any box by |det(T)|;
it follows (via some routine “epsilonics” [Edwards, p.252])
that the same is true for the volume of any contented set.
(For shears, Edwards uses a result from section 4 of this chapter;
we do not have this result yet, but the basic idea is readily available
from what we’ve done already.) It is also clear that the volume
is preserved by any translation, i.e. any transformation
x ↦ x+c
for some constant vector c.
For a general change of variable we need the behavior of volumes
under an arbitrary map F from (an open set U in)
R^{n}
to R^{n}
for which both F and its inverse are continuously differentiable.
We have seen already (Inverse Function Theorem) that this is the case
iff F is continuously differentiable and
its derivative is an invertible linear transformation.
Under this hypothesis, if we fix some point x_{0}
in U, we can assume (after a linear change of variable,
and translation of both domain and range) that
x_{0} = F(x_{0}) = 0
and the derivative of F at x_{0}
is the identity. Then we know that for any ε>0
there is a δ such that if for all r<δ
our map F takes the box
[−δ, δ]^{n}
(a.k.a. the δ-neighborhood of 0
under the sup norm) to some set contained in the
((1+ε)δ)-neighborhood of the origin
and containing the ((1−ε)δ)-neighborhood.
It follows that the volume (which exists by the negligible-boundary
criterion) is between
(1−ε)^{n} and
(1+ε)^{n} times the volume of
[−δ, δ]^{n}.
To prove the change-of-variable formula, we now divide the region of
integration into small boxes and apply the above result to each one.
We need to know that δ can be chosen uniformly given ε;
we can do this using uniform continuity of the partial derivatives,
at least if F and its inverse extends to a neighborhood of
the closure of U (a hypothesis that soon gets removed
in Addendum 5.6 on page 255–256; NB we need this, too, for our
motivating example
∫_{ R} exp(−x²) = π^{½},
because the polar-coordinates map is not invertible at the origin).
Then it’s just a few pages of epsilonics (253–255)
to check that everything fits together as expected.
Monday, 28 April: Fubini, etc.: integration over
R^{m+n} reduces to integration over
R^{m} and over R^{n}
See Section 4, pages 235 to 240. Even after solving Exercise 3.3,
Theorem 4.1 remains useful because 3.3 requires continuity, which is
lost even when we want to integrate continuous functions over
simple regions R other than boxes (because multiplying by
the characteristic function φ_{R} yields a
function that, though integrable, is generally not continuous on
the boundary of R. I gave in class the example of
the formula for the volume of a pyramid (which was mentioned after
the second midterm in connection with the probability that the sum of
three rounded numbers differs from the rounding of their sum).
I also noted that in this case we have the following underhanded
alternative for avoiding integration altogether (once we know that
how volume behave under linear transformation, and know that
the pyramid is “contented”): it is enough to show that
the pyramid P defined by
0 < x < y < z < 1
has volume 1/6, because any pyramid of base B and height h
can be taken to P by a linear transformation of determinant
±Bh/3; and six congruent copies of P
(corresponding to the 3! possible orders of
x, y, z) can be combined to form
the unit cube. The same trick yields the volume of a pyramid
(a.k.a. “simplex”) in R^{n}
for any dimension n, and explains the appearance of the
factor 1/n! in the formula. We could then even use
“Cavalieri’s principle” (Theorem 4.2 on page 240)
to recover the formula for x^{n−1}
that replaces the usual Riemann-sum or antiderivative approach with
n-dimensional geometry. See Edwards, pages
241–242 (and the Exercises for this section) for further examples
of applications of the theorems of Fubini and Cavalieri.
At the end of the proof of Theorem 4.1, the punchline
|∫ f −
∫ F | < ε
follows from the observation that
∫ f and ∫ F
are known to be in the same interval
(∫ h, ∫ k)
= (∫ H, ∫ K)
whose length is <ε.
(In presenting this proof in class I may have switched the roles of
k&K with h&H
compared with Edwards; sorry if this caused an ε of confusion.)
Towards the end of Monday’s class I gave an overview of the little of
Chapter V that we will be able to cover in our final meeting on Wednesday.
Inevitably this is only a teaser, and even in Math 55, when we
had already covered exterior algebra (the natural context for
“differential forms”) in the linear-algebra semester,
I could only barely give an honest treatment of Stokes’ theorem.
For us, an outline of Green’s theorem (the very special case which
connects line and surface integrals in R²)
will have to suffice. You can learn the larger story in its proper
context and depth in a class on differential geometry (usually Math 132).
Monday, 30 April: Line integrals and Green’s Theorem
With only 1¼ class meetings left for Chapter V [the ¼
was the end of Monday], we barely have the time to introduce
line integrals (a.k.a. path integrals) and
Green’s Theorem,
which is the very special case of
“Stokes’ Theorem”
∫_{ D }∂ω
= ∫_{ ∂D }ω
in which D is a plane region (and one well enough behaved
that ∂D makes sense). [The scare quotes(?) are because
the result is actually due to Lord Kelvin — Stokes had no more
to do with it than Pell did with
“Pell’s equation”
x² − Dy² = 1;
such misattributions happen surprisingly often… — plus the
1850 result is also a very special case, where D is a region
in 3-space.] See Math 132 for the full story. Meanwhile
here are some lecture notes
to supplement the CAs’ “live lecture notes” for the day.
Already these very special cases are useful. The line-integral one
(F(s) − F(r)
= ∫_{γ} dF, where γ is
a path from r to s) becomes the
“work-energy theorem”
in Newtonian mechanics. For Green’s theorem, Edwards gives several
applications, to which we can add a proof of the Fundamental Theorem of
Algebra. There are at least two ways to use Green’s Theorem
to obtain this result. The first uses the following corollary of
∫_{ D }∂ω
= ∫_{ ∂D }ω
if ∂ω vanishes throughout D then
∫_{ ∂D }ω = 0.
If P were a nonconstant polynomial with complex coefficients
and no zeros then we could take
ω = dF / F
and use a large disc for D to get a contradiction
(though the fact that ∂(dF / F) = 0
is not immediately obvious and requires some verification,
as does the behavior of
∫_{ ∂D }ω
as the radius goes to infinity; complex analysis (as in Math 113)
explains why this works. A second route is to use the fact that
P has harmonic real and imaginary parts (see Problem 2 of
the 7th problem set),
and thus can be evaluated at a point z_{0} by averaging
over any circle centered at z_{0} (see
the final problem of Math 25b).
From this one can deduce that the function |P(·)|
has no local minimum, and we already know that this is what we need to
finish the topological proof of the
Fundamental Theorem of Algbera.