Every matrix is related to an upper triangular matrix in a particularly significant way.
This is Schur’s theorem and it is the most important theorem in the spectral theory of
matrices.
Lemma 6.4.1Let
{x1,⋅⋅⋅,xn}
be a basis for F^{n}. Then there exists an orthonormalbasis for F^{n},
{u1,⋅⋅⋅,un}
which has the property that for each k ≤ n, span
(x1,⋅⋅⋅,xk)
=
span
(u1,⋅⋅⋅,uk)
.
Proof: Let
{x1,⋅⋅⋅,xn}
be a basis for F^{n}. Let u_{1}≡ x_{1}∕
|x1|
. Thus for k = 1,span
(u1)
= span
(x1)
and
{u1}
is an orthonormal set. Now suppose for some k < n, u_{1},
which is seen easily by solving 6.10 for x_{k+1} and it
follows
span(x1,⋅⋅⋅,xk,xk+1) = span (u1,⋅⋅⋅,uk,uk+1).
If l ≤ k,
( k )
(u ⋅u ) = C ( (x ⋅u )− ∑ (x ⋅u )(u ⋅u )) =
k+1 l k+1 l j=1 k+1 j j l
( k )
C ( (xk+1 ⋅ul)− ∑ (xk+1 ⋅uj)δlj) = C ((xk+1 ⋅ul)− (xk+1 ⋅ul)) = 0.
j=1
The vectors,
{uj}
_{j=1}^{n}, generated in this way are therefore an orthonormal basis because each
vector has unit length. ■
The process by which these vectors were generated is called the Gram Schmidt process. Here is
a fundamental definition.
Definition 6.4.2An n × n matrix U, is unitary if UU^{∗} = I = U^{∗}U where U^{∗}is definedto be the transpose of the conjugate of U.
Proposition 6.4.3An n × n matrix is unitary if and only if the columns (rows) are anorthonormal set.
Proof:This follows right away from the way we multiply matrices. If U is an n×n complex
matrix, then
∗ ∗ -------
(U U)ij = uiuj = (ui,uj)
and the matrix is unitary if and only if this equals δ_{ij} if and only if the columns are
orthonormal.
Note that if U is unitary, then so is U^{T}. This is because
------ ( (--) )T
(UT )∗ UT ≡ (UT)TU T = U UT = (U U∗)T = IT = I
Thus an n × n matrix is unitary if and only if the rows are an orthonormal set. ■
Theorem 6.4.4Let A be an n × n matrix. Then there exists a unitary matrix U suchthat
U∗AU = T, (6.11)
(6.11)
where T is an upper triangular matrix having the eigenvalues of A on the main diagonal listedaccording to multiplicity as roots of the characteristic equation.
Proof: The theorem is clearly true if A is a 1 × 1 matrix. Just let U = 1 the 1 × 1 matrix
which has 1 down the main diagonal and zeros elsewhere. Suppose it is true for
(n− 1)
×
(n − 1)
matrices and let A be an n×n matrix. Then let v_{1} be a unit eigenvector for A . Then there exists
λ_{1} such that
Av = λv , |v | = 1.
1 1 1 1
Extend
{v1}
to a basis and then use Lemma 6.4.1 to obtain {v_{1},
⋅⋅⋅
,v_{n}}, an orthonormal basis in
F^{n}. Let U_{0} be a matrix whose i^{th} column is v_{i}. Then from the above, it follows U_{0} is unitary.
Then U_{0}^{∗}AU_{0} is of the form
( )
B ≡ λ1 ∗
0 A1
where A_{1} is an n − 1 × n − 1 matrix. The above matrix B has the same eigenvalues as A. Also
note in case of an eigenvalue μ for B,
( ) ( ) ( )
a a ∗
μ x = B x = A1x
so x is an eigenvector for A_{1} with the same eigenvalue μ. Now by induction there exists an
where T is upper triangular. Then let U = U_{0}U_{1}. Since
(U0U1 )
^{∗} = U_{1}^{∗}U_{0}^{∗}, it follows A is
similar to T and that U_{0}U_{1} is unitary. Hence A and T have the same characteristic
polynomials and since the eigenvalues of T are the diagonal entries listed according to
algebraic multiplicity, these are also the eigenvalues of A listed according to multiplicity.
■
Corollary 6.4.5Let A be a real n×n matrix having only real eigenvalues. Then there exists areal orthogonal matrix Q and an upper triangular matrix T such that
QT AQ = T
and furthermore, if the eigenvalues of A are listed in decreasing order,
Proof:Repeat the above argument but pick a real eigenvector for the first step which
corresponds to λ_{1} as just described. Then use induction as above. Simply replace the word
“unitary” with the word “orthogonal”. ■
As a simple consequence of the above theorem, here is an interesting lemma.
Lemma 6.4.6Let A be of the form
( )
P1 ⋅⋅⋅ ∗
A = || .. ... .. ||
( . . )
0 ⋅⋅⋅ Ps
where P_{k}is an m_{k}× m_{k}matrix. Then
∏
det(A ) = det (Pk).
k
Also, the eigenvalues of A consist of the union of the eigenvalues of the P_{j}.
Proof: Let U_{k} be an m_{k}× m_{k} unitary matrix such that
U∗kPkUk = Tk
where T_{k} is upper triangular. Then it follows that for
Therefore, since the determinant of an upper triangular matrix is the product of the diagonal
entries,
∏ ∏
det(A ) = det(Tk) = det(Pk).
k k
From the above formula, the eigenvalues of A consist of the eigenvalues of the upper triangular
matrices T_{k}, and each T_{k} has the same eigenvalues as P_{k}. ■
What if A is a real matrix and you only want to consider real unitary matrices?
Theorem 6.4.7Let A be a real n × n matrix. Then there exists a real unitary (orthogonal)matrix Q and a matrix T of the form
( )
| P1 ⋅⋅⋅ ∗ |
T = |( ... ... |) (6.12)
0 Pr
(6.12)
where P_{i}equals either a real 1 × 1 matrix or P_{i}equals a real 2 × 2 matrix having as itseigenvalues a conjugate pair of eigenvalues of A such that Q^{T}AQ = T. The matrix T is called thereal Schur form of the matrix A.Recall that a real unitary matrix is also called an orthogonalmatrix.
Proof:Suppose
Av1 = λ1v1, |v1| = 1
where λ_{1} is real. Then let {v_{1},
⋅⋅⋅
,v_{n}} be an orthonormal basis of vectors in ℝ^{n}. Let Q_{0} be a
matrix whose i^{th} column is v_{i}. Then Q_{0}^{∗}AQ_{0} is of the form
( )
λ1 ∗ ⋅⋅⋅ ∗
|| 0 ||
|| . ||
( .. A1 )
0
where A_{1} is a real n − 1 × n − 1 matrix. This is just like the proof of Theorem 6.4.4 up to this
point.
Now consider the case where λ_{1} = α + iβ where β≠0. It follows since A is real that
v_{1} = z_{1} + iw_{1} and that v_{1} = z_{1}− iw_{1} is an eigenvector for the eigenvalue α − iβ. Here z_{1}
and w_{1} are real vectors. Since v_{1} and v_{1} are eigenvectors corresponding to distinct
eigenvalues, they form a linearly independent set. From this it follows that
{z ,w }
1 1
is an
independent set of vectors in ℂ^{n}, hence in ℝ^{n}. Indeed,
{v ,v-}
1 1
is an independent set and also
span
(v ,v- )
1 1
= span
(z ,w )
1 1
. Now using the Gram Schmidt theorem in ℝ^{n}, there exists
{u ,u }
1 2
, an orthonormal set of real vectors such that span
be an orthonormal basis in ℝ^{n} and let Q_{0} be a unitary matrix whose i^{th}
column is u_{i} so Q_{0} is a real orthogonal matrix. Then Au_{j} are both in span
(u1,u2)
for j = 1,2
and so u_{k}^{T}Au_{j} = 0 whenever k ≥ 3. It follows that Q_{0}^{∗}AQ_{0} is of the form
where A_{1} is now an n− 2 ×n− 2 matrix and P_{1} is a 2 × 2 matrix. Now this is similar to A and so
two of its eigenvalues are α + iβ and α − iβ.
Now find
^
Q
_{1} an n− 2 ×n− 2 matrix to put A_{1} in an appropriate form as above and come up
with A_{2} either an n− 4 ×n− 4 matrix or an n− 3 ×n− 3 matrix. Then the only other difference
is to let
thus putting a 2 × 2 identity matrix in the upper left corner rather than a one. Repeating this
process with the above modification for the case of a complex eigenvalue leads eventually to 6.12
where Q is the product of real unitary matrices Q_{i} above. When the block P_{i} is 2 × 2, its
eigenvalues are a conjugate pair of eigenvalues of A and if it is 1 × 1 it is a real eigenvalue of
A.
where I_{k} is the 2 × 2 identity matrix in the case that P_{k} is 2 × 2 and is the number 1 in the case
where P_{k} is a 1 × 1 matrix. Now by Lemma 6.4.6,
∏r
det(λI − T ) = det(λIk − Pk).
k=1
Therefore, λ is an eigenvalue of T if and only if it is an eigenvalue of some P_{k}. This proves the
theorem since the eigenvalues of T are the same as those of A including multiplicity
because they have the same characteristic polynomial due to the similarity of A and T.■
Of course there is a similar conclusion which says that the blocks can be ordered according to
order of the size of the eigenvalues.
Corollary 6.4.8Let A be a real n×n matrix. Then there exists a real orthogonal matrix Q andan upper triangular matrix T such that
( )
P1 ⋅⋅⋅ ∗
QTAQ = T = || .. .. ||
( . . )
0 Pr
where P_{i}equals either a real 1 × 1 matrix or P_{i}equals a real 2 × 2 matrix having as itseigenvalues a conjugate pair of eigenvalues of A. If P_{k}corresponds to the two eigenvaluesα_{k}± iβ_{k}≡ σ
(Pk)
, Q can be chosen such that
|σ (P1)| ≥ |σ(P2)| ≥ ⋅⋅⋅
where
∘ -2---2-
|σ(Pk)| ≡ αk + βk
The blocks, P_{k}can be arranged in any other order also.
Definition 6.4.9When a linear transformation A, mapping a linear space V to V has abasis of eigenvectors, the linear transformation is called non defective. Otherwise it is calleddefective.An n×n matrix A, is called normal if AA^{∗} = A^{∗}A.An importantclass of normalmatrices is that of the Hermitian or self adjoint matrices. An n×n matrix A is self adjointor Hermitian if A = A^{∗}.
You can check that an example of a normal matrix which is neither symmetric nor Hermitian
is
( )
6i − (1+ i)√2
(1− i)√2 6i
.
The next lemma is the basis for concluding that every normal matrix is unitarily similar to a
diagonal matrix.
Lemma 6.4.10If T is upper triangular and normal, then T is a diagonal matrix.
Proof: This is obviously true if T is 1 × 1. In fact, it can’t help being diagonal in this case.
Suppose then that the lemma is true for
(n− 1)
×
(n − 1)
matrices and let T be an upper
triangular normal n × n matrix. Thus T is of the form
Since these two matrices are equal, it follows a = 0. But now it follows that T_{1}^{∗}T_{1} = T_{1}T_{1}^{∗} and
so by induction T_{1} is a diagonal matrix D_{1}. Therefore,
( T )
T = t11 0
0 D1
a diagonal matrix.
Now here is a proof which doesn’t involve block multiplication. Since T is normal,
T^{∗}T = TT^{∗}. Writing this in terms of components and using the description of the
adjoint as the transpose of the conjugate, yields the following for the ik^{th} entry of
T^{∗}T = TT^{∗}.
Next let i = k = 3 and obtain that T looks like a diagonal matrix in so far as the first 3 rows
and columns are concerned. Continuing in this way, it follows T is a diagonal matrix.
■
Theorem 6.4.11Let A be a normal matrix. Then thereexists a unitary matrix U suchthat U^{∗}AU is a diagonal matrix. Also if A is normal and U is unitary, then U^{∗}AU is alsonormal.
Proof:From Theorem 6.4.4 there exists a unitary matrix U such that U^{∗}AU equals an upper
triangular matrix. The theorem is now proved if it is shown that the property of being normal is
preserved under unitary similarity transformations. That is, verify that if A is normal and if
B = U^{∗}AU, then B is also normal. But this is easy.
B∗B = U∗A ∗UU ∗AU = U∗A ∗AU
= U∗AA ∗U = U ∗AU U ∗A∗U = BB ∗.
Therefore, U^{∗}AU is a normal and upper triangular matrix and by Lemma 6.4.10 it must be a
diagonal matrix. ■
Corollary 6.4.12If A is Hermitian, then allthe eigenvalues of A are real and there existsan orthonormal basis of eigenvectors. Also there exists a unitary U such that U^{∗}AU = D, adiagonal matrix whose diagonal is comprised of the eigenvalues of A. The columns of U arethe corresponding eigenvectors. By permuting the columns of U one can cause the diagonalentries of D to occur in any desired order.
Proof: Since A is normal, there exists unitary, U such that U^{∗}AU = D, a diagonal matrix
whose diagonal entries are the eigenvalues of A. Therefore, D^{∗} = U^{∗}A^{∗}U = U^{∗}AU = Dshowing
D is real.
Finally, let
( )
U = u1 u2 ⋅⋅⋅ un
where the u_{i} denote the columns of U and
( )
| λ1 0 |
D = |( ... |)
0 λ
n
The equation, U^{∗}AU = D implies
( )
AU = Au1 ( Au2 ⋅⋅⋅ Aun )
= U D = λ u λ u ⋅⋅⋅ λ u (6.13)
1 1 2 2 n n
where the entries denote the columns of AU and UD respectively. Therefore, Au_{i} = λ_{i}u_{i} and since
the matrix is unitary, the ij^{th} entry of U^{∗}U equals δ_{ij} and so
∗
δij = uiuj ≡ uj ⋅ui.
This proves the corollary because it shows the vectors
{ui}
are orthonormal. Therefore, they form
a basis because every orthonormal set of vectors is linearly independent. It follows
from 6.13 that one can achieve any order for the λ_{i} by permuting the columns of U.
■
Corollary 6.4.13If A is a real symmetric matrix, then A is Hermitian and there existsa real unitary (orthogonal) matrix U such that U^{T}AU = D where D is a diagonal matrixwhose diagonal entries are the eigenvalues of A. By arranging the columns of U the diagonalentries of D can be made to appear in any order.
Proof:It is clear that A = A^{∗} = A^{T}. Thus A is real and all eigenvalues are real and it is
Hermitian. Now by Corollary 6.4.5, there is an orthogonal matrix U such that U^{T}AU = T. Since
A is normal, so is T by Theorem 6.4.11. Hence by Lemma 6.4.10T is a diagonal matrix. Then it
follows the diagonal entries are the eigenvalues of A and the columns of U are the corresponding
eigenvectors. Permuting these columns, one can cause the eigenvalues to appear in any order on
the diagonal. ■
The converse for the above theorems about normal and Hermitian matrices is also
true. That is, the Hermitian matrices, (A = A^{∗}) are exactly those for which there is a
unitary U such that U^{∗}AU is a real diagonal matrix. The normal matrices are exactly
those for which there is a unitary U such that U^{∗}AU is a diagonal matrix, maybe not
real.
To summarize these types of matrices which have just been discussed, here is a little
diagram.