In the case where A has distinct positive eigenvalues it was shown above that under
reasonable conditions related to a certain matrix having an LU factorization the QR
algorithm produces a sequence of matrices
{Ak}
which converges to an upper triangular
matrix. What if A is just an n × n matrix having possibly complex eigenvalues but A is
nondefective? What happens with the QR algorithm in this case? The short answer to this
question is that the A_{k} of the algorithm typically cannot converge. However, this does not
mean the algorithm is not useful in finding eigenvalues. It turns out the sequence of
matrices
{Ak }
have the appearance of a block upper triangular matrix for large k in the
sense that the entries below the blocks on the main diagonal are small. Then looking at
these blocks gives a way to approximate the eigenvalues. An important example of the
concept of a block triangular matrix is the real Schur form for a matrix discussed in
Theorem 6.4.7 but the concept as described here allows for any size block centered on the
diagonal.
First it is important to note a simple fact about unitary diagonal matrices. In what follows Λ
will denote a unitary matrix which is also a diagonal matrix. These matrices are just the identity
matrix with some of the ones replaced with a number of the form e^{iθ} for some θ. The important
property of multiplication of any matrix by Λ on either side is that it leaves all the zero entries the
same and also preserves the absolute values of the other entries. Thus a block triangular
matrix multiplied by Λ on either side is still block triangular. If the matrix is close to
being block triangular this property of being close to a block triangular matrix is also
preserved by multiplying on either side by Λ. Other patterns depending only on the size of
the absolute value occurring in the matrix are also preserved by multiplying on either
side by Λ. In other words, in looking for a pattern in a matrix, multiplication by Λ is
irrelevant.
Now let A be an n × n matrix having real or complex entries. By Lemma 14.2.3 and the
assumption that A is nondefective, there exists an invertible S,
k (k) (k) k −1
A = Q R = SD S (14.14)
(14.14)
where
( )
| λ1 0 |
D = |( ... |)
0 λ
n
and by rearranging the columns of S, D can be made such that
|λ1| ≥ |λ2| ≥ ⋅⋅⋅ ≥ |λn|.
Assume S^{−1} has an LU factorization. Then
Ak = SDkLU = SDkLD −kDkU.
Consider the matrix in the middle, D^{k}LD^{−k}. The ij^{th} entry is of the form
(
|{ λkiLijλ−j kif j < i
(DkLD −k) = 1 if i = j
ij |( 0 if j > i
and these all converge to 0 whenever
|λi|
<
|λj|
. Thus
k −k
D LD = (Lk + Ek)
where L_{k} is a lower triangular matrix which has all ones down the diagonal and some subdiagonal
terms of the form
λkiLijλ−j k (14.15)
(14.15)
for which
|λi|
=
|λj|
while E_{k}→ 0. (Note the entries of L_{k} are all bounded independent of k but
some may fail to converge.) Then
Q(k)R (k) = S (Lk + Ek )DkU
Let
SLk = QkRk (14.16)
(14.16)
where this is the QR factorization of SL_{k}. Then
Q (k)R(k) = (QkRk + SEk)DkU
= Q (I + Q∗SE R−1) R DkU
k k k kk k
= Qk (I + Fk)RkD U
where F_{k}→ 0. Let I + F_{k} = Q_{k}^{′}R_{k}^{′}. Then Q^{(k)
}R^{(k)
} = Q_{k}Q_{k}^{′}R_{k}^{′}R_{k}D^{k}U. By Lemma
14.2.4
′ ′
Qk → I and Rk → I. (14.17)
(14.17)
Now let Λ_{k} be a diagonal unitary matrix which has the property that Λ_{k}^{∗}D^{k}U is an upper
triangular matrix which has all the diagonal entries positive. Then
Q (k)R(k) = QkQ ′Λk (Λ ∗R′RkΛk )Λ∗DkU
k k k k
That matrix in the middle has all positive diagonal entries because it is itself an upper triangular
matrix, being the product of such, and is similar to the matrix R_{k}^{′}R_{k} which is upper triangular
with positive diagonal entries. By Lemma 14.2.4 again, this time using the uniqueness
assertion,
(k) ′ (k) ∗ ′ ∗ k
Q = QkQ kΛk,R = (ΛkR kRk Λk)ΛkD U
Note the term Q_{k}Q_{k}^{′}Λ_{k} must be real because the algorithm gives all Q^{}
(k)
as real matrices. By
14.17 it follows that for k large enough Q^{(k)
}≈ Q_{k}Λ_{k} where ≈ means the two matrices are close.
Recall A_{k} = Q^{(k)
T}AQ^{(k)
} and so for large k,
∗ ∗ ∗
Ak ≈ (Qk Λk) A (QkΛk) = ΛkQkAQk Λk
As noted above, the form of Λ_{k}^{∗}Q_{k}^{∗}AQ_{k}Λ_{k} in terms of which entries are large and small is not
affected by the presence of Λ_{k} and Λ_{k}^{∗}. Thus, in considering what form this is in, it suffices to
consider Q_{k}^{∗}AQ_{k}.
This could get pretty complicated but I will consider the case where
if |λi| = |λi+1 |, then |λi+2| < |λi+1|. (14.18)
(14.18)
This is typical of the situation where the eigenvalues are all distinct and the matrix A is real so
the eigenvalues occur as conjugate pairs. Then in this case, L_{k} above is lower triangular with some
nonzero terms on the diagonal right below the main diagonal but zeros everywhere else. Thus
maybe
where R_{k}^{−1} is upper triangular. Also recall from the definition of S in 14.14, it follows that
S^{−1}AS = D. Thus the columns of S are eigenvectors of A, the i^{th} being an eigenvector for
λ_{i}. Now from the form of L_{k}, it follows L_{k}R_{k}^{−1} is a block upper triangular matrix
denoted by T_{B} and so Q_{k} = ST_{B}. It follows from the above construction in 14.15 and the
given assumption on the sizes of the eigenvalues, there are finitely many 2 × 2 blocks
centered on the main diagonal along with possibly some diagonal entries. Therefore,
for large k the matrix A_{k} = Q^{}
(k)
TAQ^{(k)
} is approximately of the same form as that
of
∗ − 1 −1 −1
Q kAQk = TB S ASTB = TB DTB
which is a block upper triangular matrix. As explained above, multiplication by the various
diagonal unitary matrices does not affect this form. Therefore, for large k, A_{k} is approximately a
block upper triangular matrix.
How would this change if the above assumption on the size of the eigenvalues were relaxed but
the matrix was still nondefective with appropriate matrices having an LU factorization as above?
It would mean the blocks on the diagonal would be larger. This immediately makes the problem
more cumbersome to deal with. However, in the case that the eigenvalues of A are distinct, the
above situation really is typical of what occurs and in any case can be quickly reduced to this
case.
To see this, suppose condition 14.18 is violated and λ_{j},
⋅⋅⋅
,λ_{j+p} are complex eigenvalues
having nonzero imaginary parts such that each has the same absolute value but they are all
distinct. Then let μ > 0 and consider the matrix A + μI. Thus the corresponding eigenvalues of
A + μI are λ_{j} + μ,
⋅⋅⋅
,λ_{j+p} + μ. A short computation shows
|λj + μ|
,
⋅⋅⋅
,
|λj+p + μ|
are all
distinct and so the above situation of 14.18 is obtained. Of course, if there are repeated
eigenvalues, it may not be possible to reduce to the case above and you would end up with large
blocks on the main diagonal which could be difficult to deal with.
So how do you identify the eigenvalues? You know A_{k} and behold that it is close to a block
upper triangular matrix T_{B}^{′}. You know A_{k} is also similar to A. Therefore, T_{B}^{′} has eigenvalues
which are close to the eigenvalues of A_{k} and hence those of A provided k is sufficiently large. See
Theorem 6.9.2 which depends on complex analysis or the exercise on Page 532 which gives another
way to see this. Thus you find the eigenvalues of this block triangular matrix T_{B}^{′} and assert that
these are good approximations of the eigenvalues of A_{k} and hence to those of A. How do
you find the eigenvalues of a block triangular matrix? This is easy from Lemma 6.4.6.
Say
( )
B1 ⋅⋅⋅ ∗
T′ = || .. .. ||
B ( . . )
0 Bm
Then forming λI − T_{B}^{′} and taking the determinant, it follows from Lemma 6.4.6 this
equals
∏m
det(λIj − Bj)
j=1
and so all you have to do is take the union of the eigenvalues for each B_{j}. In the case emphasized
here this is very easy because these blocks are just 2 × 2 matrices.
How do you identify approximate eigenvectors from this? First try to find the approximate
eigenvectors for A_{k}. Pick an approximate eigenvalue λ, an exact eigenvalue for T_{B}^{′}.
Then find v solving T_{B}^{′}v = λv. It follows since T_{B}^{′} is close to A_{k} that A_{k}v ≈ λv and
so
(k) (k)T
Q AQ v = Akv ≈ λv
Hence
AQ (k)Tv ≈ λQ (k)Tv
and so Q^{}
(k)
Tv is an approximation to the eigenvector which goes with the eigenvalue of A which is
close to λ.
Example 14.2.8Here is a matrix.
( )
| 3 2 1 |
( − 2 0 − 1 )
− 2 − 2 0
It happens that the eigenvalues of this matrix are 1,1 + i,1 −i. Lets apply the QR algorithm as ifthe eigenvalues were not known.
Applying the QR algorithm to this matrix yields the following sequence of matrices.
At this point the bottom two terms on the left part of the bottom row are both very small so it
appears the real eigenvalue is near 1.0. The complex eigenvalues are obtained from
solving
Example 14.2.9The equation x^{4}+x^{3}+4x^{2}+x−2 = 0 has exactly two real solutions. Youcan see this by graphing it. However, the rational root theorem from algebra shows neitherof these solutions are rational. Also, graphing it does not yield any information about thecomplex solutions. Lets use the QR algorithm to approximate all the solutions, real andcomplex.
A matrix whose characteristic polynomial is the given polynomial is
When these are plugged in to the polynomial equation, you see that each is close to being a
solution of the equation.
It seems like most of the attention to the QR algorithm has to do with finding
ways to get it to “converge” faster. Great and marvelous are the clever tricks which
have been proposed to do this but my intent is to present the basic ideas, not to go in
to the numerous refinements of this algorithm. However, there is one thing which is
usually done. It involves reducing to the case of an upper Hessenberg matrix which is one
which is zero below the main sub diagonal. Every matrix is unitarily similar to one of
these.
Let A be an invertible n × n matrix. Let Q_{1}^{′} be a unitary matrix
The vector Q_{1}^{′} is multiplying is just the bottom n− 1 entries of the first column of A. Then let Q_{1}
be
( )
1 0
0 Q′
1
It follows
( a a ⋅⋅⋅ a )
( ) | 11 12 1n | ( )
Q AQ ∗= 1 0 AQ ∗= || a || 1 0
1 1 0 Q′1 1 |( ... A′1 |) 0 Q ′∗1
0
( )
| ∗ ∗ ⋅⋅⋅ ∗ |
|| a ||
= |( ... A |)
1
0
Now let Q_{2}^{′} be the n − 2 × n − 2 matrix which does to the first column of A_{1} the
same sort of thing that the n − 1 × n − 1 matrix Q_{1}^{′} did to the first column of A.
Let
( )
Q2 ≡ I 0
0 Q ′2
where I is the 2 × 2 identity. Then applying block multiplication,
where A_{2} is now an n− 2 ×n− 2 matrix. Continuing this way you eventually get a unitary matrix
Q which is a product of those discussed above such that
This matrix equals zero below the subdiagonal. It is called an upper Hessenberg matrix.
It happens that in the QR algorithm, if A_{k} is upper Hessenberg, so is A_{k+1}. To see this, note
that the matrix is upper Hessenberg means that A_{ij} = 0 whenever i − j ≥ 2.
Ak+1 = RkQk
where A_{k} = Q_{k}R_{k}. Therefore as shown before,
Ak+1 = RkAkR −k1
Let the ij^{th} entry of A_{k} be a_{ij}^{k}. Then if i − j ≥ 2
k+1 ∑n ∑j k −1
aij = ripapqrqj
p=iq=1
It is given that a_{pq}^{k} = 0 whenever p − q ≥ 2. However, from the above sum,
p− q ≥ i− j ≥ 2
and so the sum equals 0.
Since upper Hessenberg matrices stay that way in the algorithm and it is closer to being upper
triangular, it is reasonable to suppose the QR algorithm will yield good results more quickly for
this upper Hessenberg matrix than for the original matrix. This would be especially true if the
matrix is good sized. The other important thing to observe is that, starting with an upper
Hessenberg matrix, the algorithm will restrict the size of the blocks which occur to being 2 × 2
blocks which are easy to deal with. These blocks allow you to identify the complex
roots.