Here is a very useful theorem due to Sylvester. The field of scalars will be F and the entries of A,B come
from F. We will have in mind F = ℂ but the results hold more generally. You really only need to be
able to completely factor something called the minimal polynomial which will be described
below.
Theorem A.11.1Let A, B be matrices such that BA makes sense. Then
dim (ker(BA )) ≤ dim (ker(B ))+ dim (ker(A )).
Proof:If x ∈ ker
(BA )
, then Ax ∈ ker
(B)
and so A
(ker(BA ))
⊆ ker
(B)
. The following picture may
help.
PICT
Now let
{x1,⋅⋅⋅,xn}
be a basis of ker
(A )
and let
{Ay1,⋅⋅⋅,Aym }
be a basis for A
(ker(BA ))
. Take
any z ∈ ker
(BA )
. Then Az = ∑i=1maiAyi and so
( )
m∑
A z − aiyi = 0
i=1
which means z −∑i=1maiyi∈ ker
(A)
and so there are scalars bi such that
m∑ ∑n
z − aiyi = bixi.
i=1 j=1
It follows span
(x1,⋅⋅⋅,xn,y1,⋅⋅⋅,ym)
⊇ ker
(BA )
and so by the first part (see the picture)
dim (ker(BA )) ≤ n + m ≤ dim (ker(A))+ dim (ker(B )) ■
Of course this result holds for any finite product of square matrices by induction. One way this is quite
useful is in the case where you have a finite product ∏i=1lLi all of which are square matrices of the same
size. Then
( )
∏l ∑l
dim ker Li ≤ dim (kerLi)
i=1 i=1
and so if you can find a linearly independent set of vectors in ker
( )
∏l Li
i=1
of size
∑l
dim (kerLi),
i=1
then it must be a basis for ker
(∏l )
i=1Li
.
Definition A.11.2Let
{V}
i
i=1rbe subspaces of some vector space V . We have in mind V = Fpbut thisis not necessary. Then
∑r
Vi
i=1
denotes all sums of the form∑i=1rviwhere vi∈ Vi. If whenever
∑r
vi = 0,vi ∈ Vi, (1.38)
i=1
(1.38)
it follows that vi = 0 for each i, then a special notation is used to denote∑i=1rVi. This notationis
V1 ⊕ ⋅⋅⋅⊕ Vr
and it is called a direct sum of subspaces.
Lemma A.11.3If V = V1⊕
⋅⋅⋅
⊕ Vrand if βi =
{ }
vi1,⋅⋅⋅,vimi
is a basis for Vi, then a basisfor V is
{β1,⋅⋅⋅,βr}
.
Proof: Suppose ∑i=1r∑j=1micijvji = 0. then since it is a direct sum, it follows for each
i,
m∑i
cijvij = 0
j=1
and now since
{vi1,⋅⋅⋅,vim }
i
is a basis, each cij = 0. ■
Here is a useful lemma.
Lemma A.11.4Let Lia square matrix for i = 1,
⋅⋅⋅
,p each of the same size, and suppose fori≠j,LiLj = LjLiand also Liis one to one on ker
(Lj)
whenever i≠j. Then
(∏p )
ker Li = ker(L1 )⊕ +⋅⋅⋅+ ⊕ ker(Lp )
i=1
Here∏i=1pLiis the product of all the matrices. A symbol like∏j≠iLjis the product of all of them exceptfor Li.
Proof:Note that since the matrices commute, Lj : ker
(Li)
↦→
ker
(Li)
. Here is why. If Liy = 0 so that
y ∈ ker
(Li)
, then
LiLjy = LjLiy = Lj0 = 0
and so Lj : ker
(Li)
↦→
ker
(Li)
. Next observe that it is obvious that, since the operators commute,
∑p (∏p )
ker(Lp) ⊆ ker Li
i=1 i=1
Suppose
p
∑
vi = 0,vi ∈ ker(Li),
i=1
but some vi≠0. Then do ∏j≠iLj to both sides. Since the matrices commute, this results in
∏
Ljvi = 0
j⁄=i
which contradicts the assumption that these Lj are one to one and the above observation that they map
ker
(Li)
to ker
(Li)
. Thus if
∑
vi = 0,vi ∈ ker(Li)
i
then each vi = 0. It follows that
( ∏p )
ker(L1) ⊕+ ⋅⋅⋅+ ⊕ ker(Lp) ⊆ ker Li (*)
i=1
(*)
From Sylvester’s theorem and the observation about direct sums in Lemma A.11.3,
∑p
dim (ker(Li)) = dim (ker(L1)⊕ + ⋅⋅⋅+ ⊕ ker(Lp))
i=1
( (∏p )) ∑p
≤ dim ker Li ≤ dim (ker(Li))
i=1 i=1
which implies all these are equal. Now in general, if W is a subspace of V, a finite dimensional vector space
and the two have the same dimension, then W = V . This is because W has a basis and if v is not in the
span of this basis, then v adjoined to the basis of W would be a linearly independent set so
the dimension of V would then be strictly larger than the dimension of W. It follows from *
that
( p )
ker(L )⊕ + ⋅⋅⋅+ ⊕ ker(L ) = ker ∏ L ■
1 p i=1 i
Here is a situation in which the above holds. ker
(A − λiI)
r is sometimes called a generalized
eigenspace in case λi is an eigenvalue.
Theorem A.11.5Let A be an n×n matrix and suppose
{λ1,⋅⋅⋅,λk}
are distinct scalars. Define for riapositive integer,
Vi = ker(A− λiI)ri (1.39)
(1.39)
Then
( p )
ker ∏ (A − λ I)ri = V ⊕ ⋅⋅⋅⊕V . (1.40)
i=1 i i p
(1.40)
Proof:It is obvious the linear transformations
(A − λiI)
ri commute. Now here is a claim.
Claim :Let μ≠λi. Then
(A − μI)
m : Vi
↦→
Vi and is one to one and onto for every m a positive
integer.
Proof:It is clear
(A − μI)
m maps Vi to Vi because v ∈ Vi is equivalent to
(A − λiI)
riv = 0.
Consequently,
(A − λ I)ri (A − μI)mv = (A − μI)m(A − λ I)ri v = (A − μI)m 0 = 0
i i
which shows that
(A − μI)
mv ∈ Vi.
It remains to verify
(A − μI)
m is one to one. This will be done by showing that
(A − μI)
is one to one.
Let w ∈ Vi and suppose
(A − μI)
w = 0 so that Aw = μw. Then for m ≡ ri,
(A − λiI)
mw = 0 and so by
the binomial theorem,
( ) ( )
m m∑ m m −l l m∑ m m− l l
(μ− λi) w = l (− λi) μw = l (− λi) A w
l=0 l=0
= (A − λ I)m w = (A − λ I)ri w = 0.
i i
Therefore, since μ≠λi, it follows w = 0 and this verifies
(A− μI )
is one to one. Thus
(A − μI)
m is also one
to one on Vi. Letting
{ i i}
u1,⋅⋅⋅,u rk
be a basis for Vi, it follows
{ m i m i }
(A − μI) u 1,⋅⋅⋅,(A − μI) urk
is also a basis and so
(A − μI)
m is also onto. The desired result now follows from Lemma A.11.4.
■
By the Cayley-Hamilton theorem for A a n×n complex matrix, it satisfies its characteristic polynomial
which was a polynomial of degree n, denoted as q
(λ)
. Let the minimal polynomial p
(λ)
be the monic
polynomial (leading coefficient is 1) of smallest degree such that p
(A)
= 0. In all of this, A0 is defined as I.
Thus the minimal polynomial is of the form
p(λ) = λm + am −1λm−1 + ⋅⋅⋅+ a1λ+ a0,m ≤ n
and
p(A ) ≡ Am + am− 1Am −1 + ⋅⋅⋅+ a1A + a0I = 0
where the 0 is the n × n 0 matrix.
Lemma A.11.6The minimal polynomial divides the characteristic polynomial q