Recall the following definition of what it means for a matrix to be diagonalizable.
Definition 14.9.1 Let A be an n × n matrix. It is said to be diagonalizable if there exists an invertible matrix S such that

where D is a diagonal matrix.
Also, here is a useful observation.
Observation 14.9.2 If A is an n×n matrix and AS = SD for D a diagonal matrix, then each column of S is an eigenvector or else it is the zero vector. This follows from observing that for s_{k} the k^{th} column of S and from the way we multiply matrices,

It is sometimes interesting to consider the problem of finding a single similarity transformation which will diagonalize all the matrices in some set.
Lemma 14.9.3 Let A be an n × n matrix and let B be an m × m matrix. Denote by C the matrix

Then C is diagonalizable if and only if both A and B are diagonalizable.
Proof: Suppose S_{A}^{−1}AS_{A} = D_{A} and S_{B}^{−1}BS_{B} = D_{B} where D_{A} and D_{B} are diagonal matrices. You should use block multiplication to verify that S ≡
Consider the converse that C is diagonalizable. It is necessary to show that A has a basis of eigenvectors for F^{n} and that B has a basis of eigenvectors in F^{m}. Thus S has columns s_{i}. Suppose C is diagonalized by S =

where x_{i} ∈ F^{n} and where y_{i} ∈ F^{m}. The result is

where S_{11} is an n×n matrix and S_{22} is an m×m matrix. Then there is a diagonal matrix, D_{1} being n×n and D_{2} m × m such that

such that

Thus,


It follows each of the x_{i} is an eigenvector of A or else is the zero vector and that each of the y_{i} is an eigenvector of B or is the zero vector. If there are n linearly independent x_{i}, then A is diagonalizable by Theorem 6.3.7 on Page 6.3.7.
The row rank of the top half of S, the matrix
Note that once you know that each of A,B are diagonalizable, you can then use the specific method used in the first part to accomplish the diagonalization.
The following corollary follows from the same type of argument as the above.
Corollary 14.9.4 Let A_{k} be an n_{k} × n_{k} matrix and let C denote the block diagonal

matrix given below.

Then C is diagonalizable if and only if each A_{k} is diagonalizable.
Definition 14.9.5 A set, ℱ of n × n matrices is said to be simultaneously diagonalizable if and only if there exists a single invertible matrix S such that for every A ∈ℱ, S^{−1}AS = D_{A} where D_{A} is a diagonal matrix. ℱ is a commuting family of matrices if whenever A,B ∈ℱ, AB = BA.
Lemma 14.9.6 If ℱ is a set of n × n matrices which is simultaneously diagonalizable, then ℱ is a commuting family of matrices.
Proof: Let A,B ∈ℱ and let S be a matrix which has the property that S^{−1}AS is a diagonal matrix for all A ∈ℱ. Then S^{−1}AS = D_{A} and S^{−1}BS = D_{B} where D_{A} and D_{B} are diagonal matrices. Since diagonal matrices commute,
Lemma 14.9.7 Let D be a diagonal matrix of the form
 (14.18) 
where I_{ni} denotes the n_{i} ×n_{i} identity matrix and λ_{i}≠λ_{j} for i≠j and suppose B is a matrix which commutes with D. Then B is a block diagonal matrix of the form
 (14.19) 
where B_{i} is an n_{i} × n_{i} matrix.
Proof: Let B =

Therefore, if i≠j,B_{ij} = 0. Hence B as the form which is claimed. ■
Lemma 14.9.8 Let ℱ denote a commuting family of n × n matrices such that each A ∈ℱ is diagonalizable. Then ℱ is simultaneously diagonalizable.

Proof: First note that if every matrix in ℱ has only one eigenvalue, there is nothing to prove. This is because for A such a matrix,

and so

Thus all the matrices in ℱ are diagonal matrices and you could pick any S to diagonalize them all. Therefore, without loss of generality, assume some matrix in ℱ has more than one eigenvalue.
The significant part of the lemma is proved by induction on n. If n = 1, there is nothing to prove because all the 1 × 1 matrices are already diagonal matrices. Suppose then that the theorem is true for all k ≤ n − 1 where n ≥ 2 and let ℱ be a commuting family of diagonalizable n × n matrices. Pick A ∈ℱ which has more than one eigenvalue and let S be an invertible matrix such that S^{−1}AS = D where D is of the form given in 14.18. By permuting the columns of S there is no loss of generality in assuming D has this form. Now denote by
It follows easily that

so the matrices commute. Now if M is a matrix in

By assumption, there exists T such that T^{−1}CT = D and so

showing that M is also diagonalizable.
By Lemma 14.9.7 every B ∈
By Corollary 14.9.4 each of these blocks is diagonalizable. This is because B is known to be so. Therefore, by induction, since all the blocks are no larger than n− 1 ×n− 1, thanks to the assumption that A has more than one eigenvalue, there exist invertible n_{i} × n_{i} matrices, T_{i} such that T_{i}^{−1}B_{i}T_{i} is a diagonal matrix whenever B_{i} is one of the matrices making up the block diagonal of any B ∈

then T^{−1}BT = a diagonal matrix for every B ∈

Theorem 14.9.9 Let ℱ denote a family of matrices which are diagonalizable. Then ℱ is simultaneously diagonalizable if and only if ℱ is a commuting family.
Proof: If ℱ is a commuting family, it follows from Lemma 14.9.8 that it is simultaneously diagonalizable. If it is simultaneously diagonalizable, then it follows from Lemma 14.9.6 that it is a commuting family. ■
This is really a remarkable theorem. Recall that if S^{−1}AS = D a diagonal matrix, then the columns of S are a basis of eigenvectors. Hence this says that when you have a commuting family of non defective matrices, then they have the same eigenvectors. This shows how remarkable it is when a set of matrices commutes.