When I learnt about Cholesky decomposition it struck me immediately the elegance and simplicity of the solution. It was clear that the symmetry contained some sort of redundancy inside, but it was not clear at all how that symmetry could be exploited. Here I will try to pay my insignificant tribute to the man who discovered it and present you his story.
We are talking about France
The story of Andre Louis Cholesky traces back probably in Poland, when it is believed his ancestors moved to France with the Napoleon army1. He was born in Montguyon (October 15th, 1875), north of Bordeaux where he absolved Lycée and entered L’Ecole Polytechnique. There he had brilliant teachers, among many being Camille Jordan (yep, the Jordan normal form was not invented by a basketball player) and after graduation he entered Army’s artillery branch as Second Lieutenant.
He was designated to work in Tunisia and Algeria, and after he moved to the Geodesic Section of the Army Geographic Service 1905, where he was reported to have “… a sharp intelligence and a great facility for mathematical work, having an inquiring spirit and original ideas”2. He was part of various geographical works like the topological cartography of various territories.
Following Gauss
The methods used at the time dates back to Gauss when he was calculating planet’s orbits. It consisted in the following ideas. The triangulation of the terrain started with the measurement of a line segment, and iterate from there measuring only angles (horizontal and vertical). The angles are much easier to measure than the length for long distances. After many observations one will end up with a lot of numbers. Using the numbers directly poses some problems, since those numbers contains errors. The procedure than uses some nonlinear equations which are linearly approximated by the first terms of Taylor approximation.
$$C x = b$$
But this poses a problem, since \(C\) is not square. The system of equations has more equations than unknowns. The situation is not improved by the fact that the values involved \(C\) and \(b\) contains errors, as we mentioned. Here Gauss came in place and solution is found as the one which minimize the \( L_2 \) norm.
One recognize that this is a quadratic problem and has a minima where the derivative is zero (since the Hessian is positive). Thus, taking derivatives and equaling with 0, the well-know normal equations arise:
$$C^T C x = C^T b$$
where \( C \) is the matrix of coefficients, \(x\) is the desired solution and \(b\) is the equality vector. At that time the solution was found by Gaussian elimination. This process is tedious when there were many unknowns and computed by hand. Many wise people from Geographical Service were in a search for a better way to solve this.
Hidden in plain sight
A careful observer will notice that \(C^T C\) is symmetrical. This is intriguing since this is not an ordinary linear system, it has some regularity inside, some redundancy given by symmetry, which one can feel it can be exploited. It must be something more, but what could that be?
Among many who perhaps saw the same facts, the young officer found the solution. Perhaps he did not saw the non negative requirement (know today), besides symmetry, but he knew something. A triangular system of equations is much easier to solve, than a regular system. This is provided by Gaussian elimination process itself, but he pushed this much further. Let’s see. A lower triangular system looks like this:
$$\begin{aligned} l_{11}x_1 & & &= b_1 \\ l_{21}x_1 &+l_{22}x_2 & &= b_2 \\ l_{31}x_1 &+l_{32}x_2 &+l_{33}x_3 &= b_3 \end{aligned}$$
And one can notice that the first unknown \(x_1\) is trivial to find. Knowing \(x_1\), computing the second unknown \(x_2\) is again trivial, and so on. The previous example has 3 unknowns for illustration purposes only, you should think about its generalization. This procedure is called forward substitution and a similar one exists for high triangular systems named backward substitution.
Let’s start afresh by replacing \(C^T C\) with\(A\). $$A x = b$$
The Cholesky method is to find a decomposition $$A = L L^T$$ where \(L\) is a lower triangular matrix. Having done that we can see that: $$ Ax = L L^T x = L ( L^T x) = L y = b$$
First we solve for \(y\) in \( L y = b \) by forward substitution. After we found \(y\) we solve for \(x\) in \(L^T x = y\) by backward substitution. This is clever, isn’t it? But how to find \(L\)? And here comes the elegance, because if you see how it’s done it is remarkably natural.
$$ L L^T = \begin{bmatrix} l_{11} & 0 & 0\\ l_{21} & l_{22} & 0\\ l_{31} & l_{32} & l_{33} \end{bmatrix} \begin{bmatrix} l_{11} & l_{21} & l_{31}\\ 0 & l_{22} & l_{32}\\ 0 & 0 & l_{33} \end{bmatrix} = \begin{bmatrix} l_{11}^2 & & \text{symmetric}\\ l_{21}l_{11} & l_{21}^2 + l_{22}^2 & \\ l_{31}l_{11} & l_{31}l_{21}+l_{32}l_{22} & l_{31}^2 + l_{32}^2 +l_{33}^2 \end{bmatrix} = \begin{bmatrix} a_{11} & & \text{symmetric}\\ a_{21} & a_{22} & \\ a_{31} & a_{32} & a_{33} \end{bmatrix} = A $$
Two matrices are equal if values from each position are equal.
$$ L = \begin{bmatrix} \sqrt{a_{11}} & 0 & 0\\ a_{21}/l_{11} & \sqrt{a_{22}-l_{21}^2} & 0\\ a_{31}/l_{11} & (a_{32}-l_{31}l_{21})/l_{22} & \sqrt{a_{33}-l_{31}^2-l_{32}^2} \end{bmatrix} $$
If it looks intimidating, than it’s fine. You are not alone. Imagine that you are Cholesky trying to figure it, having at disposal some paper pieces and a pen. You will appreciate more his efforts. After some time of paying attention to the patterns, you start to see the regularities.
$$l_{jj} = \sqrt{a_{jj} – \sum_{k=1}^{j-1}l_{jk}^2}$$
$$l_{ij}=\frac{1}{l_{jj}}(a_{ij}-\sum_{k=1}^{j-1}l_{ik}l_{jk}) \text{ for }i>j$$
Still, if it looks intimidating, look at the code for Cholesky-Banachiewitcz algorithm below:
for (i = 0; i < dimensionSize; i++) { for (j = 0; j <= i; j++) { float sum = 0; for (k = 0; k < j; k++) sum += L[i][k] * L[j][k]; if (i == j) L[i][j] = sqrt(A[i][i] - sum); else L[i][j] = (1.0 / L[j][j] * (A[i][j] - sum)); } }
On today computers, this task is trivial. At that time it was a real challenge. Cholesky described in his paper3 that he was able to solve a system of 10 equations with 5 decimal digits in 4 to 5 hours. He reported that his method was also successfully used for several systems of dimensions 30, and for one of dimension 56 (but he did not mention how long it took). This is top performance by any measure!
Legacy
The life of Andre Cholesky was a mix between duty and passion. Perhaps his first ideas about his method came to him in France, when he participated at the topological measurement of the country. He definitely enrich those ideas during his missions in Tunis and Algeria.
But this is not over. He was sent to Crete where he conducted the topological efforts and he managed to end his mission against nature impediments. He then served as artillery officer. Since 1916 until February 1918, he was sent to Romania, serving as the Head of Romanian Geographical Service4. There, again, he impressed so much that he was promoted to the Commander rank.
Back to France he was serving as artillery commander and in August 31th, 1918 he fell for his country5. A loss for his country, for Geodesy and for the whole world. But his legacy, his method remains with us, being the first choice for solving systems of equations when coefficient matrix is symmetric non negative definite. This is very often the case when solving over determined systems with least squares.
Bibliography