24 February 2025

There is a bookcase in my office holding an eclectic mix of non-fiction books.

It’s all the more eclectic as I couldn’t be bothered to sort out the books when we last moved. They were put on the shelf in no particular order, which can be very irritating when looking for a specific title. I’d really like the books sorted by author, and within author by date of publication. Think for a moment: how would you go about it? It’s an interesting challenge because it is the kind of problem software designers face all the time.

A sorting algorithm

To fix this, I need a sorting algorithm. In principle, I could just take all the books down and randomly reshelve them. There’s a chance that they would end up correctly sorted. But this is hardly a useful approach. In total, there are 240 books on the shelves, which means there are around 1058 ways to arrange them – that’s 1 followed by 58 zeroes – only one of which is correct. The chances of success are distinctly small.

No programmer would take this approach. But, as a first attempt, they might resort to a bubble sort. To operate this on my books I would take the first two down and check if they are in the correct order. If not, I swap them. Then I do the same with the second and third books, and so on through the entire bookcase. I repeat this process, starting with the first book again, but this time I can ignore the last two books as the book that should be last will already be in the final position. After the second pass I can ignore the final three and so on. When I’ve eliminated all the books, I stop with them correctly ordered.

This could take a while. In the worst case, I might have to perform the swapping task 28,680 times. For me and my bookshelf, this is way too boring – but computers don’t get bored, and could, happily if not particularly quickly, complete a sort this way. A lazy programmer might indeed consider the task completed. But my 240 books only represent a small amount of data. To handle far larger quantities, we need to speed things up.

The mathematical genius, John von Neumann

This was clear to the mathematical genius behind early US computer development, John von Neumann. Before the development of the bubble sort, he had already devised a more sophisticated approach called a merge sort. If I were to do this with my books, I’d take all of them off the shelves and put them on a table in pairs, getting each pair in the right order as I did so. I’d then take the first two pairs and get all four in the correct order by first comparing the leftmost of each pair and so on. After repeating this until I had all the books in ordered sets of four I’d then group them up in blocks of eight… and so on.

It might seem that I’ve added considerable complexity – but this new sorting technique cuts down the maximum number of swapping operations required from 28,680 to just 1,640. That could save me a lot of time.

Not surprisingly, there have been many more sorting innovations since von Neumann came up with the merge sort in 1945, providing a whole range of different ways to deal with my books. But the progression from random selection through two levels of sophistication gives a good feel for how algorithms progress.

We take it for granted

When we use a wide variety of computer applications, from File Explorer or Finder to Word, we take it for granted that it will be capable of sorting information at the click of a mouse. It’s the kind of task that coders cut their teeth on – but seeing some of the simpler mechanisms making this possible is a great way to get a better feel for the hidden algorithms that work out of sight of the user.

Next time you sort a lengthy table with a click, give a little thought to the creativity that has gone into that speedy operation.

Brian Clegg is an award-winning science writer with over 50 books in print and articles in a wide range of newspapers and magazines (www.brianclegg.net).

Back to Curo News