Sorting Algorithms:

[DISCLAIMER] all my code is my own code, and therefore appropriately terrible.

Humans have been sorting for almost as long as we've been counting. For as long as we've been around it's been useful to be able to keep things in their correct order, and for millenia we've done this by eye. This works very well for small lists of items, but what about a hundred, or even thousands of items? Then computers came along, and suddenly we needed to sort millions of items at once. Clearly a human isn't going to cut it.

Instead, allow me to introduce you to the idea of sorting algorithms. An algorithm is a series of steps that you can follow to obtain a desired result, and each of these algorithms are guaranteed to always produce a list that is in order. There are four major sorting algorithms:

Bubble Sort:

The first sorting algorithm is called bubble sort. For a list length N, this algorithm performs N passes of the array. In each pass it compares each element from the start with the one next to it. If they are out of order, they get swapped. As a result, the largest unsorted item in the array gets 'bubbled' up to the end in each pass.

Time Complexity: O(N^2)

Space Complexity: O(1)

C implementation:

void bubbleSort_i (int *array, int length) {
  for (int pass = 1; pass < length; pass++) {
    int swappedInPass = 0;
    for (int i = 0; i < length - pass; i++) {
      if (array[i] > array[i + 1]) {
        swap_i(array, i, i + 1); // Swap
        swappedInPass = 1;
      }
    }
    if (swappedInPass == 0) return;
  }
}

While good for learning about sorting algorithms, bubble sort is very inefficient due to the fact that it has to parse the entire array in each pass, and should never be used, as insertion sort is literally always faster.

Insertion Sort:

Next we'll look at insertion sort. This is like bubble sort's cousin that's better than it in every way. In theory, this algorithm takes each item in the original array, and inserts it into its correct position in a new array until every item is in the new array. Sounds like the time complexity is O(N), right? Wrong. There are a few problems with this approach: firstly, how will the computer know which is the 'correct' place for an item? The answer is to put the item to be inserted at the start or end of the new array and repeatedly compare it with the item next to it, swapping if needed, just like bubble sort. The speedup for bubble sort actually comes in the fact that because our array is already sorted, we can stop comparing as soon as we get two items in the right order. Secondly, this has a space complexity of O(N), and we would prefer something better. You may have noticed that whenever the old array shrinks by 1, the new array grows by 1. So why don't we just create the old array in the same space as the new array? In general, this approach is called sorting in place, and it's very powerful, since without it every sorting algorith, no matter how efficient, would be limited to a space complexity of O(N).

The end result of these optimisations is that the actual insertion sort algorithm doesn't appear to be actually inserting anything at all at first glance.

Time Complexity: O(N^2) (but faster than bubble sort)

Space Complexity: O(1)

C implementation:

void insertionSort_i (int *array, int length) {
  for (int pass = 1; pass < length; pass++) {
    int current = array[pass];
    int offset = 1;
    while (offset <= pass && current < array[pass - offset]) {
      swap_i(array, pass - offset + 1, pass - offset);
      offset++;
    }
  }
}

Unlike bubble sort, insertion sort actually finds some use in places, as most faster algorithms are much more complex and will generally take longer to sort small numbers of items. (Around 25 or so in my experience). For example, the implementation of quicksort that I present later actually uses insertion sort when subsections get small enough to squeeze a little bit of extra speed out.

Merge Sort:

Insertion sort is a nice upgrade to the painfully slow bubble sort, but it has a major problem: A time complexity of O(N^2) is, to put be frank, absolutely abysmal. A few hundred items is fine, but sorting several thousand already takes significantly longer, and although you just might, after several hours, succeed in sorting a million items, a billion is right out of the question. We need something much better. And thankfully, in 1945, John Von Neumann (look him up, he's absolutely everywhere), managed to find exactly that.

Merge sort is the first algorithm here that is O(NlogN) time complexity instead of O(N^2). To understand how this algorithm works, consider the following question: given two arrays that are in order, could you combine them into a single array that is also sorted? The answer is yes, and it's pretty simple: since both arrays start with their smallest member, just remove the smaller of the two first members and place it in the new array, and then rinse and repeat until both old arrays are empty. But this doesn't have anything to do with sorting, does it? It turns out that it does. The key insight that Von Neumann had was that if you break down any array into its individual elements, each one of those is sorted by definition. Then, you can progressively merge higher and higher orders of array, with them all being sorted. Since we're splitting the array into two each call, there are log(N) calls. And since we make one comparison for every item we add to the merged arrays in a 'layer', each call has O(N) comparisons. Therefore, the algorithm has a time complexity of O(NlogN). There is one small caveat, however: to do this, unlike insertion sort we very much do have to essentially create a new array, giving the algorithm a much higher space complexity that we might like. Furthermore, most implementations of the algorithm are recursive, which tends to put a limit on the number of items that can actually be sorted due to how slow recursion is compared with iteration.

Time Complexity: O(NlogN) (but usually recursive)

Space Complexity: O(N)

C implementation:

int *mergeSort_i (int *array, int length) {
  if (length <= 1) {
    return array; // Base case
  }

  // Split
  int middle = length / 2;
  int *sub1 = malloc(sizeof(int) * middle);
  int *sub2 = malloc(sizeof(int) * (length - middle));
  for (int i = 0; i < middle; i++) {
    sub1[i] = array[i];
  }
  for (int i = middle; i < length; i++) {
    sub2[i - middle] = array[i];
  }
  free(array);

  // Merge sort the subarrays
  sub1 = mergeSort_i(sub1, middle);
  sub2 = mergeSort_i(sub2, length - middle);

  // Merge the arrays
  int *rtn = mergeArrays_i(sub1, middle, sub2, length - middle);
  return rtn;
}

Merge sort is a key sorting algorithm to learn about due to its inventiveness and simplicity. However, in the modern day more efficient algorithms such as quicksort are generally used.

Merge Sort visualisation sadly not available yet

Quicksort:

Merge sort is nice, and any recursive algorithm can be written iteratively, but what can't be fixed is the space complexity. When we're sorting billion item arrays, we really don't want to have to create all of that memory again; it would be much better if we found an O(NlogN) algorithm that can sort in place. Allow me to introduce you to quicksort, the final member of our party. Quicksort is, as the name might suggest, quick. Similarly to merge sort, it works by splitting the array into progressively smaller chunks. It does this by selecting a pivot point in the array. This could be the first item, the middle item, the end item, or decided by some other (preferably constant time complexity) means; the resulting algorithm works the same way. Next, we move all items smaller than the pivot to the left of it, and all items bigger than the pivot to its right. Now we repeat the process, using the subsections formed on either side of the pivot as the arrays. By default, a pivot is always sorted, so once every single item is a pivot, the array is fully sorted (or we can use insertion sort or some other algorithm on the smaller sections, but the end result is the same). As usual, doing this in place is a little harder than it sounds and involves some complicated swapping gymnastics. As you might guess from the similarity of this algorithm to merge sort, the time complexity is O(NlogN). However, quicksort can be worst in merge sort in that if we choose the wrong pivot it can have drastic consequences on how long the algorithm takes. It will always be at least O(NlogN), but if the pivots are too consistently imbalanced then it will be a lot closer to O(N^2). Luckily, this is unlikely on long arrays, and algorithms do exist to select a pivot makes it less likely for this to happen (for example, you can look at the first element, the last element and the middle element and choose the median one of these as your pivot).

Time Complexity: O(NlogN)

Space COmplexity: O(logN)

C implementation:

void quickSort_i (int *array, int length) {
  // Initialise pointer arrays
  bool allPivots = false;
  int numSections = 1;
  int *leftPointers = malloc(sizeof(int));
  int *rightPointers = malloc(sizeof(int));
  int *pivots = malloc(sizeof(int));
  *leftPointers = 1;
  *rightPointers = length - 1;
  *pivots = 0;

  // Iterate until every element is a pivot
  while (!allPivots) {
    // Prepare to get section sizes
    int *sectionSizes = malloc(sizeof(int) * 2 * numSections);
    int nextNumSections = 0;
    allPivots = true;

    for (int i = 0; i < numSections; i++) {
      // Get values
      int currentPivot = pivots[i];
      int currentLeft = leftPointers[i];
      int currentRight = rightPointers[i];

      // Insertion sort for group sizes under threshold value (25 appears to be best)
      if (currentRight - currentPivot + 1 <= 25) {
        insertionSort_i(array + currentPivot, currentRight - currentPivot + 1);
        sectionSizes[2 * i] = 0;
        sectionSizes[2 * i + 1] = 0;
        continue;
      }

      // Iterate over section (assume the pivot is always before the initial currentLeft position)
      int leftSize = 0;
      int rightSize = 0;
      while (currentLeft < currentRight) {
        // Swap left with right if left is greater than pivot, and decrement right
        if (array[currentLeft] > array[currentPivot]) {
          swap_i(array, currentLeft, currentRight);
          currentRight--;
          rightSize++;
        } else {
          currentLeft++;
          leftSize++;
        }
        while (array[currentRight] > array[currentPivot]) {
          currentRight--; // Move currentRight to the first item that should be on the left
          rightSize++;
        }
      }
      swap_i(array, currentPivot, currentRight);
      if (currentPivot != currentRight) {
        leftPointers[i] = currentPivot;
        leftSize++;
      }
      pivots[i] = currentRight;

      sectionSizes[2 * i] = leftSize;
      if (leftSize > 1) {
        nextNumSections++;
        allPivots = false;
      }
      sectionSizes[2 * i + 1] = rightSize;
      if (rightSize > 1) {
        nextNumSections++;
        allPivots = false;
      }
    }

    // Create new sections
    int *newLeft = malloc(sizeof(int) * nextNumSections);
    int *newRight = malloc(sizeof(int) * nextNumSections);
    int *newPivots = malloc(sizeof(int) * nextNumSections);

    if (!allPivots) {
      int offset = 0;
      for (int i = 0; i < numSections; i++) {
        if (sectionSizes[2 * i] > 1) {
          newPivots[2 * i - offset] = leftPointers[i];
          newLeft[2 * i - offset] = leftPointers[i] + 1;
          newRight[2 * i - offset] = pivots[i] - 1;
        } else offset++;
        if (sectionSizes[2 * i + 1] > 1) {
          newPivots[2 * i + 1 - offset] = pivots[i] + 1;
          newLeft[2 * i + 1 - offset] = pivots[i] + 2;
          newRight[2 * i + 1 - offset] = rightPointers[i];
        } else offset++;
      }
    } else return;

    // Delete old sections
    free(leftPointers);
    leftPointers = newLeft;
    free(rightPointers);
    rightPointers = newRight;
    free(pivots);
    pivots = newPivots;
    numSections = nextNumSections;
    free(sectionSizes);
  }
}

Quicksort visualisation sadly not available yet

Home