Next up on the sorting train: Bucket Sort! I like the name; I like the visuals; it’s a fun sort. And in plenty of cases it can be oh, so efficient. The main idea behind bucket sort, also called bin sort, is to take a list of data and divide it into buckets based on value ranges, and then sort the individual buckets using either another sort or recursively. Insertion sort is a commonly used one; for larger data sets a merge or quick sort might be more appropriate. The sort is best suited to sorting data that is fairly evenly distributed within a range. For example, an array of pretty evenly spaced out numbers 1-100, or of non integers between 0 and 1. In a bad use case all the data in the array would be tightly packed together and just be placed into one bucket anyway, adding no efficiency to the algorithm used to sort the individual buckets. Depending on the algorithm used to sort the individual buckets, the Big O can reach O(n^2).
Bucket Sort consists of 4 steps:
Create an array of empty buckets
Scatter the input array data into the buckets
Sort the data inside the buckets
Gather the sorted data from the buckets
In an average case, the first two steps can both take O(n) time to run. The 3rd step, which is sorting the data in the buckets ends up at O(n^2/k + n), where k is the number of buckets. The final step, which consists of gathering all the data into a concatenated array is O(k), so the total average Big O for a bucket sort is 0(n + n^2/k + k). Worst-case space complexity is O(n+k). Here’s a gif:
OK, so let’s get coding. In this example, we’ll be using an insertion sort to sort through the individual buckets. In a worst case scenario the Big O for this sort can reach O(n^2), but let’s hope not! For those not familiar with it, it looks like this:
Now, for the actual bucket sort in ruby; it’s going to look a little something like this:
And there we have it! Bucket sort has some real downsides in terms of efficiency, but for evenly distributed data it can sort pretty darn well. Plus, we get to throw around the word ‘bucket’ in a technical context, so really, everybody wins here. Enjoy!