August172011

Sorting things out

Technically.

Sorting algorithms employ multiple approaches to complete the same task: put things in order.
Up to date, I’ve managed to master 3 techniques, two of which were my own idea (which happen to overlap with the world’s two simplest sorting techniques). Below is the source code for all three.

Selection sort:

def custom_sort(a):

    i = 0

    while i < len(a)-1:

        # finding the smallest number

        min = a[i]

        mindex = i

        for j in range(i, len(a)):

            if min > a[j]:

                min = a[j]

                mindex = j

        # swap smallest num and 0th index

        temp = a[i]

        a[i] = a[mindex]

        a[mindex] = temp

        # finished swapping the smallest number and the number in the 0th index

        i = i+1

Insertion sort:

def insertion_sort(a):

    i = 1

    while i<len(a):

        j = i - 1

        while j>=0 and a[j+1] < a[j]:

            temp = a[j+1]

            a[j+1] = a[j]

            a[j] = temp

            j = j - 1

        i = i + 1

Merge sort:

def merge_sort(array):

    if(len(array) > 1):

        left = []

        right = []

        middle = len(array)/2

        for i in range(0,middle):

            left.append(array[i])

        for i in range(middle,len(array)):

            right.append(array[i])

        left = merge_sort(left)

        right = merge_sort(right)

        result = merge(left,right)

        return result

    else:

        return array

def merge(left,right):

    result = []

    j = 0

    for i in range(len(left)):

        while(j<len(right) and left[i] > right[j]):

            result.append(right[j])

            j += 1

        result.append(left[i])

    if(j < len(right)): # one extra element on the right hand

        for a in range(j, len(right)):

            result.append(right[a]) # append the last elements

    return result

Among the three, selection sort is the simplest technique. The methodology is as followed:
- Find the smallest value in the sequence
- Put it in the first slot of the array/list and put whatever value was in the first slot into the now-vacant slot (in essence, swapping the 2 values)
- Without considering the above smallest value, find the smallest value in the rest of the elements
- Put the next smallest value in the second slot of the array/list
- Rinse, repeat until finished. 
Selection sort is, as expected, not quite efficient especially with larger quantities to compare with. The main reason/intuition behind this issue is the inefficient repetition in the task. Assume the list [9,8,7,6,5,4,3,2,1] is to be sorted, the algorithm will first have to go through the 9 digits, comparing them pair by pair to find the smallest (number 1), do a 3-step swap, then go through the same 8 digits one more time to know that 2 is the next smallest. The majority of comparisons (or in computer’s perspective: operations) are obsolete and waste time/memory/processing power.

Insertion sort is a more elegant technique in that it is not as easy/intuitive to figure out and does not feel as awkward. Below is the procedure:
- Start from the 2nd element
- Compare each element being accessed with its immediate left neighbor, if the right is bigger, swap it with the left (thus, they are sorted locally).
- After the move, continue to check with its new immediate left neighbor, if legal, move left one more time, rinse, repeat until finding a left neighbor that is smaller.
- Move one over to the right, repeat the same procedure until the sorted portion (left hand side) becomes the whole sequence.
Theoretically, this technique seems a lot more efficient than Selection sort in that there is much less extra obsolete movement/work. In reality, it’s worse than Selection sort in the experiments I’ve managed to reproduce.

Merge sort is a god-send. The technique is elegant, and its performance is breath-taking compared with the above two. Employing divide and conquer, merge sort breaks the sequence down into pairs of 2, compare them (thus sorting them), then build up the already sorted branches into the original big sequence using the same comparing/sorting philosophy that was used to (seemingly painlessly) sort the 2-element pairs. Procedure:
- Take sequence, find middle point
- Copy values before middle point into a list, and values from and after the middle point to another list
- Redo step 2 for each of the two lists
- Redo step 2 for each of the four lists
- …
- When each of the 2^n lists have 1 element after break down, stop.
- Start sorting each individual element in groups of 2 and building up the sequence.
- If the first value of list A (now called alpha) is smaller than the first element of list B (now called beta), add alpha to the empty list called result. Otherwise, add beta.
- Apply the same philosophy for bigger lists of A and B until no more elements exist to be added from either A or B. Check either A or B for any extra elements and add them all to the rear of result.

Page 1 of 1