Introduction to Sorting Networks

Doug Hoyte

What are sorting networks?

Compare-swap operations

Consider this array:

    >>> array = [3,2,4]
  

Normal way to sort it:

    >>> array.sort()
  

With a sorting network for size 3:

>>> def compare_swap(array, a, b):
...     if array[a] < array[b]:
...         (array[a], array[b]) = (array[b], array[a])
... 
>>> compare_swap(array, 0, 1)
>>> compare_swap(array, 0, 2)
>>> compare_swap(array, 1, 2)
  

knuth diagrams

>>> def compare_swap(array, a, b):
...     if array[a] > array[b]:
...         (array[a], array[b]) = (array[b], array[a])
...
>>> compare_swap(array, 0, 1)
>>> compare_swap(array, 0, 2)
>>> compare_swap(array, 1, 2)
  
This is the corresponding sorting network notation (aka knuth diagram):

x86 asm compiled with CMU lisp

      2E:       MOV     EAX, [EDX+1] ;;; (LET ((A (AREF ARR 0)) (B (AREF ARR 2))) ...)
      31:       MOV     ECX, [EDX+9]
      34:       CMP     EAX, ECX
      36:       JLE     L0
      38:       MOV     [EDX+1], ECX ;;; (SETF (AREF ARR 0) B (AREF ARR 2) A)
      3B:       MOV     [EDX+9], EAX
      3E: L0:   MOV     EAX, [EDX+1] ;;; (LET ((A (AREF ARR 0)) (B (AREF ARR 1))) ...)
      41:       MOV     ECX, [EDX+5]
      44:       CMP     EAX, ECX
      46:       JLE     L1
      48:       MOV     [EDX+1], ECX ;;; (SETF (AREF ARR 0) B (AREF ARR 1) A)
      4B:       MOV     [EDX+5], EAX
      4E: L1:   MOV     EAX, [EDX+5] ;;; (LET ((A (AREF ARR 1)) (B (AREF ARR 2))) ...)
      51:       MOV     ECX, [EDX+9]
      54:       CMP     EAX, ECX
      56:       JLE     L2
      58:       MOV     [EDX+5], ECX ;;; (SETF (AREF ARR 1) B (AREF ARR 2) A)
      5B:       MOV     [EDX+9], EAX
      5E: L2:   ...

Net Construction: Base case

Adding another element

Sink the smallest element to the bottom:

Next treat the top two elements as an independent network and apply the two-element base case:

The second largest element is now in the middle and the largest on top.

Example: size 5, step 1

Move the smallest element to the bottom

Example: size 5, step 2

Then the next smallest to the second from the bottom

Example: size 5, step 3

And the next

Example: size 5, step 4

And finally the base case.


This is bubble sort except that the normal bubble sort returns as soon as it detects the array is sorted. Because this sorting network doesn't, it's a non-natural bubble sort:

Hasse diagrams

  • Order theory is concerned with binary relations between items in a set
  • Useful for analyzing sorting networks
  • A sorting network's job is to turn a partially ordered set (aka poset) into a totally ordered set
  • Here is a hasse diagram depicting an unsorted array (numbers are wire indices):

Hasse diagrams, 2

  • If we do a compare-swap on wires 0 and 2, we end up with a 3-segment poset
  • The hasse diagram indicates we now know the relative order between the elements on wires 0 and 2, but no other orders

Hasse diagrams, 3

  • Comparing wires 1 and 3 gives a 2-segment poset

Hasse diagrams, 4

  • Now we have a single poset to work with
  • This diagram shows that element 0 now contains the largest value
  • The smallest value isn't yet known: it could be in 2 or 3

Hasse diagrams, 5

  • After sorting 2 and 3, the smallest value is now in 3
  • The only thing remaining is to figure out which of the elements in 1 or 2 are larger

Hasse diagrams, 6

  • We now have a totally ordered set
  • Using hasse diagrams we have shown that this is a valid sorting network

Various network algorithms

Bubble
Bose-Nelson
Bitonic
Merge-Exchange

Micro-scale growth of algorithms

Macro-scale growth of algorithms

Big-𝓞 Complexity

Parallelism

Parallelism, 2

Min/Max selection networks

Median selection networks

3x3 image kernel Paeth's 9-element median filter
012
345
678

Bi-directional networks

  • Down arrows are normal comparators that move the smallest to the bottom wire
  • Up-arrows are the opposite: they move the smallest upwards
  • Bi-directional networks can be converted into normal networks by twisting all up comparators into down comparators
  • In order to prevent ugly wire crossings, untwist the wires all the way to the outputs

0-1 Principle

Sort stability

Sort stability, 2

Swaps with 2n input combinations

Input ArraySwaps Required
0000
0012
0101
0112
1000
1011
1100
1110
Avg0.75
Input ArraySwaps Required
0000
0011
0101
0111
1000
1011
1100
1110
Avg0.5
  • Even networks that have the same number of comparators and stages can perform differently depending on input
  • In order to sort the array, the first network performs more swaps for some inputs
  • Whether the first network is slower because of this depends on your architecture and compiler

Swaps with n! input combinations

Input ArraySwaps Required
2100
1201
1022
2011
0212
0123
Avg1.5
Input ArraySwaps Required
2100
1201
1022
2011
0212
0121
Avg1.17
  • A slightly more intuitive way to look at the same thing is to consider n! array permutations instead of a bitstring
  • Here we see that the first network has a pathologically bad case where all swaps are executed
  • In the second network, the first operation is guaranteed to have put either the largest or smallest element into its final position

Closing a timing side-channel

Possibly leaky Constant-time (simple) Constant-time (best)
    int temp;
    if (a[i] < a[j]) {
      temp = a[i];
      a[i] = a[j];
      a[j] = temp;
    }
  
    int temp[2];
    int noswap = !(a[i] < a[j]);
    temp[0] = a[i];
    temp[1] = a[j];
    a[i] = temp[!noswap];
    a[j] = temp[noswap];
  
    #define max(x, y) (x ^ ((x ^ y) & -(x < y)))
    #define min(x, y) (y ^ ((x ^ y) & -(x < y)))
    int temp;
    temp = max(a[i], a[j]);
    a[j] = min(a[i], a[j]);
    a[i] = temp;
  

x86-64 Asm, gcc 4.5.2 -O3

Possibly Leaky
39: mov  0x4(%rdi),%edx   ;
42: mov  0x44(%rdi),%r13d ;
46: cmp  %r13d,%edx       ; a[0] < a[16]
49: mov  %edx,-0x4(%rsp)  ; temp = a[0]
53: jge  0x400746 <+70>   ; maybe goto 70
55: mov  %r13d,0x4(%rdi)  ; a[0] = a[16]
59: mov  %r13d,-0x4(%rsp) ;
64: mov  %edx,%r13d       ;
67: mov  %edx,0x44(%rdi)  ; a[16] = temp
70: ...
  
  • The jge conditional jump at address 53 causes a timing leak on some systems
Constant-time (simple)
16: mov    0x40(%rdi),%edx
19: mov    (%rdi),%ecx
21: cmp    %edx,%ecx        ; diff = !(a[0] < a[16])
31: setge  %al
27: mov    %ecx,-0x8(%rsp)  ; temp[0] = a[0]
23: mov    %edx,-0x4(%rsp)  ; temp[1] = a[16]
37: mov    %eax,%edx
41: xor    $0x1,%edx
48: movslq %edx,%rdx
51: mov    -0x8(%rsp,%rdx,4),%edx
68: mov    %edx,-0x30(%rsp) ; a[0] = temp[!diff]
39: cltq
44: mov    -0x8(%rsp,%rax,4),%eax
59: mov    %eax,-0x40(%rsp) ; a[16] = temp[diff]
63: mov    %eax,0x40(%rdi)
  
  • No jumps
  • Edited/re-ordered for clarity

Timing results: Sorting 32 ints

Intel Atom N550 (gcc 4.5.2)
Algo Asc Desc Const
qsort(3) 6.828 μs 6.025 μs
Leaky 1.841 μs 1.924 μs
Const-simple 3.057 μs 3.057 μs
Const-best 1.900 μs 1.901 μs
AMD Athlon II X2 215 (gcc 4.4.3)
Algo Asc Desc Const
qsort(3) 1.102 μs 0.955 μs
Leaky 0.412 μs 0.398 μs ?
Const-simple 0.749 μs 0.748 μs
Const-best 0.458 μs 0.457 μs
SPARC sun4v (gcc 4.4.3)
Algo Asc Desc Const
qsort(3) 5.900 μs 20.112 μs
Leaky 5.696 μs 4.240 μs
Const-simple 16.512 μs 16.452 μs ?
Const-best 8.048 μs 8.048 μs
ARM v7 rev 2 (gcc 4.4.5)
Algo Asc Desc Const
qsort(3) 10.041 μs 8.166 μs
Leaky 2.333 μs 2.333 μs
Const-simple 4.000 μs 4.000 μs
Const-best 2.833 μs 2.833 μs

Misc Applications

Research directions

Thanks

Conclusion

The end

Questions?