Feeds:
Posts
Comments

Archive for the ‘Algorithms’ Category

During a puzzle solving contest (among programmers), much like the one Bilbo and Gollum had, and Gollum… I mean one of the players poses a question like the following:

You have 2 breakable things (could be eggs, lightbulbs, etc) and you have to find the exact point in an integer range (height, speed, etc e.g. 100 ft drop) where they break with a limited number to tests (usually 20). Close is not good enough. The answer must be exact.

The first off-the-cuff answer is usually “binary search… I’d start at 50 …” at which point the person who posed the question says “Ok. Let’s say the first item breaks at 50. Now what?” You only have 19 tries left and 49 positions to check. It can’t be done that way.

The problem solver has to think a minute but comes back with something new, usually even spacing like “start at 20. If it breaks go by 1s. If it doesn’t break try 40.” at this point the questioner follows with “Ok. What if it breaks at 40?” You only have 18 tries left and 19 positions to check. Works for the first step but not for the nth.

Now the problem solver has something that almost works so they back down to 10. “start a 10, then 20, then 30. Worst case it breaks at 100 and you have to go by 1s from 91 to 99 to find out where.” This does in fact solve the problem but it isn’t optimal.

So how do we revise the problem to get a more optimal solution? Attach a motivation. If, like on a game show, you’re going to keep $1000 for every test you can save in the worst case could you make any money? If so how much?

After a while the problem solver hits upon the idea of a sliding window. “start at 20 then go 18 then 17 more…” this works. Consider:

20 (skip 18) 38 (skip 17) 55 (skip 16) 71 (skip 15) 86 (skip 14) 100 breaks!
87 88 89 90 91 92 93 94 95 96 97 98 99

19 tests in the worst case and the average case takes 20 so that’s $1000, thank you very much. But wait, what if you want to make money in the average case?

Hmm. So now the problem solver hits upon the idea to build in the profit from the beginning. “Let’s say I wan’t to make $1000 every time. That means I have to build the profit in to the number of tests. I’ll only give myself 19 from the beginning” Ho ho! now we’re on to something.

19 (skip 17) 36 (skip 16) 52 (skip 15) 67 (skip 14) 81 (skip 13) 94 breaks!
82 83 84 85 86 87 88 89 90 91 92 93

Check it. The problem solver makes a minimum of $1000. Now the problem solver gets excited and wants to do the same process again but reserving 2 tests… then 3 tests. etc until she finds the most profitable solution.

At this point the questioner refines the problem “Wait. This is taking too long. What if the range was a million and you could do 200000 tests?” Can you think of a faster way than going step by step?

The problem solver will likely suggest binary searching for the optimal number of tests to reserve. “So we know a sliding scale starting at 20 works. Try 10. That doesn’t make money in the worst case. So then try 15. etc.”

And that is quite good enough as far as solving this problem goes… as long as the range isn’t dynamic.

Now here’s the deal. A mathematician (or someone who remembers a couple of useful things from college math) can get an optimal solution for this problem even if the range is dynamic. Here’s how:

Start off with the observation that we’re simply shrinking the range at each step:

(n)+(n-1)+(n-2)+…+1 where the sum is less than or equal to 100

You may recall a story about Einstein being asked by his teacher to add up all the numbers between 1 and 100 in order to keep him busy. Naturally Einstein found a quick way to do it and we can use his equation

(n(n-1))/2

to optimize our solution. We want to pick a sufficiently small n so that the sum of the series either hits the target (100) exactly or barely goes over. This gives us the equation:

(n(n-1))/2 >= 100

multiply both sides by 2

(n(n-1)) >= 200

distribute n

(n * n^2) >= 200

Recalling our exposure to Big-O notation we can observe that n^2 will far outweigh n for sufficiently large values on the right side of the equals so we can simplify to

n^2 >= 200

Take the square root of both sides to solve for n

n >= 14.14

floor (or round down) 14.14 to get 14 (and the smallest value that meets our needs) then check the result by plugging 14 back into the equation.

14 + 14^2 = 210

try one lower (because of the n+ part) to make sure

13 + 13^2 => 182

yep. 14 is the answer. Try it

14 (skip 12) 26 (skip 11) 37 (skip 10) 47 (skip 9) 56 (skip 8 ) 64 (skip 7) 71 (skip 6) 77 (skip 5) 82 (skip 4) 86 (skip 3) 89 (skip 2) 91 92 93 94 95 96 97 98 99 100

Hmm. In the worst case (100) it takes 21 tests. The problem is we didn’t build the cost of the tests into the equation. Let’s do that:

(n(n-1)) / 2 >= 100 + (n-profit)

n will always be larger than profit and we can probably further bound profit to at most n/2 so

(n + n^2)/2 >= 100 + 1/2n

n + n^2 >= 200 + n

n^2 >= 200

get 14.14 again and plug it in.

14^2 >= 200

196 >= 200

false. This means we have to increment n to 15… which we can also do by rounding up instead or rounding down when we take the square root, getting

225 >= 200

Based on our question we can also estimate the profit to be at most 1/2n or 7.5 steps. Try it:

15 (skip 13) 28 (skip 12) 40 (skip 11) 51 (skip 10) 61 (skip 9) 70 (skip 8 ) 78 (skip 7) 85 (skip 6) 91 (skip 5) 96 (skip 4) 100 break
97 98 99

The problem solver can pocket $5000 on average and $6000 in the case where the item breaks at 100.

There we have it, a quicker and more cost effective way to solve the problem for any range.

based on this StackOverflow answer and this StackOverflow question and other variants I’ve heard over the years.

Read Full Post »

Let’s build a Sort method that uses the quicksort algorithm. Note, we’re not trying to use TDD to derive the quicksort algorithm.

Step 1 – test for a null input

Stub the test

[TestFixture]
public class When_asked_to_sort_a_list
{
    [Test]
    public void Given_null()
    {
        Test.Verify(
            with_a_null_input,
            when_asked_to_sort_the_input,
            should_throw_an_ArgumentNullException
            );
    }
	
	public void Sort<T>(IList<T> input)
    {
        throw new NotImplementedException();
	}
}

next fill in the implied details

    private Exception _exception;
    private List<char> _input;

    [SetUp]
    public void BeforeEachTest()
    {
        _input = null;
        _exception = null;
    }
	
    private void with_a_null_input()
    {
        _input = null;
    }

    private void when_asked_to_sort_the_input()
    {
        try
        {
            Sort(_input);
        }
        catch (Exception <span class="hiddenGrammarError" pre="">exception)
        {
            _exception</span> = exception;
        }
    }

    private void should_throw_an_ArgumentNullException()
    {
        _exception.ShouldNotBeNull();
        _exception.ShouldBeOfType<ArgumentNullException>();
    }

run the test

WITH a null input
WHEN asked to sort the input
SHOULD throw an  Argument Null Exception - FAILED

then build the necessary implementation.

    public void Sort<T>(IList<T> input)
    {
        if (input == null)
        {
            throw new ArgumentNullException();
        }
        throw new NotImplementedException();
    }

Step 2 – test for an empty input

Write the test

    [Test]
    public void Given_an_empty_list()
    {
        Test.Verify(
            with_an_empty_list,
            when_asked_to_sort_the_input,
            should_not_throw_an_Exception
            );
    }
	
    private void with_an_empty_list()
    {
        _input = new List<char>();
    }

    private void should_not_throw_an_Exception()
    {
        _exception.ShouldBeNull();
    }	

run the test

WITH an empty list
WHEN asked to sort the input
SHOULD not throw an  Exception - FAILED

FluentAssert.Exceptions.ShouldBeNullAssertionException :   Expected: null
  But was:  System.NotImplementedException: The method or operation is not implemented.

then update the implementation.

    public void Sort<T>(IList<T> input)
    {
        if (input == null)
        {
            throw new ArgumentNullException();
        }
        if (!input.Any())
        {
            return;
        }
        throw new NotImplementedException();
    }

Step 3 – test for a single-element input

    [Test]
    public void Given_a_single_element_list()
    {
        Test.Verify(
            with_a_list,
            that_has_element__A,
            when_asked_to_sort_the_input,
            should_not_throw_an_Exception,
            the_element_of_the_list_should_be__A
            );
    }

    private void with_a_list()
    {
        with_an_empty_list();
    }

    private void that_has_element__A()
    {
        _input.Add('A');
    }

    private void the_element_of_the_list_should_be__A()
    {
        _input.ShouldContainAllInOrder(new[] { 'A' });
    }	

run the test

WITH a list
that has element   A - PASSED
WHEN asked to sort the input
SHOULD not throw an  Exception - FAILED

FluentAssert.Exceptions.ShouldBeNullAssertionException :   Expected: null
  But was:  System.NotImplementedException: The method or operation is not implemented.

update the implementation.

    public void Sort<T>(IList<T> input)
    {
        if (input == null)
        {
            throw new ArgumentNullException();
        }
        if (input.Count <= 1)
        {
            return;
        }
        throw new NotImplementedException();
    }

Step 4a – test sorting list { A B }

    [Test]
    public void Given_a_list_with_elements__A_B()
    {
        Test.Verify(
            with_a_list,
            that_starts_with_element__A,
            followed_by_element__B,
            when_asked_to_sort_the_input,
            should_not_throw_an_Exception,
            the_elements_of_the_sorted_list_should_be__A_B__in_that_order
            );
    }

    private void that_starts_with_element__A()
    {
        that_has_element__A();
    }

    private void followed_by_element__B()
    {
        _input.Add('B');
    }

    private void the_elements_of_the_sorted_list_should_be__A_B__in_that_order()
    {
        _input.ShouldContainAllInOrder("AB".ToArray());
    }

run the test

WITH a list
that starts with element   A - PASSED
followed by element   B - PASSED
WHEN asked to sort the input
SHOULD not throw an  Exception - FAILED

FluentAssert.Exceptions.ShouldBeNullAssertionException :   Expected: null
  But was:  System.NotImplementedException: The method or operation is not implemented.

update the implementation.

    public void Sort<T>(IList<T> input)
    {
        if (input == null)
        {
            throw new ArgumentNullException();
        }
        if (input.Count <= 1)
        {
            return;
        }

    }

Step 4b – test sorting list { B A }

    [Test]
    public void Given_a_list_with_elements__B_A()
    {
        Test.Verify(
            with_a_list,
            that_starts_with_element__B,
            followed_by_element__A,
            when_asked_to_sort_the_input,
            should_not_throw_an_Exception,
            the_elements_of_the_sorted_list_should_be__A_B__in_that_order
            );
    }

    private void that_has_element__B()
    {
        _input.Add('B');
    }

    private void followed_by_element__B()
    {
        that_has_element__B();
    }

    private void that_starts_with_element__B()
    {
        that_has_element__B();
    }

    private void followed_by_element__A()
    {
        that_has_element__A();
    }

run the test

WITH a list
that starts with element   B - PASSED
followed by element   A - PASSED
WHEN asked to sort the input
SHOULD not throw an  Exception - PASSED
the elements of the sorted list should be   A  B  in that order - FAILED

FluentAssert.Exceptions.ShouldBeEqualAssertionException : at offset 0

update the implementation.

    public void Sort<T>(IList<T> input) 
        where T : IComparable
    {
        if (input == null)
        {
            throw new ArgumentNullException();
        }
        if (input.Count <= 1)
        {
            return;
        }
        SwapIfOutOfOrder(input, 0, 1);
    }

    private static void SwapIfOutOfOrder<T>(IList<T> input, int first, int second)
        where T : IComparable
    {
        if (input[first].CompareTo(input[second]) == 1)
        {
            var temp = input[first];
            input[first] = input[second];
            input[second] = temp;
        }
    }

Step 4c – track comparisons

In order to verify that the sort algorithm used by Sort is quicksort we need the ability to introspect what it is doing. Verifying that the output is sorted is insufficient because it could be using some other sort algorithm and we specifically want to implement quicksort. In a previous post we passed an instrumented IComparer along with the list to the sorted to the Sort method. This time we’ll try something different. We’ll wrap the objects being sorted so that we can count the number of comparisons performed during the sort. This has the advantage of working with any Sort implementation that can sort IComparable elements and with lists that contain duplicate elements.

public class ComparableWrapper<T> : IComparable
    where T : IComparable
{
    private readonly T _value;
    private readonly Action _action;

    public ComparableWrapper(T value, Action action)
    {
        _value = value;
        _action = action;
    }

    public T Value
    {
        get { return _value; }
    }

    public int CompareTo(object obj)
    {
        var otherWrapper = obj as ComparableWrapper<T>;
        if (otherWrapper == null)
        {
            throw new ArgumentException("Object is not a ComparableWrapper");
        }
        _action();
        return _value.CompareTo(otherWrapper._value);
    }
}

update the tests to use the wrapper

    private List<ComparableWrapper<char>> _input;
    private int _comparisonCount;

    private void IncrementCount()
    {
        _comparisonCount++;
    }
	
	[SetUp]
    public void BeforeEachTest()
    {
        _comparisonCount = 0;
        _input = null;
        _exception = null;
    }
	
    private void with_an_empty_list()
    {
        _input = new List<ComparableWrapper<char>>();
    }	
	
    private void Add(char value)
    {
        _input.Add(new ComparableWrapper<char>(value, IncrementCount));
    }

    private void that_has_element__A()
    {
        Add('A');
    }

    private void that_has_element__B()
    {
        Add('B');
    }

    private void the_element_of_the_list_should_be__A()
    {
        _input.Select(x => x.Value)
              .ShouldContainAllInOrder(new[] {'A'});
    }
	
    private void the_elements_of_the_sorted_list_should_be__A_B__in_that_order()
    {
        _input.Select(x => x.Value)
          .ShouldContainAllInOrder("AB".ToArray());
    }	

and add to the tests to check the comparison count.

    [Test]
    public void Given_a_list_with_elements__A_B()
    {
        Test.Verify(
            with_a_list,
            that_starts_with_element__A,
            followed_by_element__B,
            when_asked_to_sort_the_input,
            should_not_throw_an_Exception,
            the_elements_of_the_sorted_list_should_be__A_B__in_that_order,
            should_perform_at_most_1_comparison
            );
    }

    private void should_perform_at_most_1_comparison()
    {
        _comparisonCount.ShouldBeLessThanOrEqualTo(1);
    }
	
    [Test]
    public void Given_a_list_with_elements__B_A()
    {
        Test.Verify(
            with_a_list,
            that_starts_with_element__B,
            followed_by_element__A,
            when_asked_to_sort_the_input,
            should_not_throw_an_Exception,
            the_elements_of_the_sorted_list_should_be__A_B__in_that_order,
            should_perform_at_most_1_comparison
            );
    }	

Step 5a – test sorting list { C B A }

    [Test]
    public void Given_a_list_with_elements__C_B_A()
    {
        Test.Verify(
            with_a_list,
            that_starts_with_element__C,
            followed_by_element__B,
            followed_by_element__A,
            when_asked_to_sort_the_input,
            should_not_throw_an_Exception,
            the_elements_of_the_sorted_list_should_be__A_B_C__in_that_order,
            should_perform_at_most_3_comparisons
            );
    }

    private void that_starts_with_element__C()
    {
        that_has_element__C();
    }

    private void that_has_element__C()
    {
        Add('C');
    }

    private void the_elements_of_the_sorted_list_should_be__A_B_C__in_that_order()
    {
        _input.Select(x => x.Value)
            .ShouldContainAllInOrder("ABC".ToArray());
    }

    private void should_perform_at_most_3_comparisons()
    {
        _comparisonCount.ShouldBeLessThanOrEqualTo(3);
    }

run the test

WITH a list
that starts with element   C - PASSED
followed by element   B - PASSED
followed by element   A - PASSED
WHEN asked to sort the input
SHOULD not throw an  Exception - PASSED
the elements of the sorted list should be   A  B  C  in that order - FAILED

FluentAssert.Exceptions.ShouldBeEqualAssertionException : at offset 0

update the implementation.

    public void Sort<T>(IList<T> input)
        where T : IComparable
    {
        if (input == null)
        {
            throw new ArgumentNullException();
        }
        if (input.Count <= 1)
        {
            return;
        }
        SwapIfOutOfOrder(input, 0, 1);
        if (input.Count == 2)
        {
            return;
        }
        SwapIfOutOfOrder(input, 0, 2);
        SwapIfOutOfOrder(input, 1, 2);
    }

Step 5b – test sorting all permutations of list { C B A }

To make sure the code works for all permutations of the list we’ll make use of the Permute extension and FormatForDisplay extension built in previous postings.

    [Test]
    public void Should_be_able_to_sort_all_permutations_of__C_B_A__with_at_most_3_comparisons_each()
    {
        with_a_list();
        that_starts_with_element__C();
        followed_by_element__B();
        followed_by_element__A();

        foreach (var permutation in _input.Permute(3))
        {
            _comparisonCount = 0;
            _input = permutation.ToList();
            try
            {
                when_asked_to_sort_the_input();
                the_elements_of_the_sorted_list_should_be__A_B_C__in_that_order();
                should_perform_at_most_3_comparisons();
            }
            catch (Exception e)
            {
                Console.WriteLine("failed on permutation " + permutation.Select(x => x.Value).FormatForDisplay());
                throw;
            }
        }
    }

run the test and get an unexpected failure

failed on permutation { C A B }

FluentAssert.Exceptions.ShouldBeEqualAssertionException : at offset 0

due to an assumption made about the values returned by CompareTo. Fix it:

    private static void SwapIfOutOfOrder<T>(IList<T> input, int first, int second)
        where T : IComparable
    {
        if (input[first].CompareTo(input[second]) > 0)
        {
            var temp = input[first];
            input[first] = input[second];
            input[second] = temp;
        }
    }

Step 6a – test sorting list { D C A B }

    [Test]
    public void Given_a_list_with_elements__D_C_A_B()
    {
        Test.Verify(
            with_a_list,
            that_starts_with_element__D,
            followed_by_element__C,
            followed_by_element__A,
            followed_by_element__B,
            when_asked_to_sort_the_input,
            should_not_throw_an_Exception,
            the_elements_of_the_sorted_list_should_be__A_B_C_D__in_that_order,
            should_perform_at_most_6_comparisons
            );
    }
	
	private void that_starts_with_element__D()
    {
        that_has_element__D();
    }
	
    private void that_has_element__D()
    {
        Add('D');
    }

	private void followed_by_element__C()
    {
        that_has_element__C();
    }
	
    private void the_elements_of_the_sorted_list_should_be__A_B_C_D__in_that_order()
    {
        _input.Select(x => x.Value)
            .ShouldContainAllInOrder("ABCD".ToArray());
    }

    private void should_perform_at_most_6_comparisons()
    {
        _comparisonCount.ShouldBeLessThanOrEqualTo(6);
    }	

run the test

WITH a list
that starts with element   D - PASSED
followed by element   C - PASSED
followed by element   A - PASSED
followed by element   B - PASSED
WHEN asked to sort the input
SHOULD not throw an  Exception - PASSED
the elements of the sorted list should be   A  B  C  D  in that order - FAILED

FluentAssert.Exceptions.ShouldBeEqualAssertionException : at offset 0

update the implementation to use bubble sort.

    public void Sort<T>(IList<T> input) where T : IComparable
    {
        if (input == null)
        {
            throw new ArgumentNullException();
        }
        for (int outer = input.Count; outer >= 1; outer--)
        {
            for (int inner = 0; inner < outer - 1; inner++)
            {
                SwapIfOutOfOrder(input, inner, inner + 1);
            }
        }
    }

Step 6b – test sorting all permutations of list { D C A B } with at most 6 comparisons each

    [Test]
    public void Should_be_able_to_sort_all_permutations_of__D_C_A_B__with_at_most_6_comparisons_each()
    {
        with_a_list();
        that_starts_with_element__D();
        followed_by_element__C();
        followed_by_element__A();
        followed_by_element__B();

        foreach (var permutation in _input.Permute(4))
        {
            _comparisonCount = 0;
            _input = permutation.ToList();
            try
            {
                when_asked_to_sort_the_input();
                the_elements_of_the_sorted_list_should_be__A_B_C_D__in_that_order();
                should_perform_at_most_6_comparisons();
            }
            catch (Exception e)
            {
                Console.WriteLine("failed on permutation " + permutation.Select(x => x.Value).FormatForDisplay());
                throw;
            }
        }
    }

test passes.

Step 6c – test sorting all permutations of list { D C A B } with less than 6 comparisons each on average

This is the where we force the implementation away from bubble sort. Bubble sort always uses 6 comparisons to sort 4 elements but quicksort only needs 6 in the worst case.

    [Test]
    public void Should_be_able_to_sort_all_permutations_of__D_C_A_B__with_less_than_6_comparisons_each_on_average()
    {
        with_a_list();
        that_starts_with_element__D();
        followed_by_element__C();
        followed_by_element__A();
        followed_by_element__B();

        int totalComparisons = 0;
        int totalPermutations = 0;
        foreach (var permutation in _input.Permute(4))
        {
            totalPermutations++;
            _comparisonCount = 0;
            _input = permutation.ToList();
            try
            {
                when_asked_to_sort_the_input();
                the_elements_of_the_sorted_list_should_be__A_B_C_D__in_that_order();
                should_perform_at_most_6_comparisons();
                totalComparisons += _comparisonCount;
            }
            catch (Exception e)
            {
                Console.WriteLine("failed on permutation " + permutation.Select(x => x.Value).FormatForDisplay());
                throw;
            }
        }

        var average = totalComparisons / (double)totalPermutations;
        average.ShouldBeLessThan(6, () => "Incorrect average number of comparisons - " + average);
    }

run the test

FluentAssert.Exceptions.ShouldBeLessThanAssertionException : Incorrect average number of comparisons - 6

update the implementation using the wikipedia algorithm as a guide.

    public void Sort<T>(IList<T> input)
        where T : IComparable
    {
        if (input == null)
        {
            throw new ArgumentNullException();
        }

        var segments = new Stack<int[]>();
        segments.Push(new[] { 0, input.Count - 1 });
        while (segments.Any())
        {
            var segment = segments.Pop();
            int first = segment[0];
            int last = segment[1];
            int middle = GetMiddle(first, last);
            while (first < last)
            {
                while (first < middle && AreInCorrectOrder(input, first, middle))
                {
                    first++;
                }
                while (last > middle && AreInCorrectOrder(input, middle, last))
                {
                    last--;
                }
                if (first < last)
                {
                    Swap(input, first, last);
                    first++;
                    last--;
                    if (first - 1 == middle)
                    {
                        middle = ++last;
                    }
                    else if (last + 1 == middle)
                    {
                        middle = --first;
                    }
                }
            }
            if (middle - segment[0] > 1)
            {
                segments.Push(new[] { segment[0], middle - 1 });
            }
            if (segment[1] - middle > 1)
            {
                segments.Push(new[] { middle + 1, segment[1] });
            }
        }
    }

    private static int GetMiddle(int first, int last)
    {
        return first + ((last - first) >> 1);
    }
	
    private static bool AreInCorrectOrder<T>(IList<T> input, int first, int second)
        where T : IComparable
    {
        return input[first].CompareTo(input[second]) <= 0;
    }	

    private static void Swap<T>(IList<T> input, int first, int second)
    {
        var temp = input[first];
        input[first] = input[second];
        input[second] = temp;
    }

Step 7 – make sure the implementation can handle all permutations of list { A B C D E F G H }

    [Test]
    public void Should_be_able_to_sort_all_permutations_of__A_B_C_D_E_F_G_H()
    {
        with_a_list();
        "ABCDEFGH".ToList().ForEach(Add);

        foreach (var permutation in _input.Permute(8))
        {
            _comparisonCount = 0;
            _input = permutation.ToList();
            try
            {
                when_asked_to_sort_the_input();
                the_elements_of_the_sorted_list_should_be__A_B_C_D_E_F_G_H_in_that_order();
            }
            catch (Exception e)
            {
                Console.WriteLine("failed on permutation " + permutation.Select(x => x.Value).FormatForDisplay());
                throw;
            }
        }
    }

    private void the_elements_of_the_sorted_list_should_be__A_B_C_D_E_F_G_H_in_that_order()
    {
        _input.Select(x => x.Value)
            .ShouldContainAllInOrder("ABCDEFGH".ToArray());
    }

test passes.

Final – how does this implementation compare to List.Sort?

First get the number of comparisons our Sort uses across all permutations of { A B C D E F G H }

    var input = "ABCDEFGH"
        .Select(x=>new ComparableWrapper<char>(x, IncrementCount))
        .ToList();

    foreach (var permutation in input.Permute(8))
    {
        Sort(permutation);
    }
    Console.WriteLine("total comparisons: "+_comparisonCount);

output

total comparisons: 682272

Now get the number of comparisons List.Sort uses for the same set of inputs.

    var input = "ABCDEFGH"
        .Select(x=>new ComparableWrapper<char>(x, IncrementCount))
        .ToList();

    foreach (var permutation in input.Permute(8))
    {
        permutation.ToList().Sort();
    }
    Console.WriteLine("total comparisons: "+_comparisonCount);

output

total comparisons: 1209888

whoa! Our no-frills implementation of quicksort uses only 56% of the comparisons that List.Sort requires to achieve the same goal!

682272 / 1209888 = 0.563913354

It kind of makes you wonder: which sort algorithm does List.Sort use?

Read Full Post »

I’m curious which sort algorithm the built in List.Sort uses in the .Net framework (runtime version 2.0.50727). One way to find out is to observe the code in action when we give it an array to sort. How? List.Sort has a convenient overload that takes an IComparer<T>. We can instrument an IComparer<T> to record which items are compared and see if we can discern the algorithm. Let’s start with a small array:

[Test]
public void Show_List_Sort_algorithm()
{
    var input = new List<char>{ 'B', 'A' };

make a copy of it

    var copy = input.ToList();

write out the contents of the original to the display

    Console.WriteLine("starting with " + input.FormatForDisplay());

next create an instrumented comparer

    var charComparer = new InstrumentedComparer<char>(copy);

then sort the copy using the built-in List.Sort() and our instrumented comparer

    copy.Sort(charComparer);

write out the sorted contents to the display

    Console.WriteLine("results in "+copy.FormatForDisplay());

once the sort is complete we get the list of comparisons and swaps that were performed from the instrumented comparer

    var result = charComparer.Comparisons;

and write them to the display too.

    WriteComparisonsAndSwaps(result, input);
}

Next we need an instrumented comparer

public class InstrumentedComparer<T> : IComparer<T>
{

that takes the list we are going to sort.

    private readonly IList<T> _current;
    private List<T> _previousState;

    public InstrumentedComparer(IList<T> original)
    {
        _current = original;
        _previousState = original.ToList();
    }

We’ll need access to the comparisons that were performed.

    readonly List<Comparison> _comparisons = new List<Comparison>();
    public IEnumerable<Comparison> Comparisons
    {
        get { return _comparisons; }
    }

We also need to implement the only method on the IComparer<T> interface. We don’t know whether this comparison is going to be used to perform a swap or not but we can determine and record that the previous one resulted in a swap and finish the method by returning the result of comparing the two elements.

    public int Compare(T elementA, T elementB)
    {
        RecordSwaps();
        RecordComparisons(elementA, elementB);

        _previousState = _current.ToList();

        return elementA.CompareTo(elementB);
    }

The use of CompareTo requires T to be comparable so add a constraint to the class.

public class InstrumentedComparer<T> : IComparer<T> where T : IComparable

Now implement the recording methods. Figure out if any values changed positions and if so get their indexes and create a swap record.

    private void RecordSwaps()
    {
        var changedIndexes = Enumerable.Range(0, _current.Count)
            .Where(i => _current[i].CompareTo(_previousState[i]) != 0)
            .ToList();

        // If nothing was swapped then we're done.
        bool swapped = changedIndexes.Any();
        if (!swapped)
        {
            return;
        }

        var last = _comparisons.Last();
        Comparison swappedComparison;
        if (last.IndexA == changedIndexes[0] && last.IndexB == changedIndexes[1])
        {
            swappedComparison = last;
        }
        else
        {
            swappedComparison = new Comparison(changedIndexes[0], changedIndexes[1]);
            _comparisons.Add(swappedComparison);
        }
        swappedComparison.Swapped = true;
    }

As for recording the comparisons, the method’s parameters are array elements instead of indexes so we’ll have to determine the indexes ourselves. For the purposes of recording we’ll assume that there are no duplicates in the array.

    private void RecordComparisons(T elementA, T elementB)
    {
        var indexA = _current.IndexOf(elementA);
        var indexB = _current.IndexOf(elementB);

        var comparison = new Comparison(indexA, indexB)
            {
                Compared = true
            };
        _comparisons.Add(comparison);
    }
}

Next we need to implement the Comparison class.

public class Comparison
{

it takes the two indexes that were compared

    public Comparison(int indexA, int indexB)
    {
        IndexA = indexA;
        IndexB = indexB;
    }

    public int IndexA { get; private set; }
    public int IndexB { get; private set; }

tracks whether a comparion, swap, or both occurred

    public bool Swapped { get; set; }
    public bool Compared { get; set; }

provides a convenient method for swapping two elements

    public void Swap<T>(IList<T> data)
    {
        T temp = data[IndexA];
        data[IndexA] = data[IndexB];
        data[IndexB] = temp;
    }

and overrides ToString for display purposes

    public override string ToString()
    {
        return (1 + IndexA) + "<->" + (1 + IndexB);
    }
}

Next we need to implement the method that writes the comparison information to the display.

private static void WriteComparisonsAndSwaps(IEnumerable<Comparison> comparisons, IList<char> input)
{

Write each comparison that was performed

    foreach(var comparison in comparisons)
    {

and if a swap was performed

        bool swap = comparison.Swapped;
        if (swap)
        {

perform the swap on the original

            comparison.Swap(input);
        }

and display information about the swap

        string text = String.Format("{0}{1}{2} {3} {4}",
                                    comparison.Compared ? "compare" : "",
                                    comparison.Compared && swap ? " and " : "",
                                    swap ? "swap" : "",
                                    comparison,
                                    input.FormatForDisplay());
        Console.WriteLine(text);
    }

Finally we need to implement the FormatForDisplay extension method.

public static class IEnumerablTExtensions
{
    public static string FormatForDisplay<T>(this IEnumerable<T> input)
    {
        return "{ " + String.Join(" ", input.Select(x => x.ToString()).ToArray()) + " }";
    }
}

Okay, now we have a framework for observing List.Sort() in action. Let’s try it out on { B A}

starting with { B A }
results in { A B }
compare and swap 1<->2 { A B }
compare  1<->2 { A B }
compare  1<->1 { A B }
compare  1<->2 { A B }
compare  1<->1 { A B }

Huh? That’s strange. What kind of algorithm would use 6 comparisons to order 2 elements? One might hypothesize from the duplicated

compare  1<->2 { A B }
compare  1<->1 { A B }

pairs that some kind of partitioning is happening. There may also be an off-by-one error due to the comparison of the element in position 1 with itself. Let’s try a larger input to get more data.

    var input = new[] { 'D', 'C', 'B', 'A' };

gets

starting with { D C B A }
results in { A B C D }
compare and swap 1<->2 { C D B A }
compare and swap 1<->4 { A D B C }
compare and swap 2<->4 { A C B D }
compare 1<->2 { A C B D }
compare 2<->2 { A C B D }
compare 2<->4 { A C B D }
compare and swap 2<->3 { A B C D }
compare 1<->2 { A B C D }
compare 1<->2 { A B C D }
compare 1<->1 { A B C D }
compare 1<->2 { A B C D }
compare 1<->1 { A B C D }
compare 3<->4 { A B C D }
compare 3<->4 { A B C D }
compare 3<->3 { A B C D }
compare 3<->4 { A B C D }
compare 3<->3 { A B C D }

We can see more comparison of elements with themselves (1<->1, 2<->2, 3<->3) confirming the off-by-one error int the code. There’s also definitely some partitioning happening. Compare

compare  1<->2 { A B C D }
compare  1<->2 { A B C D }
compare  1<->1 { A B C D }
compare  1<->2 { A B C D }
compare  1<->1 { A B C D }

with

compare  3<->4 { A B C D }
compare  3<->4 { A B C D }
compare  3<->3 { A B C D }
compare  3<->4 { A B C D }
compare  3<->3 { A B C D }

the same relative positions are compared in the same order. But what is happening in the first part?

compare and swap 1<->2 { C D B A }
compare and swap 1<->4 { A D B C }
compare and swap 2<->4 { A C B D }
compare  1<->2 { A C B D }
compare  2<->2 { A C B D }
compare  2<->4 { A C B D }
compare and swap 2<->3 { A B C D }

Let’s compare this sequence with that of a sorted array.

    var input = new[] { 'A', 'B', 'C', 'D' };

output

results in { A B C D }
starting with { A B C D }
compare  1<->2 { A B C D }
compare  1<->4 { A B C D }
compare  2<->4 { A B C D }
compare  1<->2 { A B C D }
compare  2<->2 { A B C D }
compare  2<->4 { A B C D }
compare  2<->3 { A B C D }
compare  2<->2 { A B C D }
compare  3<->4 { A B C D }
compare  3<->4 { A B C D }
compare  3<->3 { A B C D }
compare  3<->4 { A B C D }
compare  3<->3 { A B C D }

The first 7 and last 5 are the same between the two runs so we can figure those are part of the standard algorithm. In the middle the ordered run does

compare  2<->2 { A B C D }

while the reversed run does

compare  1<->2 { A B C D }
compare  1<->2 { A B C D }
compare  1<->1 { A B C D }
compare  1<->2 { A B C D }
compare  1<->1 { A B C D }

Let’s capture the comparisons for an even longer array to get a better idea what is happening in the first part and to see if the partitioning is recursive.

        var input = "ABCDEFGH".Reverse().ToArray(); 

output

starting with { H G F E D C B A }
results in { A B C D E F G H }
compare and swap 1<->4 { E G F H D C B A }
compare and swap 1<->8 { A G F H D C B E }
compare and swap 4<->8 { A G F E D C B H }
compare 1<->4 { A G F E D C B H }
compare 2<->4 { A G F E D C B H }
compare 4<->8 { A G F E D C B H }
compare 4<->7 { A G F E D C B H }
swap 2<->7 { A B F E D C G H }
compare 3<->4 { A B F E D C G H }
compare 4<->6 { A B F E D C G H }
swap 3<->6 { A B C E D F G H }
compare 4<->4 { A B C E D F G H }
compare and swap 4<->5 { A B C D E F G H }
compare 1<->2 { A B C D E F G H }
compare 1<->4 { A B C D E F G H }
compare 2<->4 { A B C D E F G H }
compare 1<->2 { A B C D E F G H }
compare 2<->2 { A B C D E F G H }
compare 2<->4 { A B C D E F G H }
compare 2<->3 { A B C D E F G H }
compare 2<->2 { A B C D E F G H }
compare 3<->4 { A B C D E F G H }
compare 3<->4 { A B C D E F G H }
compare 3<->3 { A B C D E F G H }
compare 3<->4 { A B C D E F G H }
compare 3<->3 { A B C D E F G H }
compare 5<->6 { A B C D E F G H }
compare 5<->8 { A B C D E F G H }
compare 6<->8 { A B C D E F G H }
compare 5<->6 { A B C D E F G H }
compare 6<->6 { A B C D E F G H }
compare 6<->8 { A B C D E F G H }
compare 6<->7 { A B C D E F G H }
compare 6<->6 { A B C D E F G H }
compare 7<->8 { A B C D E F G H }
compare 7<->8 { A B C D E F G H }
compare 7<->7 { A B C D E F G H }
compare 7<->8 { A B C D E F G H }
compare 7<->7 { A B C D E F G H }

We definitely have some recursion. Also note the bare swaps without direct comparison (highlighted).

I think we have enough information to begin reconstructing the algorithm.

The first 3 comparisons

compare and swap 1<->4 { E G F H D C B A }
compare and swap 1<->8 { A G F H D C B E }
compare and swap 4<->8 { A G F E D C B H }

tell us there are 3 indexes. One each for first, middle, and last. Let’s built a Sort method to test against.

    public void Sort<T>(IList<T> items, IComparer<T> comparer)
    {
        int first = 0;
        int last = items.Count -1;
        int middle = last / 2;

Compare the first and middle elements and if they are out of order swap them.

        if (comparer.Compare(items[first], items[middle]) > 0)
        {
            Swap(items, first, middle);
        }

Do the same with the first and last elements.

        if (comparer.Compare(items[first], items[last]) > 0)
        {
            Swap(items, first, last);
        }

And then with the middle and last elements.

        if (comparer.Compare(items[middle], items[last]) > 0)
        {
            Swap(items, middle, last);
        }

Now analyzing the log again we find two sequences of comparison against the middle. One rising

compare 1<->4 { A G F E D C B H }
compare 2<->4 { A G F E D C B H }
compare 3<->4 { A B F E D C G H }
compare 4<->4 { A B C E D F G H }

and one falling.

compare 4<->8 { A G F E D C B H }
compare 4<->7 { A G F E D C B H }
compare 4<->6 { A B F E D C G H }
compare and swap 4<->5 { A B C D E F G H }

This implies a loop where first increments toward the middle while last is decremented towards the middle.

        while(first < last)
        {
            
        }

Now look at log lines 4-8

compare 1<->4 { A G F E D C B H }
compare 2<->4 { A G F E D C B H }
compare 4<->8 { A G F E D C B H }
compare 4<->7 { A G F E D C B H }
swap 2<->7 { A B F E D C G H }

We start at the top and search for the first element that should be swapped with the element in the middle. Element 2 (G) should be swapped with element 4 (E). Then starting with the last position and descending we search for the first element that is less than the element in the middle. Element 7 (B) should be swapped with element 4 (E). We can therefore reason that element 2 (G) should be swapped with element 7 (B) so swap them. We get:

        while(first < last)
        {
            while (first < middle && comparer.Compare(items[first], items[middle]) <= 0)
            {
                first++;
            }

            while (last > middle && comparer.Compare(items[middle], items[last]) <= 0)
            {
                last--;
            }

            if (first < last)
            {
                Swap(items, first, last);
            }
            first++;
            last--;
        }

We can see the outer while loop continuing to run in log lines 9-11

compare 3<->4 { A B F E D C G H }
compare 4<->6 { A B F E D C G H }
swap 3<->6 { A B C E D F G H }

In lines 12-13 first = middle but last > middle

compare 4<->4 { A B C E D F G H }
compare and swap 4<->5 { A B C D E F G H }

This implies a change to our implementation to allow first to equal middle.

        while(first < last)
        {
            while (first <= middle && comparer.Compare(items[first], items[middle]) < 0)
            {
                first++;
            }

            while (last > middle && comparer.Compare(items[middle], items[last]) <= 0)
            {
                last--;
            }

            if (first < last)
            {
                Swap(items, first, last);
            }
            first++;
            last--;
        }

This also reproduces the off-by-one error we saw in the log earlier. The algorithm shouldn’t be comparing elements that have the same index.

So once we’ve finished the run through the while loop, first >= last, all elements that have values greater than that of the middle element should have been moved to positions after it, and all elements that have values less than that of the middle element should have been moved to positions before it. Let’s assume that is true for the moment. What happens next? Look at log positions 14-16

compare 1<->2 { A B C D E F G H }
compare 1<->4 { A B C D E F G H }
compare 2<->4 { A B C D E F G H }

This is recursion kicking in. We’re now repeating the entire sequence for elements 1-4 (top half of the input). We see the same thing happen in log lines 27-29 for the bottom half of the input.

compare 5<->6 { A B C D E F G H }
compare 5<->8 { A B C D E F G H }
compare 6<->8 { A B C D E F G H }

This means we need to finish off our method with calls to repeat the procedure for the top half and bottom half of the input. This further means we need to extract an overload for the Sort method that takes first and last as parameters. To do that we’ll have to keep a copy of the original values of first and last so we can use them in the recursion. We also need a way to stop the recursion. We’ll do that when first and last are adjacent.

    public void Sort<T>(IList<T> items, IComparer<T> comparer)
    {
        int first = 0;
        int last = items.Count -1;
        Sort(items, comparer, first, last);
    }

    private void Sort<T>(IList<T> items, IComparer<T> comparer, int first, int last)
    {
        if (last - first <= 1)
        {
            return;
        }
        int middle = (last + first) / 2;
        if (comparer.Compare(items[first], items[middle]) > 0)
        {
            Swap(items, first, middle);
        }
        if (comparer.Compare(items[first], items[last]) > 0)
        {
            Swap(items, first, last);
        }
        if (comparer.Compare(items[middle], items[last]) > 0)
        {
            Swap(items, middle, last);
        }

        int originalFirst = first;
        int originalLast = last;
        while(first < last)
        {
            while (first <= middle && comparer.Compare(items[first], items[middle]) < 0)
            {
                first++;
            }

            while (last > middle && comparer.Compare(items[middle], items[last]) <= 0)
            {
                last--;
            }

            if (first < last)
            {
                Swap(items, first, last);
            }
            first++;
            last--;
        }

        Sort(items, comparer, originalFirst, middle);
        Sort(items, comparer, 1+middle, originalLast);
    }

So to continue checking our implementation against the log. Lines 14-21 show what happened in the recursion against the top half of the input.

compare 1<->2 { A B C D E F G H }  // first vs middle
compare 1<->4 { A B C D E F G H }  // first vs last
compare 2<->4 { A B C D E F G H }  // middle vs last
// now the loop
compare 1<->2 { A B C D E F G H }  // first vs middle; first++
compare 2<->2 { A B C D E F G H }  // first vs middle
compare 2<->4 { A B C D E F G H }  // last vs middle; last--
compare 2<->3 { A B C D E F G H }  // last vs middle; last--
compare 2<->2 { A B C D E F G H }  // last vs middle !!

Ah. We found a difference in our implementation. The second inner while loop should be

            while (last >= middle && comparer.Compare(items[middle], items[last]) < 0)
            {
                last--;
            }

This is the other off-by-one error in the algorithm that causes an unnecessary comparison.

Continuing with the log in lines 22-26

compare 3<->4 { A B C D E F G H } 
compare 3<->4 { A B C D E F G H }
compare 3<->3 { A B C D E F G H }
compare 3<->4 { A B C D E F G H }
compare 3<->3 { A B C D E F G H }

We know that the method tried to recurse again for the top half of the top half (first=0 and last=1) and hit the recursion guard clause. It then recursed for the bottom half of the top half. This implies that there is a recursion guard just before each of the recursion calls and our initial guard only checks for single-element inputs. So we get the following revised implementation:

    private void Sort<T>(IList<T> items, IComparer<T> comparer, int first, int last)
    {
        if (last - first < 1)
        {
            return;
        }
        int middle = (last + first) / 2;
        if (comparer.Compare(items[first], items[middle]) > 0)
        {
            Swap(items, first, middle);
        }
        if (comparer.Compare(items[first], items[last]) > 0)
        {
            Swap(items, first, last);
        }
        if (comparer.Compare(items[middle], items[last]) > 0)
        {
            Swap(items, middle, last);
        }

        int originalFirst = first;
        int originalLast = last;
        while(first < last)
        {
            while (first <= middle && comparer.Compare(items[first], items[middle]) < 0)
            {
                first++;
            }

            while (last >= middle && comparer.Compare(items[middle], items[last]) < 0)
            {
                last--;
            }

            if (first < last)
            {
                Swap(items, first, last);
            }
            first++;
            last--;
        }

        if (middle - originalFirst >= 1)
        {
            Sort(items, comparer, originalFirst, middle);
        }
        if (originalLast - middle > 1)
        {
            Sort(items, comparer, 1 + middle, originalLast);
        }
    }

Continuing our analysis of log lines 22-26

compare 3<->4 { A B C D E F G H } // first vs middle !
compare 3<->4 { A B C D E F G H }
compare 3<->3 { A B C D E F G H }
compare 3<->4 { A B C D E F G H }
compare 3<->3 { A B C D E F G H }

Wait a second. We recursed with 1+middle = 2 and originalLast=3 so the new first is 2 (offset 3) and the new last is 3 (offset 4). Line 22 shows that the middle is also 3 (offset 4) so we aren’t calculating the middle correctly. Should it be the following?

     int middle = (last + first + 1) / 2;

Nope, that gets the wrong value when first = 0 and last = 7. Ok, let’s pull it out to a method

    int middle = GetMiddle(first, last);
	 
    public int GetMiddle(int first, int last)
    {
	    // not (first + last) / 2;
        // not (first + last + 1) / 2;
        throw new NotImplementedException();
    }	 

and get some tests around the known values

    [Test]
    public void Middle_should_be_3_when_first_is_0_and_last_is_7()
    {
	    // based on log line 1
        const int first = 0;
        const int last = 7;
        var result = GetMiddle(first, last);
        result.ShouldBeEqualTo(3);
    }

    [Test]
    public void Middle_should_be_1_when_first_is_0_and_last_is_3()
    {
        // based on log line 14
        const int first = 0;
        const int last = 3;
        var result = GetMiddle(first, last);
        result.ShouldBeEqualTo(1);
    }	
	
    [Test]
    public void Middle_should_be_3_when_first_is_2_and_last_is_3()
    {
	    // based on log line 22
        const int first = 2;
        const int last = 3;
        var result = GetMiddle(first, last);
        result.ShouldBeEqualTo(3);
    }

This looks like a rounding issue. Let’s see here…

(7 + 0) = 7 / 2 = 3.5 expect 3
(3 + 0) = 3 / 2 = 1.5 expect 1
(3 + 2) = 5 / 2 = 2.5 expect 3

So if the result is odd we round down otherwise round up? The built-in Math.Round only offers rounding options AwayFromZero and ToEven so I guess we have to roll our own. We can use the shift operator to do a quick divide by 2 and truncate.

    public int GetMiddle(int first, int last)
    {
        var truncMiddle = (first + last) >> 1;
        return (truncMiddle&1) == 1 ? truncMiddle : 1 + truncMiddle;
    }

Cool. That fixes it. So now back to analysis of log lines 22-26

compare 3<->4 { A B C D E F G H } // first vs middle
compare 3<->4 { A B C D E F G H } // first vs last
compare 3<->3 { A B C D E F G H } // middle vs last !! expected 4<->4
compare 3<->4 { A B C D E F G H }
compare 3<->3 { A B C D E F G H }

What happened here? it looks like instead of comparing middle to last it compared first to first. Oh wait, look at the last 3 lines. Doesn’t that look like the first-middle, first-last, middle-last pattern? It skipped the middle check.

compare 3<->4 { A B C D E F G H } // first vs middle
compare 3<->4 { A B C D E F G H } // first vs last
// skip middle vs last
compare 3<->3 { A B C D E F G H }
compare 3<->4 { A B C D E F G H }
compare 3<->3 { A B C D E F G H }

We saw this same pattern before when we sorted {B A}.

starting with { B A }
results in { A B }
compare and swap 1<->2 { A B } // first vs middle (rounded up)
compare  1<->2 { A B } // first vs last
// skip middle vs last
compare  1<->1 { A B } // first vs first?
compare  1<->2 { A B } // first vs last?
compare  1<->1 { A B } // first vs first?

Ok, here’s what it looks like it is doing. If it skips the middle vs last it is setting middle to the value of first then entering the while loop.

        if (middle < last)
        {
            if (comparer.Compare(items[middle], items[last]) > 0)
            {
                Swap(items, middle, last);
            }
        }
        else
        {
            middle = first; 
        }

The problem is when we get to the recursion check middle is now equal to first so we recurse again with the same values. One solution is to set a flag when that happens

        bool recurseBottomHalf = true;
        if (middle < last)
        {
            if (comparer.Compare(items[middle], items[last]) > 0)
            {
                Swap(items, middle, last);
            }
        }
        else
        {
            recurseBottomHalf = false;
            middle = first; 
        }

and check it as part of the recursion guard.

        if (originalLast - middle > 1 && recurseBottomHalf)
        {
            Sort(items, comparer, 1 + middle, originalLast);
        }

We now return from recursion and do the same process for the bottom half (4-7) of the input. Working through the remaining log lines gives us this slightly revised result:

    private void Sort<T>(IList<T> items, IComparer<T> comparer, int first, int last)
    {
        if (last - first < 1)
        {
            return;
        }

        int middle = GetMiddle(first, last);
        bool swappedMiddle = false;
        if (comparer.Compare(items[first], items[middle]) > 0)
        {
            swappedMiddle = true;
            Swap(items, first, middle);
        }
        if (comparer.Compare(items[first], items[last]) > 0)
        {
            swappedMiddle = true;
            Swap(items, first, last);
        }
        bool recurseBottomHalf = true;
        if (middle < last)
        {
            if (comparer.Compare(items[middle], items[last]) > 0)
            {
                Swap(items, middle, last);
            }
        }
        else
        {
            recurseBottomHalf = false;
            middle = first; 
        }

        int originalFirst = first;
        int originalLast = last;
        while(first < last)
        {
            while (first <= middle && comparer.Compare(items[first], items[middle]) < 0)
            {
                first++;
            }

            while (last >= middle && comparer.Compare(items[middle], items[last]) <= 0)
            {
                last--;
            }

            if (first < last)
            {
                Swap(items, first, last);
            }
            first++;
            last--;
        }

        if (middle - originalFirst >= 1)
        {
            Sort(items, comparer, originalFirst, middle  - (swappedMiddle ? 0 : 1));
        }
        if (originalLast - middle > 1 && recurseBottomHalf)
        {
            Sort(items, comparer, 1 + middle, originalLast);
        }
    }

You’ll see that the result of running the instrumented comparer against our Sort method is identical to that of the original method.

Are we done? No, because we haven’t tried a range of inputs that would likely chase out more subtleties of the implementation. But this post is quite long enough already.

If you’d like to continue this experiment you can run all permutations of the input string through both Sort methods and see if they get the same result. The Permute method from my previous post will help.

    [Test]
    public void Compare_List_Sort_with_our_Sort()
    {
        var input = "ABCDEFGH".ToArray();
        var permutations = input
            .Permute(input.Length);

        foreach(var permutation in permutations)
        {
            var listSortCopy = permutation.ToList();
            var listSortComparer = new InstrumentedComparer<char>(listSortCopy);
            listSortCopy.Sort(listSortComparer);
            var ourCopy = permutation.ToList();
            var ourComparer = new InstrumentedComparer<char>(listSortCopy);
            Sort(ourCopy, ourComparer);

            bool anyDifferences = Enumerable.Range(0, permutation.Count)
                .Any(i => listSortCopy[i] != ourCopy[i]);

            anyDifferences.ShouldBeFalse(()=>"results differ with input "+permutation.FormatForDisplay());
        }
    }

At this point we have enough information to determine that List.Sort uses a version of the QuickSort algorithm, which was the goal. Interestingly we’ve also identified a couple of places where the built-in implementation is performing unnecessary comparisons.

Update:

After some more experimentation I reported the implementation bugs found above to Microsoft and they fixed them. Cool!

Read Full Post »

The self-referencing string indexer examines an input string and builds an array of index values for that input. The index is built according to rules as outlined below.

Create a String indexer with method int[] Index(string input)

  1. return an empty array if the input is null
  2. return an empty array if the input is empty
  3. the index array should have the same number of elements as the input
  4. the index array values should default to 0
  5. the first value of the index array should always be -1
  6. the second value of the index array should always be 0
  7. when an input character is one position to the right of a character that is the same as the input character at that character’s corresponding index value then its corresponding index value should be 1 + the index value of the character to its left

example:

input = "aab" 
output = [-1,0,1]
	output[0] is -1 // rule 5
	output[1] is 0  // rule 6
	output[2] is 1  // rule 7 because input[2-1] == input[output[2-1]] => input[1] == input[0] => 'a' == 'a'

example:

input = "abcabda" 
output = [-1,0,0,0,1,2,0]
	output[0] is -1 // rule 5
	output[1] is 0  // rule 6
	output[2] is 0  // rule 4
	output[3] is 0  // rule 4
	output[4] is 1  // rule 7 because input[4-1] == input[output[4-1]] => input[3] == input[0] => 'a' == 'a'
	output[5] is 2  // rule 7 because input[5-1] == input[output[5-1]] => input[4] == input[1] => 'b' == 'b'
	output[6] is 0  // rule 4

You have implemented the table-building portion of the Knuth-Morris-Pratt substring search algorithm.

Read Full Post »

Based on this stackoverflow question.

Problem: Given an input array of integers of size n, and a query array of integers of size k, find the smallest window of the input that contains all the elements of the query in the same order.

   var inputArray = new[] { 2, 1, 5, 6, 8, 1, 8, 6, 2, 9, 2, 9, 1, 2 };
   var queryArray = new[] { 6, 1, 2 };

Algorithm:

  1. get all indexes into the inputArray of all queryArray values
  2. order them ascending by index
  3. using each index (x) as a starting point find the first higher index (y) such that the segment inputArray[x-y] contains all queryArray values
  4. keep only those segments that have the queryArray items in order
  5. order the segments by their lengths, ascending

First get all indexes into the inputArray of all queryArray values and order them ascending by index.

    public static int[] SmallestWindow(int[] inputArray, int[] queryArray)
    {
        var indexed = queryArray
            .SelectMany(x => inputArray
                                 .Select((y, i) => new
                                     {
                                         Value = y,
                                         Index = i
                                     })
                                 .Where(y => y.Value == x))
            .OrderBy(x => x.Index)
            .ToList();

Next, using each index (x) as a starting point find the first higher index (y) such that the segment inputArray[x-y] contains all queryArray values.

        var segments = indexed
            .Select(x =>
                {
                    var unique = new HashSet<int>();
                    return new
                        {
                            Item = x,
                            Followers = indexed
                                .Where(y => y.Index >= x.Index)
                                .TakeWhile(y => unique.Count != queryArray.Length)
                                .Select(y =>
                                    {
                                        unique.Add(y.Value);
                                        return y;
                                    })
                                .ToList(),
                            IsComplete = unique.Count == queryArray.Length
                        };
                })
            .Where(x => x.IsComplete);

Now keep only those segments that have the queryArray items in order.

        var queryIndexed = segments
            .Select(x => x.Followers.Select(y => new
                {
                    QIndex = Array.IndexOf(queryArray, y.Value),
                    y.Index,
                    y.Value
                }).ToArray());

        var queryOrdered = queryIndexed
            .Where(item =>
                {
                    var qindex = item.Select(x => x.QIndex).ToList();
                    bool changed;
                    do
                    {
                        changed = false;
                        for (int i = 1; i < qindex.Count; i++)
                        {
                            if (qindex[i] <= qindex[i - 1])
                            {
                                qindex.RemoveAt(i);
                                changed = true;
                            }
                        }
                    } while (changed);
                    return qindex.Count == queryArray.Length;
                });

Finally, order the segments by their lengths, ascending. The first segment in the result is the smallest window into inputArray that contains all queryArray values in the order of queryArray.

        var result = queryOrdered
            .Select(x => new[]
                {
                    x.First().Index,
                    x.Last().Index
                })
            .OrderBy(x => x[1] - x[0]);

        var best = result.FirstOrDefault();
        return best;
    }

run it with

    public void Run()
    {
        var inputArray = new[] { 2, 1, 5, 6, 8, 1, 8, 6, 2, 9, 2, 9, 1, 2 };
        var queryArray = new[] { 6, 1, 2 };

        var result = SmallestWindow(inputArray, queryArray);

        if (result == null)
        {
            Console.WriteLine("no matching window");
        }
        else
        {
            Console.WriteLine("Smallest window is indexes " + result[0] + " to " + result[1]);
        }
    }

output

Smallest window is indexes 3 to 8

caveat: This implementation does not handle a queryArray that has duplicate values.

Read Full Post »

Based on this stackoverflow question.

Given a matrix find the highest sum by choosing elements such that no two have the same row or column.

Consider the following Matrix:

      m1 m2 m3
n1    1  2  3
n2    4  5  6
n3    7  8  9
n4    10 11 12

The highest sum is 12+8+4 = 24

For the following:

     m1  m2
n1   17  1
n2   18  15 

The highest sum is 17+15 = 32.

For a reasonably small matrix you could just find all the combinations by taking one item from each column.

Here’s a c# implementation that solves the 3 column problem.

	public void solve_3_cols()
	{
		var rows = new List<List<int>>
			{
				new List<int>{1,2,3},
				new List<int>{4,5,6},
				new List<int>{7,8,9},
				new List<int>{10,11,12}
			};

		var matrix = rows
			.SelectMany((x, row) => x.Select((y, col) => new
			{
				Row = row,
				Col = col,
				Value = y
			}))
			.ToList();

		var sums = from c1 in matrix
		           let lev1Cols = new HashSet<int>(new[] { c1.Col })
				   let lev1Rows = new HashSet<int>(new[] { c1.Row })
		           from c2 in matrix
		           where !lev1Cols.Contains(c2.Col) && ! lev1Rows.Contains(c2.Row)
				   let lev2Cols = new HashSet<int>(lev1Cols.Concat(new[]{c2.Col}))
				   let lev2Rows = new HashSet<int>(lev1Rows.Concat(new[]{c2.Row}))
				   from c3 in matrix
				   where !lev2Cols.Contains(c3.Col) && !lev2Rows.Contains(c3.Row)
		             select new 
		             	{
		             		Sum = c1.Value + c2.Value + c3.Value,
		             		Items = new[] { c1, c2, c3 }
		             	};

		var result = sums.OrderByDescending(x => x.Sum).First();

		Console.WriteLine(result.Sum);
		result.Items.ForEach(Console.WriteLine);
	}

output:

    24
    { Row = 1, Col = 0, Value = 4 }
    { Row = 2, Col = 1, Value = 8 }
    { Row = 3, Col = 2, Value = 12 }

To solve a different number of columns change the way sums is being built. For example, here’s a variant that solves the 2-column example:

	[Test]
	public void solve_2_cols()
	{
		var rows = new List<List<int>>
			{
				new List<int>{17,1},
				new List<int>{18,15},
			};

		var matrix = rows
			.SelectMany((x, row) => x.Select((y, col) => new
			{
				Row = row,
				Col = col,
				Value = y
			}))
			.ToList();

		var sums = from c1 in matrix
				   let lev1Cols = new HashSet<int>(new[] { c1.Col })
				   let lev1Rows = new HashSet<int>(new[] { c1.Row })
				   from c2 in matrix
				   where !lev1Cols.Contains(c2.Col) && !lev1Rows.Contains(c2.Row)
				   select new
				   {
					   Sum = c1.Value + c2.Value,
					   Items = new[] { c1, c2 }
				   };

		var result = sums.OrderByDescending(x => x.Sum).First();

		Console.WriteLine(result.Sum);
		result.Items.ForEach(Console.WriteLine);
	}

output:

    32
    { Row = 0, Col = 0, Value = 17 }
    { Row = 1, Col = 1, Value = 15 }

If we were to introduce a couple of objects like Cell and Sum we could create a solver that works for a variable number of columns.

The rub is, finding all combinations is not a good solution for very large matrices.

Read Full Post »

%d bloggers like this: