Feeds:
Posts
Comments

Archive for the ‘Functional Programming’ Category

Given the ability to generate prime numbers we can easily determine the prime factors of an integer.

public IEnumerable<int> GetFactors(int input)
{
    int first = Primes()
        .TakeWhile(x => x <= Math.Sqrt(input))
        .FirstOrDefault(x => input % x == 0);
    return first == 0
            ? new[] { input }
            : new[] { first }.Concat(GetFactors(input / first));
}

usage:

public void GetPrimeFactors()
{
    var factors = GetFactors(825);
    factors.ToList().ForEach(Console.WriteLine);
}

output:

3
5
5
11

If we multiply these prime numbers together we get the original input: 825.

Read Full Post »

The WikiPedia definition:

A prime is a natural number that has exactly two distinct natural number divisors: 1 and itself.

Based on this definition we can ignore the divisible by 1 test and start our divisibility tests at 2 and go up to 1 less than the number being tested for primality.

public IEnumerable<int> Primes()
{
	var ints = Enumerable.Range(2, Int32.MaxValue - 1);
	return ints.Where(x => !ints
							.TakeWhile(y => y < x)
							.Any(y => x % y == 0));
}

output:

2 3 5 7 11 13 17 19 23 29 31 …

On my box this implementation takes 13.18 seconds to get the 10,000th prime number.

It works but we’re wasting cycles by going all the way to n-1. Consider the number 12. Its factors include (2×6) and (3×4). We don’t have to test 12 being evenly divisible by 6 if we already tested 2. The same is true for 4. It turns out that, worst case, we only have to try divisors up to the square root of the number – consider 25.

Revised, implementation colored definition:

A number is prime if it cannot be evenly divided by any integer between 2 and its square root.

public IEnumerable<int> Primes()
{
    var ints = Enumerable.Range(2, Int32.MaxValue - 1);
    return ints.Where(x => !ints
                                .TakeWhile(y => y <= Math.Sqrt(x))
                                .Any(y => x % y == 0));
}

This implementation is substantially faster. It finds the 10,000th prime in a fraction of a second and getting the 100,000th prime takes only 3.42 seconds.

But with this implementation we’re paying a Math.Sqrt tax on each test so let’s pull that out of the TakeWhile():

public IEnumerable<int> Primes()
{
	var ints = Enumerable.Range(2, Int32.MaxValue - 1);
	return ints.Where(x =>
	                  	{
	                  		var sqrt = Math.Sqrt(x);
	                  		return !ints
	                       	        	.TakeWhile(y => y <= sqrt)
	                       	        	.Any(y => x % y == 0);
	                  	});
}

This refactoring brings the time required to find the 100,000th prime down to 2.82 seconds.

Another optimization is to recognize that 2 is the only even numbered prime because any other even valued prime candidate would be divisible by 2:

public IEnumerable<int> Primes()
{
	return new[] { 2 }.Concat(OddInts().Where(x =>
                      	{
                      		var sqrt = Math.Sqrt(x);
                      		return !OddInts()
                           	        	.TakeWhile(y => y <= sqrt)
                           	        	.Any(y => x % y == 0);
                      	}));
}

public IEnumerable<int> OddInts()
{
	int start = 1;
	while (start > 0)
	{
		yield return start+=2;
	}
}

This brings the time down to 1.86 seconds on my box.

Another potential optimization is to change the definition to:

A number is prime if it cannot be evenly divided by any prime number between 2 and its square root.

public IEnumerable<int> Primes()
{
	return new[] { 2 }.Concat(OddInts().Where(x =>
                      	{
                      		var sqrt = Math.Sqrt(x);
                      		return !Primes()
                           	        	.TakeWhile(y => y <= sqrt)
                           	        	.Any(y => x % y == 0);
                      	}));
}

This implementation, however, incurs recursion overhead due to repeated calls to Primes() and as a result to takes 3.88 seconds just to generate the 10,000th prime.

So, returning to the previous best… Another optimization is to take advantage of the fact that all integers can be expressed as (6k+i) where k is some integer and i is a number from the range [-1,4]. Half of those are divisible by 2 (6k+0, 6k+2, and 6k+4) and two are divisible by 3 (6k+0 and 6k+3). That leaves just 6k-1 and 6k+1. Sample sequence:

6*0-1  -> -1
6*0+1  -> 1
6*1-1  -> 5
6*1+1  -> 7
6*2-1  -> 11
6*2+1  -> 13
6*3-1  -> 17
6*3+1  -> 19
... 

This means OddInts() can become PotentialPrimes()

public IEnumerable<int> PotentialPrimes()
{
	yield return 3;
	int k = 1;
	while (k > 0)
	{
		yield return 6*k-1;
		yield return 6*k+1;
		k++;
	}
}

public IEnumerable<int> Primes()
{
	return new[] { 2 }.Concat(PotentialPrimes().Where(x =>
	{
		var sqrt = Math.Sqrt(x);
		return !PotentialPrimes()
					.TakeWhile(y => y <= sqrt)
					.Any(y => x % y == 0);
	}));
}

Which finds the 100,000th prime in 1.68 seconds.

If we’re going to yield 3 then we might as well yield 2 as well to clean up the implementation of Primes()

public IEnumerable<int> Primes()
{
	return PotentialPrimes().Where(x =>
	{
		var sqrt = Math.Sqrt(x);
		return !PotentialPrimes()
					.TakeWhile(y => y <= sqrt)
					.Any(y => x % y == 0);
	});
}

public IEnumerable<int> PotentialPrimes()
{
	yield return 2;
	yield return 3;
	int k = 1;
	while (k > 0)
	{
		yield return 6*k-1;
		yield return 6*k+1;
		k++;
	}
}

It takes a fraction of a second longer (1.71 seconds) to find the 100,000th prime but the clean implementation is worth the cost at this point.

9 Jan 2011 update:

The next optimization we can add, at the cost of memory, is to memoize the prime numbers and use them for checking the primality of potential primes instead of generating the potential prime list every time.

    public static IEnumerable<int> Primes()
    {
        var memoized = new List<int>();
        var primes = PotentialPrimes().Where(x =>
            {
                double sqrt = Math.Sqrt(x);
                return !memoized
                            .TakeWhile(y => y <= sqrt)
                            .Any(y => x % y == 0);
            });
        foreach(var prime in primes)
        {
            yield return prime;
            memoized.Add(prime);
        }
    }

This reduces the time to find the 100,000th prime to 0.69 seconds on my box and starts to put the millionth prime within reach.

Another improvement I’m conflicted about is to eliminate the test in the PotentialPrimes() loop by using a goto. I’m conflicted first because I never use goto and second because we won’t have any warning if k wraps past Int32.Max into negative values. Here it is for your evalutation:

    private static IEnumerable<int> PotentialPrimes()
    {
        yield return 2;
        yield return 3;
        int k = 1;
        loop:
        yield return k * 6 - 1;
        yield return k * 6 + 1;
        k++;
        goto loop;
    }

This reduces the time to get the 100,000th prime by almost 10% (0.63 seconds).

1 Jul 2011 update:

The memoized non-goto version above averages 0.516 seconds over 100 runs on the .Net 4.0 runtime.

It occurred to me that Math.Sqrt is overkill for this function as it returns a precise (double) answer and I only need an integer. With that in mind I experimented with getting a close upper bound on the integer value of the square root.

	public static IEnumerable<int> Primes()
	{
		var memoized = new List<int>();
		int sqrt = 1;
		var primes = PotentialPrimes().Where(x =>
			{
				sqrt = GetSqrtCeiling(x, sqrt);
				return !memoized
					        .TakeWhile(y => y <= sqrt)
					        .Any(y => x % y == 0);
			});
		foreach (int prime in primes)
		{
			yield return prime;
			memoized.Add(prime);
		}
	}

	private static int GetSqrtCeiling(int value, int start)
	{
		while (start * start < value)
		{
			start++;
		}
		return start;
	}

This is ~2% faster, averaging 0.505 seconds over 100 runs.

Interestingly, the goto variant of PotentialPrimes() doesn’t appear to improve performance anymore under the .Net 4.0 runtime.

Read Full Post »

%d bloggers like this: