Nature rewards survival with genetic replication. If a frog lives until it is mature enough to reproduce then it is considered successful. Its descendants will inherit some of its genes and some from its mate. If through this genetic variation a descendant develops a contact toxin on its skin, the frog species could, however, become overabundant due to lack of predators. Like the natural process for which they are named, genetic algorithms work through successive approximation towards a goal and breeding successful but not overly specialized descendants.
Problem to solve:
Without hard coding, generate the string “Hello World!”
Round 1: generate random strings until we match the expected value.
internal class Program
{
public static void Main()
{
new StringDuplicator().Duplicate("Hello world!");
}
}
public class StringDuplicator
{
public string Duplicate(string toMatch)
{
var result = new GeneticSolver().GetBest();
Console.WriteLine(result);
return result;
}
}
public class GeneticSolver
{
readonly Random _random = new Random();
public string GetBest(string toMatch)
{
//...
}
}
How does GetBest() know how many characters to generate? We could get that from the string itself with toMatch.Length. How does GetBest() know which characters to use? Does it get those from the string too? Does it use all possible characters or only printable ones? Let’s be more specific.
Round 1 revised: given # of genes, gene set, and expected result, generate random strings until we match the expected value.
public static class TExtensions
{
public static IEnumerable<T> Generate<T>(this T initial, Func<T> next)
{
var current = initial;
while (true)
{
yield return current;
current = next();
}
}
}
public class GeneticSolver
{
private readonly Random _random = new Random();
private string GenerateSequence(int length, string geneSet)
{
Func<char> next = () => geneSet[_random.Next(0, geneSet.Length)];
char initial = next();
return new String(initial.Generate(next).Take(length).ToArray());
}
public string GetBest(int length, string geneSet, string toMatch)
{
string generated = GenerateSequence(length, geneSet);
while (generated != toMatch)
{
generated = GenerateSequence(length, geneSet);
}
return generated;
}
}
public class StringDuplicator
{
public string Duplicate(string toMatch)
{
var solver = new GeneticSolver();
string geneSet = new String(toMatch.Distinct().ToArray());
string result = solver.GetBest(toMatch.Length,
geneSet,
toMatch);
Console.WriteLine(result);
return result;
}
}
Will it work? Probably not in our lifetime. We’re looking at 1 chance in 54^12 that it will generate the expected result. That is the problem with all or nothing fitness. Here we’ve said “get it exactly right or try again”. Few problems are likely to be simple enough that we can evolve a solution based solely on success or failure. Usually we’ll also need a measure of near success. This implies two scoring systems: one for unsuccessful individuals, and another for optimizing successful ones. These can often be combined.
This time, instead of all-or-nothing we’ll use a fitness function that gives points for the number of characters that are correct.
public class StringDuplicator
{
public string Duplicate(string toMatch)
{
var solver = new GeneticSolver();
int geneCount = toMatch.Length;
Func<string, int> getFitness = child =>
{
int matches = Enumerable.Range(0, geneCount)
.Count(x => child[x] == toMatch[x]);
return matches;
};
string geneSet = new String(toMatch.Distinct().ToArray());
string result = solver.GetBest(toMatch.Length,
geneSet,
getFitness);
Console.WriteLine(result);
return result;
}
}
The second thing we need is a way to breed better results from previous results. Instead of generating completely random children we’ll generate an initial parent and create a child by taking the parent’s genes and making one random change. When the child gets a better fitness result than the current parent we make it the new parent.
Round 2: given # of genes, gene set, and a fitness function, generate random strings until we match the expected value.
public string GetBest(int length, string geneSet, Func<string, int> getFitness)
{
string parent = GenerateSequence(length, geneSet);
int parentScore = getFitness(parent);
while (parentScore != length)
{
string child = Mutate(parent, geneSet);
int childScore = getFitness(child);
if (childScore > parentScore)
{
parentScore = childScore;
parent = child;
}
}
return parent;
}
private string Mutate(string parent, string geneSet)
{
int location = _random.Next(0, parent.Length);
var parentGenes = parent.ToCharArray();
parentGenes[location] = geneSet[_random.Next(0, geneSet.Length)];
return new String(parentGenes);
}
This time it runs in a fraction of a second… but how did it do it? Let’s pass in an action that we can call to display each successive child improvement. Let’s also track the number of generations and the elapsed time.
public string GetBest(int length, string geneSet, Func<string, int> getFitness, Action<int, int, string> displayChild)
{
int generationCount = 1;
var parent = GenerateSequence(length, geneSet);
var parentScore = getFitness(parent);
displayChild(generationCount, parentScore, parent);
while (parentScore != length)
{
var child = Mutate(parent, geneSet);
var childScore = getFitness(child);
if (childScore > parentScore)
{
parentScore = childScore;
parent = child;
displayChild(generationCount, parentScore, parent);
}
generationCount++;
}
return parent;
}
public string Duplicate(string toMatch)
{
var solver = new GeneticSolver();
int geneCount = toMatch.Length;
Func<string, int> getFitness = child =>
{
int matches = Enumerable.Range(0, geneCount)
.Count(x => child[x] == toMatch[x]);
return matches;
};
string geneSet = new String(toMatch.Distinct().ToArray());
var stopwatch = new Stopwatch();
stopwatch.Start();
Action<int, int, string> displayCurrentBest =
(generation, fitness, item) =>
Console.WriteLine("generation\t{0} fitness\t{1} {2}\telapsed: {3}",
generation.ToString().PadLeft(5, ' '),
fitness.ToString().PadLeft(TotalWidth(toMatch), ' '),
item,
stopwatch.Elapsed);
string result = solver.GetBest(toMatch.Length,
geneSet,
getFitness,
displayCurrentBest);
Console.WriteLine(result);
return result;
}
private static int TotalWidth(string expected)
{
return (int)Math.Floor(Math.Log10(Math.Abs(expected.Length != 0 ? expected.Length : 1))) + 1;
}
Run it now and we get output like the following:
generation 1 fitness 0 rl!dHdrwl r elapsed: 00:00:00.0093788 generation 4 fitness 1 re!dHdrwl r elapsed: 00:00:00.0110906 generation 18 fitness 2 He!dHdrwl r elapsed: 00:00:00.0112696 generation 20 fitness 3 He!lHdrwl r elapsed: 00:00:00.0115229 generation 24 fitness 4 He!lHdrol r elapsed: 00:00:00.0117862 generation 31 fitness 5 He!lHdroll r elapsed: 00:00:00.0120102 generation 52 fitness 6 He!lHdrorl r elapsed: 00:00:00.0122658 generation 97 fitness 7 He!lodrorl r elapsed: 00:00:00.0124217 generation 125 fitness 8 He!lodrorl ! elapsed: 00:00:00.0125791 generation 213 fitness 9 He!lodrorld! elapsed: 00:00:00.0127173 generation 234 fitness 10 Hellodrorld! elapsed: 00:00:00.0128413 generation 327 fitness 11 Hello rorld! elapsed: 00:00:00.0129618 generation 460 fitness 12 Hello world! elapsed: 00:00:00.0131015 Hello world!
You can download the final source code from my github repository.
Experiment with longer strings and larger gene sets.
Welcome to the world of genetic algorithms!
Move on to Part 2