Feeds:
Posts
Comments

Archive for the ‘Genetic Algorithms’ Category

In Part 1 we built a basic genetic solver that used mutation to solve problems. In this part we’re going to tackle a slightly more complex problem, the 8 Queens Puzzle, and then expand the solver as necessary.

Get the latest version of this post as a free chapter from my eBook Genetic Algorithms with Python

The 8 Queens Puzzle involves putting 8 queens on a standard chess board such that none are under attack. According to WikiPedia there are only 92 solutions to this puzzle and once we remove mirrorings and rotations there are only 12 unique solutions.

To start with we need to define a genome. The chess board conveniently has the same number of rows as columns (8) so we’ll use the digits 1-8 for our genes.

geneset = '12345678'

Array indexes are zero based however, so we’ll need to convert gene symbols to row and column values. To do that we’ll find the gene’s index into the set of gene symbols then use that index as a row or column, and combine those to make a Point.

class Point:
    Row = None
    Col = None

    def __init__(self, row, col):
        self.Row = row
        self.Col = col

def getPoint(rowGene, colGene, gene_set):
    rowIndex = gene_set.index(rowGene)
    if rowIndex == -1:
        raise ValueError("'" + rowGene + "' is an invalid gene")
    colIndex = gene_set.index(colGene)
    if colIndex == -1:
        raise ValueError("'" + colGene + "' is an invalid gene")
    return Point(rowIndex, colIndex)

We also need to be able plot the Points on a board.

def getBoard(candidate, gene_set):
    board = [['.'] * 8 for i in range(8)]
    for index in range(0, len(candidate), 2):
        point = getPoint(candidate[index], candidate[index + 1], gene_set)
        board[point.Row][point.Col] = 'Q'
    return board

And we want to be able to visualize the board.

def display(candidate, gene_set, startTime):
    timeDiff = datetime.datetime.now() - startTime
    board = getBoard(candidate.Genes, gene_set)
    for i in range(8):
        print(board[i][0],
              board[i][1],
              board[i][2],
              board[i][3],
              board[i][4],
              board[i][5],
              board[i][6],
              board[i][7]
              )
    print("%s\t%i\t%s" % (candidate.Genes, candidate.Fitness, str(timeDiff)))

This will give us output like the following

. Q . . . . . .
. . Q . . . . .
. . . . Q Q . .
. . . . . . . Q
. . . . . . . .
Q . . . . . . .
. . . . . . Q .
. . . . Q . . .
7761124823353685	29	0:00:00.013001

The row of digits under the board is the set of genes that created the board layout. The number to the right will be the fitness, a measure of how close this set of genes is to the desired result. To drive improvement we’ll want to increase the fitness value whenever the related board position lets more queens coexist on the board. Let’s think about how we can do that.

We’ll start with counting the number of columns that have a queen. Here’s a layout that gets an optimal score but is undesirable

Q Q Q Q Q Q Q Q
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .

We’ll also count the number of rows that have queens. Here’s a revised board where both counts are optimal but the layout still allows queens to attack each other.

Q . . . . . . .
. Q . . . . . .
. . Q . . . . .
. . . Q . . . .
. . . . Q . . .
. . . . . Q . .
. . . . . . Q .
. . . . . . . Q

To fix this problem we’ll include the count of the number of southeast diagonals that have a queen. Again we can find a corner case as follows:

. . . . . . . Q
. . . . . . Q .
. . . . . Q . .
. . . . Q . . .
. . . Q . . . .
. . Q . . . . .
. Q . . . . . .
Q . . . . . . .

To fix this final problem we’ll include the count of the number of northeast diagonals that have a queen.

Our fitness value will be the sum of those four counts, 8+8+8+8 being optimal.

Side note on implementation, we can calculate the northeast diagonals in Excel using the formula =$A2+B$1 which results in a grid as follows

    0   1   2   3   4   5   6   7
0   0   1   2   3   4   5   6   7
1   1   2   3   4   5   6   7   8
2   2   3   4   5   6   7   8   9
3   3   4   5   6   7   8   9   10
4   4   5   6   7   8   9   10  11
5   5   6   7   8   9   10  11  12
6   6   7   8   9   10  11  12  13
7   7   8   9   10  11  12  13  14

The southeast diagonals can be calculated using =(8-1-$A2)+B$1 which we can visualize as follows:

    0   1   2   3   4   5   6   7
0   7   8   9   10  11  12  13  14
1   6   7   8   9   10  11  12  13
2   5   6   7   8   9   10  11  12
3   4   5   6   7   8   9   10  11
4   3   4   5   6   7   8   9   10
5   2   3   4   5   6   7   8   9
6   1   2   3   4   5   6   7   8
7   0   1   2   3   4   5   6   7

Using the above 2 formulas along with the row and column values lets us touch each board position exactly once, which makes our fitness function run fast.

def getFitness(candidate, gene_set):
    board = getBoard(candidate, gene_set)
    rowsWithQueens = {}
    colsWithQueens = {}
    northEastDiagonalsWithQueens = {}
    southEastDiagonalsWithQueens = {}
    for row in range(8):
        for col in range(8):
            if board[row][col] == 'Q':
                rowsWithQueens[row] = 1
                colsWithQueens[col] = 1
                northEastDiagonalsWithQueens[row + col] = 1
                southEastDiagonalsWithQueens[8 - 1 - row + col] = 1

    return len(rowsWithQueens) \
           + len(colsWithQueens) \
           + len(northEastDiagonalsWithQueens) \
           + len(southEastDiagonalsWithQueens)

Finally our main (integration test) method will bring all the parts together.

class EightQueensTests(unittest.TestCase):
    def test(self):
        geneset = '12345678'
        startTime = datetime.datetime.now()
        fnDisplay = lambda candidate: display(candidate, geneset, startTime)
        fnGetFitness = lambda candidate: getFitness(candidate, geneset)
        best = genetic.getBest(fnGetFitness, fnDisplay, 16, 8 + 8 + 8 + 8, geneset)
        self.assertEqual(best.Fitness, 8 + 8 + 8 + 8)

Note that we’ve added a parameter to getBest. This is because in Part 1 the solver determined the optimal fitness internally but now we want pass it in.

def getBest(get_fitness, display, targetLen, optimalFitness, geneSet):
...
    while bestParent.Fitness < optimalFitness:

We can run our test to see if the solver can find a solution.

Q . . . . . . .
. . . . . . . .
. . . . Q . . .
. . . . Q . . .
. Q . . Q . . .
. . . . . . . .
. . . Q . . . .
Q . Q . . . . .
4574811152355583	23	0:00:00
Q . . . . . . .
. . . . . . . .
. . . . Q . . .
. . . . Q . . .
. . . . Q . . Q
. . . . . . . .
. . . Q . . . .
Q . Q . . . . .
4574811158355583	24	0:00:00
Q . . . . . . .
. . . . . . . .
. . . . Q . . .
. . Q . Q . . .
. . . . Q . . Q
. . . . . . . .
. . . Q . . . .
Q . . . . . . .
4574811158355543	25	0:00:00.001000
Q . . . . . . .
. . . . . . . .
. . . . Q . . .
. . Q . Q . . .
. . . . . . Q Q
. . . . . . . .
. . . Q . . . .
Q . . . . . . .
4574811158355743	26	0:00:00.002000
Q . . . . . . .
. . . . . . . .
Q . . . . . . .
. . Q . Q . . .
. . . . . . Q Q
. . . . . . . .
. . . Q . . . .
Q . . . . . . .
4574811158315743	27	0:00:00.003000
Q . . . . . . .
. . . . . . . Q
Q . . . . . . .
. . Q . Q . . .
. . . . . . Q .
. . . . . . . .
. . . Q . . . .
Q . . . . . . .
4574811128315743	28	0:00:00.004000
Q . . . . . . .
. . . . . . . Q
Q . . . . . . .
. . Q . Q . . .
. . . . . . Q .
. . . . . . . .
. . . Q . . . .
. Q . . . . . .
4574821128315743	29	0:00:00.005001
Q . . . . . . .
. . . . . Q . .
Q . . . . . . .
. . Q . Q . . .
. . . . . . Q .
. . . . . . . .
. . . Q . . . .
. Q . . . . . .
4574821126315743	30	0:00:00.009001

After several test runs we find that most of the time it gets close but can’t get all the way to the optimal value of 32. We need to enhance the solver’s capabilities for it to be able to handle this problem.

If at first you don’t succeed, try, try again!

We’re going to do that by introducing a second genetic line. We’ll mutate the 2nd line as long as it is improving. If it ends up with a better fitness than bestParent then it will become the bestParent. Otherwise, we’ll start a new genetic line again with a random gene sequence. We repeat this process over and over until we find an optimal result. Here’s the updated solver loop:

def getBest(get_fitness, display, targetLen, optimalFitness, geneSet):
    random.seed()
    bestParent = generateParent(targetLen, geneSet, get_fitness)
    display(bestParent)

    while bestParent.Fitness < optimalFitness:
        parent = generateParent(targetLen, geneSet, get_fitness)
        attemptsSinceLastImprovement = 0
        while attemptsSinceLastImprovement < 128:
            child = mutate(parent, geneSet, get_fitness)
            if child.Fitness > parent.Fitness:
                parent = child
                attemptsSinceLastImprovement = 0
            attemptsSinceLastImprovement += 1

        if bestParent.Fitness < parent.Fitness:
            bestParent, parent = parent, bestParent
            display(bestParent)

    return bestParent

Now when we run the 8 queens test we find an optimal solution every time.

. Q . . Q . . .
. . . . Q . . .
. . . . . . . Q
. . . . . . . Q
. . . . . . . .
. . . . . . . .
. . . . . . . .
Q . . . . . . .
8148381525122525	20	0:00:00
. . Q . . . . .
. . Q . . . . .
. . . . . . Q .
. . Q . . . . .
. . . . . . . .
. . . Q . . . .
Q . . . . . . .
. . . . Q . . Q
2385884313647137	28	0:00:00.013001
. . . . . . Q .
. . . . . . . .
. . . Q . . . .
. . . . . Q . Q
Q . . . . . . .
. . . . Q . . .
. Q . . . . . .
. . . . Q . . .
3417486546857251	30	0:00:00.041003
. . Q . . . . .
. . . . Q . . .
. Q . . . . . .
. . . Q . . . .
Q . . . . . . .
. . Q . . . . .
. . . . . . . Q
. . . . . Q . .
5178258613324463	31	0:00:00.143008
. . . . Q . . .
. . Q . . . . .
Q . . . . . . .
. . . . . . Q .
. Q . . . . . .
. . . . . . . Q
. . . . . Q . .
. . . Q . . . .
5247312368841576	32	0:00:00.308018

However, when we run the string duplication test from Part 1 it now struggles to find a solution. This is because we ignore our best line completely once we find it and only try to improve by mutating a new genetic line.

JoyEMlECsatqzfBVFkvIHduMg!HAFtnWcC	3	0:00:00
Nj!HBLwEHvdse wGGOKbJctg AriqlNDPP	8	0:00:00.022001
NBiBRlE thoHfFWhW !tnnFyuHAv wputM	11	0:00:00.062004
NUt alx xmusGiwTmEwprder are lKsdG	19	0:00:00.112007
Not allkthosbHnhoCHinvBr OQe losmR	21	0:00:00.852049
NotW!lT NhoJe VhRFwaDderiaCe most.	22	0:00:02.510144
Not alF tQose S!o wDndUZ yre lostA	25	0:00:05.847335
NotXLll those whopwWnder are lost.	30	0:01:33.040322

We need a way to continue to take advantage of the genes in bestParent. One way nature handles this is through crossbreeding so we’ll introduce a crossover strategy where we take one random gene from a 2nd (the best) parent.

def crossover(parent, parent2, get_fitness):
    index = random.randint(0, len(parent.Genes) - 1)
    genes = list(parent.Genes)
    genes[index] = parent2.Genes[index]
    childGenes = (''.join(genes))
    fitness = get_fitness(childGenes)
    return Individual(childGenes, fitness)

And update the main loop to randomly choose whether to use mutation or crossover to improve the 2nd genetic line.

    options = {
        0: lambda p: mutate(p, geneSet, get_fitness),
        1: lambda p: crossover(p, bestParent, get_fitness)
    }

    while bestParent.Fitness < optimalFitness:
        parent = generateParent(targetLen, geneSet, get_fitness)
        attemptsSinceLastImprovement = 0
        while attemptsSinceLastImprovement < 128:
            child = options[random.randint(0, len(options) - 1)](parent)

Now when we run the tests both are able to achieve optimal results every time.

 

Refactor

 

Now that everything works we’re going to do some code hygiene. In solving this problem we used genes “12345678” but it would have been more convenient, and faster, if we could have used raw integers 0-7. So let’s make that change. I’ll show selected changes below but you can get the full set from github.

class EightQueensTests(unittest.TestCase):
    def test(self):
        geneset = [0, 1, 2, 3, 4, 5, 6, 7]
    ...
        fnDisplay = lambda candidate: display(candidate, startTime)
        fnGetFitness = lambda candidate: getFitness(candidate)
    ...

This means both Point and getPoint can go away, simplifying getBoard to:

def getBoard(candidate):
    board = [['.'] * 8 for i in range(8)]
    for index in range(0, len(candidate), 2):
        board[candidate[index]][candidate[index + 1]] = 'Q'
    return board

In display we go the other way so we can show all the genes as a string.

def display(candidate, startTime):
    timeDiff = datetime.datetime.now() - startTime
    board = getBoard(candidate.Genes)
    for i in range(8):
        print(board[i][0],
              board[i][1],
              board[i][2],
              board[i][3],
              board[i][4],
              board[i][5],
              board[i][6],
              board[i][7]
              )
    print("%s\t%i\t%s" % (''.join(map(str, candidate.Genes)), candidate.Fitness, str(timeDiff)))

To support the geneset being an array in the solver we only have to eliminate the line in crossover, mutate and generateParent that converts the child gene sequence to a string.

Read more of this series in my book Genetic Algorithms with Python

 

Read Full Post »

The goal of this, my first program in Python, is to reproduce a target string (like Hello World!) without looking directly at it. I’ll do this with a simple genetic algorithm that randomly generates an initial sequence of characters and then mutates one random character in that sequence at a time until it matches the target. Think of this like playing a Hangman variant where you pass a letter sequence to the person who knows the target word, and the only feedback you get is how many of your letters are correct. It is also reminiscent of the game Hotter Colder, except we’re doing it with code.

We start off with a standard set of letters for genes and a target string:

    geneset = " abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!."
    target = "Not all those who wander are lost."

Next we need a way to generate a random gene sequence from the gene set.

def generateParent(length):
    genes = list("")
    for i in range(0,length):
        geneIndex = random.randint(0, len(geneset) -1);
        genes.append(geneset[geneIndex])
    return(''.join(genes))

There are many ways to calculate a fitness value (how close the guess is to the target) for the generated string. For this particular problem we’ll simply count the number of characters that are the same between the candidate string and the target string.

def getFitness(candidate, target):
   fitness = 0
   for i in range(0, len(candidate)):
       if target[i] == candidate[i]:
           fitness += 1
   return(fitness)

We also need a way to produce a new child gene sequence by mutating the existing (parent) string.  The point is to create a copy of the parent then replace 1 letter/gene in the copy with a randomly selected one from the set of all possible genes/letters.

def mutate(parent):
   geneIndex = random.randint(0, len(geneset) -1);
   index = random.randint(0, len(parent) - 1)
   genes = list(parent)
   genes[index] = geneset[geneIndex]
   return(''.join(genes))

Next we need a way to display a gene sequence, its fitness value and how much time has elapsed.

def display(candidate, startTime):
    timeDiff = datetime.datetime.now() - startTime
    fitness = getFitness(candidate, target)
    print ("%s\t%i\t%s" % (candidate, fitness, str(timeDiff)))

The heart of the genetic solver is a loop that uses the functions above to generate a candidate gene sequence, compare it to the previous best, and randomly mutate it until all the genes match those in the target.

bestParent = generateParent(len(target))
bestFitness = getFitness(bestParent, target)
display(bestParent, startTime)

while bestFitness < len(bestParent):
   child = mutate(bestParent)
   childFitness = getFitness(child, target)

   if childFitness > bestFitness:
      bestFitness = childFitness
      bestParent = child
      display(bestParent, startTime)

Sample output:

Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:44:40) [MSC v.1600 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
<<< ================================ RESTART ================================
<<<
iEjRmPrBUUDosXupcEw.Ji.qTCtRYawlry	1	0:00:00
iEjRmPrBUUDosXupcEw.Ji.qTCtRYaolry	2	0:00:00.004001
iEjRmPr UUDosXupcEw.Ji.qTCtRYaolry	3	0:00:00.007001
iEjRmPr UUDosXupcEw.JieqTCtRYaolry	4	0:00:00.011001
iEjRmPr UUDosXupcEw.nieqTCtRYaolry	5	0:00:00.014002
iEjRmPl UUDosXupcEw.nieqTCtRYaolry	6	0:00:00.016002
iEjRmPl UhDosXupcEw.nieqTCtRYaolry	7	0:00:00.023003
iEjRmPl UhoosXupcEw.nieqTCtRYaolry	8	0:00:00.026003
iEjRmPl UhossXupcEw.nieqTCtRYaolry	9	0:00:00.029003
iEjRmPl UhossXupcEw.nierTCtRYaolry	10	0:00:00.032003
iEj mPl UhossXupcEw.nierTCtRYaolry	11	0:00:00.036004
iEj mPl UhossXupcEw.nierTCrRYaolry	12	0:00:00.038004
iEj aPl UhossXupcEw.nierTCrRYaolry	13	0:00:00.041004
iEj aPl UhossXupcEw.nderTCrRYaolry	14	0:00:00.044005
iEj aPl UhossXupcEw.nderTCreYaolry	15	0:00:00.045005
iEj aPl UhossXupoEw.nderTCreYaolry	16	0:00:00.047005
iEj aPl UhoseXupoEw.nderTCreYaolry	17	0:00:00.051005
iEj aPl UhoseXupoEw.nderTCreYlolry	18	0:00:00.054006
iEj aPl UhoseXupoEw.nderTCreYlolr.	19	0:00:00.057006
iEj aPl thoseXupoEw.nderTCreYlolr.	20	0:00:00.059006
iEj aPl thoseXupoEwanderTCreYlolr.	21	0:00:00.065007
iEj aPl thoseXupo wanderTCreYlolr.	22	0:00:00.068007
NEj aPl thoseXupo wanderTCreYlolr.	23	0:00:00.073008
NEj aPl thoseXupo wander CreYlolr.	24	0:00:00.075008
NEj aPl thoseXupo wander CreYlolt.	25	0:00:00.081008
NEj aPl thoseXwpo wander CreYlolt.	26	0:00:00.083009
Noj aPl thoseXwpo wander CreYlolt.	27	0:00:00.090009
Noj aPl thoseXwho wander CreYlolt.	28	0:00:00.093010
Noj aPl thoseXwho wander CreYlost.	29	0:00:00.097010
Not aPl thoseXwho wander CreYlost.	30	0:00:00.106011
Not all thoseXwho wander CreYlost.	31	0:00:00.117012
Not all thoseXwho wander areYlost.	32	0:00:00.120012
Not all thoseXwho wander are lost.	33	0:00:00.123013
Not all those who wander are lost.	34	0:00:00.151015

Refactor

 

Good, it works. Now we need to separate the solver code from that specific to the string duplication problem so we can use it to solve other problems. We’ll start by moving our main loop into a function called getBest. That function’s parameters will include the functions it should call to get the candidate’s fitness and to display a new best sequence.

def getBest(get_fitness, display, targetLen, geneSet):
    random.seed()
    bestParent = generateParent(targetLen, geneSet)
    bestFitness = get_fitness(bestParent)
    display(bestParent)

    while bestFitness < len(bestParent):
       child = mutate(bestParent, geneSet)
       childFitness = get_fitness(child)

       if childFitness > bestFitness:
          bestFitness = childFitness
          bestParent = child
          display(bestParent)
    return bestParent

We also need to pass the gene set to generateParent and mutate so they aren’t using global variables.

def mutate(parent, geneSet):
   geneIndex = random.randint(0, len(geneSet) -1);
   index = random.randint(0, len(parent) - 1)
   genes = list(parent)
   genes[index] = geneSet[geneIndex]
   return(''.join(genes))

def generateParent(length, geneSet):
    genes = list("")
    for i in range(0,length):
        geneIndex = random.randint(0, len(geneSet) -1);
        genes.append(geneSet[geneIndex])
    return(''.join(genes))

Now move the rest of the code to a new file named stringDuplicationTests.py. Notice that display and getFitness no longer have the same number of parameters as those above.

def getFitness(candidate, target):
   fitness = 0
   for i in range(0, len(candidate)):
       if target[i] == candidate[i]:
           fitness += 1
   return(fitness)

def display(candidate, target, startTime):
    timeDiff = datetime.datetime.now() - startTime
    fitness = getFitness(candidate, target)
    print ("%s\t%i\t%s" % (candidate, fitness, str(timeDiff)))

In our main test method we’ll pass lambdas that take the candidate gene sequence passed by the solver as a parameter and call the local methods with additional required parameters as necessary.

def test_string_duplication():
    geneset = " abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!."
    startTime = datetime.datetime.now()
    target = "Not all those who wander are lost."
    fnDisplay = lambda candidate: display(candidate, target, startTime)
    fnGetFitness = lambda candidate: getFitness(candidate, target)
    best = genetic.getBest(fnGetFitness, fnDisplay, len(target), geneset)

Next we’ll make it so that executing this file calls our test function.

if __name__ == '__main__':
    test_string_duplication()

That’s it. We now have separation of concerns and when we run the test code the program still works. You can get the updated code from this checkin on Github.

Next, we’ll modify the test file to make it compatible with Python’s built in test framework.

import unittest
import datetime
import genetic

def getFitness(candidate, target):
   fitness = 0
   for i in range(0, len(candidate)):
       if target[i] == candidate[i]:
           fitness += 1
   return(fitness)

def display(candidate, target, startTime):
    timeDiff = datetime.datetime.now() - startTime
    fitness = getFitness(candidate, target)
    print ("%s\t%i\t%s" % (candidate, fitness, str(timeDiff)))

class StringDuplicationTests(unittest.TestCase):
    def test(self):
        geneset = " abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!."
        startTime = datetime.datetime.now()
        target = "Not all those who wander are lost."
        fnDisplay = lambda candidate: display(candidate, target, startTime)
        fnGetFitness = lambda candidate: getFitness(candidate, target)
        best = genetic.getBest(fnGetFitness, fnDisplay, len(target), geneset)
        self.assertEqual(best, target)

if __name__ == '__main__':
    unittest.main()

This allows us to run the tests from the commandline and without writing the output, which, by the way, makes it run twice as fast.

python -m unittest -b geneticTests.py
.
----------------------------------------------------------------------
Ran 1 test in 0.108s

OK

The final refactoring will be to introduce an Individual object that has both Genes and Fitness attributes.

class Individual:
   Genes = None
   Fitness = None
   def __init__(self, genes, fitness):
        self.Genes = genes
        self.Fitness = fitness

This lets us pass those values around as a unit and reduces some double work in the display function.

import random

def mutate(parent, geneSet, get_fitness):
   geneIndex = random.randint(0, len(geneSet) -1);
   index = random.randint(0, len(parent.Genes) - 1)
   genes = list(parent.Genes)
   genes[index] = geneSet[geneIndex]
   childGenes = (''.join(genes))
   fitness = get_fitness(childGenes)
   return Individual(childGenes, fitness)

def generateParent(length, geneSet, get_fitness):
    genes = list("")
    for i in range(0,length):
        geneIndex = random.randint(0, len(geneSet) -1);
        genes.append(geneSet[geneIndex])
    childGenes = (''.join(genes))
    fitness = get_fitness(childGenes)
    return Individual(childGenes,fitness)

def getBest(get_fitness, display, targetLen, geneSet):
    random.seed()
    bestParent = generateParent(targetLen, geneSet, get_fitness)
    display(bestParent)

    while bestParent.Fitness < targetLen:
       child = mutate(bestParent, geneSet, get_fitness)

       if child.Fitness > bestParent.Fitness:
          bestParent = child
          display(bestParent)
    return bestParent

We also have to make compensating changes to the test file methods.

def display(candidate, target, startTime):
    timeDiff = datetime.datetime.now() - startTime
    print ("%s\t%i\t%s" % (candidate.Genes, candidate.Fitness, str(timeDiff)))

class StringDuplicationTests(unittest.TestCase):
    def test(self):
        geneset = " abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!."
        startTime = datetime.datetime.now()
        target = "Not all those who wander are lost."
        fnDisplay = lambda candidate: display(candidate, target, startTime)
        fnGetFitness = lambda candidate: getFitness(candidate, target)
        best = genetic.getBest(fnGetFitness, fnDisplay, len(target), geneset)
        self.assertEqual(best.Genes, target)

Go on to Part 2 in which we evolve the solver to handle the 8 Queens Puzzle.

Learn more in my book Genetic Algorithms with Python

Read Full Post »

In Part 2 of this series we converted our code to a library, made our initial puzzle into an integration test, and extracted test related parameters and methods from the library.

Now we’re ready to try a new puzzle. This time we’ll expand our solver to handle a slightly more difficult problem – the 8 Queens puzzle.

In the 8 Queens puzzle we wan’t to place 8 chess Queens on a Chess board such that none of them are under attack.

7 Chess Queens in safe positions on a Chess board

According to WikiPedia there are only 92 solutions to this puzzle and once we remove mirrorings and rotations there are only 12 unique solutions.

We start off by figuring out how we’re going to map genes to the problem. One solution that I’ve used before is to assign each square on the 8×8 Chess board a symbol from the 64 symbol set ([a-z][A-Z][0-9]@#) as follows:

board positions labelled first rows then columns

We need to be able to convert a symbol (gene) to a board position. To do that we’ll find its index in the set of genes then convert that index to a row and column, or Point.

    fn to_point(gene: char, gene_set: &str) -> Point {
        let location = gene_set.find(gene);
        assert_eq!(location.is_some(), true);
        let index = location.unwrap() as i32;
        let row = index / 8i32;
        let column = index % 8i32;
        return Point{row: row, col: column};
    }

    struct Point {
        row: i32,
        col: i32
    }

We also need a way to visualize a set of Points (Queens) on a Chess board.

    fn get_board(candidate: &String, gene_set: &str) -> [[char; 8]; 8] {
        let mut board:[[char; 8]; 8] = [['.'; 8]; 8];
        
        for point in candidate.chars().map(|c| to_point(c, gene_set)) {
            board[point.row as usize][point.col as usize] = 'Q';
        }
        board
    }

    fn display_8_queens(candidate: &String, gene_set: &str, start: PreciseTime) {
        let now = PreciseTime::now();
        let elapsed = start.to(now);
        let board:[[char; 8]; 8] = get_board(candidate, gene_set);
        for i in 0..8 {
            let mut row = "".to_string();
            for j in 0..8 {
                row.push(board[i][j]);
                row.push(' ');
            }
            println!("{}", row);
        }
        
        println!("{}\t{}\t{}", candidate, get_8_queens_fitness(&candidate, gene_set), elapsed);
    }

The output of display_8_queens looks like this:

. . Q . . . . .
. . . . . Q . .
. . . . . . . Q
Q . . . . . . .
. . . Q . . . .
. . . . . . Q .
. . . . Q . . .
. Q . . . . . .
0ncJ5yUx        4096    PT0.192306457S

Next we need a way to see 1) if we have 8 Queens on the board, we might not if the same gene appears twice in the generated sequence, and 2) if each Queen is on its own row, column and diagonals.

First the row and column checks. We’ll keep a count of how many rows and columns have exactly one Queen.

    fn get_8_queens_fitness(candidate: &String, gene_set: &str) -> usize {
        let board = get_board(candidate, gene_set);
        
        // count rows with 1 queen
        let indexes: Vec<i32> = (0..8).collect();
        let mut correct_queens_in_row = 0; 
        for i in 0..8 {
            let row_count = indexes.iter()
                .cloned()
                .map(|col| board[i][col as usize])
                .filter(|ch|'Q' == *ch)
                .count();
            if row_count == 1 {
                correct_queens_in_row = correct_queens_in_row + 1;
            }
        }
        let mut correct_queens_in_column = 0; 
        for i in 0..8 {
            let column_count = indexes.iter()
                .cloned()
                .map(|row| board[row as usize][i])
                .filter(|ch|'Q' == *ch)
                .count();
            if column_count == 1 {
                correct_queens_in_column = correct_queens_in_column + 1;
            }
        }

To count the number of diagonals that have exactly one Queen we’ll introduce a generator that creates Points starting from an initial position and then moving by a given row and column offset. First the generator and some tests for it.

    struct Diagonal {
        row: i32,
        col: i32,
        row_offset: i32,
        col_offset: i32
    }

    impl Iterator for Diagonal {
        type Item = Point;
        fn next(&mut self) -> Option<Point> {
            let prev_row = self.row;
            let prev_col = self.col;
            self.row = prev_row + self.row_offset;
            self.col = prev_col + self.col_offset;
            
            // this is an infinite value generator
            Some(Point{row:prev_row,col:prev_col})
        }
    }

    #[test]
    fn test_diagonal_iterator_first_value_returned_should_be_the_start_state() {
        let mut diag = Diagonal {row:5,col:6,row_offset:1,col_offset:1};
        let first = diag.next();
        assert_eq!(true,first.is_some());
        let point = first.unwrap() as Point;
        assert_eq!(point.row,5);
        assert_eq!(point.col,6);
    }
    
    #[test]
    fn test_diagonal_iterator_second_value_returned_should_be_with_offsets_added_to_start_state() {
        let diag = Diagonal {row:5,col:4,row_offset:1,col_offset:-1};
        let first = diag.skip(1).next();
        assert_eq!(true,first.is_some());
        let point = first.unwrap() as Point;
        assert_eq!(point.row,6);
        assert_eq!(point.col,3);
    }

Note: As of this 5/29 the above code does not work with the 1.0.0 release of rustc due to a bug in the compiler… however the nightly build no longer has the bug. Try it on Playpen

And now the diagonal checks…

    let mut correct_queens_in_northeast_diagonal = 0; 
    for i in 0..15 {
        let diag = Diagonal {row:i,col:0,row_offset:-1,col_offset:1};
        let diagonal_count = diag
        .take_while(|point|point.row >= 0 && point.col >= 0)
        .skip_while(|point|point.row >=8 || point.col >=8)
        .take_while(|point|point.col < 8)
        .map(|point| board[point.row as usize][point.col as usize])
        .filter(|ch| 'Q' == *ch)
        .count();
        if diagonal_count == 1 {
        correct_queens_in_northeast_diagonal = correct_queens_in_northeast_diagonal + 1;
        }
    }    

    let mut correct_queens_in_southeast_diagonal = 0; 
    for i in -8..8 {
        let diag = Diagonal {row:i,col:0,row_offset:1,col_offset:1};
        let diagonal_count = diag
        .skip_while(|point|point.row < 0 || point.col < 0)
        .take_while(|point|point.col < 8 && point.row < 8)
        .map(|point| board[point.row as usize][point.col as usize])
        .filter(|ch| 'Q' == *ch)
        .count();
        if diagonal_count == 1 {
        correct_queens_in_southeast_diagonal = correct_queens_in_southeast_diagonal + 1;
        }
    }    

At the end of the method we’ll multiply all the row count values to get the fitness value. Given this the best possible solution would be 8*8*8*8.

    (if correct_queens_in_row == 0 { 1 } else { correct_queens_in_row })
    * (if correct_queens_in_column == 0 { 1 } else { correct_queens_in_column })
    * (if correct_queens_in_northeast_diagonal == 0 { 1 } else { correct_queens_in_northeast_diagonal })
    * (if correct_queens_in_southeast_diagonal == 0 { 1 } else { correct_queens_in_southeast_diagonal })
}

Language aside, rust doesn’t have a trinary operator but it does let you use an if statement inline.

Finally we need a main method to call the solver and verify the results. Once again we’ll create it as an integration test.

    #[test]
    fn test_8_queens() {
        let start = PreciseTime::now();
        let gene_set = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789@#";

        let wrapped_display = |candidate: &String| {display_8_queens(&candidate, gene_set, start);};
        let wrapped_get_fitness = |candidate: &String| -> usize {get_8_queens_fitness(&candidate, gene_set)};
    
        let best = genetic::get_best(wrapped_get_fitness, wrapped_display, 8, 8*8*8*8, gene_set);
        println!("Total time: {}", start.to(PreciseTime::now()));
        let best_fitness = get_8_queens_fitness(&best, gene_set);
        assert_eq!(best_fitness,8*8*8*8);
    }

When we run that test we may get a successful result like the following.

cargo test test_8_queens -- --nocapture
     Running target\debug\genetic-debe16ff205aab6a.exe

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured

     Running target\debug\lib_8_queens_tests-afc3deed67cad86e.exe

running 1 test
. . Q . Q . . .
Q Q . . . . . .
Q . . . Q . . .
. . . . . . . .
. . . . . . . .
. . . . . . Q .
. . . . . . . .
. . Q . . . . .
ujcieqU6        120     PT0.003442522S
. . Q . Q . . .
Q . . . . . . .
Q . . . Q . . .
. . . . . . . .
. . . . . . . .
. . . . . . Q .
. Q . . . . . .
. . Q . . . . .
uXcieqU6        192     PT0.004552093S
. . Q . Q . . .
Q . . . . . . .
Q . . . Q . . .
. . . . . Q . .
. . . . . . . .
. . . . . . Q .
. Q . . . . . .
. . . . . . . .
uXcieqUD        480     PT0.005474583S
. . Q . Q . . .
Q . . . . . . .
. . . . Q . . .
Q . . . . Q . .
. . . . . . . .
. . . . . . Q .
. Q . . . . . .
. . . . . . . .
uXcieyUD        640     PT0.006346324S
. . Q . Q . . .
Q . . . . . . .
. . . . Q . . .
Q . . . . . . .
. . . . . . . .
. . . . . . Q .
. Q . . . . . .
Q . . . . . . .
uXcieyU4        648     PT0.007207255S
. . Q . Q . . .
Q . . . . . . .
. . . . . . . .
Q . . . . . . .
. . . . . . . .
. . . . . . Q .
. Q . . . . . .
Q . . . . . . .
eXcieyU4        700     PT0.009275650S
. . Q . Q . . .
Q . . . . . . .
. . . . . . . .
Q . . . . . . .
. . . . . . . .
. . . . . Q Q .
. Q . . . . . .
. . . . . . . .
eXcieyUT        735     PT0.010128473S
. . Q . Q . . .
Q . . . . . . .
. . . . . . . .
Q . . . . . . .
. . . . . . . .
. Q . . . Q Q .
. Q . . . . . .
. . . . . . . .
eXciPyUT        768     PT0.010938355S
. . Q . Q . . .
Q . . . . . . .
. . . . . . . .
Q . . . . . . .
. . . . . . . .
. . . . . Q Q .
. Q . . . . . .
. . . . . . . Q
eXci#yUT        1152    PT0.011762050S
. . Q . Q . . .
Q . . . . . . .
. . . . . . . .
Q . . . . . . .
. . . . . . . .
. . . . . . Q .
. Q . . . . . .
. . . . . . . Q
eXci#yU#        1225    PT0.013189927S
. . Q . Q . . .
Q . . . . . . .
. . . . . . . Q
Q . . . . . . .
. . . . . . . .
. . . . . . Q .
. Q . . . . . .
. . . . . . . Q
eXci#yUx        1536    PT0.014965841S
. . Q . Q . . .
. . . . . . . .
. . . . . . . Q
Q . . . . . . .
. . . Q . . . .
. . . . . . Q .
. Q . . . . . .
. . . . . . . Q
eXcJ#yUx        1728    PT0.021601343S
. . Q . Q . . .
. . . . . . . Q
. . . . . . . Q
Q . . . . . . .
. . . Q . . . .
. . . . . . Q .
. . . . . . . .
. . . . . . . Q
epcJ#yUx        1920    PT0.023504280S
. . Q . . . . .
. . . . . . . Q
. . . . . . . Q
Q . . . . . . .
. . . Q . . . .
. . . . . . Q .
. . . . Q . . .
. . . . . . . Q
0pcJ#yUx        2560    PT0.030906422S
. . Q . . . . .
. . . . . Q . .
. . . . . . . Q
Q . . . . . . .
. . . Q . . . .
. . . . . . Q .
. . . . Q . . .
. . . . . . . Q
0ncJ#yUx        3072    PT0.166375236S
. . Q . . . . .
. . . . . Q . .
. . . . . . . Q
Q . . . . . . .
. . . Q . . . .
. . . . . . Q .
. . . . Q . . .
. Q . . . . . .
0ncJ5yUx        4096    PT0.192306457S
Total time: PT0.193881176S
test tests::test_8_queens ... ok

But the odds are against it… That’s because the Mutation genetic strategy can’t always solve this problem. For our solver to be able to find a solution every time we’re going to have to introduce a new strategy. That is the subject of Part 4.

The source code to this point is available on Github if you want to experiment.

Read Full Post »

In this series I’ll be building out a simple genetic solver in Rust, a new programming language for me. Building a genetic solver is my preferred programming problem for learning new programming languages.  I’ve previously solved this problem in C# and used it to learn to code in golang. If you are new to genetic algorithms start with the C# post.

This was my first program in Rust and from scratch it took about four hours to work through issues.

Things I learned:

  • Rust’s compiler is particular about variable naming, preferring underscores to camel case.
  • There’s a lot of syntax, particularly around loaning items to called functions, but it isn’t that hard to follow if you’ve had experience with C/C++ and templated types.
  • The line that returns a value from a function cannot end with a semicolon.
  • The compiler does a great job of explaining what it is expecting.  Just take it slow and build up the program one function at a time.
  • The Rust install doesn’t necessarily contain everything you might need. On windows, I had to install MingW to get gcc so it can compile the code in the time crate.

OK, let’s go. We’ll start off with a standard set of genes and a target string:

    let gene_set = " abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!.";
    let target = "Not all those who wander are lost.";

Next we need a way to generate a random gene sequence (parent) from the gene set.

fn generate_parent(gene_set: &str, length: usize) -> String {
    let mut rng = thread_rng();
    let sample = sample(&mut rng, gene_set.chars(), length);
    sample.into_iter().collect()
}

Note, sample is a function provided by the rand crate.  It returns N items randomly selected from the input.

Next we need to calculate a fitness value based on the number of differences between a candidate string and the target string.

fn get_fitness(candidate: &String, target: &str) -> usize {
    let different_count = target.chars()
      .zip(candidate.chars())
      .filter(|&(a, b)| a != b)
      .count();
 
    target.len() - different_count
}

It is nice that map/reduce type functions are available out of the box, like in C#.

We also need a way to produce a new (child) string by mutating an existing (parent) string.  The point is to create a copy of the parent then replace 1 character (gene) in the copy with a randomly selected one from the set of valid genes.

fn mutate_parent(parent: &String, gene_set: &str) -> String {
    let mut rng = thread_rng();
    let gene_index = rng.gen::<usize>() % gene_set.len();
    let parent_index = rng.gen::<usize>() % parent.len();
    let mut candidate = String::with_capacity(parent.len());
 
    if parent_index > 0 {
        candidate.push_str(&parent[..parent_index]);
    }
    candidate.push_str(&gene_set[gene_index..(1+gene_index)]);
    if parent_index+1 < parent.len() {
        candidate.push_str(&parent[parent_index+1..]);
    }
    candidate
}

There’s probably a simpler way to do that in Rust along the lines of:

  1. copy the parent
  2. select a random location in the parent to mutate
  3. select a random gene to insert
  4. copy[random_location] = new_gene

Now we need a way to display a gene sequence, its fitness value and how much time has elapsed.

fn display(candidate: &String, target: &str, start: time::PreciseTime) {
    let now = PreciseTime::now();
    let elapsed = start.to(now);
    println!("{}\t{}\t{}", candidate, get_fitness(&candidate, target), elapsed);
}

The heart of the genetic solver is a loop that uses the functions above to generate a candidate gene sequence, compare it to the previous best, and randomly mutate it until all the genes match those in the target.

fn get_best(get_fitness: fn(&String,&str) -> usize, 
  display: fn(&String, &str, start: time::PreciseTime), 
  target: &str, 
  gene_set: &str, 
  length: usize, 
  start: time::PreciseTime) -> String {

    let mut best_parent = generate_parent(gene_set, length);
    let mut best_fitness = get_fitness(&best_parent, target);
 
    while best_fitness < length {
        let child = mutate_parent(&best_parent, gene_set);
        let fitness = get_fitness(&child, target);
        if fitness > best_fitness {
            best_fitness = fitness;
            best_parent = child;
            display(&best_parent, target, start);
        }
    }

    best_parent 
}

Yes, I know we could just call the functions instead of passing them in but this demonstrates that capability in the language and it is a feature we will need in order to use different fitness functions and display methods in a more complex solver.

The main function brings all the parts together:

extern crate time;
extern crate rand;

use rand::{thread_rng, sample, Rng};
use time::PreciseTime;

fn main() {
    let start = PreciseTime::now();
    let gene_set = " abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!.";
    let target = "Not all those who wander are lost.";
    let best = get_best(get_fitness, display, target, gene_set, target.len(), start);
    println!("{}", best);
    println!("Total time: {}", start.to(PreciseTime::now()));
}

...

Get the source here.

Because the time crate is not pre-compiled you may (on Windows) need to install MingW as appropriate for your OS (32 or 64bit).

Then run the program from the commandline with:

cargo run

The first time it runs it will download and compile the dependencies it needs.  Nice feature.

Sample output:

>cargo run
 Updating registry `https://github.com/rust-lang/crates.io-index`
 Downloading rand v0.3.8
 Downloading time v0.1.25
 Downloading libc v0.1.7
 Downloading gcc v0.3.5
 Compiling gcc v0.3.5
 Compiling rand v0.3.8
 Compiling time v0.1.25
 Compiling genetic v0.0.1 (.../projects/rust/genetic)
 Running `target\debug\genetic.exe`
 aH.RefKhPjklmOopq!stuMwxSrAVCDEWG 1 PT0.004136558S
 aH.aefKhPjklmOopq!stuMwxSrAVCDEWG 2 PT0.006234092S
 aH.aefKhPjklmOopq!stdMwxSrAVCDEWG 3 PT0.006508939S
 aH.aefKhPoklmOopq!stdMwxSrAVCDEWG 4 PT0.006637123S
 aH.aefKhPoklmOopq!stdMwxSrAVCDEtG 5 PT0.007313462S
 aH.aefKhPoklmOopq!stdMwxSreVCDEtG 6 PT0.008119910S
 aH.aefKhPokemOopq!stdMwxSreVCDEtG 7 PT0.011582439S
 aH.aefKhPokemOopq!stdewxSreVCDEtG 8 PT0.012396586S
 aH.alfKhPokemOopq!stdewxSreVCDEtG 9 PT0.014519526S
 oH.alfKhPokemOopq!stdewxSreVCDEtG 10 PT0.015515364S
 oH.alfKhPokemwopq!stdewxSreVCDEtG 11 PT0.020154268S
 oH.alfKtPokemwopq!stdewxSreVCDEtG 12 PT0.020608496S
 oH.alfKthokemwopq!stdewxSreVCDEtG 13 PT0.021233638S
 oH.alfKthokemwooq!stdewxSreVCDEtG 14 PT0.021521572S
 oH.alfKthokemwooq!sndewxSreVCDEtG 15 PT0.023080962S
NoH.alfKthokemwooq!sndewxSreVCDEtG 16 PT0.023575224S
NoH.alfKthokemwooq!sndewxSre CDEtG 17 PT0.024708100S
NoH.alfKthokemwoo !sndewxSre CDEtG 18 PT0.025637729S
NoH alfKthokemwoo !sndewxSre CDEtG 19 PT0.028930115S
NoH alfKthokemwoo !sndew Sre CDEtG 20 PT0.029574888S
NoH alfKthokemwoo !sndew are CDEtG 21 PT0.030039510S
NoH alf thokemwoo !sndew are CDEtG 22 PT0.034410881S
Not alf thokemwoo !sndew are CDEtG 23 PT0.039302690S
Not alf thosemwoo !sndew are CDEtG 24 PT0.040969862S
Not alf thosemwoo !sndew are CDstG 25 PT0.041442952S
Not alf those woo !sndew are CDstG 26 PT0.043314527S
Not all those woo !sndew are CDstG 27 PT0.045133751S
Not all those woo !sndew are CDst. 28 PT0.047369478S
Not all those woo !sndew are lDst. 29 PT0.050106782S
Not all those woo wsndew are lDst. 30 PT0.057051472S
Not all those woo wsndew are lost. 31 PT0.062051064S
Not all those woo wsnder are lost. 32 PT0.066566787S
Not all those who wsnder are lost. 33 PT0.069746386S
Not all those who wander are lost. 34 PT0.081900460S
Not all those who wander are lost.
Total time: PT0.082663410S

Wow! That’s a lot faster than the go version!

Go on to Part 2.

Read Full Post »

In this project we’ll be solving a variant of John Koza‘s Lawnmower Problem. The previous projects in this series successively drove out the functionality of a genetic solver capable of handling this kind of problem. This project raises our understanding of genetic algorithms and their application to problem solving to a whole new level.

The Lawnmower Problem asks us to provide instructions to a lawnmower to make it mow a field of grass. The field wraps in all directions – if the mower goes off the top it ends up at the bottom and vice versa and the same side-to-side. Let’s say the mower begins in the middle of an 8×8 field facing south.

 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  M  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .

Next let’s say the available instructions are: mow and turn. mow moves the mower forward one grid square in whatever direction it is facing and cuts the grass in that square. turn causes the mower to turn left 90 degrees.

Simple enough right? Using our previous experience in this series we know we can define a gene for each instruction and use hill climbing to find an optimal solution. I leave this as an exercise to the reader – check out the previous post if you need a refresher.

You should end up with a solution that looks something like the this:

mow mow mow mow mow mow mow mow turn mow mow mow mow mow mow mow turn mow mow mow mow mow mow mow turn mow mow mow mow mow mow turn mow mow mow mow mow mow turn mow mow mow mow mow turn mow mow mow mow mow turn mow mow mow mow turn mow mow mow mow turn mow mow mow turn mow mow mow turn mow mow turn mow mow turn mow turn mow

We can visualize it by numbering each grid square in the order it is mowed:

64  57  42  19  4   31  50  61

63  56  41  18  5   32  51  62

54  55  40  17  6   33  52  53

37  38  39  16  7   34  35  36

12  13  14  15  8*  9   10  11

25  24  23  22  1   28  27  26

46  45  44  21  2   29  48  47

59  58  43  20  3   30  49  60

* - starting location

Or by showing the route traveled graphically:

Mow-turn instructions visualized graphically

Cool, a spiral.

Now let’s expand the available instruction set because all we ever get is spiral shaped mowing patterns. The new instruction is: jump. jump has 2 non-negative arguments, forward and right. The mower will jump forward and to the right the specified number of squares and cut the grass where it lands.

For example, if the mower is facing south at the start location (4,4)

 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  M  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .

and is told to jump (2,3) it will end up at (1,6), still facing south.

 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  #  .  .  .
 .  .  .  .  .  .  .  .
 .  M  .  .  .  .  .  .
 .  .  .  .  .  .  .  .

To implement this simply add jump to the set of genes that require special handling and treat the two genes that follow it as the arguments. If less than 2 genes follow it because it is at or near the end of the gene sequence then fill the missing arguments with zeros. Again, I leave the implementation as an exercise. Note: also make your implementation of getFitness() prefer shorter sequences if and only if the gene sequence mows the entire field.

You may end up with a result similar to the following:

jump (2,5) mow mow mow mow mow mow mow jump (4,3) mow mow mow mow mow mow mow jump (0,3) mow mow mow mow mow mow mow jump (0,3) mow mow mow mow mow mow mow jump (3,3) mow mow mow mow mow mow mow jump (5,3) mow mow mow mow mow jump (5,3) mow mow mow mow mow mow mow jump (1,3) mow mow mow mow mow mow mow jump (5,2) mow

Note that the genetic algorithm has completely abandoned the turn instruction in favor of the more powerful jump instruction because this results in a shorter program.

Here’s a visualization of the mowing route:

44  17  56  40  16  48  26  3

45  18  57  33  9   49  27  4

46  19  58  34  10  50  28  5

63  20  59  35  11  51  29  6

64  21  60  36  12* 52  30  7

41  22  61  37  13  53  31  8

42  23  62  38  14  54  32  1

43  24  55  39  15  47  25  2

* - starting location

Now back the Koza’s purpose for this problem. As interesting as this solution is, the sequence of instructions generated by the solver are completely different from the solution a human would use. Now think about how you’d tell another person to mow a toroidal field. You wouldn’t give them detailed instructions for every step right? You’d break it down into a set of repeatable sequences. In a non-toroid field you might say something like: start at the corner of the field and mow a strip along the edge of the field all the way to the other side.

 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 #  #  #  #  #  #  #  M>

Then turn the mower back this way and mow the strip right next to the one you just completed until you get back to where you started.

 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
<M  #  #  #  #  #  #  #
 #  #  #  #  #  #  #  #

Turn around again and repeat the process until you’ve mowed the whole field.

You automatically combine squares into strips and trips across-and-back into a repeatable pattern. How do we do that with the mower?

The best result we’ve generated so far requires 64 jump and mow instructions, one for each grid square, to tell the mower how to cut the grass in the field. How can we make it look more like the instructions you’d give a human? We have to introduce the ability to repeat a sequence of instructions by reference and make this an instruction too.

This is where things get interesting. We’re going from using the genes of the genetic sequence more-or-less one-to-one to solve a problem, to genetic programming.

Implementation-wise this means we need two more special genes: begin-numbered-reference and call-numbered-reference. begin-numbered-reference will increment an id and start a new instruction sequence, or block, if and only if the current block is non-empty. call-numbered-sequence, or more simply call, will take a parameter for the id of the sequence to execute. Once that sequence has completed, execution will return to the sequence that made the call – exactly like calling a subroutine.

Here’s a Go implementation of a gene decoder that builds a program for the mower as described above.

func parseProgram(genes string, f *field, m *mower) *program {
	p := NewProgram()

	instructionCodes := make(chan int)
	builders := make(chan builder)
	go streamInstructionCodes(genes, instructionCodes)
	go func() {
		offset := 0
		for instructionCode := range instructionCodes {
			switch instructionCode % numberOfInstructions {
			case 0:
				builders <- createMowBuilder()
			case 1:
				builders <- createTurnBuilder()
			case 2:
				builders <- createJumpBuilder(instructionCodes)
			case 3:
				builders <- createBlockBuilder()
			case 4:
				builders <- createCallBuilder(instructionCodes)
			default:
				panic(fmt.Sprint("No builder defined for instructionCode '", instructionCode, "' from gene '", genes[offset:offset+1], "'"))
			}
			offset++
		}
		builders <- builder{stop: true}
	}()

	currentBlockName := "main"
	blockId := -1
	instructions := make([]Instruction, 0, len(genes))

	for builder := range builders {
		if builder.stop {
			break
		}
		if builder.startNewBlock {
			if len(instructions) > 0 {
				p.addBlock(currentBlockName, instructions)
				blockId++
				currentBlockName = createBlockName(blockId)
				instructions = make([]Instruction, 0, len(genes))
			}
			continue
		}
		instructions = append(instructions, builder.create(f, m))
	}

	if len(instructions) > 0 {
		p.addBlock(currentBlockName, instructions)
	}

	return p
}

Note: Koza’s implementation prevents recursion. This implementation allows recursion but it isn’t difficult to modify it to work Koza’s way. I just find the recursive results more interesting.

Now when we move to from one-to-one gene evaluation to running a program, determining fitness becomes a problem in its own right. On the face it is the same: perform the instructions, determine how many field positions were mowed, switch evaluation strategies to program length if all field squares have been mowed. The problem is we need to handle the flow control involved in fetching subroutines, running them, and returning to the previous location upon completion. We also need the ability to track the number of instructions executed and optionally exit if we run beyond a pre-determined maximum – this prevents infinite loops from blocking evolution in the solver.

I implemented this functionality as a domain independent interpreter (GitHub) while coding my implementation of the Lawnmower Problem.

With the gene-to-program converter and interpreter in place we are free to continue our exploration of the Lawnmower Problem. Here’s a sample optimal solution found by my implementation:

Final: 10119     16.6669533s
main: block1
block0: mow mow mow jump (0,1)
block1: block0 block0 mow block1


52  15  44  5   35  60  26  25

17  16  45  6   36  61  27  53

18  46  8   7   37  62  28  54

19  47  9   39  38  63  29  55

20  48  10  40  64* 31  30  56

21  49  11  41  1   32  57  22

50  13  12  42  2   33  58  23

51  14  43  4   3   34  59  24

* - starting location

Note that the program is recursive in that block1 calls itself.

If we change the logic to prevent recursion we get a slightly longer optimal solution like this:

Final: 10115     15.4188819s
main: block0 block0 block0 block0
block0: jump (6,6) block1 block1 block1 block1
block1: jump (0,1) mow mow mow


34  37  24  10  9   60  48  50

39  38  25  11  62  61  49  51

40  27  26  12  63  2   1   52

41  28  14  13  64  3   54  53

42  29  15  19  18* 4   55  43

31  30  16  20  6   5   56  44

32  35  17  21  7   58  57  45

33  36  23  22  8   59  47  46

* - starting location

Welcome to the world of genetic programming!

You can get the code for my Lawnmower Problem solver from my GitHub repository.

Read Full Post »

The goal of knapsack problems is to put as much stuff into a container as it will hold, optimizing for constraints such as item weight and size and value. The standard Knapsack Problem adds the limitation that there is only one of each particular item available. In the unbounded variant of the Knapsack Problem there is no limit. This project solves the Unbounded Knapsack Problem (UKP) by improving upon the genetic solver used in the previous post. The code is written in Go.

We’ll start with an easy one so we can understand how it works. We’ll do that by using the item weights, sizes and values from the Rosetta Code page for the Unbounded Knapsack Problem, but with different names, as follows:

Name Value (each) Weight Volume (each)
Bark 3000 0.3 .025
Herb 1800 0.2 .015
Root 2500 2.0 .002

The knapsack contents cannot weigh more than 25 units and its maximum volume is 0.25 units. Our goal is to maximize the value of the contents of the knapsack without exceeding either of the weight or volume limits.

Let’s think about how we would solve this problem by hand. We want to maximize the value within the constraints. So we want a high ratio of value to weight and value to volume. And we want as many of those in the bag as we can get. When we can’t stuff any more of the top item into the bag, we fill in the remaining space with the next most valuable item, and so on. This process is known as hill climbing.

First we need a struct to hold the resource information.

type resource struct {
	name   string
	value  int
	weight float64
	volume float64
}

This project uses genes in a new way – as indexes and counts. Each chromosome will have two parts. The first will represent an index into the list of available resources. The second will be the quantity of that resource to take. The candidate gene sequence will be decoded as follows:

func decodeGenes(candidate string, resources []resource, geneSet string) map[resource]int {
	resourceCounts := make(map[resource]int, len(candidate)/2)
	for i := 0; i < len(candidate); i += 2 {
		chromosome := candidate[i : i+2]
		resourceId := scale(strings.Index(geneSet, chromosome[0:1]), len(geneSet), len(resources))
		resourceCount := strings.Index(geneSet, chromosome[1:2])
		resource := resources[resourceId]
		resourceCounts[resource] = resourceCounts[resource] + resourceCount
	}
	return resourceCounts
}

func scale(value, currentMax, newMax int) int {
	return value * newMax / currentMax
}

The fitness function will use the decoded resource counts and the resource constraints to determine an appropriate fitness for the candidate.

func getFitness(resourceCounts map[resource]int, maxWeight float64, maxVolume float64) int {
	weight := 0.0
	volume := 0.0
	value := 0

	for resource, count := range resourceCounts {
		weight += resource.weight * float64(count)
		volume += resource.volume * float64(count)
		value += resource.value * count
	}

	if weight > maxWeight || volume > maxVolume {
		return -value
	}
	return int(value)
}

Next is a display function that tells us which resources and how many of each to take and their value (fitness).

func display(resourceCounts map[resource]int, fitness int, elapsed time.Duration) {
	label := ""
	for resource, count := range resourceCounts {
		label += fmt.Sprint(count, " ", resource.name, " ")
	}
	fmt.Println(
		fitness,
		"\t",
		label,
		"\t",
		elapsed)
}

The only thing left is to define the resources and invoke the solver:

func main() {
	resources := []resource{
		{name: "Bark", value: 3000, weight: 0.3, volume: .025},
		{name: "Herb", value: 1800, weight: 0.2, volume: .015},
		{name: "Root", value: 2500, weight: 2.0, volume: .002},
	}

	const maxWeight = 25.0
	const maxVolume = .25

	geneSet := "0123456789ABCDEFGH"

	calc := func(candidate string) int {
		decoded := decodeGenes(candidate, resources, geneSet)
		return getFitness(decoded, maxWeight, maxVolume)
	}

	start := time.Now()

	disp := func(candidate string) {
		decoded := decodeGenes(candidate, resources, geneSet)
		fitness := getFitness(decoded, maxWeight, maxVolume)
		display(decoded, fitness, time.Since(start))
	}

	var solver = new(genetic.Solver)
	solver.MaxSecondsToRunWithoutImprovement = .1
	solver.MaxRoundsWithoutImprovement = 2

	var best = solver.GetBestUsingHillClimbing(calc, disp, geneSet, 10, 2, math.MaxInt32)

	fmt.Println("\nFinal:")
	disp(best)
}

Sample output:

$ go run samples/ukp/rosetta/rosetta.go
30000    12 Root                 0
33600    12 Root 2 Herb          2.0001ms
36000    2 Bark 12 Root          2.0001ms
37200    12 Root 4 Herb          3.0002ms
39600    2 Herb 2 Bark 12 Root   3.0002ms
44100    2 Herb 9 Root 6 Bark    5.0003ms
49500    9 Root 5 Herb 6 Bark    5.0003ms
50100    2 Herb 9 Root 8 Bark    5.0003ms
51500    8 Bark 11 Root          6.0003ms
52600    8 Bark 10 Root 2 Herb   6.0003ms
54500    11 Root 5 Herb 6 Bark   6.0003ms

Final: 
54500    11 Root 6 Bark 5 Herb   248.0142ms

Excellent. A success with a sub-second run time. Also, that result matches one of the four optimal combinations for this problem from the RosettaCode page.

Now, just as there are standard problem sets for the Travelling Salesperson Problem, there are also standard problem sets available for the Unbounded Knapsack Problem. One such set, named exnsd16, comes from the PYAsUKP site.

Let’s use the structure of the code above as a guide to implementing a solver for standard problem sets.

First, the resource data must be imported from the file. The file has the following format:

...
c: 1234  <-- constraint
...
begin data <-- resource data
55 32 <-- weight and value
91 3212
...
832 88
end data
...
sol: <-- begin optimal solution
	83	21	58	65 <-- item index, count, weight, value
	71	51	13	20
...

This means the parser will have several transitions because we need the constraint and resource data as a minimum. It would also be nice to have the optimal solution info for result verification.

The resource type doesn’t need a volume field:

type resource struct {
	name   string
	value  int
	weight int
}

Here’s the parser:

func loadResources(routeFileName string) ([]resource, int, map[resource]int) {
	parts := make(chan part)

	lineHandler := ukpResourceFileHeader
	go func() {
		for line := range File.EachLine(routeFileName) {
			lineHandler = lineHandler(line, parts)
		}
		close(parts)
	}()

	resources := make([]resource, 0, 10)
	solution := make(map[resource]int)

	maxWeight := -1
	for part := range parts {
		switch {
		case part.partType == constraintPart:
			maxWeight = parseConstraint(part.line)
		case part.partType == resourcePart:
			resources = append(resources, parseResource(part.line, len(resources)))
		case part.partType == solutionPart:
			resourceId, count := parseSolutionResource(part.line)
			solution[resources[resourceId-1]] = count
		}
	}

	return resources, maxWeight, solution
}

func parseConstraint(line string) int {
	parts := strings.Fields(line)
	constraint, err := strconv.Atoi(parts[1])
	if err != nil {
		panic("failed to parse constraint from '" + line + "'")
	}
	return constraint
}

func parseResource(line string, totalResources int) resource {
	parts := strings.Fields(line)
	weight, err := strconv.Atoi(parts[0])
	if err != nil {
		panic("failed to parse weight from '" + line + "'")
	}
	value, err := strconv.Atoi(parts[1])
	if err != nil {
		panic("failed to parse value from '" + line + "'")
	}

	return resource{
		name:   fmt.Sprint("Item_" + strconv.Itoa(1+totalResources)),
		weight: weight,
		value:  value,
	}
}

func parseSolutionResource(line string) (int, int) {
	parts := strings.Fields(line)
	resourceId, err := strconv.Atoi(parts[0])
	if err != nil {
		panic("failed to parse resourceId from '" + line + "'")
	}
	count, err := strconv.Atoi(parts[1])
	if err != nil {
		panic("failed to parse count from '" + line + "'")
	}

	return resourceId, count
}

type ukpLineFn func(line string, parts chan part) ukpLineFn

func ukpResourceFileHeader(line string, parts chan part) ukpLineFn {
	if strings.Index(line, "c:") != 0 {
		return ukpResourceFileHeader
	}
	parts <- part{line: line, partType: constraintPart}
	return ukpDataHeader
}

func ukpDataHeader(line string, parts chan part) ukpLineFn {
	if strings.Index(line, "begin data") != 0 {
		return ukpDataHeader
	}
	return ukpData
}

func ukpData(line string, parts chan part) ukpLineFn {
	if strings.Index(line, "end data") != 0 {
		parts <- part{line: line, partType: resourcePart}
		return ukpData
	}
	return ukpSolutionHeader
}

func ukpSolutionHeader(line string, parts chan part) ukpLineFn {
	if strings.Index(line, "sol:") != 0 {
		return ukpSolutionHeader
	}
	return ukpSolution
}

func ukpSolution(line string, parts chan part) ukpLineFn {
	if len(line) > 0 {
		parts <- part{line: line, partType: solutionPart}
		return ukpSolution
	}
	return ukpFooter
}

func ukpFooter(line string, parts chan part) ukpLineFn {
	return ukpFooter
}

type partType int

type part struct {
	line     string
	partType partType
}

const (
	constraintPart partType = 1 + iota
	resourcePart
	solutionPart
)

A chromosome will still have two parts, the index and count, but this time it will need more genes because the problem we’re solving has 2000 possible resources and the quantities may need to go as high as 160.

spreadsheet demonstrating picking best first item

If we use hexadecimal values for the geneSet we’ll need 3 genes for the resource index and 2 for the quantity.

const hexLookup = "0123456789ABCDEF"
const numberOfGenesPerChromosome = 5

Here is the decoder:

func decodeGenes(candidate string, resources []resource) map[resource]int {
	resourceCounts := make(map[resource]int, len(candidate)/numberOfGenesPerChromosome)
	const maxHexValue = 16 * 16 * 16
	for i := 0; i < len(candidate); i += numberOfGenesPerChromosome {
		chromosome := candidate[i : i+numberOfGenesPerChromosome]
		resourceId := scale(hexToInt(chromosome[0:3]), maxHexValue, len(resources))
		resourceCount := hexToInt(chromosome[3:numberOfGenesPerChromosome])
		resource := resources[resourceId]
		resourceCounts[resource] = resourceCounts[resource] + resourceCount
	}
	return resourceCounts
}

func hexToInt(hex string) int {
	value := 0
	multiplier := 1
	for i := len(hex) - 1; i >= 0; i-- {
		value += multiplier * strings.Index(hexLookup, hex[i:i+1])
		multiplier *= len(hexLookup)
	}
	return value
}

func scale(value, currentMax, newMax int) int {
	return value * newMax / currentMax
}

Once we’ve decoded the candidate’s gene sequence we can get its fitness:

func getFitness(resourceCounts map[resource]int, maxWeight, optimalFitness int) int {
	weight := 0
	value := 0

	for resource, count := range resourceCounts {
		weight += resource.weight * count
		value += resource.value * count
	}

	if weight > maxWeight {
		if value == optimalFitness {
			return optimalFitness + weight - maxWeight
		}
		return -value
	}

	return int(value)
}

And the last utility function is one to display the decoded genes and quantities in a useful format:

func display(resourceCounts map[resource]int, fitness int, elapsed time.Duration, shorten bool) {
	label := ""
	for resource, count := range resourceCounts {
		if count == 0 {
			continue
		}
		if len(label) > 0 {
			label += ", "
		}
		label += fmt.Sprint(count, " of ", resource.name)
	}
	if shorten && len(label) > 33 {
		label = label[:33] + " ..."
	}
	fmt.Println(
		fitness,
		"   ",
		label,
		"\t",
		elapsed)
}

Finally, here’s the code that glues all the parts together to try to solve the problem.

func main() {
	flag.Parse()
	if flag.NArg() != 1 {
		fmt.Println("Usage: go run standard.go RESOURCEFILEPATH")
		return
	}
	var resourceFileName = flag.Arg(0)
	if !File.Exists(resourceFileName) {
		fmt.Println("file " + resourceFileName + " does not exist.")
		return
	}
	fmt.Println("using resource file: " + resourceFileName)

	resources, maxWeight, solution := loadResources(resourceFileName)

	optimalFitness := 0
	for resource, count := range solution {
		optimalFitness += resource.value * count
	}

	calc := func(candidate string) int {
		decoded := decodeGenes(candidate, resources)
		return getFitness(decoded, maxWeight, optimalFitness)
	}

	start := time.Now()

	disp := func(candidate string) {
		decoded := decodeGenes(candidate, resources)
		fitness := getFitness(decoded, maxWeight, optimalFitness)
		display(decoded, fitness, time.Since(start), true)
	}

	var solver = new(genetic.Solver)
	solver.MaxSecondsToRunWithoutImprovement = 5
	solver.MaxRoundsWithoutImprovement = 3

	var best = solver.GetBestUsingHillClimbing(calc, disp, hexLookup, 10, numberOfGenesPerChromosome, optimalFitness)

	fmt.Print("\nFinal: ")
	decoded := decodeGenes(best, resources)
	fitness := getFitness(decoded, maxWeight, optimalFitness)
	display(decoded, fitness, time.Since(start), false)
	if fitness == optimalFitness {
		fmt.Println("-- that's the optimal solution!")
	} else {
		percentOptimal := float32(100) * float32(fitness) / float32(optimalFitness)
		fmt.Printf("-- that's %f%% optimal\n", percentOptimal)
	}
}

Sample output:

using resource file: samples/ukp/data/exnsd16.ukp
1001881     137 of Item_730      3.0002ms
1016358     147 of Item_1698     5.0003ms
1023272     148 of Item_1698     5.0003ms
1023775     155 of Item_480      283.0162ms
1025480     155 of Item_1227     1.2450713s
1025681     157 of Item_1331     2.7771589s
1027260     156 of Item_288      3.4291962s
1027500     156 of Item_288, 6 of Item_1567      3.9722272s
1027540     7 of Item_1567, 156 of Item_288      3.9732273s
1027860     156 of Item_288, 15 of Item_1567     3.9732273s
1028783     156 of Item_288, 1 of Item_1692      4.212241s
1028950     156 of Item_288, 1 of Item_1801      4.212241s
1029261     1 of Item_1810, 156 of Item_288      4.213241s
1029581     156 of Item_288, 1 of Item_1790      4.2452429s
1029663     156 of Item_288, 1 of Item_1248      4.2492431s
1029664     1 of Item_1767, 156 of Item_288      7.4734275s
1029680     156 of Item_288, 1 of Item_987       8.8265049s

Final: 1029680     1 of Item_987, 156 of Item_288        14.0928061s
-- that's the optimal solution!

It doesn’t achieve an optimal solution every time but it does find a solution that’s better than 99 percent optimal almost every time. Here’s the fitness results across 100 runs with 5 seconds to run and a maximum of 3 rounds without improvement:

1029680 21
1029582 3
1029504 1
1029477 1
1029459 1
1029356 1
1029094 1
1028950 2
1028926 1
1028911 1
1028905 2
1028891 1
1028890 1
1028888 1
1028872 1
1028603 1
1028209 1
1028205 1
1028198 1
1028183 10  <- median
1028176 1
1028166 4
1028127 1
1028101 14
1028100 1
1028058 1
1028050 2
1027991 1
1027776 1
1027416 3
1027311 1
1027042 1
1026972 1
1026799 5
1026747 1
1026721 1
1026345 1
1026285 1
1026204 1
1025834 2
1025681 3 <- 99.611626% optimal

The source code for this project is available from my GitHub repository.

Go on to part 6.

Read Full Post »

This Go project uses the genetic solver from my previous post to evolve a regular expression that matches all items in a set of wanted strings without matching any items in the set of unwanted strings. For example:

	wanted := []string{"00", "01", "10"}
	unwanted := []string{"11"}

We desire a regular expression that finds strings from wanted without finding those from unwanted. We’ll start by determining the unique set of characters in the wanted strings:

func getUniqueCharacters(wanted []string) string {
	uniqueCharacters := make(map[string]bool)

	characters := ""
	for _, item := range wanted {
		for i := 0; i < len(item); i++ {
			token := item[i : i+1]
			if !uniqueCharacters[token] {
				characters += token
				uniqueCharacters[token] = true
			}
		}
	}
	return characters
}

And we’ll combine those with a set of regex special characters to get our gene set:

const regexSpecials = "[]()|?*+"

geneSet := getUniqueCharacters(wanted) + regexSpecials

Next, when we receive a candidate regex from the solver we’ll verify that it is a valid regex by trying to compile it.

func isValidRegex(candidate string) bool {
	if strings.Contains(candidate, "()") || strings.Contains(candidate, "??") {
		return false
	}

	_, err := regexp.Compile(candidate)
	return err == nil
}

We also need a function to calculate the fitness of a given candidate based on the number of wanted and unwanted strings it matches.

func calculate(wanted, unwanted []string, geneSet string, candidate string) int {
	if !isValidRegex(candidate) {
		return math.MaxInt32
	}

	regex := regexp.MustCompile("^(" + candidate + ")$")
	fitness := 0
	for _, item := range wanted {
		if !regex.MatchString(item) {
			fitness++
		}
	}

	for _, item := range unwanted {
		if regex.MatchString(item) {
			fitness += 10
		}
	}

	return fitness
}

Next we’ll need a way to visualize what’s happening:

	disp := func(candidate string) {
		fmt.Println(candidate,
			"\t",
			calc(candidate),
			"\t",
			time.Since(start))
	}

Finally we glue it all together as follows:

package main

import (
	"github.com/handcraftsman/GeneticGo"
	"fmt"
	"math"
	"regexp"
	"strings"
	"time"
)

const regexSpecials = "[]()|?*+"

func main() {
	wanted := []string{"AL", "AK", "AS", "AZ", "AR"}
	unwanted := []string{"AA"}

	geneSet := getUniqueCharacters(wanted) + regexSpecials

	calc := func(candidate string) int {
		return calculate(wanted, unwanted, geneSet, candidate)
	}
	start := time.Now()

	disp := func(candidate string) {
		fmt.Println(candidate,
			"\t",
			calc(candidate),
			"\t",
			time.Since(start))
	}

	var solver = new(genetic.Solver)
	solver.MaxSecondsToRunWithoutImprovement = 3
	solver.LowerFitnessesAreBetter = true
	solver.MaxRoundsWithoutImprovement = 10

	var best = solver.GetBestUsingHillClimbing(calc, disp, geneSet, 10, 1, 0)
	fitness := calc(best)

	if fitness == 0 {
		println("\nsolved with: " + best)
	} else {
		println("\nfailed to find a solution")
	}

	fmt.Print("Total time: ")
	fmt.Println(time.Since(start))
}

Note: depending on the speed of your processor you may be able to reduce MaxSecondsToRunWithoutImprovement. For this problem I’ve set it to 1/10 second.

solver.MaxSecondsToRunWithoutImprovement = .1

Output:

$ go run samples/regex.go
|        3       0
10       2       1.0001ms
1*0+     1       131.0075ms
0*1?0*   0       335.0192ms

solved with: 0*1?0*
Total time: 335.0192ms

Cool!

Now, here’s a somewhat more interesting problem – repetition:

	wanted := []string{"01", "0101", "010101"}
	unwanted := []string{"0011"}

	solver.MaxSecondsToRunWithoutImprovement = .1

Output:

$ go run samples/regex.go
0        3       1ms
01       2       1ms
(01)+    0       215.0123ms

solved with: (01)+
Total time: 216.0123ms

Nailed it! And one more based on U.S. state mail codes:

	wanted := []string{"AL","AK","AS","AZ","AR"}
	unwanted := []string{"AA"}

	solver.MaxSecondsToRunWithoutImprovement = .5

Output:

$ go run samples/regex.go
R        5       0
AS       4       1.0001ms
AK?L?    3       1.0560604s
AS*L?K*          2       2.1301218s
A[SKLZ]          1       2.62215s
A[LRSZK]         0       3.1391795s

solved with: A[LRSZK]
Total time: 3.6392081s

The source code for this project is available from my GitHub repository.

Go on to part 5.

Read Full Post »

Older Posts »

%d bloggers like this: