Feeds:
Posts
Comments

Graph coloring is an interesting problem. Variations involve using the fewest number of colors while making each node a unique color, trying to use an equal number of each color, etc. Take for example this map of the United States.

multi colored U.S. map

How can we keep the constraint that adjacent states do not have the same color while reducing the number of colors to only four? If you try this by hand you start by picking a state and coloring the states around it alternating colors. You can do this with 3 colors if there an even number of adjacent states.

Tennessee and adjacent states 3-colored

But if there are an odd number then you need 4 colors.

Kentucky and adjacent states 4-colored

Then you continue the pattern to other states and introduce a new color only when necessary.

We’re going to do this for the map of 50 U.S. states using the genetic solver from the previous post.

We don’t care about the visual representation per se. We only care about the physical relationships between the states, which we can represent with a graph, or more simply as a set of rules indicating which states cannot have the same color.

We start off with a file containing a list of states and the set of states that are adjacent to each. The row for Tennessee might be as follows:

TN,AL;AR;GA;KY;MO;MS;NC;VA

Next we need to read that file.

def loadData(localFileName):
    # expects: AA,BB;CC;DD where BB, CC and DD are the initial column values in other rows
    with open(localFileName, mode='r') as infile:
        reader = csv.reader(infile)
        mydict = {row[0]: row[1].split(';') for row in reader if row}
        return mydict

Then we need to build the rules. A Rule connects two states indicating that they are adjacent. We want to be able to put rules in a dictionary and find them in a list so we need to define __hash__ and __eq__. We might also want to be able to display a rule so we’ll add a __str__ implementation.

class Rule:
    Item = None
    Other = None
    Stringified = None

    def __init__(self, item, other, stringified):
        self.Item = item
        self.Other = other
        self.Stringified = stringified

    def __eq__(self, another):
        return hasattr(another, 'Item') and \
               hasattr(another, 'Other') and \
               self.Item == another.Item and \
               self.Other == another.Other

    def __hash__(self):
        return hash(self.Item) * 397 ^ hash(self.Other)

    def __str__(self):
        return self.Stringified

Next we’re going to build the set of rules. While we’re doing so we’re going to perform a sanity check on the data. Whenever a state says it is adjacent to another state, the adjacent state should also say it is adjacent to the first state.

def buildLookup(items):
    itemToIndex = {}
    index = 0
    for key in sorted(items):
        itemToIndex[key] = index
        index += 1
    return itemToIndex

def buildRules(items):
    itemToIndex = buildLookup(items.keys())
    rulesAdded = {}
    rules = []
    keys = sorted(list(items.keys()))

    for key in sorted(items.keys()):
        keyIndex = itemToIndex[key]
        adjacentKeys = items[key]
        for adjacentKey in adjacentKeys:
            if adjacentKey == '':
                continue
            adjacentIndex = itemToIndex[adjacentKey]
            temp = keyIndex
            if adjacentIndex < temp:
                temp, adjacentIndex = adjacentIndex, temp
            ruleKey = keys[temp] + "->" + keys[adjacentIndex]
            rule = Rule(temp, adjacentIndex, ruleKey)
            if rule in rulesAdded:
                rulesAdded[rule] += 1
            else:
                rulesAdded[rule] = 1
                rules.append(rule)

    for k, v in rulesAdded.items():
        if v == 1:
            print("rule %s is not bidirectional" % k)

    return rules

Now we have the ability to convert a file of node relationships to a set of adjacency rules. Next we need to build the code used by the genetic solver. We’ll start by determining what our genes will be. In this case since we want to four-color the 50 states our geneset will be four color codes.

        colors = ["Orange", "Yellow", "Green", "Blue"]
        colorLookup = {}
        for color in colors:
            colorLookup[color[0]] = color
        geneset = list(colorLookup.keys())

Our Individuals will have 50 genes, one for each state, in alphabetical order. This lets us use the index into the genes as an index into the set of sorted state codes.

Since the expected optimal situation will be that all adjacent states have different colors we can set the optimal value to the number of rules.

At the end we’ll write out the color of each state.

class GraphColoringTests(unittest.TestCase):
    def test(self):
        states = loadData("adjacent_states.csv")
        rules = buildRules(states)
        colors = ["Orange", "Yellow", "Green", "Blue"]
        colorLookup = {}
        for color in colors:
            colorLookup[color[0]] = color
        geneset = list(colorLookup.keys())
        optimalValue = len(rules)
        startTime = datetime.datetime.now()
        fnDisplay = lambda candidate: display(candidate, startTime)
        fnGetFitness = lambda candidate: getFitness(candidate, rules)
        best = genetic.getBest(fnGetFitness, fnDisplay, len(states), optimalValue, geneset)
        self.assertEqual(best.Fitness, optimalValue)

        keys = sorted(states.keys())

        for index in range(len(states)):
            print(keys[index] + " is " + colorLookup[best.Genes[index]])

As for display, it should be sufficient to output the color codes.

def display(candidate, startTime):
    timeDiff = datetime.datetime.now() - startTime
    print("%s\t%i\t%s" % (''.join(map(str, candidate.Genes)), candidate.Fitness, str(timeDiff)))

This gets output like the following. The number to the right of the gene sequence will indicate how many rules this gene sequence satisfies.

YGGBOOGOOBBYGGYYYYGBGYOOGBOYGGOOOYBOYBBGGOBYOGOGOGG	74	0:00:00.001000

Finally we need a fitness function that checks all the rules assuming the states are colored according to the gene sequence.

def getFitness(candidate, rules):
    rulesThatPass = 0
    for rule in rules:
        if candidate[rule.Item] != candidate[rule.Other]:
            rulesThatPass += 1

    return rulesThatPass

That’s it. Now when we run our main test function we get the following output:

OOYYOBBGYOOYGBBYOOOBOGYGYGGBBYGOGGOYOYGYBBOBOBGOBBG	82	0:00:00
YYBOYGGGGOBYOYBGBOOBOOBBGGBGGYGBBGOBOBYYOGYYBBYOYGO	102	0:00:00.016001
BOOGOGGOGBGYGGBGOOYBYOBYGBBOGBGBBBOYYYGYYOYOOGYBOBY	103	0:00:00.316018
GOBOGGOYGBGOBGOGYBYBOOYYGGBBGOYBYYYOOBGYYOYGBGGOGYY	104	0:00:01.602092
BBBBGGBYGOGYBGOGBBYGOGYYYGYBBOOBYYYOOOGOYGOGOGBBGYB	105	0:00:04.933282
AK is Blue
AL is Blue
AR is Blue
AZ is Blue
CA is Green
CO is Green
CT is Blue
DC is Yellow
DE is Green
FL is Orange
GA is Green
HI is Yellow
IA is Blue
ID is Green
IL is Orange
IN is Green
KS is Blue
KY is Blue
LA is Yellow
MA is Green
MD is Orange
ME is Green
MI is Yellow
MN is Yellow
MO is Yellow
MS is Green
MT is Yellow
NC is Blue
ND is Blue
NE is Orange
NH is Orange
NJ is Blue
NM is Yellow
NV is Yellow
NY is Yellow
OH is Orange
OK is Orange
OR is Orange
PA is Green
RI is Orange
SC is Yellow
SD is Green
TN is Orange
TX is Green
UT is Orange
VA is Green
VT is Blue
WA is Blue
WI is Green
WV is Yellow
WY is Blue

Which looks like this:

4-color us states

Read more of this series in my eBook Genetic Algorithms with Python

In Part 1 we built a basic genetic solver that used mutation to solve problems. In this part we’re going to tackle a slightly more complex problem, the 8 Queens Puzzle, and then expand the solver as necessary.

Get the latest version of this post as a free chapter from my eBook Genetic Algorithms with Python

The 8 Queens Puzzle involves putting 8 queens on a standard chess board such that none are under attack. According to WikiPedia there are only 92 solutions to this puzzle and once we remove mirrorings and rotations there are only 12 unique solutions.

To start with we need to define a genome. The chess board conveniently has the same number of rows as columns (8) so we’ll use the digits 1-8 for our genes.

geneset = '12345678'

Array indexes are zero based however, so we’ll need to convert gene symbols to row and column values. To do that we’ll find the gene’s index into the set of gene symbols then use that index as a row or column, and combine those to make a Point.

class Point:
    Row = None
    Col = None

    def __init__(self, row, col):
        self.Row = row
        self.Col = col

def getPoint(rowGene, colGene, gene_set):
    rowIndex = gene_set.index(rowGene)
    if rowIndex == -1:
        raise ValueError("'" + rowGene + "' is an invalid gene")
    colIndex = gene_set.index(colGene)
    if colIndex == -1:
        raise ValueError("'" + colGene + "' is an invalid gene")
    return Point(rowIndex, colIndex)

We also need to be able plot the Points on a board.

def getBoard(candidate, gene_set):
    board = [['.'] * 8 for i in range(8)]
    for index in range(0, len(candidate), 2):
        point = getPoint(candidate[index], candidate[index + 1], gene_set)
        board[point.Row][point.Col] = 'Q'
    return board

And we want to be able to visualize the board.

def display(candidate, gene_set, startTime):
    timeDiff = datetime.datetime.now() - startTime
    board = getBoard(candidate.Genes, gene_set)
    for i in range(8):
        print(board[i][0],
              board[i][1],
              board[i][2],
              board[i][3],
              board[i][4],
              board[i][5],
              board[i][6],
              board[i][7]
              )
    print("%s\t%i\t%s" % (candidate.Genes, candidate.Fitness, str(timeDiff)))

This will give us output like the following

. Q . . . . . .
. . Q . . . . .
. . . . Q Q . .
. . . . . . . Q
. . . . . . . .
Q . . . . . . .
. . . . . . Q .
. . . . Q . . .
7761124823353685	29	0:00:00.013001

The row of digits under the board is the set of genes that created the board layout. The number to the right will be the fitness, a measure of how close this set of genes is to the desired result. To drive improvement we’ll want to increase the fitness value whenever the related board position lets more queens coexist on the board. Let’s think about how we can do that.

We’ll start with counting the number of columns that have a queen. Here’s a layout that gets an optimal score but is undesirable

Q Q Q Q Q Q Q Q
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .

We’ll also count the number of rows that have queens. Here’s a revised board where both counts are optimal but the layout still allows queens to attack each other.

Q . . . . . . .
. Q . . . . . .
. . Q . . . . .
. . . Q . . . .
. . . . Q . . .
. . . . . Q . .
. . . . . . Q .
. . . . . . . Q

To fix this problem we’ll include the count of the number of southeast diagonals that have a queen. Again we can find a corner case as follows:

. . . . . . . Q
. . . . . . Q .
. . . . . Q . .
. . . . Q . . .
. . . Q . . . .
. . Q . . . . .
. Q . . . . . .
Q . . . . . . .

To fix this final problem we’ll include the count of the number of northeast diagonals that have a queen.

Our fitness value will be the sum of those four counts, 8+8+8+8 being optimal.

Side note on implementation, we can calculate the northeast diagonals in Excel using the formula =$A2+B$1 which results in a grid as follows

    0   1   2   3   4   5   6   7
0   0   1   2   3   4   5   6   7
1   1   2   3   4   5   6   7   8
2   2   3   4   5   6   7   8   9
3   3   4   5   6   7   8   9   10
4   4   5   6   7   8   9   10  11
5   5   6   7   8   9   10  11  12
6   6   7   8   9   10  11  12  13
7   7   8   9   10  11  12  13  14

The southeast diagonals can be calculated using =(8-1-$A2)+B$1 which we can visualize as follows:

    0   1   2   3   4   5   6   7
0   7   8   9   10  11  12  13  14
1   6   7   8   9   10  11  12  13
2   5   6   7   8   9   10  11  12
3   4   5   6   7   8   9   10  11
4   3   4   5   6   7   8   9   10
5   2   3   4   5   6   7   8   9
6   1   2   3   4   5   6   7   8
7   0   1   2   3   4   5   6   7

Using the above 2 formulas along with the row and column values lets us touch each board position exactly once, which makes our fitness function run fast.

def getFitness(candidate, gene_set):
    board = getBoard(candidate, gene_set)
    rowsWithQueens = {}
    colsWithQueens = {}
    northEastDiagonalsWithQueens = {}
    southEastDiagonalsWithQueens = {}
    for row in range(8):
        for col in range(8):
            if board[row][col] == 'Q':
                rowsWithQueens[row] = 1
                colsWithQueens[col] = 1
                northEastDiagonalsWithQueens[row + col] = 1
                southEastDiagonalsWithQueens[8 - 1 - row + col] = 1

    return len(rowsWithQueens) \
           + len(colsWithQueens) \
           + len(northEastDiagonalsWithQueens) \
           + len(southEastDiagonalsWithQueens)

Finally our main (integration test) method will bring all the parts together.

class EightQueensTests(unittest.TestCase):
    def test(self):
        geneset = '12345678'
        startTime = datetime.datetime.now()
        fnDisplay = lambda candidate: display(candidate, geneset, startTime)
        fnGetFitness = lambda candidate: getFitness(candidate, geneset)
        best = genetic.getBest(fnGetFitness, fnDisplay, 16, 8 + 8 + 8 + 8, geneset)
        self.assertEqual(best.Fitness, 8 + 8 + 8 + 8)

Note that we’ve added a parameter to getBest. This is because in Part 1 the solver determined the optimal fitness internally but now we want pass it in.

def getBest(get_fitness, display, targetLen, optimalFitness, geneSet):
...
    while bestParent.Fitness < optimalFitness:

We can run our test to see if the solver can find a solution.

Q . . . . . . .
. . . . . . . .
. . . . Q . . .
. . . . Q . . .
. Q . . Q . . .
. . . . . . . .
. . . Q . . . .
Q . Q . . . . .
4574811152355583	23	0:00:00
Q . . . . . . .
. . . . . . . .
. . . . Q . . .
. . . . Q . . .
. . . . Q . . Q
. . . . . . . .
. . . Q . . . .
Q . Q . . . . .
4574811158355583	24	0:00:00
Q . . . . . . .
. . . . . . . .
. . . . Q . . .
. . Q . Q . . .
. . . . Q . . Q
. . . . . . . .
. . . Q . . . .
Q . . . . . . .
4574811158355543	25	0:00:00.001000
Q . . . . . . .
. . . . . . . .
. . . . Q . . .
. . Q . Q . . .
. . . . . . Q Q
. . . . . . . .
. . . Q . . . .
Q . . . . . . .
4574811158355743	26	0:00:00.002000
Q . . . . . . .
. . . . . . . .
Q . . . . . . .
. . Q . Q . . .
. . . . . . Q Q
. . . . . . . .
. . . Q . . . .
Q . . . . . . .
4574811158315743	27	0:00:00.003000
Q . . . . . . .
. . . . . . . Q
Q . . . . . . .
. . Q . Q . . .
. . . . . . Q .
. . . . . . . .
. . . Q . . . .
Q . . . . . . .
4574811128315743	28	0:00:00.004000
Q . . . . . . .
. . . . . . . Q
Q . . . . . . .
. . Q . Q . . .
. . . . . . Q .
. . . . . . . .
. . . Q . . . .
. Q . . . . . .
4574821128315743	29	0:00:00.005001
Q . . . . . . .
. . . . . Q . .
Q . . . . . . .
. . Q . Q . . .
. . . . . . Q .
. . . . . . . .
. . . Q . . . .
. Q . . . . . .
4574821126315743	30	0:00:00.009001

After several test runs we find that most of the time it gets close but can’t get all the way to the optimal value of 32. We need to enhance the solver’s capabilities for it to be able to handle this problem.

If at first you don’t succeed, try, try again!

We’re going to do that by introducing a second genetic line. We’ll mutate the 2nd line as long as it is improving. If it ends up with a better fitness than bestParent then it will become the bestParent. Otherwise, we’ll start a new genetic line again with a random gene sequence. We repeat this process over and over until we find an optimal result. Here’s the updated solver loop:

def getBest(get_fitness, display, targetLen, optimalFitness, geneSet):
    random.seed()
    bestParent = generateParent(targetLen, geneSet, get_fitness)
    display(bestParent)

    while bestParent.Fitness < optimalFitness:
        parent = generateParent(targetLen, geneSet, get_fitness)
        attemptsSinceLastImprovement = 0
        while attemptsSinceLastImprovement < 128:
            child = mutate(parent, geneSet, get_fitness)
            if child.Fitness > parent.Fitness:
                parent = child
                attemptsSinceLastImprovement = 0
            attemptsSinceLastImprovement += 1

        if bestParent.Fitness < parent.Fitness:
            bestParent, parent = parent, bestParent
            display(bestParent)

    return bestParent

Now when we run the 8 queens test we find an optimal solution every time.

. Q . . Q . . .
. . . . Q . . .
. . . . . . . Q
. . . . . . . Q
. . . . . . . .
. . . . . . . .
. . . . . . . .
Q . . . . . . .
8148381525122525	20	0:00:00
. . Q . . . . .
. . Q . . . . .
. . . . . . Q .
. . Q . . . . .
. . . . . . . .
. . . Q . . . .
Q . . . . . . .
. . . . Q . . Q
2385884313647137	28	0:00:00.013001
. . . . . . Q .
. . . . . . . .
. . . Q . . . .
. . . . . Q . Q
Q . . . . . . .
. . . . Q . . .
. Q . . . . . .
. . . . Q . . .
3417486546857251	30	0:00:00.041003
. . Q . . . . .
. . . . Q . . .
. Q . . . . . .
. . . Q . . . .
Q . . . . . . .
. . Q . . . . .
. . . . . . . Q
. . . . . Q . .
5178258613324463	31	0:00:00.143008
. . . . Q . . .
. . Q . . . . .
Q . . . . . . .
. . . . . . Q .
. Q . . . . . .
. . . . . . . Q
. . . . . Q . .
. . . Q . . . .
5247312368841576	32	0:00:00.308018

However, when we run the string duplication test from Part 1 it now struggles to find a solution. This is because we ignore our best line completely once we find it and only try to improve by mutating a new genetic line.

JoyEMlECsatqzfBVFkvIHduMg!HAFtnWcC	3	0:00:00
Nj!HBLwEHvdse wGGOKbJctg AriqlNDPP	8	0:00:00.022001
NBiBRlE thoHfFWhW !tnnFyuHAv wputM	11	0:00:00.062004
NUt alx xmusGiwTmEwprder are lKsdG	19	0:00:00.112007
Not allkthosbHnhoCHinvBr OQe losmR	21	0:00:00.852049
NotW!lT NhoJe VhRFwaDderiaCe most.	22	0:00:02.510144
Not alF tQose S!o wDndUZ yre lostA	25	0:00:05.847335
NotXLll those whopwWnder are lost.	30	0:01:33.040322

We need a way to continue to take advantage of the genes in bestParent. One way nature handles this is through crossbreeding so we’ll introduce a crossover strategy where we take one random gene from a 2nd (the best) parent.

def crossover(parent, parent2, get_fitness):
    index = random.randint(0, len(parent.Genes) - 1)
    genes = list(parent.Genes)
    genes[index] = parent2.Genes[index]
    childGenes = (''.join(genes))
    fitness = get_fitness(childGenes)
    return Individual(childGenes, fitness)

And update the main loop to randomly choose whether to use mutation or crossover to improve the 2nd genetic line.

    options = {
        0: lambda p: mutate(p, geneSet, get_fitness),
        1: lambda p: crossover(p, bestParent, get_fitness)
    }

    while bestParent.Fitness < optimalFitness:
        parent = generateParent(targetLen, geneSet, get_fitness)
        attemptsSinceLastImprovement = 0
        while attemptsSinceLastImprovement < 128:
            child = options[random.randint(0, len(options) - 1)](parent)

Now when we run the tests both are able to achieve optimal results every time.

 

Refactor

 

Now that everything works we’re going to do some code hygiene. In solving this problem we used genes “12345678” but it would have been more convenient, and faster, if we could have used raw integers 0-7. So let’s make that change. I’ll show selected changes below but you can get the full set from github.

class EightQueensTests(unittest.TestCase):
    def test(self):
        geneset = [0, 1, 2, 3, 4, 5, 6, 7]
    ...
        fnDisplay = lambda candidate: display(candidate, startTime)
        fnGetFitness = lambda candidate: getFitness(candidate)
    ...

This means both Point and getPoint can go away, simplifying getBoard to:

def getBoard(candidate):
    board = [['.'] * 8 for i in range(8)]
    for index in range(0, len(candidate), 2):
        board[candidate[index]][candidate[index + 1]] = 'Q'
    return board

In display we go the other way so we can show all the genes as a string.

def display(candidate, startTime):
    timeDiff = datetime.datetime.now() - startTime
    board = getBoard(candidate.Genes)
    for i in range(8):
        print(board[i][0],
              board[i][1],
              board[i][2],
              board[i][3],
              board[i][4],
              board[i][5],
              board[i][6],
              board[i][7]
              )
    print("%s\t%i\t%s" % (''.join(map(str, candidate.Genes)), candidate.Fitness, str(timeDiff)))

To support the geneset being an array in the solver we only have to eliminate the line in crossover, mutate and generateParent that converts the child gene sequence to a string.

Read more of this series in my eBook Genetic Algorithms with Python

 

The goal of this, my first program in Python, is to reproduce a target string (like Hello World!) without looking directly at it. I’ll do this with a simple genetic algorithm that randomly generates an initial sequence of characters and then mutates one random character in that sequence at a time until it matches the target. Think of this like playing a Hangman variant where you pass a letter sequence to the person who knows the target word, and the only feedback you get is how many of your letters are correct. It is also reminiscent of the game Hotter Colder, except we’re doing it with code.

We start off with a standard set of letters for genes and a target string:

    geneset = " abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!."
    target = "Not all those who wander are lost."

Next we need a way to generate a random gene sequence from the gene set.

def generateParent(length):
    genes = list("")
    for i in range(0,length):
        geneIndex = random.randint(0, len(geneset) -1);
        genes.append(geneset[geneIndex])
    return(''.join(genes))

There are many ways to calculate a fitness value (how close the guess is to the target) for the generated string. For this particular problem we’ll simply count the number of characters that are the same between the candidate string and the target string.

def getFitness(candidate, target):
   fitness = 0
   for i in range(0, len(candidate)):
       if target[i] == candidate[i]:
           fitness += 1
   return(fitness)

We also need a way to produce a new child gene sequence by mutating the existing (parent) string.  The point is to create a copy of the parent then replace 1 letter/gene in the copy with a randomly selected one from the set of all possible genes/letters.

def mutate(parent):
   geneIndex = random.randint(0, len(geneset) -1);
   index = random.randint(0, len(parent) - 1)
   genes = list(parent)
   genes[index] = geneset[geneIndex]
   return(''.join(genes))

Next we need a way to display a gene sequence, its fitness value and how much time has elapsed.

def display(candidate, startTime):
    timeDiff = datetime.datetime.now() - startTime
    fitness = getFitness(candidate, target)
    print ("%s\t%i\t%s" % (candidate, fitness, str(timeDiff)))

The heart of the genetic solver is a loop that uses the functions above to generate a candidate gene sequence, compare it to the previous best, and randomly mutate it until all the genes match those in the target.

bestParent = generateParent(len(target))
bestFitness = getFitness(bestParent, target)
display(bestParent, startTime)

while bestFitness < len(bestParent):
   child = mutate(bestParent)
   childFitness = getFitness(child, target)

   if childFitness > bestFitness:
      bestFitness = childFitness
      bestParent = child
      display(bestParent, startTime)

Sample output:

Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:44:40) [MSC v.1600 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
<<< ================================ RESTART ================================
<<<
iEjRmPrBUUDosXupcEw.Ji.qTCtRYawlry	1	0:00:00
iEjRmPrBUUDosXupcEw.Ji.qTCtRYaolry	2	0:00:00.004001
iEjRmPr UUDosXupcEw.Ji.qTCtRYaolry	3	0:00:00.007001
iEjRmPr UUDosXupcEw.JieqTCtRYaolry	4	0:00:00.011001
iEjRmPr UUDosXupcEw.nieqTCtRYaolry	5	0:00:00.014002
iEjRmPl UUDosXupcEw.nieqTCtRYaolry	6	0:00:00.016002
iEjRmPl UhDosXupcEw.nieqTCtRYaolry	7	0:00:00.023003
iEjRmPl UhoosXupcEw.nieqTCtRYaolry	8	0:00:00.026003
iEjRmPl UhossXupcEw.nieqTCtRYaolry	9	0:00:00.029003
iEjRmPl UhossXupcEw.nierTCtRYaolry	10	0:00:00.032003
iEj mPl UhossXupcEw.nierTCtRYaolry	11	0:00:00.036004
iEj mPl UhossXupcEw.nierTCrRYaolry	12	0:00:00.038004
iEj aPl UhossXupcEw.nierTCrRYaolry	13	0:00:00.041004
iEj aPl UhossXupcEw.nderTCrRYaolry	14	0:00:00.044005
iEj aPl UhossXupcEw.nderTCreYaolry	15	0:00:00.045005
iEj aPl UhossXupoEw.nderTCreYaolry	16	0:00:00.047005
iEj aPl UhoseXupoEw.nderTCreYaolry	17	0:00:00.051005
iEj aPl UhoseXupoEw.nderTCreYlolry	18	0:00:00.054006
iEj aPl UhoseXupoEw.nderTCreYlolr.	19	0:00:00.057006
iEj aPl thoseXupoEw.nderTCreYlolr.	20	0:00:00.059006
iEj aPl thoseXupoEwanderTCreYlolr.	21	0:00:00.065007
iEj aPl thoseXupo wanderTCreYlolr.	22	0:00:00.068007
NEj aPl thoseXupo wanderTCreYlolr.	23	0:00:00.073008
NEj aPl thoseXupo wander CreYlolr.	24	0:00:00.075008
NEj aPl thoseXupo wander CreYlolt.	25	0:00:00.081008
NEj aPl thoseXwpo wander CreYlolt.	26	0:00:00.083009
Noj aPl thoseXwpo wander CreYlolt.	27	0:00:00.090009
Noj aPl thoseXwho wander CreYlolt.	28	0:00:00.093010
Noj aPl thoseXwho wander CreYlost.	29	0:00:00.097010
Not aPl thoseXwho wander CreYlost.	30	0:00:00.106011
Not all thoseXwho wander CreYlost.	31	0:00:00.117012
Not all thoseXwho wander areYlost.	32	0:00:00.120012
Not all thoseXwho wander are lost.	33	0:00:00.123013
Not all those who wander are lost.	34	0:00:00.151015

Refactor

 

Good, it works. Now we need to separate the solver code from that specific to the string duplication problem so we can use it to solve other problems. We’ll start by moving our main loop into a function called getBest. That function’s parameters will include the functions it should call to get the candidate’s fitness and to display a new best sequence.

def getBest(get_fitness, display, targetLen, geneSet):
    random.seed()
    bestParent = generateParent(targetLen, geneSet)
    bestFitness = get_fitness(bestParent)
    display(bestParent)

    while bestFitness < len(bestParent):
       child = mutate(bestParent, geneSet)
       childFitness = get_fitness(child)

       if childFitness > bestFitness:
          bestFitness = childFitness
          bestParent = child
          display(bestParent)
    return bestParent

We also need to pass the gene set to generateParent and mutate so they aren’t using global variables.

def mutate(parent, geneSet):
   geneIndex = random.randint(0, len(geneSet) -1);
   index = random.randint(0, len(parent) - 1)
   genes = list(parent)
   genes[index] = geneSet[geneIndex]
   return(''.join(genes))

def generateParent(length, geneSet):
    genes = list("")
    for i in range(0,length):
        geneIndex = random.randint(0, len(geneSet) -1);
        genes.append(geneSet[geneIndex])
    return(''.join(genes))

Now move the rest of the code to a new file named stringDuplicationTests.py. Notice that display and getFitness no longer have the same number of parameters as those above.

def getFitness(candidate, target):
   fitness = 0
   for i in range(0, len(candidate)):
       if target[i] == candidate[i]:
           fitness += 1
   return(fitness)

def display(candidate, target, startTime):
    timeDiff = datetime.datetime.now() - startTime
    fitness = getFitness(candidate, target)
    print ("%s\t%i\t%s" % (candidate, fitness, str(timeDiff)))

In our main test method we’ll pass lambdas that take the candidate gene sequence passed by the solver as a parameter and call the local methods with additional required parameters as necessary.

def test_string_duplication():
    geneset = " abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!."
    startTime = datetime.datetime.now()
    target = "Not all those who wander are lost."
    fnDisplay = lambda candidate: display(candidate, target, startTime)
    fnGetFitness = lambda candidate: getFitness(candidate, target)
    best = genetic.getBest(fnGetFitness, fnDisplay, len(target), geneset)

Next we’ll make it so that executing this file calls our test function.

if __name__ == '__main__':
    test_string_duplication()

That’s it. We now have separation of concerns and when we run the test code the program still works. You can get the updated code from this checkin on Github.

Next, we’ll modify the test file to make it compatible with Python’s built in test framework.

import unittest
import datetime
import genetic

def getFitness(candidate, target):
   fitness = 0
   for i in range(0, len(candidate)):
       if target[i] == candidate[i]:
           fitness += 1
   return(fitness)

def display(candidate, target, startTime):
    timeDiff = datetime.datetime.now() - startTime
    fitness = getFitness(candidate, target)
    print ("%s\t%i\t%s" % (candidate, fitness, str(timeDiff)))

class StringDuplicationTests(unittest.TestCase):
    def test(self):
        geneset = " abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!."
        startTime = datetime.datetime.now()
        target = "Not all those who wander are lost."
        fnDisplay = lambda candidate: display(candidate, target, startTime)
        fnGetFitness = lambda candidate: getFitness(candidate, target)
        best = genetic.getBest(fnGetFitness, fnDisplay, len(target), geneset)
        self.assertEqual(best, target)

if __name__ == '__main__':
    unittest.main()

This allows us to run the tests from the commandline and without writing the output, which, by the way, makes it run twice as fast.

python -m unittest -b geneticTests.py
.
----------------------------------------------------------------------
Ran 1 test in 0.108s

OK

The final refactoring will be to introduce an Individual object that has both Genes and Fitness attributes.

class Individual:
   Genes = None
   Fitness = None
   def __init__(self, genes, fitness):
        self.Genes = genes
        self.Fitness = fitness

This lets us pass those values around as a unit and reduces some double work in the display function.

import random

def mutate(parent, geneSet, get_fitness):
   geneIndex = random.randint(0, len(geneSet) -1);
   index = random.randint(0, len(parent.Genes) - 1)
   genes = list(parent.Genes)
   genes[index] = geneSet[geneIndex]
   childGenes = (''.join(genes))
   fitness = get_fitness(childGenes)
   return Individual(childGenes, fitness)

def generateParent(length, geneSet, get_fitness):
    genes = list("")
    for i in range(0,length):
        geneIndex = random.randint(0, len(geneSet) -1);
        genes.append(geneSet[geneIndex])
    childGenes = (''.join(genes))
    fitness = get_fitness(childGenes)
    return Individual(childGenes,fitness)

def getBest(get_fitness, display, targetLen, geneSet):
    random.seed()
    bestParent = generateParent(targetLen, geneSet, get_fitness)
    display(bestParent)

    while bestParent.Fitness < targetLen:
       child = mutate(bestParent, geneSet, get_fitness)

       if child.Fitness > bestParent.Fitness:
          bestParent = child
          display(bestParent)
    return bestParent

We also have to make compensating changes to the test file methods.

def display(candidate, target, startTime):
    timeDiff = datetime.datetime.now() - startTime
    print ("%s\t%i\t%s" % (candidate.Genes, candidate.Fitness, str(timeDiff)))

class StringDuplicationTests(unittest.TestCase):
    def test(self):
        geneset = " abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!."
        startTime = datetime.datetime.now()
        target = "Not all those who wander are lost."
        fnDisplay = lambda candidate: display(candidate, target, startTime)
        fnGetFitness = lambda candidate: getFitness(candidate, target)
        best = genetic.getBest(fnGetFitness, fnDisplay, len(target), geneset)
        self.assertEqual(best.Genes, target)

Go on to Part 2 in which we evolve the solver to handle the 8 Queens Puzzle.

Learn more in my eBook Genetic Algorithms with Python

In Part 3 we added an integration test to verify that the solver can find a solution to the 8 Queens puzzle. However, Mutation is the only strategy currently implemented in the solver and it fails to find a solution to that problem 80% of the time. We need to introduce a new strategy from nature – crossover from another successful genetic line.

But first, to simplify some code sections I’m going to introduce a container so we can pass both the genes and their fitness as a single object.

    pub struct Individual {
        pub genes: String,
        pub fitness: usize
    }

This means we’ll call the display function with an Individual instead of a String, but also that we don’t have to call get_fitness in our tests in order to display the interim results. There are no surprises in this change so I won’t repeat them here. You can browse these code changes on Github.

We need to make two improvements to the solver in order for it to be able to solve this problem every time. The first is to add crossover as a genetic strategy. This is similar to mutation in that we’re going to replace 1 gene at a specific location. But this time instead of substituting a random gene we’re going to get the same gene location from a second successful parent.

    fn crossover<F>(parent1: &Individual, parent2: &Individual, get_fitness: F) -> Individual
    where F : Fn(&String)->usize
    {
        let mut rng = thread_rng();
        let parent_index = rng.gen::<usize>() % parent1.genes.len();
        let mut candidate = String::with_capacity(parent1.genes.len());

        if parent_index > 0 {
            candidate.push_str(&parent1.genes[..parent_index]);
        }
        candidate.push_str(&parent2.genes[parent_index..(1+parent_index)]); // replace 1 gene
        if parent_index+1 < parent1.genes.len() {
            candidate.push_str(&parent1.genes[parent_index+1..]);
        }

        let fitness = get_fitness(&candidate);
        Individual { genes: candidate, fitness: fitness }
    } 

The second improvement is to have an additional genetic line. We start with a random gene sequence and use our strategies to improve it until we fail to improve it 100 times in a row. Then we compare it to the current best parent and keep the better of the two. This process repeats until we hit the optimal fitness value.

    pub fn get_best<F,D>(
        get_fitness: F, 
        display: D, 
        length: usize, 
        optimal_fitness_value: usize,
        gene_set: &str) -> Individual
        where F : Fn(&String)->usize,
        D : Fn(&Individual)
    {
        let mut best_parent = generate_parent(&get_fitness, gene_set, length);
        display(&best_parent);

        while best_parent.fitness < optimal_fitness_value {
            let mut candidate = generate_parent(&get_fitness, gene_set, length);
            let mut attempts_since_last_improvement = 0;
            while attempts_since_last_improvement < 100 {
                let child = match attempts_since_last_improvement % 3 {
                    0 => generate_parent(&get_fitness, gene_set, length),
                    1 => mutate_parent(&candidate, &get_fitness, gene_set),
                    _ => crossover(&candidate, &best_parent, &get_fitness)
                };

                if child.fitness > candidate.fitness { 
                    candidate = child;
                    attempts_since_last_improvement = 0;
                }
                attempts_since_last_improvement = attempts_since_last_improvement + 1;
            }
            if candidate.fitness > best_parent.fitness { 
                best_parent = candidate;
                display(&best_parent);
            }
        }

        best_parent 
    }

Now when we run the integration tests they all pass 100% of the time, though some rounds take longer than others. Explore this code change on Github.

In Part 2 of this series we converted our code to a library, made our initial puzzle into an integration test, and extracted test related parameters and methods from the library.

Now we’re ready to try a new puzzle. This time we’ll expand our solver to handle a slightly more difficult problem – the 8 Queens puzzle.

In the 8 Queens puzzle we wan’t to place 8 chess Queens on a Chess board such that none of them are under attack.

7 Chess Queens in safe positions on a Chess board

According to WikiPedia there are only 92 solutions to this puzzle and once we remove mirrorings and rotations there are only 12 unique solutions.

We start off by figuring out how we’re going to map genes to the problem. One solution that I’ve used before is to assign each square on the 8×8 Chess board a symbol from the 64 symbol set ([a-z][A-Z][0-9]@#) as follows:

board positions labelled first rows then columns

We need to be able to convert a symbol (gene) to a board position. To do that we’ll find its index in the set of genes then convert that index to a row and column, or Point.

    fn to_point(gene: char, gene_set: &str) -> Point {
        let location = gene_set.find(gene);
        assert_eq!(location.is_some(), true);
        let index = location.unwrap() as i32;
        let row = index / 8i32;
        let column = index % 8i32;
        return Point{row: row, col: column};
    }

    struct Point {
        row: i32,
        col: i32
    }

We also need a way to visualize a set of Points (Queens) on a Chess board.

    fn get_board(candidate: &String, gene_set: &str) -> [[char; 8]; 8] {
        let mut board:[[char; 8]; 8] = [['.'; 8]; 8];
        
        for point in candidate.chars().map(|c| to_point(c, gene_set)) {
            board[point.row as usize][point.col as usize] = 'Q';
        }
        board
    }

    fn display_8_queens(candidate: &String, gene_set: &str, start: PreciseTime) {
        let now = PreciseTime::now();
        let elapsed = start.to(now);
        let board:[[char; 8]; 8] = get_board(candidate, gene_set);
        for i in 0..8 {
            let mut row = "".to_string();
            for j in 0..8 {
                row.push(board[i][j]);
                row.push(' ');
            }
            println!("{}", row);
        }
        
        println!("{}\t{}\t{}", candidate, get_8_queens_fitness(&candidate, gene_set), elapsed);
    }

The output of display_8_queens looks like this:

. . Q . . . . .
. . . . . Q . .
. . . . . . . Q
Q . . . . . . .
. . . Q . . . .
. . . . . . Q .
. . . . Q . . .
. Q . . . . . .
0ncJ5yUx        4096    PT0.192306457S

Next we need a way to see 1) if we have 8 Queens on the board, we might not if the same gene appears twice in the generated sequence, and 2) if each Queen is on its own row, column and diagonals.

First the row and column checks. We’ll keep a count of how many rows and columns have exactly one Queen.

    fn get_8_queens_fitness(candidate: &String, gene_set: &str) -> usize {
        let board = get_board(candidate, gene_set);
        
        // count rows with 1 queen
        let indexes: Vec<i32> = (0..8).collect();
        let mut correct_queens_in_row = 0; 
        for i in 0..8 {
            let row_count = indexes.iter()
                .cloned()
                .map(|col| board[i][col as usize])
                .filter(|ch|'Q' == *ch)
                .count();
            if row_count == 1 {
                correct_queens_in_row = correct_queens_in_row + 1;
            }
        }
        let mut correct_queens_in_column = 0; 
        for i in 0..8 {
            let column_count = indexes.iter()
                .cloned()
                .map(|row| board[row as usize][i])
                .filter(|ch|'Q' == *ch)
                .count();
            if column_count == 1 {
                correct_queens_in_column = correct_queens_in_column + 1;
            }
        }

To count the number of diagonals that have exactly one Queen we’ll introduce a generator that creates Points starting from an initial position and then moving by a given row and column offset. First the generator and some tests for it.

    struct Diagonal {
        row: i32,
        col: i32,
        row_offset: i32,
        col_offset: i32
    }

    impl Iterator for Diagonal {
        type Item = Point;
        fn next(&mut self) -> Option<Point> {
            let prev_row = self.row;
            let prev_col = self.col;
            self.row = prev_row + self.row_offset;
            self.col = prev_col + self.col_offset;
            
            // this is an infinite value generator
            Some(Point{row:prev_row,col:prev_col})
        }
    }

    #[test]
    fn test_diagonal_iterator_first_value_returned_should_be_the_start_state() {
        let mut diag = Diagonal {row:5,col:6,row_offset:1,col_offset:1};
        let first = diag.next();
        assert_eq!(true,first.is_some());
        let point = first.unwrap() as Point;
        assert_eq!(point.row,5);
        assert_eq!(point.col,6);
    }
    
    #[test]
    fn test_diagonal_iterator_second_value_returned_should_be_with_offsets_added_to_start_state() {
        let diag = Diagonal {row:5,col:4,row_offset:1,col_offset:-1};
        let first = diag.skip(1).next();
        assert_eq!(true,first.is_some());
        let point = first.unwrap() as Point;
        assert_eq!(point.row,6);
        assert_eq!(point.col,3);
    }

Note: As of this 5/29 the above code does not work with the 1.0.0 release of rustc due to a bug in the compiler… however the nightly build no longer has the bug. Try it on Playpen

And now the diagonal checks…

    let mut correct_queens_in_northeast_diagonal = 0; 
    for i in 0..15 {
        let diag = Diagonal {row:i,col:0,row_offset:-1,col_offset:1};
        let diagonal_count = diag
        .take_while(|point|point.row >= 0 && point.col >= 0)
        .skip_while(|point|point.row >=8 || point.col >=8)
        .take_while(|point|point.col < 8)
        .map(|point| board[point.row as usize][point.col as usize])
        .filter(|ch| 'Q' == *ch)
        .count();
        if diagonal_count == 1 {
        correct_queens_in_northeast_diagonal = correct_queens_in_northeast_diagonal + 1;
        }
    }    

    let mut correct_queens_in_southeast_diagonal = 0; 
    for i in -8..8 {
        let diag = Diagonal {row:i,col:0,row_offset:1,col_offset:1};
        let diagonal_count = diag
        .skip_while(|point|point.row < 0 || point.col < 0)
        .take_while(|point|point.col < 8 && point.row < 8)
        .map(|point| board[point.row as usize][point.col as usize])
        .filter(|ch| 'Q' == *ch)
        .count();
        if diagonal_count == 1 {
        correct_queens_in_southeast_diagonal = correct_queens_in_southeast_diagonal + 1;
        }
    }    

At the end of the method we’ll multiply all the row count values to get the fitness value. Given this the best possible solution would be 8*8*8*8.

    (if correct_queens_in_row == 0 { 1 } else { correct_queens_in_row })
    * (if correct_queens_in_column == 0 { 1 } else { correct_queens_in_column })
    * (if correct_queens_in_northeast_diagonal == 0 { 1 } else { correct_queens_in_northeast_diagonal })
    * (if correct_queens_in_southeast_diagonal == 0 { 1 } else { correct_queens_in_southeast_diagonal })
}

Language aside, rust doesn’t have a trinary operator but it does let you use an if statement inline.

Finally we need a main method to call the solver and verify the results. Once again we’ll create it as an integration test.

    #[test]
    fn test_8_queens() {
        let start = PreciseTime::now();
        let gene_set = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789@#";

        let wrapped_display = |candidate: &String| {display_8_queens(&candidate, gene_set, start);};
        let wrapped_get_fitness = |candidate: &String| -> usize {get_8_queens_fitness(&candidate, gene_set)};
    
        let best = genetic::get_best(wrapped_get_fitness, wrapped_display, 8, 8*8*8*8, gene_set);
        println!("Total time: {}", start.to(PreciseTime::now()));
        let best_fitness = get_8_queens_fitness(&best, gene_set);
        assert_eq!(best_fitness,8*8*8*8);
    }

When we run that test we may get a successful result like the following.

cargo test test_8_queens -- --nocapture
     Running target\debug\genetic-debe16ff205aab6a.exe

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured

     Running target\debug\lib_8_queens_tests-afc3deed67cad86e.exe

running 1 test
. . Q . Q . . .
Q Q . . . . . .
Q . . . Q . . .
. . . . . . . .
. . . . . . . .
. . . . . . Q .
. . . . . . . .
. . Q . . . . .
ujcieqU6        120     PT0.003442522S
. . Q . Q . . .
Q . . . . . . .
Q . . . Q . . .
. . . . . . . .
. . . . . . . .
. . . . . . Q .
. Q . . . . . .
. . Q . . . . .
uXcieqU6        192     PT0.004552093S
. . Q . Q . . .
Q . . . . . . .
Q . . . Q . . .
. . . . . Q . .
. . . . . . . .
. . . . . . Q .
. Q . . . . . .
. . . . . . . .
uXcieqUD        480     PT0.005474583S
. . Q . Q . . .
Q . . . . . . .
. . . . Q . . .
Q . . . . Q . .
. . . . . . . .
. . . . . . Q .
. Q . . . . . .
. . . . . . . .
uXcieyUD        640     PT0.006346324S
. . Q . Q . . .
Q . . . . . . .
. . . . Q . . .
Q . . . . . . .
. . . . . . . .
. . . . . . Q .
. Q . . . . . .
Q . . . . . . .
uXcieyU4        648     PT0.007207255S
. . Q . Q . . .
Q . . . . . . .
. . . . . . . .
Q . . . . . . .
. . . . . . . .
. . . . . . Q .
. Q . . . . . .
Q . . . . . . .
eXcieyU4        700     PT0.009275650S
. . Q . Q . . .
Q . . . . . . .
. . . . . . . .
Q . . . . . . .
. . . . . . . .
. . . . . Q Q .
. Q . . . . . .
. . . . . . . .
eXcieyUT        735     PT0.010128473S
. . Q . Q . . .
Q . . . . . . .
. . . . . . . .
Q . . . . . . .
. . . . . . . .
. Q . . . Q Q .
. Q . . . . . .
. . . . . . . .
eXciPyUT        768     PT0.010938355S
. . Q . Q . . .
Q . . . . . . .
. . . . . . . .
Q . . . . . . .
. . . . . . . .
. . . . . Q Q .
. Q . . . . . .
. . . . . . . Q
eXci#yUT        1152    PT0.011762050S
. . Q . Q . . .
Q . . . . . . .
. . . . . . . .
Q . . . . . . .
. . . . . . . .
. . . . . . Q .
. Q . . . . . .
. . . . . . . Q
eXci#yU#        1225    PT0.013189927S
. . Q . Q . . .
Q . . . . . . .
. . . . . . . Q
Q . . . . . . .
. . . . . . . .
. . . . . . Q .
. Q . . . . . .
. . . . . . . Q
eXci#yUx        1536    PT0.014965841S
. . Q . Q . . .
. . . . . . . .
. . . . . . . Q
Q . . . . . . .
. . . Q . . . .
. . . . . . Q .
. Q . . . . . .
. . . . . . . Q
eXcJ#yUx        1728    PT0.021601343S
. . Q . Q . . .
. . . . . . . Q
. . . . . . . Q
Q . . . . . . .
. . . Q . . . .
. . . . . . Q .
. . . . . . . .
. . . . . . . Q
epcJ#yUx        1920    PT0.023504280S
. . Q . . . . .
. . . . . . . Q
. . . . . . . Q
Q . . . . . . .
. . . Q . . . .
. . . . . . Q .
. . . . Q . . .
. . . . . . . Q
0pcJ#yUx        2560    PT0.030906422S
. . Q . . . . .
. . . . . Q . .
. . . . . . . Q
Q . . . . . . .
. . . Q . . . .
. . . . . . Q .
. . . . Q . . .
. . . . . . . Q
0ncJ#yUx        3072    PT0.166375236S
. . Q . . . . .
. . . . . Q . .
. . . . . . . Q
Q . . . . . . .
. . . Q . . . .
. . . . . . Q .
. . . . Q . . .
. Q . . . . . .
0ncJ5yUx        4096    PT0.192306457S
Total time: PT0.193881176S
test tests::test_8_queens ... ok

But the odds are against it… That’s because the Mutation genetic strategy can’t always solve this problem. For our solver to be able to find a solution every time we’re going to have to introduce a new strategy. That is the subject of Part 4.

The source code to this point is available on Github if you want to experiment.

In Part 1 we saw how a no-knowledge genetic string duplicator could be written in Rust. One problem with that implementation is the solver received parameters that it didn’t care about (start time) and shouldn’t have access to (target) to be truly no-knowledge.  In this post we’ll solve fix those issues and, in addition, turn our string duplication implementation into an integration test.

Things learned:

  • how to wrap a library in a module
  • how to setup integration tests
  • how to pass closures (lambdas) as functions to another function

First, we’ll wrap the code in a module.

pub mod genetic {
    use rand::{thread_rng, sample, Rng};
    use time::PreciseTime;

    pub fn get_best(

Two things to note here. First, we explicitly mark the module and exposed methods as public with pub and second, the use statements belong inside the module.

Next we’ll create a tests folder parallel to the src folder. This is where cargo expects to find integration tests. Create a file called lib_tests.rs and move the string duplication specific code from main.rs to it.

extern crate genetic;
extern crate time;

#[cfg(test)]
mod tests {

    use time::PreciseTime;    
    use genetic::*;

    #[test]
    fn test_duplicate_string() {
        let start = PreciseTime::now();
        let gene_set = " abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!.";
        let target = "Not all those who wander are lost.";
        let best = genetic::get_best(get_fitness, display, target, gene_set, target.len(), start);
        println!("{}", best);
        println!("Total time: {}", start.to(PreciseTime::now()));
        assert!(best == target);
    }
    
    fn get_fitness(candidate: &String, target: &str) -> usize {
        let different_count = target.chars()
            .zip(candidate.chars())
            .filter(|&(a, b)| a != b)
            .count();

        target.len() - different_count
    }

    fn display(candidate: &String, target: &str, start: PreciseTime) {
        let now = PreciseTime::now();
        let elapsed = start.to(now);
        println!("{}\t{}\t{}", candidate, get_fitness(&candidate, target),elapsed);
    }
}

Notice that we’re referencing our genetic module as an external crate (1) because our integration tests won’t be compiled into our library binary, that we’ve added a module with a conditional compile attribute (4,5), and added a test attribute to our main method (10), which as been renamed to test_duplicate_string (11). Also note that we now have an assertion to verify that the solver was able to duplicate the target string (18).

Next we need to rename main.rs to lib.rs because we’re changing what we’re building from a executable to a library.

The code change above is available on Github.

The tests can be run from the commandline via cargo test as follows:

cargo test
   Compiling genetic v0.0.2 (file:.../genetic)
     Running target\debug\genetic-d3350e5be5a2acc2.exe

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured

     Running target\debug\lib_tests-30b315b873936a67.exe

running 1 test
test tests::test_duplicate_string ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured

   Doc-tests genetic

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured

You can see that it ran our test but you can’t see the output. If you want to force cargo to write the output use cargo test -- --nocapture

The next problem we want to tackle in the library code is it has access to things it shouldn’t (the target string), and receives parameters it doesn’t care about (start time). To resolve this we’re going to introduce closures around our functions so that the methods in the test module still have access to the parameters they need, but the genetic library doesn’t need to know about them.

We start off by creating closures for the display and get_fitness functions in test_duplicate_string.

    let wrapped_display = |candidate: &String| {
        display(&candidate, target, start);
    };
    let wrapped_get_fitness = |candidate: &String| -> usize {
        get_fitness(&candidate, target)
    };

This allows us to stop passing the target and start time to get_best.

    let best = genetic::get_best(
        wrapped_get_fitness, 
        wrapped_display, 
        target.len(), 
        gene_set);

On the library side we have to change the function parameters so that it can receive the closure.

    pub fn get_best<F,D>(
        get_fitness: F, 
        display: D, 
        length: usize, 
        gene_set: &str) -> String
        where F : Fn(&String)->usize,
        D : Fn(&String)
    {

The F and D aliases let us pass closures as functions. The syntax is a bit more convoluted than other languages I’ve used but not difficult to follow once you’ve seen a working example.

The updated source code is available on Github.

Go on to Part 3.

In this series I’ll be building out a simple genetic solver in Rust, a new programming language for me. Building a genetic solver is my preferred programming problem for learning new programming languages.  I’ve previously solved this problem in C# and used it to learn to code in golang. If you are new to genetic algorithms start with the C# post.

This was my first program in Rust and from scratch it took about four hours to work through issues.

Things I learned:

  • Rust’s compiler is particular about variable naming, preferring underscores to camel case.
  • There’s a lot of syntax, particularly around loaning items to called functions, but it isn’t that hard to follow if you’ve had experience with C/C++ and templated types.
  • The line that returns a value from a function cannot end with a semicolon.
  • The compiler does a great job of explaining what it is expecting.  Just take it slow and build up the program one function at a time.
  • The Rust install doesn’t necessarily contain everything you might need. On windows, I had to install MingW to get gcc so it can compile the code in the time crate.

OK, let’s go. We’ll start off with a standard set of genes and a target string:

    let gene_set = " abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!.";
    let target = "Not all those who wander are lost.";

Next we need a way to generate a random gene sequence (parent) from the gene set.

fn generate_parent(gene_set: &str, length: usize) -> String {
    let mut rng = thread_rng();
    let sample = sample(&mut rng, gene_set.chars(), length);
    sample.into_iter().collect()
}

Note, sample is a function provided by the rand crate.  It returns N items randomly selected from the input.

Next we need to calculate a fitness value based on the number of differences between a candidate string and the target string.

fn get_fitness(candidate: &String, target: &str) -> usize {
    let different_count = target.chars()
      .zip(candidate.chars())
      .filter(|&(a, b)| a != b)
      .count();
 
    target.len() - different_count
}

It is nice that map/reduce type functions are available out of the box, like in C#.

We also need a way to produce a new (child) string by mutating an existing (parent) string.  The point is to create a copy of the parent then replace 1 character (gene) in the copy with a randomly selected one from the set of valid genes.

fn mutate_parent(parent: &String, gene_set: &str) -> String {
    let mut rng = thread_rng();
    let gene_index = rng.gen::<usize>() % gene_set.len();
    let parent_index = rng.gen::<usize>() % parent.len();
    let mut candidate = String::with_capacity(parent.len());
 
    if parent_index > 0 {
        candidate.push_str(&parent[..parent_index]);
    }
    candidate.push_str(&gene_set[gene_index..(1+gene_index)]);
    if parent_index+1 < parent.len() {
        candidate.push_str(&parent[parent_index+1..]);
    }
    candidate
}

There’s probably a simpler way to do that in Rust along the lines of:

  1. copy the parent
  2. select a random location in the parent to mutate
  3. select a random gene to insert
  4. copy[random_location] = new_gene

Now we need a way to display a gene sequence, its fitness value and how much time has elapsed.

fn display(candidate: &String, target: &str, start: time::PreciseTime) {
    let now = PreciseTime::now();
    let elapsed = start.to(now);
    println!("{}\t{}\t{}", candidate, get_fitness(&candidate, target), elapsed);
}

The heart of the genetic solver is a loop that uses the functions above to generate a candidate gene sequence, compare it to the previous best, and randomly mutate it until all the genes match those in the target.

fn get_best(get_fitness: fn(&String,&str) -> usize, 
  display: fn(&String, &str, start: time::PreciseTime), 
  target: &str, 
  gene_set: &str, 
  length: usize, 
  start: time::PreciseTime) -> String {

    let mut best_parent = generate_parent(gene_set, length);
    let mut best_fitness = get_fitness(&best_parent, target);
 
    while best_fitness < length {
        let child = mutate_parent(&best_parent, gene_set);
        let fitness = get_fitness(&child, target);
        if fitness > best_fitness {
            best_fitness = fitness;
            best_parent = child;
            display(&best_parent, target, start);
        }
    }

    best_parent 
}

Yes, I know we could just call the functions instead of passing them in but this demonstrates that capability in the language and it is a feature we will need in order to use different fitness functions and display methods in a more complex solver.

The main function brings all the parts together:

extern crate time;
extern crate rand;

use rand::{thread_rng, sample, Rng};
use time::PreciseTime;

fn main() {
    let start = PreciseTime::now();
    let gene_set = " abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!.";
    let target = "Not all those who wander are lost.";
    let best = get_best(get_fitness, display, target, gene_set, target.len(), start);
    println!("{}", best);
    println!("Total time: {}", start.to(PreciseTime::now()));
}

...

Get the source here.

Because the time crate is not pre-compiled you may (on Windows) need to install MingW as appropriate for your OS (32 or 64bit).

Then run the program from the commandline with:

cargo run

The first time it runs it will download and compile the dependencies it needs.  Nice feature.

Sample output:

>cargo run
 Updating registry `https://github.com/rust-lang/crates.io-index`
 Downloading rand v0.3.8
 Downloading time v0.1.25
 Downloading libc v0.1.7
 Downloading gcc v0.3.5
 Compiling gcc v0.3.5
 Compiling rand v0.3.8
 Compiling time v0.1.25
 Compiling genetic v0.0.1 (.../projects/rust/genetic)
 Running `target\debug\genetic.exe`
 aH.RefKhPjklmOopq!stuMwxSrAVCDEWG 1 PT0.004136558S
 aH.aefKhPjklmOopq!stuMwxSrAVCDEWG 2 PT0.006234092S
 aH.aefKhPjklmOopq!stdMwxSrAVCDEWG 3 PT0.006508939S
 aH.aefKhPoklmOopq!stdMwxSrAVCDEWG 4 PT0.006637123S
 aH.aefKhPoklmOopq!stdMwxSrAVCDEtG 5 PT0.007313462S
 aH.aefKhPoklmOopq!stdMwxSreVCDEtG 6 PT0.008119910S
 aH.aefKhPokemOopq!stdMwxSreVCDEtG 7 PT0.011582439S
 aH.aefKhPokemOopq!stdewxSreVCDEtG 8 PT0.012396586S
 aH.alfKhPokemOopq!stdewxSreVCDEtG 9 PT0.014519526S
 oH.alfKhPokemOopq!stdewxSreVCDEtG 10 PT0.015515364S
 oH.alfKhPokemwopq!stdewxSreVCDEtG 11 PT0.020154268S
 oH.alfKtPokemwopq!stdewxSreVCDEtG 12 PT0.020608496S
 oH.alfKthokemwopq!stdewxSreVCDEtG 13 PT0.021233638S
 oH.alfKthokemwooq!stdewxSreVCDEtG 14 PT0.021521572S
 oH.alfKthokemwooq!sndewxSreVCDEtG 15 PT0.023080962S
NoH.alfKthokemwooq!sndewxSreVCDEtG 16 PT0.023575224S
NoH.alfKthokemwooq!sndewxSre CDEtG 17 PT0.024708100S
NoH.alfKthokemwoo !sndewxSre CDEtG 18 PT0.025637729S
NoH alfKthokemwoo !sndewxSre CDEtG 19 PT0.028930115S
NoH alfKthokemwoo !sndew Sre CDEtG 20 PT0.029574888S
NoH alfKthokemwoo !sndew are CDEtG 21 PT0.030039510S
NoH alf thokemwoo !sndew are CDEtG 22 PT0.034410881S
Not alf thokemwoo !sndew are CDEtG 23 PT0.039302690S
Not alf thosemwoo !sndew are CDEtG 24 PT0.040969862S
Not alf thosemwoo !sndew are CDstG 25 PT0.041442952S
Not alf those woo !sndew are CDstG 26 PT0.043314527S
Not all those woo !sndew are CDstG 27 PT0.045133751S
Not all those woo !sndew are CDst. 28 PT0.047369478S
Not all those woo !sndew are lDst. 29 PT0.050106782S
Not all those woo wsndew are lDst. 30 PT0.057051472S
Not all those woo wsndew are lost. 31 PT0.062051064S
Not all those woo wsnder are lost. 32 PT0.066566787S
Not all those who wsnder are lost. 33 PT0.069746386S
Not all those who wander are lost. 34 PT0.081900460S
Not all those who wander are lost.
Total time: PT0.082663410S

Wow! That’s a lot faster than the go version!

Go on to Part 2.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: