Feeds:
Posts
Comments

Archive for the ‘Uncategorized’ Category

Graph coloring is an interesting problem. Variations involve using the fewest number of colors while making each node a unique color, trying to use an equal number of each color, etc. Take for example this map of the United States.

multi colored U.S. map

How can we keep the constraint that adjacent states do not have the same color while reducing the number of colors to only four? If you try this by hand you start by picking a state and coloring the states around it alternating colors. You can do this with 3 colors if there an even number of adjacent states.

Tennessee and adjacent states 3-colored

But if there are an odd number then you need 4 colors.

Kentucky and adjacent states 4-colored

Then you continue the pattern to other states and introduce a new color only when necessary.

We’re going to do this for the map of 50 U.S. states using the genetic solver from the previous post.

We don’t care about the visual representation per se. We only care about the physical relationships between the states, which we can represent with a graph, or more simply as a set of rules indicating which states cannot have the same color.

We start off with a file containing a list of states and the set of states that are adjacent to each. The row for Tennessee might be as follows:

TN,AL;AR;GA;KY;MO;MS;NC;VA

Next we need to read that file.

def loadData(localFileName):
    # expects: AA,BB;CC;DD where BB, CC and DD are the initial column values in other rows
    with open(localFileName, mode='r') as infile:
        reader = csv.reader(infile)
        mydict = {row[0]: row[1].split(';') for row in reader if row}
        return mydict

Then we need to build the rules. A Rule connects two states indicating that they are adjacent. We want to be able to put rules in a dictionary and find them in a list so we need to define __hash__ and __eq__. We might also want to be able to display a rule so we’ll add a __str__ implementation.

class Rule:
    Item = None
    Other = None
    Stringified = None

    def __init__(self, item, other, stringified):
        self.Item = item
        self.Other = other
        self.Stringified = stringified

    def __eq__(self, another):
        return hasattr(another, 'Item') and \
               hasattr(another, 'Other') and \
               self.Item == another.Item and \
               self.Other == another.Other

    def __hash__(self):
        return hash(self.Item) * 397 ^ hash(self.Other)

    def __str__(self):
        return self.Stringified

Next we’re going to build the set of rules. While we’re doing so we’re going to perform a sanity check on the data. Whenever a state says it is adjacent to another state, the adjacent state should also say it is adjacent to the first state.

def buildLookup(items):
    itemToIndex = {}
    index = 0
    for key in sorted(items):
        itemToIndex[key] = index
        index += 1
    return itemToIndex

def buildRules(items):
    itemToIndex = buildLookup(items.keys())
    rulesAdded = {}
    rules = []
    keys = sorted(list(items.keys()))

    for key in sorted(items.keys()):
        keyIndex = itemToIndex[key]
        adjacentKeys = items[key]
        for adjacentKey in adjacentKeys:
            if adjacentKey == '':
                continue
            adjacentIndex = itemToIndex[adjacentKey]
            temp = keyIndex
            if adjacentIndex < temp:
                temp, adjacentIndex = adjacentIndex, temp
            ruleKey = keys[temp] + "->" + keys[adjacentIndex]
            rule = Rule(temp, adjacentIndex, ruleKey)
            if rule in rulesAdded:
                rulesAdded[rule] += 1
            else:
                rulesAdded[rule] = 1
                rules.append(rule)

    for k, v in rulesAdded.items():
        if v == 1:
            print("rule %s is not bidirectional" % k)

    return rules

Now we have the ability to convert a file of node relationships to a set of adjacency rules. Next we need to build the code used by the genetic solver. We’ll start by determining what our genes will be. In this case since we want to four-color the 50 states our geneset will be four color codes.

        colors = ["Orange", "Yellow", "Green", "Blue"]
        colorLookup = {}
        for color in colors:
            colorLookup[color[0]] = color
        geneset = list(colorLookup.keys())

Our Individuals will have 50 genes, one for each state, in alphabetical order. This lets us use the index into the genes as an index into the set of sorted state codes.

Since the expected optimal situation will be that all adjacent states have different colors we can set the optimal value to the number of rules.

At the end we’ll write out the color of each state.

class GraphColoringTests(unittest.TestCase):
    def test(self):
        states = loadData("adjacent_states.csv")
        rules = buildRules(states)
        colors = ["Orange", "Yellow", "Green", "Blue"]
        colorLookup = {}
        for color in colors:
            colorLookup[color[0]] = color
        geneset = list(colorLookup.keys())
        optimalValue = len(rules)
        startTime = datetime.datetime.now()
        fnDisplay = lambda candidate: display(candidate, startTime)
        fnGetFitness = lambda candidate: getFitness(candidate, rules)
        best = genetic.getBest(fnGetFitness, fnDisplay, len(states), optimalValue, geneset)
        self.assertEqual(best.Fitness, optimalValue)

        keys = sorted(states.keys())

        for index in range(len(states)):
            print(keys[index] + " is " + colorLookup[best.Genes[index]])

As for display, it should be sufficient to output the color codes.

def display(candidate, startTime):
    timeDiff = datetime.datetime.now() - startTime
    print("%s\t%i\t%s" % (''.join(map(str, candidate.Genes)), candidate.Fitness, str(timeDiff)))

This gets output like the following. The number to the right of the gene sequence will indicate how many rules this gene sequence satisfies.

YGGBOOGOOBBYGGYYYYGBGYOOGBOYGGOOOYBOYBBGGOBYOGOGOGG	74	0:00:00.001000

Finally we need a fitness function that checks all the rules assuming the states are colored according to the gene sequence.

def getFitness(candidate, rules):
    rulesThatPass = 0
    for rule in rules:
        if candidate[rule.Item] != candidate[rule.Other]:
            rulesThatPass += 1

    return rulesThatPass

That’s it. Now when we run our main test function we get the following output:

OOYYOBBGYOOYGBBYOOOBOGYGYGGBBYGOGGOYOYGYBBOBOBGOBBG	82	0:00:00
YYBOYGGGGOBYOYBGBOOBOOBBGGBGGYGBBGOBOBYYOGYYBBYOYGO	102	0:00:00.016001
BOOGOGGOGBGYGGBGOOYBYOBYGBBOGBGBBBOYYYGYYOYOOGYBOBY	103	0:00:00.316018
GOBOGGOYGBGOBGOGYBYBOOYYGGBBGOYBYYYOOBGYYOYGBGGOGYY	104	0:00:01.602092
BBBBGGBYGOGYBGOGBBYGOGYYYGYBBOOBYYYOOOGOYGOGOGBBGYB	105	0:00:04.933282
AK is Blue
AL is Blue
AR is Blue
AZ is Blue
CA is Green
CO is Green
CT is Blue
DC is Yellow
DE is Green
FL is Orange
GA is Green
HI is Yellow
IA is Blue
ID is Green
IL is Orange
IN is Green
KS is Blue
KY is Blue
LA is Yellow
MA is Green
MD is Orange
ME is Green
MI is Yellow
MN is Yellow
MO is Yellow
MS is Green
MT is Yellow
NC is Blue
ND is Blue
NE is Orange
NH is Orange
NJ is Blue
NM is Yellow
NV is Yellow
NY is Yellow
OH is Orange
OK is Orange
OR is Orange
PA is Green
RI is Orange
SC is Yellow
SD is Green
TN is Orange
TX is Green
UT is Orange
VA is Green
VT is Blue
WA is Blue
WI is Green
WV is Yellow
WY is Blue

Which looks like this:

4-color us states

Read more of this series in my eBook Genetic Algorithms with Python

Read Full Post »

In Part 3 we added an integration test to verify that the solver can find a solution to the 8 Queens puzzle. However, Mutation is the only strategy currently implemented in the solver and it fails to find a solution to that problem 80% of the time. We need to introduce a new strategy from nature – crossover from another successful genetic line.

But first, to simplify some code sections I’m going to introduce a container so we can pass both the genes and their fitness as a single object.

    pub struct Individual {
        pub genes: String,
        pub fitness: usize
    }

This means we’ll call the display function with an Individual instead of a String, but also that we don’t have to call get_fitness in our tests in order to display the interim results. There are no surprises in this change so I won’t repeat them here. You can browse these code changes on Github.

We need to make two improvements to the solver in order for it to be able to solve this problem every time. The first is to add crossover as a genetic strategy. This is similar to mutation in that we’re going to replace 1 gene at a specific location. But this time instead of substituting a random gene we’re going to get the same gene location from a second successful parent.

    fn crossover<F>(parent1: &Individual, parent2: &Individual, get_fitness: F) -> Individual
    where F : Fn(&String)->usize
    {
        let mut rng = thread_rng();
        let parent_index = rng.gen::<usize>() % parent1.genes.len();
        let mut candidate = String::with_capacity(parent1.genes.len());

        if parent_index > 0 {
            candidate.push_str(&parent1.genes[..parent_index]);
        }
        candidate.push_str(&parent2.genes[parent_index..(1+parent_index)]); // replace 1 gene
        if parent_index+1 < parent1.genes.len() {
            candidate.push_str(&parent1.genes[parent_index+1..]);
        }

        let fitness = get_fitness(&candidate);
        Individual { genes: candidate, fitness: fitness }
    } 

The second improvement is to have an additional genetic line. We start with a random gene sequence and use our strategies to improve it until we fail to improve it 100 times in a row. Then we compare it to the current best parent and keep the better of the two. This process repeats until we hit the optimal fitness value.

    pub fn get_best<F,D>(
        get_fitness: F, 
        display: D, 
        length: usize, 
        optimal_fitness_value: usize,
        gene_set: &str) -> Individual
        where F : Fn(&String)->usize,
        D : Fn(&Individual)
    {
        let mut best_parent = generate_parent(&get_fitness, gene_set, length);
        display(&best_parent);

        while best_parent.fitness < optimal_fitness_value {
            let mut candidate = generate_parent(&get_fitness, gene_set, length);
            let mut attempts_since_last_improvement = 0;
            while attempts_since_last_improvement < 100 {
                let child = match attempts_since_last_improvement % 3 {
                    0 => generate_parent(&get_fitness, gene_set, length),
                    1 => mutate_parent(&candidate, &get_fitness, gene_set),
                    _ => crossover(&candidate, &best_parent, &get_fitness)
                };

                if child.fitness > candidate.fitness { 
                    candidate = child;
                    attempts_since_last_improvement = 0;
                }
                attempts_since_last_improvement = attempts_since_last_improvement + 1;
            }
            if candidate.fitness > best_parent.fitness { 
                best_parent = candidate;
                display(&best_parent);
            }
        }

        best_parent 
    }

Now when we run the integration tests they all pass 100% of the time, though some rounds take longer than others. Explore this code change on Github.

Read Full Post »

In Part 1 we saw how a no-knowledge genetic string duplicator could be written in Rust. One problem with that implementation is the solver received parameters that it didn’t care about (start time) and shouldn’t have access to (target) to be truly no-knowledge.  In this post we’ll solve fix those issues and, in addition, turn our string duplication implementation into an integration test.

Things learned:

  • how to wrap a library in a module
  • how to setup integration tests
  • how to pass closures (lambdas) as functions to another function

First, we’ll wrap the code in a module.

pub mod genetic {
    use rand::{thread_rng, sample, Rng};
    use time::PreciseTime;

    pub fn get_best(

Two things to note here. First, we explicitly mark the module and exposed methods as public with pub and second, the use statements belong inside the module.

Next we’ll create a tests folder parallel to the src folder. This is where cargo expects to find integration tests. Create a file called lib_tests.rs and move the string duplication specific code from main.rs to it.

extern crate genetic;
extern crate time;

#[cfg(test)]
mod tests {

    use time::PreciseTime;    
    use genetic::*;

    #[test]
    fn test_duplicate_string() {
        let start = PreciseTime::now();
        let gene_set = " abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!.";
        let target = "Not all those who wander are lost.";
        let best = genetic::get_best(get_fitness, display, target, gene_set, target.len(), start);
        println!("{}", best);
        println!("Total time: {}", start.to(PreciseTime::now()));
        assert!(best == target);
    }
    
    fn get_fitness(candidate: &String, target: &str) -> usize {
        let different_count = target.chars()
            .zip(candidate.chars())
            .filter(|&(a, b)| a != b)
            .count();

        target.len() - different_count
    }

    fn display(candidate: &String, target: &str, start: PreciseTime) {
        let now = PreciseTime::now();
        let elapsed = start.to(now);
        println!("{}\t{}\t{}", candidate, get_fitness(&candidate, target),elapsed);
    }
}

Notice that we’re referencing our genetic module as an external crate (1) because our integration tests won’t be compiled into our library binary, that we’ve added a module with a conditional compile attribute (4,5), and added a test attribute to our main method (10), which as been renamed to test_duplicate_string (11). Also note that we now have an assertion to verify that the solver was able to duplicate the target string (18).

Next we need to rename main.rs to lib.rs because we’re changing what we’re building from a executable to a library.

The code change above is available on Github.

The tests can be run from the commandline via cargo test as follows:

cargo test
   Compiling genetic v0.0.2 (file:.../genetic)
     Running target\debug\genetic-d3350e5be5a2acc2.exe

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured

     Running target\debug\lib_tests-30b315b873936a67.exe

running 1 test
test tests::test_duplicate_string ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured

   Doc-tests genetic

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured

You can see that it ran our test but you can’t see the output. If you want to force cargo to write the output use cargo test -- --nocapture

The next problem we want to tackle in the library code is it has access to things it shouldn’t (the target string), and receives parameters it doesn’t care about (start time). To resolve this we’re going to introduce closures around our functions so that the methods in the test module still have access to the parameters they need, but the genetic library doesn’t need to know about them.

We start off by creating closures for the display and get_fitness functions in test_duplicate_string.

    let wrapped_display = |candidate: &String| {
        display(&candidate, target, start);
    };
    let wrapped_get_fitness = |candidate: &String| -> usize {
        get_fitness(&candidate, target)
    };

This allows us to stop passing the target and start time to get_best.

    let best = genetic::get_best(
        wrapped_get_fitness, 
        wrapped_display, 
        target.len(), 
        gene_set);

On the library side we have to change the function parameters so that it can receive the closure.

    pub fn get_best<F,D>(
        get_fitness: F, 
        display: D, 
        length: usize, 
        gene_set: &str) -> String
        where F : Fn(&String)->usize,
        D : Fn(&String)
    {

The F and D aliases let us pass closures as functions. The syntax is a bit more convoluted than other languages I’ve used but not difficult to follow once you’ve seen a working example.

The updated source code is available on Github.

Go on to Part 3.

Read Full Post »

%d bloggers like this: