Feeds:
Posts
Comments

In Part 1 we saw how a no-knowledge genetic string duplicator could be written in Rust. One problem with that implementation is the solver received parameters that it didn’t care about (start time) and shouldn’t have access to (target) to be truly no-knowledge.  In this post we’ll solve fix those issues and, in addition, turn our string duplication implementation into an integration test.

Things learned:

  • how to wrap a library in a module
  • how to setup integration tests
  • how to pass closures (lambdas) as functions to another function

First, we’ll wrap the code in a module.

pub mod genetic {
    use rand::{thread_rng, sample, Rng};
    use time::PreciseTime;

    pub fn get_best(

Two things to note here. First, we explicitly mark the module and exposed methods as public with pub and second, the use statements belong inside the module.

Next we’ll create a tests folder parallel to the src folder. This is where cargo expects to find integration tests. Create a file called lib_tests.rs and move the string duplication specific code from main.rs to it.

extern crate genetic;
extern crate time;

#[cfg(test)]
mod tests {

    use time::PreciseTime;    
    use genetic::*;

    #[test]
    fn test_duplicate_string() {
        let start = PreciseTime::now();
        let gene_set = " abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!.";
        let target = "Not all those who wander are lost.";
        let best = genetic::get_best(get_fitness, display, target, gene_set, target.len(), start);
        println!("{}", best);
        println!("Total time: {}", start.to(PreciseTime::now()));
        assert!(best == target);
    }
    
    fn get_fitness(candidate: &String, target: &str) -> usize {
        let different_count = target.chars()
            .zip(candidate.chars())
            .filter(|&(a, b)| a != b)
            .count();

        target.len() - different_count
    }

    fn display(candidate: &String, target: &str, start: PreciseTime) {
        let now = PreciseTime::now();
        let elapsed = start.to(now);
        println!("{}\t{}\t{}", candidate, get_fitness(&candidate, target),elapsed);
    }
}

Notice that we’re referencing our genetic module as an external crate (1) because our integration tests won’t be compiled into our library binary, that we’ve added a module with a conditional compile attribute (4,5), and added a test attribute to our main method (10), which as been renamed to test_duplicate_string (11). Also note that we now have an assertion to verify that the solver was able to duplicate the target string (18).

Next we need to rename main.rs to lib.rs because we’re changing what we’re building from a executable to a library.

The code change above is available on Github.

The tests can be run from the commandline via cargo test as follows:

cargo test
   Compiling genetic v0.0.2 (file:.../genetic)
     Running target\debug\genetic-d3350e5be5a2acc2.exe

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured

     Running target\debug\lib_tests-30b315b873936a67.exe

running 1 test
test tests::test_duplicate_string ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured

   Doc-tests genetic

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured

You can see that it ran our test but you can’t see the output. If you want to force cargo to write the output use cargo test -- --nocapture

The next problem we want to tackle in the library code is it has access to things it shouldn’t (the target string), and receives parameters it doesn’t care about (start time). To resolve this we’re going to introduce closures around our functions so that the methods in the test module still have access to the parameters they need, but the genetic library doesn’t need to know about them.

We start off by creating closures for the display and get_fitness functions in test_duplicate_string.

    let wrapped_display = |candidate: &String| {
        display(&candidate, target, start);
    };
    let wrapped_get_fitness = |candidate: &String| -> usize {
        get_fitness(&candidate, target)
    };

This allows us to stop passing the target and start time to get_best.

    let best = genetic::get_best(
        wrapped_get_fitness, 
        wrapped_display, 
        target.len(), 
        gene_set);

On the library side we have to change the function parameters so that it can receive the closure.

    pub fn get_best<F,D>(
        get_fitness: F, 
        display: D, 
        length: usize, 
        gene_set: &str) -> String
        where F : Fn(&String)->usize,
        D : Fn(&String)
    {

The F and D aliases let us pass closures as functions. The syntax is a bit more convoluted than other languages I’ve used but not difficult to follow once you’ve seen a working example.

The updated source code is available on Github.

In this series I’ll be building out a simple genetic solver in Rust, a new programming language for me. Building a genetic solver is my preferred programming problem for learning new programming languages.  I’ve previously solved this problem in C# and used it to learn to code in golang. If you are new to genetic algorithms start with the C# post.

This was my first program in Rust and from scratch it took about four hours to work through issues.

Things I learned:

  • Rust’s compiler is particular about variable naming, preferring underscores to camel case.
  • There’s a lot of syntax, particularly around loaning items to called functions, but it isn’t that hard to follow if you’ve had experience with C/C++ and templated types.
  • The line that returns a value from a function cannot end with a semicolon.
  • The compiler does a great job of explaining what it is expecting.  Just take it slow and build up the program one function at a time.
  • The Rust install doesn’t necessarily contain everything you might need. On windows, I had to install MingW to get gcc so it can compile the code in the time crate.

OK, let’s go. We’ll start off with a standard set of genes and a target string:

    let gene_set = " abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!.";
    let target = "Not all those who wander are lost.";

Next we need a way to generate a random gene sequence (parent) from the gene set.

fn generate_parent(gene_set: &str, length: usize) -> String {
    let mut rng = thread_rng();
    let sample = sample(&mut rng, gene_set.chars(), length);
    sample.into_iter().collect()
}

Note, sample is a function provided by the rand crate.  It returns N items randomly selected from the input.

Next we need to calculate a fitness value based on the number of differences between a candidate string and the target string.

fn get_fitness(candidate: &String, target: &str) -> usize {
    let different_count = target.chars()
      .zip(candidate.chars())
      .filter(|&(a, b)| a != b)
      .count();
 
    target.len() - different_count
}

It is nice that map/reduce type functions are available out of the box, like in C#.

We also need a way to produce a new (child) string by mutating an existing (parent) string.  The point is to create a copy of the parent then replace 1 character (gene) in the copy with a randomly selected one from the set of valid genes.

fn mutate_parent(parent: &String, gene_set: &str) -> String {
    let mut rng = thread_rng();
    let gene_index = rng.gen::<usize>() % gene_set.len();
    let parent_index = rng.gen::<usize>() % parent.len();
    let mut candidate = String::with_capacity(parent.len());
 
    if parent_index > 0 {
        candidate.push_str(&parent[..parent_index]);
    }
    candidate.push_str(&gene_set[gene_index..(1+gene_index)]);
    if parent_index+1 < parent.len() {
        candidate.push_str(&parent[parent_index+1..]);
    }
    candidate
}

There’s probably a simpler way to do that in Rust along the lines of:

  1. copy the parent
  2. select a random location in the parent to mutate
  3. select a random gene to insert
  4. copy[random_location] = new_gene

Now we need a way to display a gene sequence, its fitness value and how much time has elapsed.

fn display(candidate: &String, target: &str, start: time::PreciseTime) {
    let now = PreciseTime::now();
    let elapsed = start.to(now);
    println!("{}\t{}\t{}", candidate, get_fitness(&candidate, target), elapsed);
}

The heart of the genetic solver is a loop that uses the functions above to generate a candidate gene sequence, compare it to the previous best, and randomly mutate it until all the genes match those in the target.

fn get_best(get_fitness: fn(&String,&str) -> usize, 
  display: fn(&String, &str, start: time::PreciseTime), 
  target: &str, 
  gene_set: &str, 
  length: usize, 
  start: time::PreciseTime) -> String {

    let mut best_parent = generate_parent(gene_set, length);
    let mut best_fitness = get_fitness(&best_parent, target);
 
    while best_fitness < length {
        let child = mutate_parent(&best_parent, gene_set);
        let fitness = get_fitness(&child, target);
        if fitness > best_fitness {
            best_fitness = fitness;
            best_parent = child;
            display(&best_parent, target, start);
        }
    }

    best_parent 
}

Yes, I know we could just call the functions instead of passing them in but this demonstrates that capability in the language and it is a feature we will need in order to use different fitness functions and display methods in a more complex solver.

The main function brings all the parts together:

extern crate time;
extern crate rand;

use rand::{thread_rng, sample, Rng};
use time::PreciseTime;

fn main() {
    let start = PreciseTime::now();
    let gene_set = " abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!.";
    let target = "Not all those who wander are lost.";
    let best = get_best(get_fitness, display, target, gene_set, target.len(), start);
    println!("{}", best);
    println!("Total time: {}", start.to(PreciseTime::now()));
}

...

Get the source here.

Because the time crate is not pre-compiled you may (on Windows) need to install MingW as appropriate for your OS (32 or 64bit).

Then run the program from the commandline with:

cargo run

The first time it runs it will download and compile the dependencies it needs.  Nice feature.

Sample output:

>cargo run
 Updating registry `https://github.com/rust-lang/crates.io-index`
 Downloading rand v0.3.8
 Downloading time v0.1.25
 Downloading libc v0.1.7
 Downloading gcc v0.3.5
 Compiling gcc v0.3.5
 Compiling rand v0.3.8
 Compiling time v0.1.25
 Compiling genetic v0.0.1 (.../projects/rust/genetic)
 Running `target\debug\genetic.exe`
 aH.RefKhPjklmOopq!stuMwxSrAVCDEWG 1 PT0.004136558S
 aH.aefKhPjklmOopq!stuMwxSrAVCDEWG 2 PT0.006234092S
 aH.aefKhPjklmOopq!stdMwxSrAVCDEWG 3 PT0.006508939S
 aH.aefKhPoklmOopq!stdMwxSrAVCDEWG 4 PT0.006637123S
 aH.aefKhPoklmOopq!stdMwxSrAVCDEtG 5 PT0.007313462S
 aH.aefKhPoklmOopq!stdMwxSreVCDEtG 6 PT0.008119910S
 aH.aefKhPokemOopq!stdMwxSreVCDEtG 7 PT0.011582439S
 aH.aefKhPokemOopq!stdewxSreVCDEtG 8 PT0.012396586S
 aH.alfKhPokemOopq!stdewxSreVCDEtG 9 PT0.014519526S
 oH.alfKhPokemOopq!stdewxSreVCDEtG 10 PT0.015515364S
 oH.alfKhPokemwopq!stdewxSreVCDEtG 11 PT0.020154268S
 oH.alfKtPokemwopq!stdewxSreVCDEtG 12 PT0.020608496S
 oH.alfKthokemwopq!stdewxSreVCDEtG 13 PT0.021233638S
 oH.alfKthokemwooq!stdewxSreVCDEtG 14 PT0.021521572S
 oH.alfKthokemwooq!sndewxSreVCDEtG 15 PT0.023080962S
NoH.alfKthokemwooq!sndewxSreVCDEtG 16 PT0.023575224S
NoH.alfKthokemwooq!sndewxSre CDEtG 17 PT0.024708100S
NoH.alfKthokemwoo !sndewxSre CDEtG 18 PT0.025637729S
NoH alfKthokemwoo !sndewxSre CDEtG 19 PT0.028930115S
NoH alfKthokemwoo !sndew Sre CDEtG 20 PT0.029574888S
NoH alfKthokemwoo !sndew are CDEtG 21 PT0.030039510S
NoH alf thokemwoo !sndew are CDEtG 22 PT0.034410881S
Not alf thokemwoo !sndew are CDEtG 23 PT0.039302690S
Not alf thosemwoo !sndew are CDEtG 24 PT0.040969862S
Not alf thosemwoo !sndew are CDstG 25 PT0.041442952S
Not alf those woo !sndew are CDstG 26 PT0.043314527S
Not all those woo !sndew are CDstG 27 PT0.045133751S
Not all those woo !sndew are CDst. 28 PT0.047369478S
Not all those woo !sndew are lDst. 29 PT0.050106782S
Not all those woo wsndew are lDst. 30 PT0.057051472S
Not all those woo wsndew are lost. 31 PT0.062051064S
Not all those woo wsnder are lost. 32 PT0.066566787S
Not all those who wsnder are lost. 33 PT0.069746386S
Not all those who wander are lost. 34 PT0.081900460S
Not all those who wander are lost.
Total time: PT0.082663410S

Wow! That’s a lot faster than the go version!

Go on to Part 2.

Flaky user interface tests don’t generally flake a second time unless there is a real problem.  Unfortunately the standard unit test runners don’t offer an option to double-check test failures.  This can result in a high build failure rate.  After being hit with a large number of suddenly flaky UI tests when a new version of Chrome came out we wrote our own test runner to double-check test failures.

Scooter is a simple NUnit console replacement.

It is compatible with NUnit attribute names out of the box but the attribute names can be replaced via command line configuration switches if your tests are marked with different attributes.

Unique features:

– Scooter double checks all test failures/errors by running the test a second time and only counting a failure if the second attempt fails.

– You can attribute an alternative method (via the -afta command line switch, e.g. -afta=AfterEachFailedTestAttribute) to be called after each test failure. This can be used for capturing screenshots, test failure notification, etc.

	[AfterEachFailedTest] 
	public void After_each_failed_test(string testFixtureName, string testMethodName) {}

– You can also create TestSuiteSetUpAttribute and TestSuiteTearDownAttribute (or equivalent) attributes and make a method for each to perform an action at the beginning and end of the test run respectively.

	[AttributeUsage(AttributeTargets.Method, AllowMultiple = false, Inherited = true)]
	public class TestSuiteTearDownAttribute : Attribute
	{
	}
	
	[Test]
	[TestSuiteTearDownAttribute]
	public void Close_all_the_browsers()
	{

– Scooter can optionally run the tests in parallel (-parallel) and/or shuffled (-shuffle).

Usage:

Replace the call to NUnit console in your build script with one that calls Scooter.Console.exe instead.
All TestFixtures in your assemblies will be run unless marked with an Ignore or Explicit attributes.

License: MIT License.

We were introduced to Settlers of Catan by some friends around 2004 and it has become one of our favorite family games. Over the years we have evolved our own variant that is much more cooperative and positive – features that we look for in all games. Here’s a list of the changes we’ve made:

  • Monopoly cards are treated like a Year of Plenty – the original was a source of contention and could make for a really unhappy experience in a resource-starved game.
  • We eliminated the concept of robbing and stealing resources for the same reason.
  • Once a settlement or city is built on a port location, any player can use its 2-to-1 or 3-to-1 resource conversion feature. This can result in some strategic trading in order to encourage/reward a particular player to build a port for which resources are plentiful.
  • Soldiers don’t steal.  They’re not brigands; they’re professionals. They allow the player to put a soldier piece (we use horsemen from a Risk set) on the board adjacent to one of his/her structures. A soldier piece on the board acts like a mobile settlement. It may move to (but is not required to) an adjacent land, whether owned by the player or not, but only on the owner’s turn. When the number of the land tile the soldier is on is rolled, the owner of the solder also receives a resource card for that land. Any number of soldiers may occupy a land tile at the same time. The concept is that soldiers make the inhabitants feel safe and assist in resource gathering, so the land is more productive.
  • A soldier can decide to settle down and establish a settlement if there is in a legal building spot where the soldier is location and the player pays the cost of a settlement and a road. The soldier (horseman) is removed from the board and the player gives up the card. The player places a settlement and road on a valid intersection of the tile formerly occupied by the soldier.  This is based on the historical idea that soldiers would travel out into wilderness areas, build forts and trade with the locals. Eventually individual soldiers might get out of the military and decide to build a farm, store, etc nearby.  This allows a player to project power to a new area of the board to overcome a resource problem or road-block that is preventing them from earning points.
  • At the start of the game, resource numbers are randomized and placed face-down on the board. This means players must make blind choices for their initial settlement locations. Once all players have chosen settlement locations, only numbers on lands adjacent to settlements are turned face-up. As players build additional settlements  the numbers next to the new settlements are turned face-up. As soldiers move onto lands with face-down numbers they also cause the numbers for those lands to be turned face-up. Reasoning: you don’t know how good a location is going to be for mining, grazing, etc until you go there.
  • Rolling 7 does not cause players to discard resources.
  • When a 7 is rolled, the player who rolled it may swap the numbers on two of his/her land tiles, or between one of his/her tiles and a land with a face-down number, turning that number face-up in the process. The player may not move a number from a land that is shared with another player unless that player agrees.  This gives the player the ability to concentrate his/her  resource gathering activities (e.g. use fertilizer on crops, establish a breeding program, innovate better mining practices, etc) based on current needs and how the dice appear to be rolling.

picture of one of our game boards

 

In this project we’ll be solving a variant of John Koza‘s Lawnmower Problem. The previous projects in this series successively drove out the functionality of a genetic solver capable of handling this kind of problem. This project raises our understanding of genetic algorithms and their application to problem solving to a whole new level.

The Lawnmower Problem asks us to provide instructions to a lawnmower to make it mow a field of grass. The field wraps in all directions – if the mower goes off the top it ends up at the bottom and vice versa and the same side-to-side. Let’s say the mower begins in the middle of an 8×8 field facing south.

 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  M  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .

Next let’s say the available instructions are: mow and turn. mow moves the mower forward one grid square in whatever direction it is facing and cuts the grass in that square. turn causes the mower to turn left 90 degrees.

Simple enough right? Using our previous experience in this series we know we can define a gene for each instruction and use hill climbing to find an optimal solution. I leave this as an exercise to the reader – check out the previous post if you need a refresher.

You should end up with a solution that looks something like the this:

mow mow mow mow mow mow mow mow turn mow mow mow mow mow mow mow turn mow mow mow mow mow mow mow turn mow mow mow mow mow mow turn mow mow mow mow mow mow turn mow mow mow mow mow turn mow mow mow mow mow turn mow mow mow mow turn mow mow mow mow turn mow mow mow turn mow mow mow turn mow mow turn mow mow turn mow turn mow

We can visualize it by numbering each grid square in the order it is mowed:

64  57  42  19  4   31  50  61

63  56  41  18  5   32  51  62

54  55  40  17  6   33  52  53

37  38  39  16  7   34  35  36

12  13  14  15  8*  9   10  11

25  24  23  22  1   28  27  26

46  45  44  21  2   29  48  47

59  58  43  20  3   30  49  60

* - starting location

Or by showing the route traveled graphically:

Mow-turn instructions visualized graphically

Cool, a spiral.

Now let’s expand the available instruction set because all we ever get is spiral shaped mowing patterns. The new instruction is: jump. jump has 2 non-negative arguments, forward and right. The mower will jump forward and to the right the specified number of squares and cut the grass where it lands.

For example, if the mower is facing south at the start location (4,4)

 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  M  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .

and is told to jump (2,3) it will end up at (1,6), still facing south.

 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  #  .  .  .
 .  .  .  .  .  .  .  .
 .  M  .  .  .  .  .  .
 .  .  .  .  .  .  .  .

To implement this simply add jump to the set of genes that require special handling and treat the two genes that follow it as the arguments. If less than 2 genes follow it because it is at or near the end of the gene sequence then fill the missing arguments with zeros. Again, I leave the implementation as an exercise. Note: also make your implementation of getFitness() prefer shorter sequences if and only if the gene sequence mows the entire field.

You may end up with a result similar to the following:

jump (2,5) mow mow mow mow mow mow mow jump (4,3) mow mow mow mow mow mow mow jump (0,3) mow mow mow mow mow mow mow jump (0,3) mow mow mow mow mow mow mow jump (3,3) mow mow mow mow mow mow mow jump (5,3) mow mow mow mow mow jump (5,3) mow mow mow mow mow mow mow jump (1,3) mow mow mow mow mow mow mow jump (5,2) mow

Note that the genetic algorithm has completely abandoned the turn instruction in favor of the more powerful jump instruction because this results in a shorter program.

Here’s a visualization of the mowing route:

44  17  56  40  16  48  26  3

45  18  57  33  9   49  27  4

46  19  58  34  10  50  28  5

63  20  59  35  11  51  29  6

64  21  60  36  12* 52  30  7

41  22  61  37  13  53  31  8

42  23  62  38  14  54  32  1

43  24  55  39  15  47  25  2

* - starting location

Now back the Koza’s purpose for this problem. As interesting as this solution is, the sequence of instructions generated by the solver are completely different from the solution a human would use. Now think about how you’d tell another person to mow a toroidal field. You wouldn’t give them detailed instructions for every step right? You’d break it down into a set of repeatable sequences. In a non-toroid field you might say something like: start at the corner of the field and mow a strip along the edge of the field all the way to the other side.

 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 #  #  #  #  #  #  #  M>

Then turn the mower back this way and mow the strip right next to the one you just completed until you get back to where you started.

 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
 .  .  .  .  .  .  .  .
<M  #  #  #  #  #  #  #
 #  #  #  #  #  #  #  #

Turn around again and repeat the process until you’ve mowed the whole field.

You automatically combine squares into strips and trips across-and-back into a repeatable pattern. How do we do that with the mower?

The best result we’ve generated so far requires 64 jump and mow instructions, one for each grid square, to tell the mower how to cut the grass in the field. How can we make it look more like the instructions you’d give a human? We have to introduce the ability to repeat a sequence of instructions by reference and make this an instruction too.

This is where things get interesting. We’re going from using the genes of the genetic sequence more-or-less one-to-one to solve a problem, to genetic programming.

Implementation-wise this means we need two more special genes: begin-numbered-reference and call-numbered-reference. begin-numbered-reference will increment an id and start a new instruction sequence, or block, if and only if the current block is non-empty. call-numbered-sequence, or more simply call, will take a parameter for the id of the sequence to execute. Once that sequence has completed, execution will return to the sequence that made the call – exactly like calling a subroutine.

Here’s a Go implementation of a gene decoder that builds a program for the mower as described above.

func parseProgram(genes string, f *field, m *mower) *program {
	p := NewProgram()

	instructionCodes := make(chan int)
	builders := make(chan builder)
	go streamInstructionCodes(genes, instructionCodes)
	go func() {
		offset := 0
		for instructionCode := range instructionCodes {
			switch instructionCode % numberOfInstructions {
			case 0:
				builders <- createMowBuilder()
			case 1:
				builders <- createTurnBuilder()
			case 2:
				builders <- createJumpBuilder(instructionCodes)
			case 3:
				builders <- createBlockBuilder()
			case 4:
				builders <- createCallBuilder(instructionCodes)
			default:
				panic(fmt.Sprint("No builder defined for instructionCode '", instructionCode, "' from gene '", genes[offset:offset+1], "'"))
			}
			offset++
		}
		builders <- builder{stop: true}
	}()

	currentBlockName := "main"
	blockId := -1
	instructions := make([]Instruction, 0, len(genes))

	for builder := range builders {
		if builder.stop {
			break
		}
		if builder.startNewBlock {
			if len(instructions) > 0 {
				p.addBlock(currentBlockName, instructions)
				blockId++
				currentBlockName = createBlockName(blockId)
				instructions = make([]Instruction, 0, len(genes))
			}
			continue
		}
		instructions = append(instructions, builder.create(f, m))
	}

	if len(instructions) > 0 {
		p.addBlock(currentBlockName, instructions)
	}

	return p
}

Note: Koza’s implementation prevents recursion. This implementation allows recursion but it isn’t difficult to modify it to work Koza’s way. I just find the recursive results more interesting.

Now when we move to from one-to-one gene evaluation to running a program, determining fitness becomes a problem in its own right. On the face it is the same: perform the instructions, determine how many field positions were mowed, switch evaluation strategies to program length if all field squares have been mowed. The problem is we need to handle the flow control involved in fetching subroutines, running them, and returning to the previous location upon completion. We also need the ability to track the number of instructions executed and optionally exit if we run beyond a pre-determined maximum – this prevents infinite loops from blocking evolution in the solver.

I implemented this functionality as a domain independent interpreter (GitHub) while coding my implementation of the Lawnmower Problem.

With the gene-to-program converter and interpreter in place we are free to continue our exploration of the Lawnmower Problem. Here’s a sample optimal solution found by my implementation:

Final: 10119     16.6669533s
main: block1
block0: mow mow mow jump (0,1)
block1: block0 block0 mow block1


52  15  44  5   35  60  26  25

17  16  45  6   36  61  27  53

18  46  8   7   37  62  28  54

19  47  9   39  38  63  29  55

20  48  10  40  64* 31  30  56

21  49  11  41  1   32  57  22

50  13  12  42  2   33  58  23

51  14  43  4   3   34  59  24

* - starting location

Note that the program is recursive in that block1 calls itself.

If we change the logic to prevent recursion we get a slightly longer optimal solution like this:

Final: 10115     15.4188819s
main: block0 block0 block0 block0
block0: jump (6,6) block1 block1 block1 block1
block1: jump (0,1) mow mow mow


34  37  24  10  9   60  48  50

39  38  25  11  62  61  49  51

40  27  26  12  63  2   1   52

41  28  14  13  64  3   54  53

42  29  15  19  18* 4   55  43

31  30  16  20  6   5   56  44

32  35  17  21  7   58  57  45

33  36  23  22  8   59  47  46

* - starting location

Welcome to the world of genetic programming!

You can get the code for my Lawnmower Problem solver from my GitHub repository.

The goal of knapsack problems is to put as much stuff into a container as it will hold, optimizing for constraints such as item weight and size and value. The standard Knapsack Problem adds the limitation that there is only one of each particular item available. In the unbounded variant of the Knapsack Problem there is no limit. This project solves the Unbounded Knapsack Problem (UKP) by improving upon the genetic solver used in the previous post. The code is written in Go.

We’ll start with an easy one so we can understand how it works. We’ll do that by using the item weights, sizes and values from the Rosetta Code page for the Unbounded Knapsack Problem, but with different names, as follows:

Name Value (each) Weight Volume (each)
Bark 3000 0.3 .025
Herb 1800 0.2 .015
Root 2500 2.0 .002

The knapsack contents cannot weigh more than 25 units and its maximum volume is 0.25 units. Our goal is to maximize the value of the contents of the knapsack without exceeding either of the weight or volume limits.

Let’s think about how we would solve this problem by hand. We want to maximize the value within the constraints. So we want a high ratio of value to weight and value to volume. And we want as many of those in the bag as we can get. When we can’t stuff any more of the top item into the bag, we fill in the remaining space with the next most valuable item, and so on. This process is known as hill climbing.

First we need a struct to hold the resource information.

type resource struct {
	name   string
	value  int
	weight float64
	volume float64
}

This project uses genes in a new way – as indexes and counts. Each chromosome will have two parts. The first will represent an index into the list of available resources. The second will be the quantity of that resource to take. The candidate gene sequence will be decoded as follows:

func decodeGenes(candidate string, resources []resource, geneSet string) map[resource]int {
	resourceCounts := make(map[resource]int, len(candidate)/2)
	for i := 0; i < len(candidate); i += 2 {
		chromosome := candidate[i : i+2]
		resourceId := scale(strings.Index(geneSet, chromosome[0:1]), len(geneSet), len(resources))
		resourceCount := strings.Index(geneSet, chromosome[1:2])
		resource := resources[resourceId]
		resourceCounts[resource] = resourceCounts[resource] + resourceCount
	}
	return resourceCounts
}

func scale(value, currentMax, newMax int) int {
	return value * newMax / currentMax
}

The fitness function will use the decoded resource counts and the resource constraints to determine an appropriate fitness for the candidate.

func getFitness(resourceCounts map[resource]int, maxWeight float64, maxVolume float64) int {
	weight := 0.0
	volume := 0.0
	value := 0

	for resource, count := range resourceCounts {
		weight += resource.weight * float64(count)
		volume += resource.volume * float64(count)
		value += resource.value * count
	}

	if weight > maxWeight || volume > maxVolume {
		return -value
	}
	return int(value)
}

Next is a display function that tells us which resources and how many of each to take and their value (fitness).

func display(resourceCounts map[resource]int, fitness int, elapsed time.Duration) {
	label := ""
	for resource, count := range resourceCounts {
		label += fmt.Sprint(count, " ", resource.name, " ")
	}
	fmt.Println(
		fitness,
		"\t",
		label,
		"\t",
		elapsed)
}

The only thing left is to define the resources and invoke the solver:

func main() {
	resources := []resource{
		{name: "Bark", value: 3000, weight: 0.3, volume: .025},
		{name: "Herb", value: 1800, weight: 0.2, volume: .015},
		{name: "Root", value: 2500, weight: 2.0, volume: .002},
	}

	const maxWeight = 25.0
	const maxVolume = .25

	geneSet := "0123456789ABCDEFGH"

	calc := func(candidate string) int {
		decoded := decodeGenes(candidate, resources, geneSet)
		return getFitness(decoded, maxWeight, maxVolume)
	}

	start := time.Now()

	disp := func(candidate string) {
		decoded := decodeGenes(candidate, resources, geneSet)
		fitness := getFitness(decoded, maxWeight, maxVolume)
		display(decoded, fitness, time.Since(start))
	}

	var solver = new(genetic.Solver)
	solver.MaxSecondsToRunWithoutImprovement = .1
	solver.MaxRoundsWithoutImprovement = 2

	var best = solver.GetBestUsingHillClimbing(calc, disp, geneSet, 10, 2, math.MaxInt32)

	fmt.Println("\nFinal:")
	disp(best)
}

Sample output:

$ go run samples/ukp/rosetta/rosetta.go
30000    12 Root                 0
33600    12 Root 2 Herb          2.0001ms
36000    2 Bark 12 Root          2.0001ms
37200    12 Root 4 Herb          3.0002ms
39600    2 Herb 2 Bark 12 Root   3.0002ms
44100    2 Herb 9 Root 6 Bark    5.0003ms
49500    9 Root 5 Herb 6 Bark    5.0003ms
50100    2 Herb 9 Root 8 Bark    5.0003ms
51500    8 Bark 11 Root          6.0003ms
52600    8 Bark 10 Root 2 Herb   6.0003ms
54500    11 Root 5 Herb 6 Bark   6.0003ms

Final: 
54500    11 Root 6 Bark 5 Herb   248.0142ms

Excellent. A success with a sub-second run time. Also, that result matches one of the four optimal combinations for this problem from the RosettaCode page.

Now, just as there are standard problem sets for the Travelling Salesperson Problem, there are also standard problem sets available for the Unbounded Knapsack Problem. One such set, named exnsd16, comes from the PYAsUKP site.

Let’s use the structure of the code above as a guide to implementing a solver for standard problem sets.

First, the resource data must be imported from the file. The file has the following format:

...
c: 1234  <-- constraint
...
begin data <-- resource data
55 32 <-- weight and value
91 3212
...
832 88
end data
...
sol: <-- begin optimal solution
	83	21	58	65 <-- item index, count, weight, value
	71	51	13	20
...

This means the parser will have several transitions because we need the constraint and resource data as a minimum. It would also be nice to have the optimal solution info for result verification.

The resource type doesn’t need a volume field:

type resource struct {
	name   string
	value  int
	weight int
}

Here’s the parser:

func loadResources(routeFileName string) ([]resource, int, map[resource]int) {
	parts := make(chan part)

	lineHandler := ukpResourceFileHeader
	go func() {
		for line := range File.EachLine(routeFileName) {
			lineHandler = lineHandler(line, parts)
		}
		close(parts)
	}()

	resources := make([]resource, 0, 10)
	solution := make(map[resource]int)

	maxWeight := -1
	for part := range parts {
		switch {
		case part.partType == constraintPart:
			maxWeight = parseConstraint(part.line)
		case part.partType == resourcePart:
			resources = append(resources, parseResource(part.line, len(resources)))
		case part.partType == solutionPart:
			resourceId, count := parseSolutionResource(part.line)
			solution[resources[resourceId-1]] = count
		}
	}

	return resources, maxWeight, solution
}

func parseConstraint(line string) int {
	parts := strings.Fields(line)
	constraint, err := strconv.Atoi(parts[1])
	if err != nil {
		panic("failed to parse constraint from '" + line + "'")
	}
	return constraint
}

func parseResource(line string, totalResources int) resource {
	parts := strings.Fields(line)
	weight, err := strconv.Atoi(parts[0])
	if err != nil {
		panic("failed to parse weight from '" + line + "'")
	}
	value, err := strconv.Atoi(parts[1])
	if err != nil {
		panic("failed to parse value from '" + line + "'")
	}

	return resource{
		name:   fmt.Sprint("Item_" + strconv.Itoa(1+totalResources)),
		weight: weight,
		value:  value,
	}
}

func parseSolutionResource(line string) (int, int) {
	parts := strings.Fields(line)
	resourceId, err := strconv.Atoi(parts[0])
	if err != nil {
		panic("failed to parse resourceId from '" + line + "'")
	}
	count, err := strconv.Atoi(parts[1])
	if err != nil {
		panic("failed to parse count from '" + line + "'")
	}

	return resourceId, count
}

type ukpLineFn func(line string, parts chan part) ukpLineFn

func ukpResourceFileHeader(line string, parts chan part) ukpLineFn {
	if strings.Index(line, "c:") != 0 {
		return ukpResourceFileHeader
	}
	parts <- part{line: line, partType: constraintPart}
	return ukpDataHeader
}

func ukpDataHeader(line string, parts chan part) ukpLineFn {
	if strings.Index(line, "begin data") != 0 {
		return ukpDataHeader
	}
	return ukpData
}

func ukpData(line string, parts chan part) ukpLineFn {
	if strings.Index(line, "end data") != 0 {
		parts <- part{line: line, partType: resourcePart}
		return ukpData
	}
	return ukpSolutionHeader
}

func ukpSolutionHeader(line string, parts chan part) ukpLineFn {
	if strings.Index(line, "sol:") != 0 {
		return ukpSolutionHeader
	}
	return ukpSolution
}

func ukpSolution(line string, parts chan part) ukpLineFn {
	if len(line) > 0 {
		parts <- part{line: line, partType: solutionPart}
		return ukpSolution
	}
	return ukpFooter
}

func ukpFooter(line string, parts chan part) ukpLineFn {
	return ukpFooter
}

type partType int

type part struct {
	line     string
	partType partType
}

const (
	constraintPart partType = 1 + iota
	resourcePart
	solutionPart
)

A chromosome will still have two parts, the index and count, but this time it will need more genes because the problem we’re solving has 2000 possible resources and the quantities may need to go as high as 160.

spreadsheet demonstrating picking best first item

If we use hexadecimal values for the geneSet we’ll need 3 genes for the resource index and 2 for the quantity.

const hexLookup = "0123456789ABCDEF"
const numberOfGenesPerChromosome = 5

Here is the decoder:

func decodeGenes(candidate string, resources []resource) map[resource]int {
	resourceCounts := make(map[resource]int, len(candidate)/numberOfGenesPerChromosome)
	const maxHexValue = 16 * 16 * 16
	for i := 0; i < len(candidate); i += numberOfGenesPerChromosome {
		chromosome := candidate[i : i+numberOfGenesPerChromosome]
		resourceId := scale(hexToInt(chromosome[0:3]), maxHexValue, len(resources))
		resourceCount := hexToInt(chromosome[3:numberOfGenesPerChromosome])
		resource := resources[resourceId]
		resourceCounts[resource] = resourceCounts[resource] + resourceCount
	}
	return resourceCounts
}

func hexToInt(hex string) int {
	value := 0
	multiplier := 1
	for i := len(hex) - 1; i >= 0; i-- {
		value += multiplier * strings.Index(hexLookup, hex[i:i+1])
		multiplier *= len(hexLookup)
	}
	return value
}

func scale(value, currentMax, newMax int) int {
	return value * newMax / currentMax
}

Once we’ve decoded the candidate’s gene sequence we can get its fitness:

func getFitness(resourceCounts map[resource]int, maxWeight, optimalFitness int) int {
	weight := 0
	value := 0

	for resource, count := range resourceCounts {
		weight += resource.weight * count
		value += resource.value * count
	}

	if weight > maxWeight {
		if value == optimalFitness {
			return optimalFitness + weight - maxWeight
		}
		return -value
	}

	return int(value)
}

And the last utility function is one to display the decoded genes and quantities in a useful format:

func display(resourceCounts map[resource]int, fitness int, elapsed time.Duration, shorten bool) {
	label := ""
	for resource, count := range resourceCounts {
		if count == 0 {
			continue
		}
		if len(label) > 0 {
			label += ", "
		}
		label += fmt.Sprint(count, " of ", resource.name)
	}
	if shorten && len(label) > 33 {
		label = label[:33] + " ..."
	}
	fmt.Println(
		fitness,
		"   ",
		label,
		"\t",
		elapsed)
}

Finally, here’s the code that glues all the parts together to try to solve the problem.

func main() {
	flag.Parse()
	if flag.NArg() != 1 {
		fmt.Println("Usage: go run standard.go RESOURCEFILEPATH")
		return
	}
	var resourceFileName = flag.Arg(0)
	if !File.Exists(resourceFileName) {
		fmt.Println("file " + resourceFileName + " does not exist.")
		return
	}
	fmt.Println("using resource file: " + resourceFileName)

	resources, maxWeight, solution := loadResources(resourceFileName)

	optimalFitness := 0
	for resource, count := range solution {
		optimalFitness += resource.value * count
	}

	calc := func(candidate string) int {
		decoded := decodeGenes(candidate, resources)
		return getFitness(decoded, maxWeight, optimalFitness)
	}

	start := time.Now()

	disp := func(candidate string) {
		decoded := decodeGenes(candidate, resources)
		fitness := getFitness(decoded, maxWeight, optimalFitness)
		display(decoded, fitness, time.Since(start), true)
	}

	var solver = new(genetic.Solver)
	solver.MaxSecondsToRunWithoutImprovement = 5
	solver.MaxRoundsWithoutImprovement = 3

	var best = solver.GetBestUsingHillClimbing(calc, disp, hexLookup, 10, numberOfGenesPerChromosome, optimalFitness)

	fmt.Print("\nFinal: ")
	decoded := decodeGenes(best, resources)
	fitness := getFitness(decoded, maxWeight, optimalFitness)
	display(decoded, fitness, time.Since(start), false)
	if fitness == optimalFitness {
		fmt.Println("-- that's the optimal solution!")
	} else {
		percentOptimal := float32(100) * float32(fitness) / float32(optimalFitness)
		fmt.Printf("-- that's %f%% optimal\n", percentOptimal)
	}
}

Sample output:

using resource file: samples/ukp/data/exnsd16.ukp
1001881     137 of Item_730      3.0002ms
1016358     147 of Item_1698     5.0003ms
1023272     148 of Item_1698     5.0003ms
1023775     155 of Item_480      283.0162ms
1025480     155 of Item_1227     1.2450713s
1025681     157 of Item_1331     2.7771589s
1027260     156 of Item_288      3.4291962s
1027500     156 of Item_288, 6 of Item_1567      3.9722272s
1027540     7 of Item_1567, 156 of Item_288      3.9732273s
1027860     156 of Item_288, 15 of Item_1567     3.9732273s
1028783     156 of Item_288, 1 of Item_1692      4.212241s
1028950     156 of Item_288, 1 of Item_1801      4.212241s
1029261     1 of Item_1810, 156 of Item_288      4.213241s
1029581     156 of Item_288, 1 of Item_1790      4.2452429s
1029663     156 of Item_288, 1 of Item_1248      4.2492431s
1029664     1 of Item_1767, 156 of Item_288      7.4734275s
1029680     156 of Item_288, 1 of Item_987       8.8265049s

Final: 1029680     1 of Item_987, 156 of Item_288        14.0928061s
-- that's the optimal solution!

It doesn’t achieve an optimal solution every time but it does find a solution that’s better than 99 percent optimal almost every time. Here’s the fitness results across 100 runs with 5 seconds to run and a maximum of 3 rounds without improvement:

1029680 21
1029582 3
1029504 1
1029477 1
1029459 1
1029356 1
1029094 1
1028950 2
1028926 1
1028911 1
1028905 2
1028891 1
1028890 1
1028888 1
1028872 1
1028603 1
1028209 1
1028205 1
1028198 1
1028183 10  <- median
1028176 1
1028166 4
1028127 1
1028101 14
1028100 1
1028058 1
1028050 2
1027991 1
1027776 1
1027416 3
1027311 1
1027042 1
1026972 1
1026799 5
1026747 1
1026721 1
1026345 1
1026285 1
1026204 1
1025834 2
1025681 3 <- 99.611626% optimal

The source code for this project is available from my GitHub repository.

Go on to part 6.

This Go project uses the genetic solver from my previous post to evolve a regular expression that matches all items in a set of wanted strings without matching any items in the set of unwanted strings. For example:

	wanted := []string{"00", "01", "10"}
	unwanted := []string{"11"}

We desire a regular expression that finds strings from wanted without finding those from unwanted. We’ll start by determining the unique set of characters in the wanted strings:

func getUniqueCharacters(wanted []string) string {
	uniqueCharacters := make(map[string]bool)

	characters := ""
	for _, item := range wanted {
		for i := 0; i < len(item); i++ {
			token := item[i : i+1]
			if !uniqueCharacters[token] {
				characters += token
				uniqueCharacters[token] = true
			}
		}
	}
	return characters
}

And we’ll combine those with a set of regex special characters to get our gene set:

const regexSpecials = "[]()|?*+"

geneSet := getUniqueCharacters(wanted) + regexSpecials

Next, when we receive a candidate regex from the solver we’ll verify that it is a valid regex by trying to compile it.

func isValidRegex(candidate string) bool {
	if strings.Contains(candidate, "()") || strings.Contains(candidate, "??") {
		return false
	}

	_, err := regexp.Compile(candidate)
	return err == nil
}

We also need a function to calculate the fitness of a given candidate based on the number of wanted and unwanted strings it matches.

func calculate(wanted, unwanted []string, geneSet string, candidate string) int {
	if !isValidRegex(candidate) {
		return math.MaxInt32
	}

	regex := regexp.MustCompile("^(" + candidate + ")$")
	fitness := 0
	for _, item := range wanted {
		if !regex.MatchString(item) {
			fitness++
		}
	}

	for _, item := range unwanted {
		if regex.MatchString(item) {
			fitness += 10
		}
	}

	return fitness
}

Next we’ll need a way to visualize what’s happening:

	disp := func(candidate string) {
		fmt.Println(candidate,
			"\t",
			calc(candidate),
			"\t",
			time.Since(start))
	}

Finally we glue it all together as follows:

package main

import (
	"github.com/handcraftsman/GeneticGo"
	"fmt"
	"math"
	"regexp"
	"strings"
	"time"
)

const regexSpecials = "[]()|?*+"

func main() {
	wanted := []string{"AL", "AK", "AS", "AZ", "AR"}
	unwanted := []string{"AA"}

	geneSet := getUniqueCharacters(wanted) + regexSpecials

	calc := func(candidate string) int {
		return calculate(wanted, unwanted, geneSet, candidate)
	}
	start := time.Now()

	disp := func(candidate string) {
		fmt.Println(candidate,
			"\t",
			calc(candidate),
			"\t",
			time.Since(start))
	}

	var solver = new(genetic.Solver)
	solver.MaxSecondsToRunWithoutImprovement = 3
	solver.LowerFitnessesAreBetter = true
	solver.MaxRoundsWithoutImprovement = 10

	var best = solver.GetBestUsingHillClimbing(calc, disp, geneSet, 10, 1, 0)
	fitness := calc(best)

	if fitness == 0 {
		println("\nsolved with: " + best)
	} else {
		println("\nfailed to find a solution")
	}

	fmt.Print("Total time: ")
	fmt.Println(time.Since(start))
}

Note: depending on the speed of your processor you may be able to reduce MaxSecondsToRunWithoutImprovement. For this problem I’ve set it to 1/10 second.

solver.MaxSecondsToRunWithoutImprovement = .1

Output:

$ go run samples/regex.go
|        3       0
10       2       1.0001ms
1*0+     1       131.0075ms
0*1?0*   0       335.0192ms

solved with: 0*1?0*
Total time: 335.0192ms

Cool!

Now, here’s a somewhat more interesting problem – repetition:

	wanted := []string{"01", "0101", "010101"}
	unwanted := []string{"0011"}

	solver.MaxSecondsToRunWithoutImprovement = .1

Output:

$ go run samples/regex.go
0        3       1ms
01       2       1ms
(01)+    0       215.0123ms

solved with: (01)+
Total time: 216.0123ms

Nailed it! And one more based on U.S. state mail codes:

	wanted := []string{"AL","AK","AS","AZ","AR"}
	unwanted := []string{"AA"}

	solver.MaxSecondsToRunWithoutImprovement = .5

Output:

$ go run samples/regex.go
R        5       0
AS       4       1.0001ms
AK?L?    3       1.0560604s
AS*L?K*          2       2.1301218s
A[SKLZ]          1       2.62215s
A[LRSZK]         0       3.1391795s

solved with: A[LRSZK]
Total time: 3.6392081s

The source code for this project is available from my GitHub repository.

Go on to part 5.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: