sferes2 / sferes2

A lightweight, generic C++11 framework for evolutionary computation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature request: more control for writing gen files

JoostHuizinga opened this issue · comments

These suggestions are not essential, but I have found them useful in the past.

  1. Dump gen files at different times than at fixed intervals. For example, I would like to be able to not dump the first gen file (because gen_0 is not that useful, yet can take up a lot of space), or to be able to dump checkpoints based on wall-clock time rather than generation interval (so I don't have to tune checkpointing based on the speed of the machine I am on).
  2. Write gen files with a different name from gen_[generation]. This is mostly so I can distinguish between files I created just for checkpointing (which can be deleted at the end of a run), and files used for analysis (which certainly should not be deleted).
  3. Call arbitrary code after a gen file is written. This functionality has mostly been important on the OSG cluster, a heterogenous cluster without a shared filesystem. On OSG, every file you write has to be transferred manually to your local account, because all files written locally are lost if the job gets preempted (which happens a lot). Another use-case is that I like to clean-up checkpoints during a run, just to minimize disk usage (we, as a lab, have lots of trouble remaining within our allocated quota).
  4. A quick way to test gen file integrity. Since writing to disk is often one of the slower processes, preemption can quite often halt your program in the middle of writing a checkpoint, leaving a corrupt gen file in its wake. This makes it hard to continue automatically, since continue scripts will usually just load the most recent gen file available.

For points 1 to 3, I would suggest adding (yet) another object to the Ea class, which would be something like a gen-file manager. The manager would decide when to write a gen file, what to name the gen file, and when to call additional code.

For point 4, I solved that by simply writing a string of special characters to the end of every gen file. To be precise, I added the following code:

        //If our archive is binary, explicitly set these characters at the end of the file
        //so we can check (or at least be reasonably certain) that the archive was written
        //successfully and entirely
        char end[25] = "\n</boost_serialization>\n";
        boost::serialization::binary_object object(end, 25);
        oa << object;

You can then read the last 25 bytes of a gen file, and check that it matches this particular magic string. This method is technically not fool proof, because that string of characters may accidentally be written somewhere in the middle of a file right at the time that a job gets preempted, but I think the chances of that happening are so low (about the chance of guessing a 25 character password), that I think it is good enough.

P.S. While it may not seem like it, given the number of issues I am creating, I think Sferes is a very nice and robust framework, and I thank all contributors for creating and maintaining it.

About point 4, we should add a checksum at the end of the file. I will have a look at this soon. About 1-3, we could indeed have a functor for this (but it will have to come with a default value). We have a better mechanism for default functors in Limbo, but adding this in sferes will mean a lot of changes.