davidrmiller / biosim4

Biological evolution simulator

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Installation and compilation instructions

eabase opened this issue · comments

It would be fabulous if you could provide some instructions on how to compile and install this awesome simulation.

What would be most useful are:

  • What OS packages are needed?
  • What libraries are needed?
  • How to install them using a typical package manager like apt get etc.
  • Compilation instructions, what tools and flags are needed?

Hi @eabase, on a distro based on Ubuntu 21.04 and configured for C++ development, the core dependencies of the simulator should be satisfied by running "sudo apt install cimg-dev". The readme file offers two ways to compile -- either by using Code::Blocks IDE or using a Makefile. Let us know if you have more specific questions about those steps.

I David,
I think the meaning was create instructions for someone who doesn't know Linux very well, so create instructions that include the exact commands to use to install (including required packages) and then run the the program. I fell into the trap of including the square brackets around the ini file when launching which meant the program wouldn't work. I spent a good 30min trying to investigate before trying without the square brackets. As soon as I did that, it worked fine (so far).

I'll try to get this dockerized (see #8)

Thank you for the docker image @mathematicalmichael 🥇

@eabase, I saw the instructions in the README for the first time 2 days ago i.e. after you created this issue. Do you think the current README fulfills your list? I was able to install required packages and compile from the current README.

@venzen no problem. do you think I should point to the published image in the README?

@mathematicalmichael , that's a great idea. Thanks for making the image available.

@mathematicalmichael my first reaction was "sure, why not?" but then I considered the consequences: some folks might see the link to the docker file and just go grab it without appreciating the full scope and context of the build requirements and components. You might get a lot more support requests if the docker image and a tempting link is free-floating out there without the mandatory walkthrough in the README :)

That being said, I think the Docker image option for compilation should be given more emphasis in the README. For example, in my case, I did not understand - at first - that the docker image is a complete compilation environment for this project. I slogged through getting all the requirements installed on Ubuntu 18.04, installed docker-cs and finally ran the container. Only then did I realize that the docker image is all I needed to compile biosim.

Highlighting the latter fact - and pointing out that a Linux host is more or less the only way to run biosim - will be useful for most users, I think.

Do you think linking to the docker image might have some other benefit?

@davidrmiller no problem! thanks for actually making your work open source and being open to contributions!

@venzen thanks for that feedback, I think that's a great point re: the lack of clarity. I can try to update the README to be more clear that the docker image is a "compilation and runtime environment" (meaning, it can pretty much stay the same despite new features of biosim4 being added).

Regarding upsides:
It took a long time to build the image on my m1 mac mini and even more so on my raspberry pi 4 (kicked them off around the same time). I ended up building it on the m1, publishing, and then using the pi to pull the image and run simulations in the background overnight.
So in short, compilation was quick, building the image was slow. Therefore the only upside is just a bit of time being saved. I think a lot of people mess with Linux on lower-power hardware so this helps get things up faster there.

Some notes:

  • I took this project as an opportunity to learn how to build multi-arch images (as well as how to get the makefile to support compiling on both platforms! that was new for me and part of the fun of submitting the PR).
  • Later I learned to use github actions + docker buildx to publish multi-arch images directly to dockerhub, so I can offer to contribute helping maintain the publishing of these images
  • re: the latter point.. perhaps under a better namespace. though the one I have now is not entirely inappropriate, it would be nice if it were under @davidrmiller's docker username instead but I'm open to keeping helping maintain the image I published.

So, I haven't landed on whether or not to link the mindthegrow image in the README, but I think if worded right it could be of help.

Something like

# Quickstart
If you are running any unix/linux-based distribution, the easiest way to get started with
reproducing this work is to install `docker` on your system (link here) and run the following in your terminal:

```sh
$ git clone https://github.com/davidrmiller/biosim4
$ cd biosim4
$ docker build -t biosim4 .
$ docker run --rm -ti -v `pwd`:/app -w `/app` biosim4 make
```

If you want to skip building the `biosim4` image yourself, you can use a community-contributed image by replacing the third step above with

```sh
$ docker pull mindthegrow/biosim4 && docker tag mindthegrow/biosim4 biosim4
```

## What is the docker image for?
The docker image provides all the required dependencies to compile and run biosim4 and avoids you needing to install any packages related to it on your system. 
<talk more about instructions linked below>

Thoughts on the initial pass? No mentions of architecture but I can make it clear that it'll work on both arm64 and amd64

@mathematicalmichael your explanation changed my mind! :)

Considering how useful your docker image is, and the effort it takes to create it (them), making it available will be a great service to BioSim enthusiasts. Regarding namespace at DockerHub I want to raise the same concern: placing the images under David's namespace, there, might lead users to direct support issues to him, rather than you, and result in an awkward resolution process. I understand that you probably propose the namespace out of respect for David's work in BioSim, but the docker image is your hard work and merit. Consider this too. :)

The README "Quickstart" contents you propose above are clear. I want to suggest:

  1. Under "System Requirements" point out that building and compiling under Windows is problematic and not supported. Keep the the caveat about 20.04 LTS and direct users to the "Bugs"section further down the page.
  2. "Quickstart": the text you posted above is good. I have this link to a guide for installing Docker in Ubuntu 20.04: https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-on-ubuntu-20-04 There might be better guides out there, but it worked for me on 18.04.
  3. The option to build the docker image oneself is great! Link to your Docker images is unavoidable. Perhaps create a new namespace at DockerHub or include under David's as you see fit.
  4. Subsection "What is the docker image for?". Useful and insightful for less experienced users. Mention that an ARM image is available and working on Pi. This will be quite popular, I imagine.

Once again, Good Job @mathematicalmichael ! The presence of your docker image inspired me to get my local docker install working and this is the first image I have run. Of course, it enabled me to compile and run biosim4 quickly and effortlessly - which was the reason for coming here in the first place. Yay! \o/

@mathematicalmichael Issue #25 has discussion about a testing methodology using Python3. Some additional modules will have to be installed during the docker build. Specifically, python3-dev, configparser and pybind11. Other modules may join the list as we progress.

Should we create a separate Dockerfile to build a container for tests? Is the existing image at mindthegrow/biosim4 suitable for such a build? I'm new to Docker so your advice will be useful.

@davidrmiller do you have some input about direction with this?

For a testing program, I might have done something like the shell script below or a Python equivalent that has no dependencies other than a standard *nix installation. This is just a quickly-written example of a concept. It invokes the simulator with a pre-prepared config file, then checks stderr and the epoch log to see if the output looks reasonable. This concept example could be expanded to loop over multiple test configs and kill any simulator processes that run too long. Instead of hardcoded expected results like in the example below, it could extract the expected results for each test from a table or from specially-formatted comments inside each config file. It's just a shell script and a config file that ship inside a test directory -- no changes are needed to the simulator or to the config file format, no special compilation, and no outside dependencies. The corresponding config file is attached here with a .txt extension:
biosim4-test-1.ini.txt

#!/bin/sh

# This script tests the biosim4 simulator using the config file named "biosim4-test-1.ini".
# Criteria for a successful test are no stderr output and certain values in the last line
# of logs/epoch-log.txt.

./bin/Debug/biosim4 ./biosim4-test-1.ini  > out  2> err

success=1

if test -s err; then
    echo "The simulator output errors:"
    head -10 err
    success=0
fi

result=`tail -1 ./logs/epoch-log.txt`
generation=`echo $result | cut -d" " -f 1`
survivors=`echo $result | cut -d" " -f 2`
diversity=`echo $result | cut -d" " -f 3`
genomeSize=`echo $result | cut -d" " -f 4`
kills=`echo $result | cut -d" " -f 5`

if [ $generation -ne 100 ]; then
    echo "Error: generation number, expected 100, got " $generation
    success=0
fi

if [ $survivors -le 266 -o $survivors -gt 284 ]; then
    echo "Error: survivors, expected 266 to 284, got " $survivors
    success=0
fi

if [ `expr $diversity \<  0.314` -eq 1 -o `expr $diversity \> 0.551` -eq 1 ]; then
    echo "Error: diversity, expected 0.314 to 0.551, got " $diversity
    success=0
fi

if [ $genomeSize -ne 64 ]; then
    echo "Error: genome size, expected 64, got " $genomeSize
    success=0
fi

if [ $kills -ne 0 ]; then
    echo "Error: number of kills, expected 0, got " $kills
    success=0
fi

if [ $success -eq 1 ]; then
    echo "Pass."
else
    echo "Fail."
fi

This shell script works, as is. Shell output:

darwin@45c0399df3a1:/app$ ./tests/simtest.sh
Pass.

Using biosim4-test-1.ini it took about 7 mins to run on my AMD 4-core laptop for this non-deterministic simulation.

my advice would be to have testing dependencies in a separate container. It can inherit from (use as its base image) the existing published image and add the couple of extra requirements on top.

I can help with the github action, it would be something like a copy/paste of the existing one but referencing a different dockerfile, the workflow being changed to "build the test image, then run docker run <same as now> <shell script> instead of make at the end. This Dockerfile-testing and new .github/workflows/test.yml file should be part of that PR that introduces the testing scripts.

seven minutes however... is much too long for testing, the github-actions servers are not really any more powerful than your laptop. fewer generations -> linear decrease in time, as would be fewer timesteps per generation. So if you cut both in half, you'll get something on the order of < 2 minutes, which is long but still tolerable. this can be accomplished with something like running sed on the biosim4.ini file prior to testing, or using a dedicated test-file as you have showed (i'm in favor of running sed and avoiding the clutter of a separate .ini, but it may be too many changes)

I appreciate the thoughts about testing. While completely automatic testing would be nice, perhaps GitHub actions are not well suited for this project. They seem to require maintaining non-trivial actions, containers, libraries, and other dependencies for tests that could only run for a limited duration.

I'd like to avoid over-engineering something that could be simple. I propose that testing is more easily done locally by a user as needed by invoking a script (cf. the shell script example above). It would execute in the same directory structure in the same container with no additional dependencies.

Testing locally would allow any user to invoke only the tests needed and only when needed (single/multi-threaded, fast, long, [non]deterministic, etc). It would allow some tests to run for an extended time, which is valuable. The tests would be easy to invoke, easy to develop, and easy to maintain by anyone.

As @mathematicalmichael pointed out, the script could generate the necessary config files when needed so that the entire testing framework could consist of just a single executable script.

I mean, adding a few libraries would only add a couple dozen MB or so to the image, so if you want to just jam em in to the image, that's really not a problem, I can update the one I have published to mindthegrow if you ping me when main is updated with a new Dockerfile. But I do like that your shell script doesn't require anything else, that's really appealing.

A local testing approach is fine, just have to validate the work of PRs locally. If you don't mind, then yah no use in setting up github actions for testing, having a compilation check is good enough.

The discussion about testing is valuable. Digging deeper into the code and having re-watched the biosim4 video a few times, it is evident that @davidrmiller has put a lot of thought into this project. That implies, also. a vision for the project as well as philosophy that steers it.

Most of the participants are enthusiastic to contribute because biosim4 inspires thought and creativity. I am slowly coming round to David's values and approach - which is something that all contributors can achieve by reading his comments and posts - they are well-worded and convey his philosophy for the project.

Good job by everybody who has contributed, so far.

@mathematicalmichael I agree with your approach of separating out tests. @davidrmiller 's idea of simple single-script tests should be the norm, and then those of use who want to run whatever extensive local tests can have another Dockerfile which builds an image with all those arbitrary libraries we require.

Such a Dockerfile and contributed files/scripts can live in separate tests directory. The official documented test-script could be in the top-level directory or could reside in the tests directory too.

Hi Guys!
Thanks for all the feedback. Sorry I have not been available to see the whole discussion earlier.
All great work, and I just wanted to add my silly 2 cents in general about docker.

To me, I was never a great fan of docker. Although it's extremely easy and useful, the drawback (as implied by someone earlier) is that it masks all the grit needed to build or customize your own system. For me using docker is a bit like saying "Here is the ISO image, just install on your VM and run.", which means you have learned absolutely nothing about the project you're running, nor about any of the tools or technology it uses. Then someone may argue, "You can always go into the config files and see what the image is doing." Well, can you? Probably not without learning docker.

(But if you think I'm lost out biking in the forest, I am happy to be corrected!)

Just to sketch another use case and give +1 support for docker...

@eabase I get your point about androgogy, however, in my case docker is more than a convenience or cop-out. Without it I would not be able to compile or run biosim4. This is due to practical (and resource) limitations: if I upgrade to Ubuntu 21.04 then I will break dependencies for other projects and destroy a finely-tuned desktop environment.

I think that his project is of interest to a wide audience: evolutionary biologists, coders, philosophers, and laypeople, etc. Yet, those incidental users who arrive here - but who lack experience of open source and C code compilation - will be out of their depth. The docker image does not allow the user a short-cut because installing docker, setting it up and actually running the image, implies that he has the concomitant skill-set to install dependencies and compile from the command line, anyway.

I was actually considering, for the past week, that a few architecture specific docker images with already-compiled binaries might be useful to noobs, but (considering your points and my response) I just convinced myself why that would probably be a futile gesture. :)

ezoic increase your site revenue