Implement proper Caching

Question

Implement proper Caching

skorfmann opened this issue 10 years ago · comments

Sebastian Korfmann commented 10 years ago

I was playing around with Drone.io over the weekend and I'm really impressed.

However, there is one big issue for our Rails project: Bundling all the Gems (~ 250 Gems, where about 10 of these are git checkouts) takes about 10 minutes for each build, as I've found no way to provide a cacheable directory to Bundler.

I've seen the issues #43 and #143, but as far as I understood the solution proposed in #43, the cache would only be invalidated when the actual setup commands have changed. In my case, it would need to re-run the commands when the content of the project's Gemfile.lock has changed.

Furthermore, it would be neat to be able to share a cache directory between different projects. In our current Jenkins setup, we're sharing a global Bundler directory, which speeds up new project builds enormously.

Here is an excerpt of our build file:

BUNDLE_PATH="$JENKINS_HOME/shared/bundle"
bundle install --path="$BUNDLE_PATH"

Brad Rydzewski · Answer 1 · Sun Mar 02 2014 10:36:14 GMT+0800 (China Standard Time)

I like the proposal in #143

We can cache folders using volumes. Maybe something like this in the yaml

cache:
  - /home/user/bundler
  - /home/user/.m2

On the most machine, the directories would need to follow some sort of naming convention that includes the repository and branch. For example /var/cache/drone/github.com/repo/name/branch or maybe /tmp so that a reboot can flush the cache?

When we build a new branch, and no cache exists, we could copy the cache from master.

Brad Rydzewski · Answer 2 · Mon Mar 03 2014 14:47:19 GMT+0800 (China Standard Time)

I have this in a local branch and it works, however, there are a few minor gotchas I found

Permissions
Docker automatically mounts with root permissions. This is an issue because many of the pre-built images we provide use the default ubuntu user. I should probably just use root in our default images.

Paths
Docker requires absolute paths for mounting volumes. This works well:

cache:
  - /usr/local/go/pkg

I've added code to turn this into an absolute path, relative to where the code is cloned in the container:

cache:
  - .npm

However, the following examples will FAIL:

cache:
  - $HOME/bundle
  - ~/bundle

Brad Rydzewski · Answer 3 · Mon Mar 03 2014 15:05:42 GMT+0800 (China Standard Time)

added docs to the README:
https://github.com/drone/drone#caching

This is still alpha quality given the above issues. The biggest issue will be permission related when the container USER is not root. The workaround is to chown the directory in the container as part of your build script. That being said, feel free to play around with it and add your feedback to this thread

Ralf Schimmel · Answer 4 · Mon Mar 03 2014 16:02:47 GMT+0800 (China Standard Time)

Great work @bradrydzewski will give it a whirl today and test it.

Sebastian Korfmann · Answer 5 · Mon Mar 03 2014 16:18:19 GMT+0800 (China Standard Time)

Thanks @bradrydzewski, I'll give it a try later today.

Ralf Schimmel · Answer 6 · Mon Mar 03 2014 16:32:30 GMT+0800 (China Standard Time)

I added caching to the .drone.yml

cache:
  - .m2/repository

and ran into the following error in the drone console log when building:

$ git clone --depth=50 --recursive --branch=story-66328504-drone-docker git@github.com:user/repo.git /var/cache/drone/src/github.com/user/repo
fatal: destination path '/var/cache/drone/src/github.com/user/repo' already exists and is not an empty directory.

I'm using my own java docker image since I needed Maven 3.1.1.

Michael Nutt · Answer 7 · Tue Mar 04 2014 14:26:31 GMT+0800 (China Standard Time)

@ralfschimmel I ran into the same problem, but everything worked fine after I moved the cached directory out of my repo into /tmp. (which makes sense, because the cache is mounted before the repo is cloned, and the clone needs to happen into an empty directory)

Brad Rydzewski · Answer 8 · Tue Mar 04 2014 14:35:57 GMT+0800 (China Standard Time)

Interesting ... @ralfschimmel thanks for testing and @mnutt thanks for troubleshooting and finding the root cause. Does anyone know if there is a command line flag we can use to force clone into a non-empty folder?

Ralf Schimmel · Answer 9 · Tue Mar 04 2014 15:53:41 GMT+0800 (China Standard Time)

Indeed, using absolute paths works just fine!

@bradrydzewski Two options I can think of top of mind;

clone into other folder and mv repo into existing folder
git init in existing folder and add repo ass remote and checkout required branch

Propheris · Answer 10 · Sat Mar 08 2014 00:15:59 GMT+0800 (China Standard Time)

I just tried with:
cache:
- /tmp/something

and ls -l /tmp shows something directory is owned by root and has drwxr-xr-x perms set so I cannot chown it (as ubuntu) nor can I create anything inside as the ubuntu user.

According to:

https://github.com/drone/drone/blob/master/pkg/build/build.go#L360

it should be 0777 so I'm not sure what's going on. Is the .deb package available here http://downloads.drone.io/latest/drone.deb up to date with what's in master or do I need to build .deb myself ?

Michael Nutt · Answer 11 · Sat Mar 08 2014 00:22:20 GMT+0800 (China Standard Time)

@Propheris are you using the ruby1.9.3 image? For me, /tmp is 0777.

The drone.deb package is automatically built on every commit to master. (that drone can successfully build)

Propheris · Answer 12 · Sat Mar 08 2014 00:22:34 GMT+0800 (China Standard Time)

I also found that ubuntu can sudo without password so it works with the following (might help someone):

image: ruby2.0.0
script:
  - sudo chown ubuntu:ubuntu /tmp/bundler
  - bundle install --path=/tmp/bundler
  - RAILS_ENV=test bundle exec rake db:create
  - RAILS_ENV=test bundle exec rake db:schema:load
  - RAILS_ENV=test bundle exec rake db:seed
  - bundle exec rspec
cache:
  - /tmp/bundler
services:
  - mysql

Brad Rydzewski · Answer 13 · Sat Mar 08 2014 01:35:54 GMT+0800 (China Standard Time)

@Propheris this is because the default Drone images run as USER ubuntu, however, Docker mounts the volumes as root. This is primary reason I haven't marked this enhancement as complete yet.

The solution is pretty simple, although a bit of a pain. I need to re-create and re-test all the Drone images (at github.com/drone/images) to run as root instead of ubuntu.

Christoph Tavan · Answer 14 · Sat Mar 22 2014 00:49:33 GMT+0800 (China Standard Time)

👍 The cache feature is awesome! chown works fine as a workaround but of course it would be great to have it working out of the box. Here's what I'm using to cache the npm packages:

image: node0.10
script:
  # workaround for cache, see https://github.com/drone/drone/issues/147
  - mkdir -p /tmp/npm
  - sudo chown -R ubuntu:ubuntu /tmp/npm
  - npm config set cache /tmp/npm
  # actual script:
  - npm test
cache:
  - /tmp/npm

Brad Rydzewski · Answer 15 · Fri May 09 2014 13:23:32 GMT+0800 (China Standard Time)

I think we're going to need to alter our caching approach, and I wanted to describe my thoughts here.

So why change the existing approach? There are few issues, but I'm going to focus on the most critical. Our current approach requires us to have physical access to the machine that is running the build (to create the cache folders, remove the folders, etc).

What if we want to spread builds across multiple servers? We can do almost everything via Docker's remote API, over TCP, with the exception of creating and managing our cache directories. This means we have two options. 1) we can create an agent that is installed on each machine to execute filesystem commands or 2) we can come up with a caching solution that works with the Docker remote API.

I'd like to explore the latter option.

I'm going to experiment with snapshotting container images. We can split the .drone.yml into sections, defining setup and script. We could optionally snapshot the container after the setup commands are run.

The .drone.yml might look something like this:

image: go1.1
cache: enabled
setup:
  - apt-get install sqlite3 libsqlite3 libsqlite3-dev
  - go get
script:
  - go build
  - go test

As mentioned we could snapshot the container after the setup commands are executed. This would have advantages, including caching things like apt-get installations which the current implementation wouldn't support.

I'm hoping to get some feedback or ideas for alternate approaches. I'll create an experimental branch for this and comment on the thread when it is ready for review.

Christoph Tavan · Answer 16 · Fri May 09 2014 15:46:39 GMT+0800 (China Standard Time)

@bradrydzewski The approach you describe would actually be really awesome because it would actually unlock the real advantages of docker for a CI environment.

I just wonder how to make the snapshotting work with stuff like npm install that typically needs to be executed for node.js projects where you would actually have to do something like ADD package.json to the Dockerfile in order to detect changes to the package.json and re-trigger building of the corresponding snapshot? How would you handle cases like this?

I still don't see however that the setup approach could entirely replace the current way of caching in all case. I have ccache in mind where what you really need is just some shared storage location where files can be stored and updated and will survive until the next build... Of course parallel builds might become problematic here as well.

Brad Rydzewski · Answer 17 · Sat May 10 2014 07:02:52 GMT+0800 (China Standard Time)

I think it would work well with npm install. The configuration would look like this:

image: node0.10
setup:
  - npm install
script:
  - npm test

We would split the build into two parts. First we would:

start with base image node0.10
inject a shell script with git checkout + the setup commands in the yaml
start the container and run the script
if successful, snapshot the container (let's pretend it's assigned hash 3da541559918)

And then we would:

start with the above snapshotted image, 3da541559918
inject a shell script generated using the script command in the yaml
start the container and run the script

Next time we run the build, use 3da541559918 as the base image. We will still run the git checkout and setup commands, but this time npm install would already have the files installed locally, in the cache.

Bonus: since Docker uses unique hashes and overlay filesystems, we won't have to worry about two builds altering the same cache.

I think this could work, of course it is just an idea in my head. It will also be kind of a pain to implement, but we do have very good mock testing at that layer...

Christoph Tavan · Answer 18 · Sat May 10 2014 16:15:03 GMT+0800 (China Standard Time)

Just to confirm that I get you right: You always want to snapshot after successful setup runs and use these snapshots the next time a build starts?

Concerning the node.js example: So in the second build when you use 3da541559918 as base image you would still:

clone
checkout
run setup
run script

So the second build will make use of the npm cache in ~/.npm inside the docker image but will not use any installed node_modules directory from builds before, correct?

My initial idea was to be able to re-use the state of the image after a successful npm install in case the dependencies don't change (to get even faster builds), however thinking about it again I think that would be too risky anyways since we wouldn't build in a clean environment.

Thinking of the ccache case again I would then put the compilation of a C++ project into the setup part, and I would put the test commands and packaging commands into the script part, right?

OK, so far this was just to understand your idea again, and for the cases I can think of the solution sounds reasonable ;).

One more thing I would like to understand better is about parallel and distributed builds.

Assuming we have two parallel builds that start off the same base image. They will produce two different snapshots after the setup phase. Which one will be used for the next build? Will docker take care of this?

Matt Butcher · Answer 19 · Sat May 31 2014 03:56:01 GMT+0800 (China Standard Time)

My concern with snapshotting is how we would revert back to the base image if we screwed something up.

Say we do something in setup that breaks expectations, but doesn't generate an error (e.g. deleted a service or changed a password to an unknown value or something). In this case, we'd want to roll back to the base image. How would that happen?

I really like the idea of snapshotting. It's elegant. But I also want protection from shooting myself in the foot.

Brad Rydzewski · Answer 20 · Sat May 31 2014 04:09:06 GMT+0800 (China Standard Time)

Fair point. I think we could provide various mechanisms to flush the cache. These are just some ideas that I can think of off the top of my head:

Flush cache using our command line utility. You could run drone flush githug.com/foo/bar and it would remove any snapshotted images.
Place some keyword in your commit message, like DRONE:FLUSH, which would instruct the system to flush the cache prior to the build executing.
Specify a cache expiration in the .drone.yml to force the cache to get flushed every so often.

Christoph Tavan · Answer 21 · Sat May 31 2014 04:19:42 GMT+0800 (China Standard Time)

Another idea I just want to throw in would be that steps in the setup phase could define make-style dependencies to files that would result in Dockerfile ADD statements.

If I understand the docker ADD mechanism correctly this could work for cases like node.js where npm install should be re-run whenever a package.json has changed.

Of course that won't help for setup-steps that do not define a file as a dependency...

Brad Rydzewski · Answer 22 · Sat May 31 2014 04:35:59 GMT+0800 (China Standard Time)

We could also use the sha value of the .drone.yml file. If the .drone.yml file changes we'll know to invalidate the cache.

Matt Butcher · Answer 23 · Sun Jun 01 2014 00:59:42 GMT+0800 (China Standard Time)

Having thought it over for a while, my gut feel about the recovery suggestions is this:

Having an explicit drone command is a great idea, and the right sort of thing to handle at that level.
Having the cache clear when a .drone.yml file changes is also the right thing to do (and pretty much matches the theory the Docker employs with Dockerfile changes)

I have also been mulling over one other possibility, which would be to add a feature that runs a Dockerfile as a setup step. Something like dockerfile: path/to/Dockerfile. My thought is that this might accomplish the caching layer (by utilizing Docker's built-in cache feature), and would also make it easier for people to extend base images. (I, for example, have an internally used image called docker/gorilla that extends the go1.2 image to add gpm and fabric.)

What I don't like about my own proposal is that there is no particularly straightforward way to control the Docker cache.

Grzesiek Kolodziejczyk · Answer 24 · Tue Jun 03 2014 00:13:31 GMT+0800 (China Standard Time)

Using cache to store bundler files between builds, but it would be nice to have the cache persist between branches. This way it would work for building pull requests.

Robert Obryk · Answer 25 · Thu Jun 05 2014 17:41:34 GMT+0800 (China Standard Time)

Unless I misunderstand Brad's last proposal, the effective sequence of steps executed would be:

git checkout commit1
setup
git checkout commit2
setup
...
git checkout commitN
setup
script

(Note that in commitN might actually be earlier than commit{N-1}, in case commitN is being rebuilt.)

This can cause subtle bugs: assume that you've accidentally removed a dependency from the project and setup doesn't install it anymore. In that case you wouldn't notice the problem until you've flushed the cache. Ctavan's proposal is free of this problem. In that case, the effective sequence would be:

setup (interspersed with adding single files from the repository)
git checkout commitN
script

This would provide a truly stateless build and, if we used docker's build mechanism for the setup phase, would give us correct cache invalidation for free. A downside of this proposal that I see is that we'd need to check out the files required in the setup phase somewhere outside of the container. If this problem can be overcome without large complications I'd be much in favour of ctavan's proposal.

Grzesiek Kolodziejczyk · Answer 26 · Thu Jun 05 2014 17:46:39 GMT+0800 (China Standard Time)

I think the discussion got sidetracked.

For rubygems caching, the current solution of having

cache:
  - /tmp/bundler
script:
  - bundle install --quiet --path /tmp/bundler

does the job pretty well.

The only issue I found is that this cache doesn't persist between builds for different branches, so is almost never used when running builds on pull requests.

Robert Obryk · Answer 27 · Thu Jun 05 2014 17:56:02 GMT+0800 (China Standard Time)

@grk This approach is hard to use when the docker host and the host that drone runs on are different and we can't rely on the contents of any directories on the docker host. Please correct me if I'm wrong: isn't it the case that using that approach allows such subtle bugs as I've mentioned to appear?

Sebastian Krebs · Answer 28 · Fri Nov 14 2014 05:35:15 GMT+0800 (China Standard Time)

Hi. Are there any news regarding this issue?

Kyle Welsby · Answer 29 · Sun Jan 18 2015 21:28:22 GMT+0800 (China Standard Time)

I would really like for my build to run faster, all the npm install and bundle install slows the builds down considerably.
Could drone have a local HTTP cache/mirror or something for regular requests to known services.

glaszig · Answer 30 · Thu Mar 12 2015 00:06:47 GMT+0800 (China Standard Time)

i am doing a ls -la before bundle install in the script phase and the cache directory is always empty.

Nathan Williams · Answer 31 · Thu Mar 12 2015 03:13:04 GMT+0800 (China Standard Time)

it looks like the cache is currently branch-specific, which makes it awkward for feature-branch based PRs, as the cache gets duped for every feature branch, but doesn't have the advantage of faster builds

Brad Rydzewski · Answer 32 · Thu Mar 12 2015 03:47:23 GMT+0800 (China Standard Time)

Per-branch caching is removed per #912 (thanks @nathwill)

Note that #902 will expose much more of the underlying Docker implementation and will allow mounting volumes from the yaml file. So #902 will end up replacing the cache section with a volumes section.

We're also working on modularity and plugins. The git clone functionality is being moved to a plugin:
https://github.com/drone-plugins/drone-git

This will give us much more flexibility and should allow us to perform a git checkout instead of a git clone if a .git directory already exists. This would allow us to start caching the repository root. Stay tuned.

davidak · Answer 33 · Sat Mar 14 2015 02:11:42 GMT+0800 (China Standard Time)

consider this:

you have cached installed dependencies and run a build again. since the last build the dependencies have changed.

so we have to run the commands from setup section again to update the dependencies. that would be still faster than installing all again.

glaszig · Answer 34 · Mon Mar 23 2015 00:13:30 GMT+0800 (China Standard Time)

@nathwill #912 does not seem to change anything for me. i'm a little confused now -- is this still a work in progress or should the simple case of sharing one folder (for rubygems) basically work?

as afore-remarked, ls -la on the cache folder yields an empty one, each and every build.

Nathan Williams · Answer 35 · Mon Mar 23 2015 09:42:18 GMT+0800 (China Standard Time)

@glaszig #912 changes the cached folder from per-repo-branch to per-repo, but is still repo specific, and not generic to the build-box (maybe you're testing different repos?). you can also poke around under /tmp/drone on the build host and find the cached directory for direct inspection.

in any case, it's definitely working on our system; maybe you can share your .drone.yml and drone version?

glaszig · Answer 36 · Tue Mar 24 2015 04:50:33 GMT+0800 (China Standard Time)

#912 changes the cached folder from per-repo-branch to per-repo, but is still repo specific, and not generic to the build-box

alright. that's what i read from the code; what i expected.

maybe you're testing different repos?

no. always the same. only different branches. so, i should see your changes having an effect.

you can also poke around under /tmp/drone on the build host and find the cached directory for direct inspection.

yeah. there's a folder structure there and also my cache folder. but it is empty.

maybe you can share your .drone.yml and drone version?

Ubuntu 14.04.1 LTS (GNU/Linux 3.13.0-24-generic x86_64)

# docker -v
Docker version 1.4.0, build 4595d4f

# drone -v
drone version 0.3.0-alpha

drone.yml

image: drone/ruby
env:
  - RAILS_ENV=test
cache:
  - /tmp/bundler
script:
  - rbenv versions
  - pwd
  - ls -la /tmp/bundler
  - cp config/database.drone.yml config/database.yml
  - sudo chown -R ubuntu:ubuntu /tmp/bundler
  - sudo chmod -R ug+rw /tmp/bundler
  - bundle install --path /tmp/bundler
  - bundle exec rake db:create
  - bundle exec rake db:schema:load > /dev/null
  - bundle exec rake db:migrate
  - bundle exec rake db:seed
  - bundle exec rspec
services:
  - mysql

Nathan Williams · Answer 37 · Tue Mar 24 2015 05:42:15 GMT+0800 (China Standard Time)

seems right to me, but i noticed that the drone version didn't update when my patch went in... the version i have installed is:

[root@drone01.prod ~]$ rpm -q drone
drone-0.3.0_alpha-1427045373.x86_64

outside of that, i've no idea why it might not be caching for you.

glaszig · Answer 38 · Tue Mar 24 2015 10:57:54 GMT+0800 (China Standard Time)

same version. somehow can't get the cache working. giving up for now.

# apt-cache showpkg drone
Package: drone
Versions:
0.3.0-alpha-1427045373

glaszig · Answer 39 · Tue Apr 28 2015 06:35:19 GMT+0800 (China Standard Time)

follow-up.

during a build today i ran docker inspect on the container.

# docker inspect 5c6bc2baaa1b
[{
...
    "HostConfig": {
        "Binds": [
            "/tmp/drone/github.com/glaszig/myproject/tmp/bundler:/tmp/bundler"
        ],
...
    "Volumes": {
        "/tmp/bundler": "/var/lib/docker/vfs/dir/03b72641890f9e5fb837db1adc74d2519fc2c06203aa99f4b5d7d132edeb6b4b"
    },
    "VolumesRW": {
        "/tmp/bundler": true
    }
}
]

what i see there is an assumably correct HostConfig.Binds entry.
What looks suspicious to me (as someone not knowing enough about docker internals) is the Volumes entry pointing to a folder in /var/lib/docker/vfs/dir. I then took a look into that folder on the host machine and found 136 such folders, presumably for each drone build, which eventually contained the gems installed during the build.

Drone/docker is writing the content of my cache folder to a new directory during every build. That's why the cache is always empty.

Any idea what is wrong here?

Ladislav Prskavec · Answer 40 · Sun Aug 02 2015 19:07:42 GMT+0800 (China Standard Time)

I think VOLUMES are not used in docker build only in docker run.

Brad Rydzewski · Answer 41 · Wed Aug 19 2015 02:30:25 GMT+0800 (China Standard Time)

update: @donny-dont has been working on proper caching, including the ability to cache portions of the git directory (which prior to his changes complains if you clone into a non-empty directory). I think this will take time to perfect, but it will be a really good start

Don Olmstead · Answer 42 · Wed Aug 19 2015 02:35:54 GMT+0800 (China Standard Time)

Hoping to get through the pull request process today and then this should be closed. Will write some docs around it too.

Don Olmstead · Answer 43 · Thu Aug 20 2015 06:28:53 GMT+0800 (China Standard Time)

drone-plugins/drone-git#1 is needed for caching as git clone cannot happen in a non empty directory. This changes the behavior to use git init which allows the caching volumes to be present.

Don Olmstead · Answer 44 · Thu Aug 20 2015 06:29:58 GMT+0800 (China Standard Time)

Alrighty drone-plugins/drone-git#1 is merged just need to write docs and this can close.