Unidata / thredds-docker

Dockerized THREDDS

Home Page:https://hub.docker.com/r/unidata/thredds-docker

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Failed to create work directory [/usr/local/tomcat/work/Catalina/localhost/thredds]

gajowi opened this issue · comments

I'm having trouble getting permissions right in layering over the existing thredds-docker containers - having similar problems to #158 and taking a similar approach to @rsignell-usgs To investigate the issues I've been delving into log files and in my environment get warning and severe log messages for 4.6.10, the one in the subject of this issue in particular.

I've been able to overcome the warning by mapping a volume (to /usr/local/tomcat/work/Catalina) that has localhost/thredds already created with tomcat write permission. I've had similar problems with /usr/local/tomcat/cache/Catalina/localhost/thredds

There are a few related issues, including getting wms to work.

A desirable general action to avoid these problems would be to enhance the ci test suite. To that end I've forked and branched (off master) and drafted some changes at: gajowi#1 It took me a few goes to get the tests working (and failing properly!). I think you should be able to view the latest failure at: https://travis-ci.org/gajowi/thredds-docker/jobs/262875836 (view the raw log to see the tests success/failure at the end).

Would you like a PR or would you prefer to just grab/rewrite the extra tests? The tests will need some rewriting as there are known warnings that need to be ignored. Probably the one line test:
docker logs thredds 2>&1 >/dev/null | grep -i warn && exit 1 || echo no warn docker log stderr entries
should be moved to a script and save intermediate files to make the logic simpler to understand and to add filtering of ignorable warnings.

BTW in my own tests building on 4.6.10 I have problems evident in serverStartup.log:
WARN serverStartup: Nc4Iosp: NetCDF-4 C library not present (jna_path='/usr/local/lib/', libname='netcdf'). java.lang.UnsatisfiedLinkError: Failed to create temporary file for /com/sun/jna/linux-x86-64/libjnidispatch.so library: Permission denied at com.sun.jna.Native.loadNativeDispatchLibraryFromClasspath(Native.java:866) ~[jna-4.2.2.jar:4.2.2 (b0)]
However, I've only applied the new travis tests to my fork of master (which does not have the problem in that environment) so I can't yet confirm if this problem applies to the 4.6.10 release (as it would be tested in a ci-travis context) or if it only applies to my layering and/or my docker environment and its storage layer setup. I may make a new issue for that.

following up I just ran 4.6.10 from docker hub with no options:
docker run --rm unidata/thredds-docker:4.6.10
I get quite a few errors to the console (or/via docker log) like:
java.io.FileNotFoundException: /usr/local/tomcat/logs/catalina.2017-08-10.log (Permission denied)
java.io.FileNotFoundException: /usr/local/tomcat/logs/localhost.2017-08-10.log (Permission denied)
java.io.FileNotFoundException: /usr/local/tomcat/logs/manager.2017-08-10.log (Permission denied)
java.io.FileNotFoundException: /usr/local/tomcat/logs/host-manager.2017-08-10.log (Permission denied)
10-Aug-2017 01:37:58.099 SEVERE [Catalina-startStop-1] org.apache.catalina.startup.HostConfig.beforeStart Unable to create directory for deployment: [/usr/local/tomcat/conf/Catalina/localhost]
10-Aug-2017 01:37:58.102 SEVERE [Catalina-startStop-1] org.apache.catalina.valves.AccessLogValve.open Failed to open access log file [/usr/local/tomcat/logs/localhost_access_log.2017-08-10.txt]
java.io.FileNotFoundException: /usr/local/tomcat/logs/localhost_access_log.2017-08-10.txt (Permission denied)

It would be highly desirable if the published containers would run without such issues.

Not all of these could be identifies by a simplistic grep filter with 'warn'(ing) or 'severe' (may be better in CAPS). I realized I should also look for 'ERROR'. Of course any of these terms could appear in dataset names or metadata so the tests aren't very robust - but maybe good enough for CI. Some of the errors don't get a loglevel label but one could search on a specific variant of 'Exception', perhaps 'java.io.FileNotFoundException'.

Following on I updated my PR to check what log files are checked and either I was looking in the wrong place for tomcat logs or there are none: https://travis-ci.org/gajowi/thredds-docker/builds/262910510

This seems like a useful check for errors:
grep -r ' ERROR \| WARN \| WARNING \| SEVERE \|^java.*Except'

It could be followed up by filtering out acceptable/expected warnings/errors, perhaps using a pattern like: https://unix.stackexchange.com/questions/299462/how-to-filter-out-lines-of-a-command-output-that-occur-in-a-text-file, with:
grep -v -F -f allowed.txt
(no -x to allow for a timestamp).

untested... I'll wait for feedback before looking at this any further.

Just a quick comment about the JNA problem you ran into. This was resolved with #164, #165.

Ugh! Thanks for finding this. I can confirm what you are seeing with

docker run --rm unidata/thredds-docker:4.6.10

I am still getting to the bottom of this, but one problem is that in the dockerhub automated builds feature, I set a "linked repository" to unidata/tomcat-docker. This is bad and I wish I had not done this. It has unintended consequences. This means that whenever unidata/tomcat-docker is updated (which I did recently when going from tomcat 8.0 --> 8.5), the build would cascade through to everything that depends on it including what I thought to be frozen in time versions of thredds-docker(e.g., unidata/thredds-docker:4.6.10).

This still leaves unanswered the problem you are seeing, but the point I am making here is that at one time this container worked but then it didn't and it is because of that linked repo feature that I have now disabled.

Complicating matters is that I cannot simply go back and just redo the Docker build, b/c it does not work anymore. The HDF library has changed URLs.

I am in a bit of a mess here as you can see, and I still don't know what those permission errors are all about. They make absolutely no sense to me from everything I've learned about Unix in the last quarter century.

Stay tuned.

Thanks for the note on JNA @julienchastang Will you make another release of 4.6.10 on docker hub? I guess that would force the issue of establishing a thredds-docker versioning scheme.. I have no great ideas about such a scheme but I see major projects having prolific (if not comprehensive) versioning/tagging schemes - eg.: https://hub.docker.com/_/tomcat/ has many tags with v1-v2-v3 (and multiple abbreviated forms, plus 'latest' tags). I'm not saying you should follow that model closely (with subdirectories and many Dockerfiles in the repo and links to particular git hashes), but perhaps the v1-v2-... pattern is useful with:
v1 = thredds release version
v2 = thredds-docker sequence version (tricky - I'll write more below)
v3 = base image signifier (optional - but you may consider varying tomcat/java at some stage)

v2 seems to be the trickiest component. Semantic versioning does not seem appropriate as the published docker image is likely to have some aspect of limited reproducibility (it will be hard to tie down if all historic resources are available to rebuild a container - but at least the recipe is clear). A sequence version might make more sense (possibly as simple as ((a1, a2, ..., b1, ...,) rc1, ...,) 1 2 3...), or just a descriptive tag (feature branch name). The main point is to be able to release a new container with an unchanged thredds war release but a different container build (such as with different hdf/netcdf or java or base image). Unfortunately v2 will probably have different meaning for different v1 versions and might need to increment quite high with only some versions being released or increment separately for separate thredds release versions.

As I stated before, I don't think I have a solid suggestion for you - but hopefully these thoughts are useful.

So, will you make another release of 4.6.10? Else I need to think about how I build the containers myself instead of pulling form docker hub (and layering on top).

I believe this is related to docker-library/tomcat#35 and also moby/moby#783.

I can confirm switching from aufs to devicemapper addresses this issue.

Good find!
I think overlay2 may be a better choice on ubuntu and debian (which we use) and I'll try switching to it.
I'm mildly worried about bloating the size of images if the chown/chmod causes the content to be copied to upper layers in the storage driver. It may be prudent to chown/chmod less aggressively, leaving root to own most content and only changing ownership of directories that tomcat needs to write to. Some permissions may also need to be set so tomcat (user) can read (or navigate to) all directories it needs to. Potentially tomcat could be added to the group that owns content before the current chown (but I don't know what group that is and that may be less than optimal security-wise).
In any case, I'm thinking that in my production setup, mapping basically all the directories that thredds writes to into the container form the host filesystem will suit me. In our setup having a uid/gid on the host system to match the tomcat:tomcat user in the container is not a big problem.

Overlay2 seems to be affected by these sorts of problems too: moby/moby#20240

Sorry it looks like I'm not going to test overlay2 or devicemapper any time soon. The workaround I have with extra volume mappings is proving sufficient for me.

I am going to close this issue since it pertains to problems in Docker itself not this repository.