influxdata / influxdata-docker

Official docker images for the influxdata stack

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Permission-probems due to user-id mapping discrepancy

holiman opened this issue · comments

I see that there are a number of previous issues,

Maybe one or all of them are same as this one? Anyway, the platform I'm using is Debian 5.10.149-1 (2022-10-17) x86_64 GNU/Linux, running container telegraf:1.24 .

Here's my config file, owned by telegraf:

root@bench-cl:/home/admin# ls -la /data/etc_telegraf
total 20
drwx------ 2 telegraf telegraf  4096 Oct 26 17:22 .
drwx------ 4 telegraf root      4096 Oct 26 17:23 ..
-rwx------ 1 telegraf telegraf 11076 Oct 26 17:22 telegraf.conf

I start the docker container like the dockerfile does but entering via bash and then calling the setpriv:

root@bench-cl:/home/admin# docker run -it -v /data/etc_telegraf:/etc/telegraf --entrypoint /bin/bash telegraf 
root@01a71b1401d4:/# exec setpriv --reuid telegraf /bin/bash

Then try to read the config:

telegraf@01a71b1401d4:/$ ls -ls /etc/telegraf/telegraf.conf
ls: cannot access '/etc/telegraf/telegraf.conf': Permission denied

Permission deined. Let's check the config-dir:

telegraf@01a71b1401d4:/$ ls -ls /etc/ | grep telegraf
 4 drwx------ 2 1002   1002  4096 Oct 26 17:22 telegraf

It's owned by 1002:1002. Who am I then?

telegraf@01a71b1401d4:/$ whoami
telegraf
telegraf@01a71b1401d4:/$ id
uid=999(telegraf) gid=0(root) groups=0(root)

Apparently, the name telegraf maps to 999 inside the container. On the host-system, telegraf maps to 1002.

root@bench-cl:/home/admin# cat /etc/passwd | grep telegraf
telegraf:x:1002:1002::/home/telegraf:/bin/sh

I have no idea how to solve this. I mean, I can solve this for this particular machine, in this particular setting. What I'm wondering is if there is a general solution to this problem? This problem is. as far as I can tell, the strongest reason why it's better to not switch users in the Dockerfile, but instead run with whatever user the operator is using.

Perhaps I'm missing something, or have done something wrong in my setup?

but entering via bash and then calling the setpriv

Can I ask why you are doing it this way and not calling telegraf directly?

❯ ls -l config/
total 4
-rw-r--r-- 1 powersj powersj 33 Oct 27 07:55 telegraf.conf
❯ cat config/telegraf.conf 
[[inputs.mem]]
[[outputs.file]]
❯ docker run -it -v /home/powersj/config:/etc/telegraf telegraf
2022-10-27T13:56:13Z I! Using config file: /etc/telegraf/telegraf.conf
2022-10-27T13:56:13Z I! Starting Telegraf 1.24.2
2022-10-27T13:56:13Z I! Available plugins: 222 inputs, 9 aggregators, 26 processors, 20 parsers, 57 outputs
2022-10-27T13:56:13Z I! Loaded inputs: mem
2022-10-27T13:56:13Z I! Loaded aggregators: 
2022-10-27T13:56:13Z I! Loaded processors: 
2022-10-27T13:56:13Z I! Loaded outputs: file
2022-10-27T13:56:13Z I! Tags enabled: host=b40d85336bd5
2022-10-27T13:56:13Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"b40d85336bd5", Flush Interval:10s
mem,host=b40d85336bd5 available=29317738496i,swap_cached=0i,total=33566760960i,used=3642249216i,sunreclaim=300638208i,cached=9188962304i,dirty=2101248i,free=20732760064i,high_total=0i,huge_pages_free=0i,buffered=2789376i,high_free=0i,low_free=0i,mapped=894029824i,shared=131170304i,available_percent=87.34157737452425,huge_pages_total=0i,inactive=7571947520i,swap_free=4294963200i,vmalloc_total=35184372087808i,write_back=0i,huge_page_size=2097152i,page_tables=38690816i,slab=643235840i,sreclaimable=342597632i,vmalloc_used=133713920i,used_percent=10.850761622011444,committed_as=12482924544i,vmalloc_chunk=0i,active=4255858688i,commit_limit=21078343680i,low_total=0i,swap_total=4294963200i,write_back_tmp=0i 1666878980000000000
^C2022-10-27T13:56:27Z I! [agent] Hang on, flushing any cached metrics before shutdown
2022-10-27T13:56:27Z I! [agent] Stopping running outputs

As far as the previous issues:

#646

This does possibly look similar, but is using podman. Are you?

#645

This one mentions that they needed to bind on a different port and the issues went away. I don't think that is applicable in this case.

#644

This one actually looks like #631 which is related to InfluxDB and specifying paths. I don't think that is applicable in this case.

Can I ask why you are doing it this way and not calling telegraf directly?

That's a very good question -- apparently I missed writing a proper intro, jumping straight into the investigation-part. So my initial problem was this:

# docker run -it -v /data/etc_telegraf:/etc/telegraf telegraf
2022-10-28T08:04:09Z E! [telegraf] Error running agent: No config file specified, and could not find one in $TELEGRAF_CONFIG_PATH, /root/.telegraf/telegraf.conf, or /etc/telegraf/telegraf.conf

Oddly, it could be bypassed with this simple trick (calling the telegraf binary directly instead of using the entrypoint script):

# docker run -it -v /data/etc_telegraf:/etc/telegraf --entrypoint telegraf telegraf 
2022-10-28T08:04:45Z I! Using config file: /etc/telegraf/telegraf.conf

And that's what led me to investigate further, the text in my description above.

As for your case

-rw-r--r-- 1 powersj powersj 33 Oct 27 07:55 telegraf.conf

You have the file world-readable, and I suppose you also have the directory world-readable. That is one possible way to solve/avoid this issue.

Some ways I can think of:

  • Make it world-readable,
  • Skip the script, use custom entrypoint to invoke the binary
  • Make it owned by 999 on the host, in my case on a fresh debian, that would be systemd-timesync
  • As per the instructions here, https://www.influxdata.com/blog/docker-run-telegraf-as-non-root/, use --user to assign a group read privilege

Neither seems great, but world-readable is probably the least hacky. setting group membership via --user seems least hacky. It's a bit ironic though, because I also use the docker plugin, and the guide says:

If a user passes in the Docker socket for Telegraf to monitor Docker itself, then they will need to add the telegraf user to the group that owns the Docker socket

Essentially, make the gid membership docker. However, as I understand it, being part of docker means you have root privileges, so it's back where it started.

This was news to me, but it seems like user namespace remap has been implemented in order to handle these precise issues.

That's good news -- seems to be a bit more hassle to set up on the host, but at least there's an official way to handle these issues.