tiredofit / docker-traefik-cloudflare-companion

Automatically Create CNAME records for containers served by Traefik

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Container stalls after "Starting Zabbix Agent"

zombielinux opened this issue · comments

I've got the following log

[s6-init] making user provided files available at /var/run/s6/etc...exited 0.
[s6-init] ensuring user provided files have correct perms...exited 0.
[fix-attrs.d] applying ownership & permissions fixes...
[fix-attrs.d] 00-functions: applying... 
[fix-attrs.d] 00-functions: exited 0.
[fix-attrs.d] 01-s6: applying... 
[fix-attrs.d] 01-s6: exited 0.
[fix-attrs.d] 02-zabbix: applying... 
[fix-attrs.d] 02-zabbix: exited 0.
[fix-attrs.d] 03-logrotate: applying... 
[fix-attrs.d] 03-logrotate: exited 0.
[fix-attrs.d] done.
[cont-init.d] executing container initialization scripts...
[cont-init.d] 00-startup: executing... 
[cont-init.d] 00-startup: exited 0.
[cont-init.d] 01-timezone: executing... 
[NOTICE] ** [timezone] Timezone: Setting to 'America/New_York' from 'Etc/GMT'
[cont-init.d] 01-timezone: exited 0.
[cont-init.d] 02-permissions: executing... 
[cont-init.d] 02-permissions: exited 0.
[cont-init.d] 03-zabbix: executing... 
[cont-init.d] 03-zabbix: exited 0.
[cont-init.d] 04-cron: executing... 
[NOTICE] ** [cron] Disabling Cron
[cont-init.d] 04-cron: exited 0.
[cont-init.d] 05-smtp: executing... 
[NOTICE] ** [smtp] Disabling SMTP Features
[cont-init.d] 05-smtp: exited 0.
[cont-init.d] 10-cloudflare-companion: executing... 
[NOTICE] ** [traefik-cloudflare-companion] Setting Traefik 2.x Mode
[cont-init.d] 10-cloudflare-companion: exited 0.
[cont-init.d] 99-container: executing... 
[cont-init.d] 99-container: exited 0.
[cont-init.d] done.
[services.d] starting services
[services.d] done.
[INFO] ** [traefik-cloudflare-companion] Starting Traefik Cloudflare Companion
[INFO] ** [zabbix] Starting Zabbix Agent

My docker-compose looks like this:

    image: tiredofit/traefik-cloudflare-companion:latest
    container_name: cloudflare-companion
    networks:
     - traefik_proxy
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
    environment:
      - TIMEZONE=$TZ
      - TRAEFIK_VERSION=2
      - CF_EMAIL=$CLOUDFLARE_EMAIL
      - CF_TOKEN=$CLOUDFLARE_API_KEY
      - TARGET_DOMAIN=$DOMAINNAME
      - DOMAIN1=$DOMAINNAME
      - DOMAIN1_ZONE_ID=$CLOUDFLARE_ZONEID
      - DOMAIN1_PROXIED=FALSE
    restart: always
    deploy:
      placement:
        constraints:
          - "node.role==manager"

Logging into the container and executing the items in /etc/cont-init.d/ shows only a single issue with "03-zabbix" as shown below

mkdir: can't create directory '': No such file or directory
chown: unknown user 
chown: unknown user 

My cursory glance is showing its failing to create a logfile somewhere along the line and then dropping out of the whole thing.

You can turn Zabbix off: ENABLE_ZABBIX=FALSE

All looks normal to me, can you give me the output of a ps -ef
Thanks

Sure can. See below

    1 root      0:00 s6-svscan -t0 /var/run/s6/services
   31 root      0:00 s6-supervise s6-fdholderd
  758 root      0:00 s6-supervise 03-zabbix
  760 root      0:00 s6-supervise 10-cloudflare-companion
  762 zabbix    0:00 zabbix_agentd -f
  764 root      0:01 python -u /usr/sbin/cloudflare-companion
  798 zabbix    0:00 zabbix_agentd: collector [idle 1 sec]
  799 zabbix    0:00 zabbix_agentd: listener #1 [waiting for connection]
  800 zabbix    0:00 zabbix_agentd: listener #2 [waiting for connection]
  801 zabbix    0:00 zabbix_agentd: active checks #1 [idle 1 sec]
  841 root      0:00 bash
  846 root      0:00 ps -ef

PID 764 shows that he container is running, as is Zabbix.
I find that with some of the changes I've made with the base images lately just running the scripts inside cont-init.d won't; give you the expected output you want as it's hardcoded to look for a different path (I believe /var/run/s6/cont-init.d).

But back to the matter at hand, you'll only get output from the python script if it can find a matching rule in your Traefik labels section.

Try this on a sample container:

 labels:
      - traefik.enable=true
      - traefik.http.routers.example.rule=Host(`dns.example.com`)

I have a helloworld container running with the following lables:

      labels:
        - "traefik.enable=true"
        - "traefik.http.routers.helloworld.rule=HostHeader(`helloworld2.$DOMAINNAME`)"
        #- "traefik.http.routers.helloworld.rule=Host(`helloworld2.$DOMAINNAME`)"
        - "traefik.http.routers.helloworld.rule=Host(`helloworld2.tld.org`)"
        - "traefik.http.routers.helloworld.entrypoints=websecure"
        - "traefik.http.routers.helloworld.tls.certresolver=dns-cloudflare"
        - "traefik.http.services.helloworld.loadbalancer.server.port=80"
        #HTTPS Redirect Code
        - "traefik.http.middlewares.helloworld-https.redirectscheme.scheme=https"
        - "traefik.http.routers.helloworld-insecure.middlewares=helloworld-https@docker"
        - "traefik.http.routers.helloworld-insecure.rule=Host(`helloworld2.$DOMAINNAME`)"
        - "traefik.http.routers.helloworld-insecure.entrypoints=web"

Where $DOMAINNAME = tld.org

That should definitely do it.
It should be monitoring the docker socket and showing some sort of response like so:


cloudflare-companion    | container rule value:  Host(`dns.example.com`)
cloudflare-companion    | extracted_domains from rule:  [u'dns.example.com']
cloudflare-companion    | Found Container: 670c82dc337067c35c7603969211e701b5d0fe6f28c60e4c92a7f77a038739e2 with Hostname dns.example.com

FWIW, I'm running docker-swarm across a few machines, but the cloudflare-companion container is able to ping the traefik container with ease.

But I'm a bit befuddled as to why it doesn't work.

Typically when running swarm your socket would be network related. How do you have the network socket connected with Traefik? We're assuming that you are using /var/run/docker.sock on your system, which is where I believe this issue is occurring.

You can change the socket entry point to something tcp oriented by setting the environment variable DOCKER_ENTRYPOINT

Hopefully someone else can speak up on this one, there are an awful lot of users of this image (and its companion nginx-proxy-cloudflare-companion) where I would think someone must have figured this out.

Yep. That's how I'm doing it.

      - /var/run/docker.sock:/var/run/docker.sock:ro

Both containers are bound to the same manager host as well.

I'm honestly pretty new to the whole docker ecosystem, so I need to find some good documentation on choosing socket entry points.

From inside your cloudflare companion container, can you make sure you can talk to the socket?
This command will show all your images pulled onto your Docker Host
curl --unix-socket /var/run/docker.sock http:/v1.24/images/json

I don't need to see the output really, but lets just make sure that its showing a json list of images. (Should be a series of ID and Parent IDs)

I've got a big json string, each element has an ID (starts with sha256:). There is a "ParentId" tag as well, but its value is null for all containers.

I DO see all my running and expected containers though.

OK< good enough. So at least we can talk to the socket which is where all the info is coming up, I truly am stumped here as to what could be happening. The image itself is pretty basic in what it does in a very hackish way. I'm going to reach out to some of my clients and see if we are running in a swarm environment to see if there is something that is being missed here.

Any word from your clients?

I have a few who are running swarm and on the TCP socket yet none are reporting issues. I'll test on a burner system today to see if I can recreate. We did have some nasty stuff happening in the past week with Traefik 2.2.2+ and finally resolved when we moved to 2.2.6 - but the cloudflare container wouldn't have been affected by it.

I setup a Ubuntu 20.04 server last night and exposed the Docker Socket via TCP and ran a few tests both from the local machine and a remote machine. I used the value of DOCKER_ENTRYPOINT=tcp://(host_ip):2376. My Docker socket was listening on 0.0.0.0 and I turned off any firewalls to limit access.

On test #1 (same machine to docker socket) it worked as expected
On test #2 (remote cloudflare-companion image pointing at the remote IP host) it again worked as expected.

There has to be something else that is blocking this. I didn't do anything fancy with my setup, had it up and running within 10 minutes after first boot. I'm back to being stumped and hoping someone is able to step in..

Have a peek at this PR.

The submitter added TLS support and also some additional variables which might be solving your problem. Assuming it is TLS related. Note, you'd probably have to export a data volume for your docker certificates as well, and there is an environment variable built to support that. Let me know if that changes anything?

Edited to add reference links

Hi there,

I am also having this issue. I am running traefik v2.2.7 in swarm mode on a single node. The "manual" service for a non-docker service works and the CNAME in CF has been added correctly. The dockerized services are not being found.

I've been trying to figure out what's been happening and just doing straight comparison on the difference between the cf-companion service and all my other services...

I am not too well versed in the coding, but how/where is the cf-companion looking for the labels? My 30 second google-fu shows that there might be a difference between service labels and container labels. In Portainer, all of my services have service labels and not container labels. I noticed that the manual entry on the cf-service shows up as a container label, not a service label.

cf-companion labels:
image
image

random-other-service labels:
image
image

The only difference I see between the yml for cf-companion and all my services is the deploy command:

#My regular services yml
deploy:
  labels:
    - "LABELS"

versus

#CF-Companion yml
labels:
  - "LABELS"

Can you give me the output (fuzz it if you need to) of one of your containers with labels?
docker inspect (containername)
You may be onto something here... We're pulling them from the json key of Config: , Labels:

image

I navigated down to the Config:Labels area. I see no mention of the traefik labels... or anything that would show the Host('etc....

So I did docker service inspect, and tada!
image

OK then. This helps tremendously. Going to think about it for a night and lets see what I can come up with.

I don't know much about it, but perhaps SWARM_MODE:TRUE might be good if it can be directed to the service labels versus the container labels. And then if it is swarm_mode, the "non-dockerized services" can be under the deploy in the cf-companion.yml as well. Unfortunately I don't know enough about coding to do a PR!

I like the SWARM_MODE idea. I've put together a test version on Docker Hub. Can you try pulling tiredofit/traefik-cloudflare-companion:develop with SWARM_MODE=TRUE as an environment variable?

Got this in the logs:

[INFO] ** [traefik-cloudflare-companion] Starting Traefik Cloudflare Companion,
  File "/usr/sbin/cloudflare-companion", line 59, in <module>,
  File "/usr/sbin/cloudflare-companion", line 49, in init,
    init(),
AttributeError: 'NoneType' object has no attribute 'get',
    for prop in c.attrs.get(u'Spec').get(u'Labels'):,
  File "/usr/sbin/cloudflare-companion", line 29, in check_container,
    check_container(c),
Traceback (most recent call last):```

Can you send me privately the entire inspect output? Based on the indenting of the output above I may be missing a key. I'm dave at tiredofit dot ca .

Sent!

Odd. still haven't seen it. No sign on my MTA either.

Sent it again. It might be because I pasted it as plain text in the email... so this time I sent it as a .txt attachment.

Received (oddly at same time). I'm going to need a few days (week max) to parse and setup a test environment. My 'real world' role has just required more time than I have anticipated. I'll be in touch.

See tiredofit/traefik-cloudflare-companion:6.0.0 for working SWARM_MODE=TRUE.