stefanprodan / swarmprom

Docker Swarm instrumentation with Prometheus, Grafana, cAdvisor, Node Exporter and Alert Manager

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mountpoint set to "/", swap file returns NaN and upgrade of containers ?

iangregsondev opened this issue · comments

Hi,

I am getting some weird errors on available disk use, the query I notice is this.

sum((node_filesystem_free_bytes{mountpoint="/rootfs"} / node_filesystem_size_bytes{mountpoint="/rootfs"}) * on(instance) group_left(node_name) node_meta{node_id=~".+"} * 100) / count(node_meta * on(instance) group_left(node_name) node_meta{node_id=~".+"})

but on my system, there is no /rootfs, I did a check on node_filesystem_size_bytes that is used on the query and it outputs

✔node_filesystem_size_bytes{device="/dev/mapper/ubuntu--vg-ubuntu--lv",fstype="ext4",instance="10.0.8.3:9100",job="node-exporter",mountpoint="/"}
✔node_filesystem_size_bytes{device="/dev/mapper/ubuntu--vg-ubuntu--lv",fstype="ext4",instance="10.0.8.16:9100",job="node-exporter",mountpoint="/"}

as you can see the mountpoint is "/"

This is what I have set in my docker (nothing changed as far as variables are concerned - I did upgraded the version - see below)

    environment:
      - NODE_ID={{.Node.ID}}
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
      - /etc/hostname:/etc/nodename
    command:
      - '--path.sysfs=/host/sys'
      - '--path.procfs=/host/proc'
      - '--collector.textfile.directory=/etc/node-exporter/'
      - '--collector.filesystem.ignored-mount-points=^/(sys|proc|dev|host|etc)($$|/)'
      - '--no-collector.ipvs'

Also another one is the "used swap memory" which returns NaN, i.e. this query

sum(((node_memory_SwapTotal_bytes - node_memory_SwapFree_bytes) / node_memory_SwapTotal_bytes) * on(instance) group_left(node_name) node_meta{node_id=~".+"} * 100) / count(node_meta * on(instance) group_left(node_name) node_meta{node_id=~".+"})

Can anybody help me debug it ?

I need to be upfront, I did upgrade the containers as I felt there were outdated so I upgraded the containers and also I changed unsee for karma (karma is a fork by the original developer, unsee is actually deprecated, its technically the same - just nicer and written in react)

I will leave the compose and dockerfiles here, would interested in knowing if anybody else had tried this and getting issues :-) All I really did was edit the dockerfiles and add ":latest" tag and update docker-compose to build the images.

docker compose

version: "3.3"

networks:
  internal:
    external: false
  traefik:
    external: true

configs:
  dockerd_config:
    file: ./dockerd-exporter/Caddyfile
  node_rules:
    file: ./prometheus/rules/swarm_node.rules.yml
  task_rules:
    file: ./prometheus/rules/swarm_task.rules.yml

services:
  dockerd-exporter:
    image: stefanprodan/caddy
    networks:
      - internal
    environment:
      - DOCKER_GWBRIDGE_IP=172.18.0.1
    configs:
      - source: dockerd_config
        target: /etc/caddy/Caddyfile
    deploy:
      mode: global
      resources:
        limits:
          memory: 128M
        reservations:
          memory: 64M

  cadvisor:
    image: google/cadvisor
    networks:
      - internal
    command: -logtostderr -docker_only
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - /:/rootfs:ro
      - /var/run:/var/run
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
    deploy:
      mode: global
      resources:
        limits:
          memory: 128M
        reservations:
          memory: 64M

  grafana:
    image: iangregsondev/swarmprom-grafana:latest
    build:
      context: ./grafana
      dockerfile: Dockerfile
    networks:
      - default
      - internal
      - traefik
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=admin
      - GF_USERS_ALLOW_SIGN_UP=false
      #- GF_SERVER_ROOT_URL=${GF_SERVER_ROOT_URL:-localhost}
      #- GF_SMTP_ENABLED=${GF_SMTP_ENABLED:-false}
      #- GF_SMTP_FROM_ADDRESS=${GF_SMTP_FROM_ADDRESS:-grafana@test.com}
      #- GF_SMTP_FROM_NAME=${GF_SMTP_FROM_NAME:-Grafana}
      #- GF_SMTP_HOST=${GF_SMTP_HOST:-smtp:25}
      #- GF_SMTP_USER=${GF_SMTP_USER}
      #- GF_SMTP_PASSWORD=${GF_SMTP_PASSWORD}
    volumes:
      - /mnt/grafana:/var/lib/grafana
    deploy:
      mode: replicated
      replicas: 1
      placement:
        constraints:
          - node.role == manager
      resources:
        limits:
          memory: 128M
        reservations:
          memory: 64M
      labels:
        - traefik.enable=true
        - traefik.docker.network=traefik
        - traefik.http.routers.grafana.entrypoints=https
        - traefik.http.routers.grafana.tls.certresolver=le
        - traefik.http.services.grafana.loadbalancer.server.port=3000
        - traefik.http.routers.grafana.rule=Host(`grafana.somedomain.dev`)
        - traefik.http.middlewares.grafana-ipwhitelist.ipwhitelist.sourcerange=192.168.1.0/24        
        - traefik.http.routers.grafana.middlewares=grafana-ipwhitelist@docker  

  alertmanager:
    image: iangregsondev/swarmprom-alertmanager:latest
    build:
      context: ./alertmanager
      dockerfile: Dockerfile
    networks:
      - default
      - internal
      - traefik
    environment:
      - SLACK_URL=${SLACK_URL:-https://hooks.slack.com/services/TOKEN}
      - SLACK_CHANNEL=${SLACK_CHANNEL:-general}
      - SLACK_USER=${SLACK_USER:-alertmanager}
    command:
      - '--config.file=/etc/alertmanager/alertmanager.yml'
      - '--storage.path=/alertmanager'
    volumes:
      - /mnt/alertmanager:/alertmanager
    deploy:
      mode: replicated
      replicas: 1
      placement:
        constraints:
          - node.role == manager
      resources:
        limits:
          memory: 128M
        reservations:
          memory: 64M
      labels:
        - traefik.enable=true
        - traefik.docker.network=traefik
        - traefik.http.routers.alertmanager.entrypoints=https
        - traefik.http.routers.alertmanager.tls.certresolver=le
        - traefik.http.services.alertmanager.loadbalancer.server.port=9093
        - traefik.http.routers.alertmanager.rule=Host(`alertmanager.somedomain.dev`)
        - traefik.http.middlewares.alertmanager-ipwhitelist.ipwhitelist.sourcerange=192.168.1.0/24        
        - traefik.http.routers.alertmanager.middlewares=alertmanager-ipwhitelist@docker  

  karma:
    image: lmierzwa/karma:latest
    networks:
      - default
      - internal
      - traefik
    environment:
      - "ALERTMANAGER_URI=http://alertmanager:9093"
    deploy:
      mode: replicated
      replicas: 1
      labels:
        - traefik.enable=true
        - traefik.docker.network=traefik
        - traefik.http.routers.karma.entrypoints=https
        - traefik.http.routers.karma.tls.certresolver=le
        - traefik.http.services.karma.loadbalancer.server.port=8080
        - traefik.http.routers.karma.rule=Host(`karma.somedomain.dev`)
        - traefik.http.middlewares.karma-ipwhitelist.ipwhitelist.sourcerange=192.168.1.0/24        
        - traefik.http.routers.karma.middlewares=karma-ipwhitelist@docker  

  node-exporter:
    image: iangregsondev/swarmprom-node-exporter:latest
    build:
      context: ./node-exporter
      dockerfile: Dockerfile     
    networks:
      - internal
    environment:
      - NODE_ID={{.Node.ID}}
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
      - /etc/hostname:/etc/nodename
    command:
      - '--path.sysfs=/host/sys'
      - '--path.procfs=/host/proc'
      - '--collector.textfile.directory=/etc/node-exporter/'
      - '--collector.filesystem.ignored-mount-points=^/(sys|proc|dev|host|etc)($$|/)'
      - '--no-collector.ipvs'
    deploy:
      mode: global
      resources:
        limits:
          memory: 128M
        reservations:
          memory: 64M

  prometheus:
    image: iangregsondev/swarmprom-prometheus:latest
    build:
      context: ./prometheus
      dockerfile: Dockerfile    
    networks:
      - default
      - internal
      - traefik
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention=24h'
    volumes:
      - /mnt/prometheus:/prometheus
    configs:
      - source: node_rules
        target: /etc/prometheus/swarm_node.rules.yml
      - source: task_rules
        target: /etc/prometheus/swarm_task.rules.yml
    deploy:
      mode: replicated
      replicas: 1
      placement:
        constraints:
          - node.role == manager
      resources:
        limits:
          memory: 2048M
        reservations:
          memory: 128M
      labels:
        - traefik.enable=true
        - traefik.docker.network=traefik
        - traefik.http.routers.prometheus.entrypoints=https
        - traefik.http.routers.prometheus.tls.certresolver=le
        - traefik.http.services.prometheus.loadbalancer.server.port=9090
        - traefik.http.routers.prometheus.rule=Host(`prometheus.somedomain.dev`)
        - traefik.http.middlewares.prometheus-ipwhitelist.ipwhitelist.sourcerange=192.168.1.0/24        
        - traefik.http.routers.prometheus.middlewares=prometheus-ipwhitelist@docker  
        

and

FROM prom/alertmanager:latest

COPY conf /etc/alertmanager/

ENTRYPOINT  [ "/etc/alertmanager/docker-entrypoint.sh" ]
CMD        [ "--config.file=/etc/alertmanager/alertmanager.yml", \
             "--storage.path=/alertmanager" ]

and

FROM grafana/grafana:latest
# https://hub.docker.com/r/grafana/grafana/tags/

COPY datasources /etc/grafana/provisioning/datasources/
COPY swarmprom_dashboards.yml /etc/grafana/provisioning/dashboards/
COPY dashboards /etc/grafana/dashboards/

ENV GF_SECURITY_ADMIN_PASSWORD=admin \
    GF_SECURITY_ADMIN_USER=admin \
    GF_PATHS_PROVISIONING=/etc/grafana/provisioning/

and

FROM prom/node-exporter:latest

ENV NODE_ID=none

USER root

COPY conf /etc/node-exporter/

ENTRYPOINT  [ "/etc/node-exporter/docker-entrypoint.sh" ]
CMD [ "/bin/node_exporter" ]

and

FROM prom/prometheus:latest
# https://hub.docker.com/r/prom/prometheus/tags/

ENV WEAVE_TOKEN=none

COPY conf /etc/prometheus/

ENTRYPOINT [ "/etc/prometheus/docker-entrypoint.sh" ]
CMD        [ "--config.file=/etc/prometheus/prometheus.yml", \
             "--storage.tsdb.path=/prometheus" ]

It seems everything else is working great in the dashboards.

Just diskspace and memory

image

image