xnuinside / airflow_in_docker_compose

Apache Airflow in Docker Compose (for both versions 1.10.* and 2.*)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Permission Denied Error

hasher1705 opened this issue · comments

When I run docker-compose up --build, I get
PermissionError: [Errno 13] Permission denied: '/opt/airflow/logs/scheduler'

Full log below -

flower_1     | Unable to load the config, contains a configuration error.
flower_1     | Traceback (most recent call last):
flower_1     |   File "/usr/local/lib/python3.6/logging/config.py", line 565, in configure
flower_1     |     handler = self.configure_handler(handlers[name])
flower_1     |   File "/usr/local/lib/python3.6/logging/config.py", line 738, in configure_handler
flower_1     |     result = factory(**kwargs)
flower_1     |   File "/home/airflow/.local/lib/python3.6/site-packages/airflow/utils/log/file_processor_handler.py", line 50, in __init__
flower_1     |     os.makedirs(self._get_log_directory())
flower_1     |   File "/usr/local/lib/python3.6/os.py", line 210, in makedirs
flower_1     |     makedirs(head, mode, exist_ok)
flower_1     |   File "/usr/local/lib/python3.6/os.py", line 220, in makedirs
flower_1     |     mkdir(name, mode)
flower_1     | PermissionError: [Errno 13] Permission denied: '/opt/airflow/logs/scheduler'

I'm using /usr/local/airflow/ instead of /opt/ as a workaround in volumes.

I'm using /usr/local/airflow/ instead of /opt/ as a workaround in volumes.

Does this works for you? this fixed and error?

Yes. This fixed the permission denied error.

Interesting, I don't have such error on MacOS. What OS you use?

Ubuntu 18.04.4 LTS

@hasher1705 but dags and other files in image are placed in /opt/airflow/ folder, how does it work for you? error no raised, correct, but volumes not works to)

hm, problem only in logs folder in Airflow, if change - ./logs:/opt/airflow/logs to - /opt/airflow/logs all works

@hasher1705 are you use pure Ubuntu or WSL?

@hasher1705 so for Ubuntu I think you need to make sure that the UID and GID matches the ones on the docker host. I use Mac for local development, but I had to do that tweak so I can run on a production EC2

Here's my Dockerfile

FROM apache/airflow:1.10.11

USER root

RUN apt-get update -yqq \
    && apt-get install -y gcc freetds-dev \
    && apt-get install -y git procps \
    && apt-get install -y vim

RUN pip install apache-airflow[mssql,ssh,s3,slack,password,crypto,sentry]

RUN pip install azure-storage-blob sshtunnel google-api-python-client oauth2client docker

# https://github.com/ufoscout/docker-compose-wait
ADD https://github.com/ufoscout/docker-compose-wait/releases/download/2.7.3/wait /wait
RUN chmod +x /wait

ARG DOCKER_UID
RUN \
    : "${DOCKER_UID:?Build argument DOCKER_UID needs to be set and non-empty. Use './build.sh' to set it automatically.}" \
    && usermod -u ${DOCKER_UID} airflow \
    && find / -path /proc -prune -o -user 50000 -exec chown -h airflow {} \; \
    && echo "Set airflow's UID to ${DOCKER_UID}"

USER airflow

My build.sh looks like this:

#!/usr/bin/env bash

docker-compose build --build-arg DOCKER_UID=$(id -u)

And then your docker-compose.yml would look like:

version: "3.7"

networks:
  airflow:

x-airflow: &airflow
  build:
    context: .
    dockerfile: Dockerfile
    args:
      - DOCKER_UID=${DOCKER_UID-1000}
  env_file:
    - .env.development
  volumes:
    - ./dags:/opt/airflow/dags
    - ./volumes/airflow_data_dump:/opt/airflow/data_dump
    - ./volumes/airflow_logs:/opt/airflow/logs
    - /var/run/docker.sock:/var/run/docker.sock

services:
  postgres:
    image: postgres:12.4
    container_name: airflow_postgres
    environment:
      - POSTGRES_USER=airflow
      - POSTGRES_PASSWORD=airflow
      - POSTGRES_DB=airflow
      - PGDATA=/var/lib/postgresql/data/pgdata
    volumes:
      - ./volumes/postgres_data:/var/lib/postgresql/data/pgdata:Z
      - ./volumes/postgres_logs:/var/lib/postgresql/data/log:Z
    networks:
      - airflow

  webserver:
    <<: *airflow
    restart: always
    container_name: airflow_webserver
    command: webserver --pid /opt/airflow/airflow-webserver.pid
    ports:
      - 8080:8080
    depends_on:
      - postgres
    networks:
      - airflow
    healthcheck:
      test: ["CMD-SHELL", "[ -f /opt/airflow/airflow-webserver.pid ]"]
      interval: 30s
      timeout: 30s
      retries: 3

  scheduler:
    <<: *airflow
    restart: always
    container_name: airflow_scheduler
    command: scheduler --pid /opt/airflow/airflow-scheduler.pid
    depends_on:
      - postgres
    networks:
      - airflow

  upgradedb:
    <<: *airflow
    container_name: airflow_upgradedb
    entrypoint: /bin/bash
    command: -c "/wait && airflow upgradedb"
    environment:
      WAIT_HOSTS: postgres:5432
    depends_on:
      - postgres
    networks:
      - airflow

.env.development:

# https://airflow.apache.org/docs/stable/configurations-ref.html#webserver
AIRFLOW__WEBSERVER__DAG_DEFAULT_VIEW=tree
AIRFLOW__WEBSERVER__HIDE_PAUSED_DAGS_BY_DEFAULT=False
AIRFLOW__WEBSERVER__RBAC=False
AIRFLOW__WEBSERVER__WORKERS=2
AIRFLOW__WEBSERVER__WORKER_REFRESH_INTERVAL=1800
AIRFLOW__WEBSERVER__WEB_SERVER_WORKER_TIMEOUT=300
AIRFLOW__WEBSERVER__NAVBAR_COLOR=#2d5d4e
AIRFLOW__WEBSERVER__ENABLE_PROXY_FIX=True
AIRFLOW__WEBSERVER__X_FRAME_ENABLED=False
AIRFLOW__WEBSERVER__SESSION_LIFETIME_DAYS=31
AIRFLOW__WEBSERVER__DEFAULT_UI_TIMEZONE=America/Sao_Paulo
AIRFLOW__WEBSERVER__WEB_SERVER_PORT=8080
AIRFLOW__WEBSERVER__EXPOSE_CONFIG=False
AIRFLOW__WEBSERVER__DAG_ORIENTATION=TB

# https://airflow.apache.org/docs/stable/configurations-ref.html#scheduler
AIRFLOW__SCHEDULER__MIN_FILE_PROCESS_INTERVAL=0
AIRFLOW__SCHEDULER__MAX_THREADS=4
AIRFLOW__SCHEDULER__SCHEDULER_HEARTBEAT_SEC=3
AIRFLOW__SCHEDULER__JOB_HEARTBEAT_SEC=5
AIRFLOW__SCHEDULER__SCHEDULER_ZOMBIE_TASK_THRESHOLD=300
AIRFLOW__SCHEDULER__CATCHUP_BY_DEFAULT=False
AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL=5

# https://airflow.apache.org/docs/stable/configurations-ref.html#core
AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgres://airflow:airflow@postgres:5432/airflow
AIRFLOW__CORE__FERNET_KEY=khRCuLpMHLVycUvbabsHy5diUV2NUygLG47auGx29VY=
AIRFLOW__CORE__EXECUTOR=LocalExecutor
AIRFLOW__CORE__LOAD_EXAMPLES=False
AIRFLOW__CORE__LOAD_DEFAULT_CONNECTIONS=False
AIRFLOW__CORE__LOGGING_LEVEL=info
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION=True
AIRFLOW__CORE__DAGBAG_IMPORT_TIMEOUT=30
AIRFLOW__CORE__DAG_FILE_PROCESSOR_TIMEOUT=50
AIRFLOW__CORE__WORKER_PRECHECK=True
AIRFLOW__CORE__DAG_DISCOVERY_SAFE_MODE=True
AIRFLOW__CORE__SECURE_MODE=True
AIRFLOW__CORE__CHECK_SLAS=True

# https://airflow.apache.org/docs/stable/configurations-ref.html#operators
AIRFLOW__OPERATORS__DEFAULT_OWNER=airflow

This setup could be a bit overwhelming...but thats how I managed to get it working as clean as possible.

chown -R 50000.50000 logs worked. Probably not very pretty, though.

@hasher1705 Thanks to you. I solved this problem.
I am using ubuntu 20.04
I modified my log path to "/usr/local/airflow/log"

I added link to this issue in readme #13 maybe will be easier to find if someone also met this issue. @jualvarez, thanks for the solution