feluelle / airflow-diagrams

Auto-generated Diagrams from Airflow DAGs. 🔮 🪄

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TaskIds that contain numbers or dashes are not supported

nathadfield opened this issue · comments

If you have a DAG which contains tasks where the id contains either numbers or dashes then, although the diagram Python file is created, it cannot be run to generate the diagram image.

e.g.

from diagrams import Diagram
from diagrams.generic.blank import Blank

with Diagram("airbyte", show=False):
    airbyte-job_sensor = Blank("airbyte_job_sensor")
    airbyte_trigger_async = Blank("airbyte_trigger_async")
    
    airbyte_trigger_async >> [airbyte_job_sensor]
➜  airflow-diagrams git:(master) ✗ python3 examples/airbyte_diagrams.py
  File "/Users/nathan/Projects/airflow-diagrams/examples/airbyte_diagrams.py", line 5
    airbyte-job_sensor = Blank("airbyte_job_sensor")
    ^
SyntaxError: cannot assign to operator

Thank you @nathadfield for reporting this.

This is indeed an issue. I will try to fix it in the next patch. Or if you want feel free to open a PR :)

@feluelle No worries. I will definitely aim to take a look over the next couple of days.

You can see here how the variables will be generated. We could simply do a node["task_id"]|replace("-", "_") and edge["task_id"]|replace("-", "_")

Digits in the task_id would still be a problem though, right?

Yes we can remove/replace them with a regex expression easily as well.

FYI - I also noticed that DAGs which use TaskGroups also generate invalid diagram variables.

Because of the .?

If that's the case, the Pr should also fix this as it is hashing it and every hash prefixed with an underscore should be a valid python variable.

@nathadfield could you please test if this is working now. You can install the latest release (candidate) by running pip install --pre airflow-diagrams. It should install 1.0.1rc1.

Because of the .?

Yep! I'll certainly take a look at the RC tomorrow.

@feluelle So, your changes do look like they fie the issues with task_ids although the node labelling is a bit of a mess.

Screenshot 2022-01-21 at 09 02 41

This is perhaps for another version but might it be a good idea to put task groups into clusters?

Great idea. Let's move this to a different issue, please. And I would also add that to a minor release instead of a patch release. So will continue today with releasing 1.0.1.