astronomer / airflow-provider-fivetran-async

A new Airflow Provider for Fivetran, maintained by Astronomer and Fivetran

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Make code more idiomatic, clean, and better enforced

dwreeves opened this issue · comments

I was working on a separate issue and I started cleaning up the code. I realized that the code needs a little more love to be in tip-top shape and it would be inappropriate to combine a PR that both adds a feature and does a lot of cleanup, so I believe this should be a separate issue.

Example issues with existing code style

  • Incomplete typing, most notably in hooks.py.
  • Mypy should be part of pre-commit
  • pre-commit is not run as part of CI, but should be
  • When I substitute some types from the docstrings into Python types, I get a couple type errors in Mypy. Two examples:
    • FivetranHook.set_schedule_type says it returns a string, but it clearly returns a dict, and mypy makes this clear. (Upon closer inspection, the return type doesn't actually matter because the return value is never used.)
    • FivetranHook._do_api_call's json arg says it takes a dict, but it clearly takes a string, or more specifically a string representation of a JSON.
  • Some unnecessary calls to the API. For example, FivetranHook.prep_connector makes 3 total API calls when it really only needs to make 2.
  • Some un-idiomatic Airflow patterns. For example, _get_hook() instead of hook as a cached_property.
  • FivetranHook._do_api_call has some unnecessary weird behavior. Notably, it has a parameter called json that maps to either the data=json or params=json kwargs in the requests.request() call. This seems less desirable than just passing data=data, params=params, or json=json. Specifically, the json kwarg is confusing from the perspective of the requests and aiohttp APIs: in those APIs, json= expects a dict and data= expects a string, but in the FivetranHook API, json= is a string that gets mapped to data=.

API Breakage?

It's unclear how much API breakage would be tolerated here. We can discuss in a PR, but TLDR I believe for a minor version bump that:

  • absolutely zero breakage should be tolerated for the FivetranSensor and FivetranOperator.
  • very minor breakage can be tolerated for FivetranHook. Specifically for private methods, and methods where the return type is never actually used by anything in FivetranSensor or FivetranOperator:
    • _do_api_call and _do_api_call_async should be changed to look and feel like the requests.request and aiohttp.ClientSession.request APIs.
      • That said, an error message should clearly state to other users what the issue is by checking whether the first arg to these calls is a tuple. I imagine there are some people who are poking under the hood of this code and are calling _do_api_call() directly.
    • FivetranHook.check_connector should return the connector_details dict object. (So that prep_connector can avoid making an unnecessary 3rd API call.)
    • FivetranHookprep_connector should return either None or dict[str, Any] | None. (Note that the returned value is never used anywhere in the code).