MrThearMan / dynamics-client

Client for making Web API request from a Microsoft Dynamics 365 Database

Home Page:https://pypi.org/project/dynamics-client/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature request: async flavour of client

cmcconomyfwig opened this issue Β· comments

Would you be interested in a PR contribution to create a parallel async-powered implementation?

Hi! DynamicsClient already has the option of creating async tasks: https://mrthearman.github.io/dynamics-client/docs/#clientcreate_task-asynciotask. Is this what you are looking for, or is something missing from it?

Geez - chalk this up to RTFM - thank you for the prompt response.
I had searched the code for async which was the wrong approach to discover async capability.

I just want to say - I audited a few different libraries and am very impressed with your thoroughness and approach. Lovely module.

Actually, there is one thing I noticed. There appears to be a side-effect during the construction of the DynamicsClient (init) that performs a synchronous API call; if so, in an async context, this would be blocking which would be discouraged.

The ability to defer the oauth activity to when we try to run a get/post/patch etc (and invoke that activity async) would be a welcome enhancement

(I say defer, because I'm not clear on how to create a class in an async context; in the past I would create such a class without invoking any I/O, and then later establish a session etc from an async context using async communication modules)

Please bear with me as I work through your examples - I may be jumping the gun here.

(separately - I provided feedback on closed issue #2 you may want to see)

First of all, thank you for your nice words! 😊

Indeed, DynamicsClient tries to fetch the token right when it initializes. This was done since the token is necessary for making any requests with the client, so every other request made with it would need to wait for the token to be fetched anyway, no matter where the it would be fethed. Then on subsequent client initializations, the token would be cached.

The question is, is there any meaningful work that an application could be doing while the token is being fetched, and I think it is possible. While all DynamicsClient methods would need to wait for the token, other async operations could take place while the token is being fetched, although these issues are somewhat mitigated by caching the token.

The issue is then with how the token-fetching could be made async. Unfortunately, the OAuth2Session class used to make the token request inherits from requests.Session, which makes sync requests, so parts of the OAuth2Session class would need to be rewritten to use httpx, and due to inheritance I don't think this would be that simple. Another option would be to use a different library, like lepture/authlib. I think this would be the better option.

Unfortunately, I don't have access to a Dynamics 365 Database at the moment, so I can't really test any solutions myself πŸ˜… As mentioned above, I think defering the token-fetch can have a performance impact, so I'll welcome any PRs if you can tests your solution end-to-end.

Interesting tip on authlib, I will look into it when I have a chance.

In terms of the initialization, I would probably include a behaviour flag of whether to get the token during init - then during any call you check if the token is null and if so (ie. behaviour flag was set to defer), jump in and initialize it then pass through to the get/post/etc.

Very good news:
with an inclusion of authlib and httpx in the requirements, it's nearly a drop-in replacement:

Before:

self._session = OAuth2Session(client=BackendApplicationClient(client_id=client_id))
token = self._session.fetch_token(token_url=token_url, client_secret=client_secret, scope=scope)

After:

self._session = OAuth2Client(client_id, client_secret, scope=scope)
token = self._session.fetch_token(token_url, grant_type='client_credentials')

I validated this against our DynamicsCRM instance;
This means we can reach parity on the synchronous structure and I can look at a second PR for the async extension.

Second great news - scope and resource are NOT in contention; it's by omitting resource in the first place that microsoft decided to assume the resource we are requesting access to is Azure AD.
So - we can provide scope AND resource, meaning no funny "XOR" logic. I'll update the PR accordingly..

Client swap released in 0.6.0

Now comes the hard part - implementing async pattern. I'd like to share a design approach before trying to implement and get some feedback.

Here are my thoughts so far:

  • All I/O must convert to async
    • Cache use, if on disk
    • Initial auth token retrieval
    • HTTP interactions with Dynamics
  • Common async patterns
    • Create object -> connect() -> do work -> explicitly close()
    • async with (async context manager)
    • I wish to implement the ability to use either approach
  • Microsoft Azure typically publishes sync and async versions of their classes with the same name under a .aio subpackage

Any thoughts?

Here's my thoughs so far:

Publishing the async version under a submodule seems like a great idea! I would suggest that we convert client.py to a client-module, with a folder structure like this (names are tenantive):

dynamics/
β”œβ”€ client/
β”‚  β”œβ”€ __init__.py
β”‚  β”œβ”€ base.py
β”‚  β”œβ”€ async.py
β”‚  β”œβ”€ sync.py

__init__.py would import the sync client from sync.py so that from dynamics.client import DynamicsClient is backwards compatible.

In base.py, have a BaseDynamicsClient abstract base class, and in sync.py and async.py the sync and async implementations respectively.

The base class should probably use dependency injection to inject the sync and async handlers for IO, e.g. OAuth2Client/AsyncOAuth2Client. Abstract methods would include things like the HTTP methods (get, post, patch, delete), token caching methods (get_token, set_token), and the client token fetching (_init_client).

Also, if we want to delay token-fetching until the client is acually used, _init_client should be called inside the HTTP methods. I think this could be made the only way the client works from now on, since implementing the async version might be harder if we leave _init_client inside __init__ like it is now.

Looking at OAuth2Client implementation, it has a mechanism for refreshing the access token before it makes requests (OAuth2Client.ensure_active_token(token: OAuth2Token), see here), which includes an update hook (OAuth2Client.update_token(token: OAuth2Client, access_token: str)) that can be set on init, e.g. OAuth2Client(client_id, client_secret, scope=scope, update_token=self.set_token). set_token would need to be changed to accept the access_token, or just **kwargs.

_init_client could be named _ensure_token, and it would try things in this order:

  1. Check if we have a token in the _oauth_client.token. If we do, we should have it in cache too. Return early.
  2. Check if we have a token in cache via get_token(). If we do, set it to _oauth_client.token. Return early.
  3. Invoke _oauth_client.fetch_token(), cache the token with set_cache() and save it to _oauth_client.token.

Lastly, DynamicsClient already has the create_task method in the sync client to enable creating asyncio.Task objects, and uses async with to create a asyncio.TaskGroup for handling them (python >=3.11). Leaving these in the sync client might be confusing for users, and should be moved to the async client. The sync client can still have the functionality for now, but it should issue a warning that it will be removed in the future.

There is likely something I didn't consider, let me know your thoughts!

I'm thinking the shared base class could contain the implementation for all the paging, metadata handling etc but wouldn't have abstract function definitions (since I don't think you can have a single @abc for a sync + async flavour of the same function anyway). These sorts of operations are in-memory/processing and not I/O intensive so they can be shared by the two IO flavours.

I like the ensure_token concept, and I would add behavioural flags for initialization as well as for cache use going forward.

Finally, I agree it's a good idea to drop any existing async affordances baked into the 'sync' class flavour and instead provide two clear and separate interaction models.

Yes, the base class should slice the HTTP methods so that it can do paging, json conversion, error handling, etc, and only abstract the actual IO part.

I made a small example to confirm that the ABC can be used for both sync and async versions, since I wasn't too sure myself. The code is typed correctly according to mypy aswell.

import asyncio
from abc import ABC, abstractmethod
from typing import Optional, List, Dict, Coroutine, Any, Union


class Base(ABC):

    @abstractmethod
    def get(self, *, not_found_ok: bool = False, query: Optional[str] = None) -> Union[List[Dict[str, Any]], Coroutine[Any, Any, List[Dict[str, Any]]]]:
        raise NotImplemented


class Sync(Base):

    def get(self, *, not_found_ok: bool = False, query: Optional[str] = None) -> List[Dict[str, Any]]:
        return [{"foo": 1}]


class Async(Base):

    async def get(self, *, not_found_ok: bool = False, query: Optional[str] = None) -> List[Dict[str, Any]]:
        return [{"bar": 2}]


a = Sync()
b = Async()

print(a.get())  # [{"foo": 1}]
print(asyncio.run(b.get()))  # [{"bar": 2}]

I'm open for behavioural flags, though I'm not sure how you would implement it, since _ensure_token would need to be a coroutine in the async client, since you'd need to call await self._oauth_client.fetch_token() inside to fetch the token when needed. Examples below.

class Async:

    def __init__(self):
        await self._ensure_token()  # SyntaxError: 'await' outside async function

    async def _ensure_token():
        ...
class Async:

    async def __init__(self):  # TypeError: __init__() should return None, not 'coroutine'
        await self._ensure_token()

    async def _ensure_token():
        ...

I had a bit of time so I made a WIP version of this to the async-client branch. Feel free to continue from it or reference it.

I had a bit of time so I made a WIP version of this to the async-client branch. Feel free to continue from it or reference it.

Thanks Matti - sorry I fell off the face of the earth, this took a backseat while other engineering priorities jumped up. I'm tenacious, I won't abandon this.

I had a bit of time so I made a WIP version of this to the async-client branch. Feel free to continue from it or reference it.

Based on my first view, here are my comments:
First, I would remove the @abc on the shared i/o functions, as the function signatures are different (returns a value versus returns an awaitable, which is what I understand the async keyword really gets at).
Instead, I would not define these functions at all in base and only define the sync and the async versions of get, post, delete, put, get_token, etc in their concrete classes.

I would go even further and attempt to pare base down to the logic of how to handle responses, parse dict contents etc, and fully divest the I/O to each class flavour (including the OAuthClient class etc).

One thing I would want to know is how sensitive we are to init signature backward compatbility, ie. are you okay with changing the DynamicsClient(...) init signature and order of params? Or do you strongly prefer that we keep them the same so that previous library users don't get clobbered? I ask because the client id & secret are sandwiched in the middle of the init params (in args order), and they are only used during the construction of the Oauth Client (construction which I would take out of the base class). That order is also referenced by the from_environment call (when cls init is invoked).

I guess what I'm saying is I would make further structural changes if you approved but I don't want to jump ahead if it goes counter to your vision; let me know what you think of my comments above

And finally, thank you for taking the time to seed this code.

Craig

First, I would remove the @abc on the shared i/o functions...

I would advice against that. The meaning behind the abstract methods is that subclasses should implement them to function with the abstract class, i.e. get, post, patch, delete, etc. should be implemented by sync and async implementations. The difference in typing methods vs coroutines is not ideal, but it's still better than not having any expected typing.

I would go even further and attempt to pare base down...

I belive my proposed implementation does this. Can you elaborate on what you mean by this?

One thing I would want to know is how sensitive we are to init signature backward compatbility...

I'd like to keep the client interface as backwards compatible as possible, but if there is good reason to change it then we can change it. Changing the order of the params is not a good enough reason to break backwards compatiblity in my opinion, although I mainly use the from_environment class method to construct instances, since it's the most convinient way to do it.

In my implementation, the client is injected to the class with a class variable (required with an abstract property), while the init signature remains the same. Still, if you think you have a good reason that would improve usability, maintainability, extensibility, etc. of the code somehow, I'm open to reconsired my opinion.

First, I would remove the @abc on the shared i/o functions...

I would advice against that. The meaning behind the abstract methods is that subclasses should implement them to function with the abstract class, i.e. get, post, patch, delete, etc. should be implemented by sync and async implementations. The difference in typing methods vs coroutines is not ideal, but it's still better than not having any expected typing.

I'm happy to stick with your intended implementation approach - I'll explain my thinking below.

I ask myself - what is the intended purpose for the base class?
Nobody is actually importing the base class, so I see it purely as a way to avoid replicating unnecessary code across the sync and aio class definition instances.
In addition, the implementation is fairly stateful (~20 self params) so we disturb fewer existing tests etc by keeping the current class approach; to make it more 'functional' would require keeping, manipulating, and passing around some kind of state object.

I would go even further and attempt to pare base down...

I believe my proposed implementation does this. Can you elaborate on what you mean by this?

This was in concert with the suggestion that i/o functions be defined in the sync/async only. I would have wanted to excise all the Oauth client information etc from the base class and purely focus on the logical processing of the dicts returned by dynamics.

Look at base.__init__() - it's fully portable, with the exception of the instantiation of the (Async)OAuthClient.
sync.DynamicsClient and aio.DynamicsClient could call super().__init__() to leverage that base init.

I would move the instantiation of the oauth_client to _ensure_token(), which would require storing the client id/secret in the class to support deferred instantiation.
Alternatively, the sync/aio __init()__ would instantiate themselves if we don't want to carry the client id/secret information in the class.

Depending on how we approach the above refactor, it could affect the cls() init call params (whether we continue to pass the client id/secret to the base init fn).

One thing I would want to know is how sensitive we are to init signature backward compatbility...

I'd like to keep the client interface as backwards compatible as possible, but if there is good reason to change it then we can change it. Changing the order of the params is not a good enough reason to break backwards compatiblity in my opinion, although I mainly use the from_environment class method to construct instances, since it's the most convinient way to do it.

In my implementation, the client is injected to the class with a class variable (required with an abstract property), while the init signature remains the same. Still, if you think you have a good reason that would improve usability, maintainability, extensibility, etc. of the code somehow, I'm open to reconsider my opinion.

In conclusion - I am content to follow your lead on the overall organization and approach to this module.

Unless you comment to the contrary, I'll assume I should move forward following along with the current design + approach.

I see you've been looking at this a little differently, but I see you point now. My concern was to make BaseDynamicsClient a "blueprint" for implementations with different HTTP libraries, if someone wanted to use some other library in the future. However, I think keeping the OAuthClient in the base class runs counter to this, and moving it them to the sync/asyc classes could allow for other implementatios that don't use a class based OAuthClient.

I think if we wanted to move all OAuth information out of the base class, we'd also move all __init__ code to the implementations, and leave the query building stuff to the base class. Maybe break down base even more, so it's just a "QueryBuilder", and move all HTTP logic to the child classes. Well, maybe there should be a additional "ResponseHandler" mixin to reduce code duplication on things like pagination and response processing.

So in conclusion, I approve your suggestion to move the OAuthClient from the base class to the implementations and overall direction you've outlined in your previous comment. Go ahead with the refactor, unless you have aditional points to discuss πŸ‘

I see you've been looking at this a little differently, but I see you point now. My concern was to make BaseDynamicsClient a "blueprint" for implementations with different HTTP libraries, if someone wanted to use some other library in the future. However, I think keeping the OAuthClient in the base class runs counter to this, and moving it them to the sync/asyc classes could allow for other implementatios that don't use a class based OAuthClient.

I think if we wanted to move all OAuth information out of the base class, we'd also move all __init__ code to the implementations, and leave the query building stuff to the base class. Maybe break down base even more, so it's just a "QueryBuilder", and move all HTTP logic to the child classes. Well, maybe there should be a additional "ResponseHandler" mixin to reduce code duplication on things like pagination and response processing.

So in conclusion, I approve your suggestion to move the OAuthClient from the base class to the implementations and overall direction you've outlined in your previous comment. Go ahead with the refactor, unless you have aditional points to discuss πŸ‘

Great! I'm looking into it now.

One other question for now - are you married to this form of cacheing? I'd prefer to use a cache decorator and not write to disk at all; failing that, would it be okay to make the disk-dependent cacheing an option?

The main reason for disk dependent caching are multi-process environments (e.g. webservers, serveless), where each process would need to request its own token if we relied on just in-process memory for storing it, like functools.lru_cache does. Storing the token in cache is ultimately faster than making separate token API requests per-process, and these types of environments are pretty common, so it makes sense to me to keep this as the default behavior. Users can always choose to implement process based storing with the get_token and set_token methods, if they so wish.

Edit: Serverless would need to use Django's cache, since you'd need a distributed cache there.

I've got a first draft prepared:

https://github.com/FortressEngineering/dynamics-client-allbranches/tree/async-client

Major changes:

  • split folders into common, sync, and aio
  • created a query_context object that handles the processing logic (note: this could open up the possibility of having a client maintain a suite of query contexts they can run and reset independently)
  • added an opt-in config option for non-file-based cacheing

I tried to run this async, but it hangs when running async with sqlite cache (but doesn't hang on running with the new simple cache).. I have some troubleshooting to do there.

I'd love to hear your feedback on this so far.

(Testing will be next..)

I don't have too much time right now to go over it, but I'll give a few thoughs:

  • I don't think you need to change the folder structure this radically. It will cause unnecessary backwards compatibility problems for people upgrading from previous versions. Try to minimize changes in this regard.
  • You don't need separate implementations for the actions and function classes, as those will return coroutines when used inside async implmentations, which can then be awaited. See my test implementation.
  • You don't need py.typed inside subdirectories
  • There is a lot of duplication of code between DynamicsClientBase and QueryContext. The former should probably inheirit from the latter. If you need multiple contexts, I think you should use multiple instances of the client.

In my implmentation, I had issues with SQLite being locked due to connections being left open. I did manage to solve the issue for my test cases so that I check whether there are open connections for a given thread and close them before trying to open new ones, but this is not ideal. It seems like SQLite might not be the best choise for async operations due to it's limited concurrecy options.

use_disk_for_token_cache is a good addition, though I think the name could be changed to something like use_persistant_cache to reflect that the token will outlive the client.

Thanks;

  • the base package exports remain the same (DynamicsClient, ftr, apl) with the additional ability to import a DynamicsClient from .aio; aren't any other changes simply internal housekeeping?
  • actions and functions are directly returned in sync contexts and awaited in async contexts; I would still argue to keep their I/O implementations separate (however the underlying prep logic is not duplicated)
  • I'll remove the py.typed duplicates - this was my mistake as I've never used this hint directly and misread its requirements.
  • That's a good idea, I will do that
  • I'll rename the flag as requested

How would you feel about a NotImplementedError if you try to access a persistent async cache and the django implementation fails?

the base package exports remain the same (DynamicsClient, ftr, apl) with the additional ability to import a DynamicsClient from .aio; aren't any other changes simply internal housekeeping?

Not necessarily. While I agree that the client and its helper classes are probably the most used components, the library also contains other tools and definitions, like the datetime utils, normalizers, enums, exceptions, etc. that users would import from the submodules directly. Moving these would break existing code and create a bad library update experience for not much gain. Also, following python's conventions, these are public modules, with private modules being prefixed by and underscore, so users are not expecting them to change without warning.

actions and functions are directly returned in sync contexts and awaited in async contexts; I would still argue to keep their I/O implementations separate (however the underlying prep logic is not duplicated)

What I'm saying is that you don't need to await the I/O coroutines of client.get/post/patch/delete inside the functions class. See the sample code below. Functions class' method function is not a coroutine, but it returns the Client's coroutine get, which is awaited in the main coroutine. This way you don't need separate implementations for sync and async Functions, increasing maintainability.

import asyncio

class Functions:
    def __get__(self, instance, owner):
        self.client = instance
        return self

    def function(self):
        return self.client.get()

class Client:
    functions = Functions()

    async def get(self):
        return "foo"

client = Client()

async def main():
    coroutine = client.functions.function()
    print(coroutine)
    result = await coroutine
    print(result)

asyncio.run(main())

How would you feel about a NotImplementedError if you try to access a persistent async cache and the django implementation fails?

This is an interesting idea. I think you could leave it like this for you part of the implementation, but I might go and implement it later for completeness. It makes the PR a little smalled as well.

So I had some time to work on the this and merged my implementation of the sync/async client split. I think I resolved all the issues with the sqlite connections not being closed correctly when sync connection in the same thread is opened after an async thread has already been opened. I didn't implement any non-disk based caching, but if it's still required you can open another issue for it. Released in 0.7.0.