graphql-python / gql

A GraphQL client in Python

Home Page:https://gql.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Streaming file upload is not working

myscfwork opened this issue · comments

from gql import gql, Client
from gql.transport.aiohttp import AIOHTTPTransport
import aiofiles


gql_query = gql('''
  mutation create($input: ProductDocumentCreateMutationInput) {
    createProductDocument(input: $input) {
        productDocument{
           id           
      }
    }
  }

query getProductDoc($id: ID!) {
    productDocument(id: $id) {
      id
      file_category
      file
      }
    }
''')

async def stream_file(filepath):
    async with aiofiles.open(filepath, "rb") as f:
        while True:
            chunk = await f.read(64 * 1024)
            if not chunk:
                break
            yield chunk

transport = AIOHTTPTransport(
            url='url',
            headers={"Authorization": f"Bearer {auth_token}"},
        )
async with Client(transport=transport, fetch_schema_from_transport=True) as gql_session
        filepath = 'path/to/product_doc.pdf'
        data = {
                    "user": 'user_id',
                   "file_category": 'product_doc'
                    "file": stream_file(filepath),
                }
        
        result = await gql_session.execute(
                    gql_query,
                    operation_name="create",
                    variable_values={"input": data},
                    upload_files=True,
                )

In above I am trying to stream file upload as shown in the gql documentation. But I get following error

[ERROR] Exception: Failed to upload {'message': 'Must provide query string.'}
Traceback (most recent call last):
File "file_upload.py", line 143, in main
result = await gql_session.execute(
File "/usr/local/lib/python3.10/site-packages/gql/client.py", line 1231, in execute
raise TransportQueryError(
gql.transport.exceptions.TransportQueryError: {'message': 'Must provide query string.'}

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
  return loop.run_until_complete(main)
File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
  return future.result()
 File "file_upload.py", line 148, in main
  raise Exception(f"Failed to upload {exp}")
Exception: Failed to upload {'message': 'Must provide query string.'}

System info:

  • OS: Ubuntu 22.04
  • Python version: 3.10
  • gql version: 3.4
  • graphql-core version: 3.2.3
  • Does it work with a normal file upload without streaming?
  • Could you try with only the create mutation inside the gql method call
  • Please post the relevant part of the schema (ProductDocumentation, productDocumentation, ...)
  • Please post the debug logs
  • Please try expanding the input argument in the mutation. Something like this:
  mutation create($file: Upload!, $file_category: String, $user_id: $ID) {
    productDocumentation(input: {file: $file, user: $user_id, file_category: $file_category}) {
      id
    }
  }

I have enabled bebug logs. Expanding the input argument does not work because create mutation only accepts input as ProductDocumentCreateMutationInput. I have shared the schema below. Also file upload works when only create mutation is used inside gql method without streaming. But it fails when multiple queries are used inside the gql call or when streaming is used.

So I tried following 3 things:

  • File upload works when single query is used without streaming file upload
gql_query = gql('''
  mutation create($input: ProductDocumentCreateMutationInput) {
    createProductDocument(input: $input) {
        productDocument{
           id           
      }
    }
  }
''')

file = io.BytesIO(open(filepath, "rb").read())
file.name = attachment.name
data = {
            "user": 'user_id',
           "file_category": 'product_doc'
            "file": file,
        }

result = await gql_session.execute(
            gql_query,
            operation_name="create",
            variable_values={"input": data},
            upload_files=True,
        )


Log below:

operations {"query": "mutation create($input: ProductDocumentCreateMutationInput!) {\n  createProductDocument(input: $input) {\n    errors {\n      field\n      message\n    }\n    productDocument {\n      id\n }\n  }\n}", "operationName": "create", "variables": {"input": {"user": "UmRUZXN0VHlwZTplZTQ1ZjNhMC0zNWM5LTRjMWUtOTZjZS1kYjExNjRhMjIxN2U=", "file_category": "product_doc", "file": null}}}
04:57:56file_map {"0": ["variables.input.file"]}
04:57:56<<< {"data":{"createProductDocument":{"errors":[],"productDocument":{"id":"UmRUZXN0QXR0YWNobWVudFR5cGU6MTJmOTVjYTAtMjNkYS00NjYwLThhZDAtNGFhNGE4OWRkOTc5"}}}}
  • When multiple queries are added and same request is done as above, it gives error
gql_query = gql('''
  mutation create($input: ProductDocumentCreateMutationInput) {
    createProductDocument(input: $input) {
        productDocument{
           id           
      }
    }
  }

# added another query
query getProductDoc($id: ID!) {
    productDocument(id: $id) {
      id
      file_category
      file
      }
    }
''')

Error Logs below:

operations {"query": "mutation create($input: ProductDocumentCreateMutationInput!) {\n  createProductDocument(input: $input) {\n    errors {\n      field\n      message\n    }\n    productDocument {\n      id\n    }\n  }\n}\n\nquery productDocument($id: ID!) {\n  productDocument(id: $id) {\n    id\n  }\n}", "operationName": "create", "variables": {"input": {"user": "UmRUZXN0VHlwZTplZTQ1ZjNhMC0zNWM5LTRjMWUtOTZjZS1kYjExNjRhMjIxN2U=", "file_category": "product_doc", "file": null}}}
file_map {"0": ["variables.input.file"]}
<<< {"errors":[{"message":"Must provide operation name if query contains multiple operations."}]}
Closing transport
[ERROR] Exception: Failed to upload {'message': 'Must provide operation name if query contains multiple operations.'}
Traceback (most recent call last):
  File "file_upload.py", line 145, in main
    result = await gql_session.execute(
  File "/usr/local/lib/python3.10/site-packages/gql/client.py", line 1231, in execute
    raise TransportQueryError(
gql.transport.exceptions.TransportQueryError: {'message': 'Must provide operation name if query contains multiple operations.'}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "file_upload.py", line 150, in main
    raise Exception(f"Failed to upload {exp}")
Exception: Failed to upload {'message': 'Must provide operation name if query contains multiple operations.'}
  • When only create mutation query and stream file uploading are used:
gql_query = gql('''
  mutation create($input: ProductDocumentCreateMutationInput) {
    createProductDocument(input: $input) {
        productDocument{
           id           
      }
    }
  }
''')

Error Logs below:


operations {"query": "mutation create($input: ProductDocumentCreateMutationInput!) {\n  createProductDocument(input: $input) {\n    errors {\n      field\n      message\n    }\n    productDocument {\n      id\n    }\n  }\n}", "operationName": "create", "variables": {"input": {"user": "UmRUZXN0VHlwZTplZTQ1ZjNhMC0zNWM5LTRjMWUtOTZjZS1kYjExNjRhMjIxN2U=", "file_category": "product_doc", "file": null}}}
file_map {"0": ["variables.input.file"]}
<<< {"errors":[{"message":"Must provide query string."}]}
Closing transport
[ERROR] Exception: Failed to upload {'message': 'Must provide query string.'}
Traceback (most recent call last):
  File "file_upload.py", line 145, in main
    result = await gql_session.execute(
  File "/usr/local/lib/python3.10/site-packages/gql/client.py", line 1231, in execute
    raise TransportQueryError(
gql.transport.exceptions.TransportQueryError: {'message': 'Must provide query string.'}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "file_upload.py", line 150, in main
    raise Exception(f"Failed to upload {exp}")
Exception: Failed to upload {'message': 'Must provide query string.'}

Relevant Schema:

input ProductDocumentCreateMutationInput {
    clientMutationId: String
    file: MultiFileScalar
    user: ID!
    file_category: ProductDocumentTypeChoices
}

Expanding the input argument does not work because create mutation only accepts input as ProductDocumentCreateMutationInput.

It should always be possible to expand input types into GraphQL basic types and scalars.
I don't think it's going to solve your problem but you should be able to use a query like this:

  mutation create($file: MultiFileScalar, $file_category: ProductDocumentTypeChoices, $user_id: ID!) {
    createProductDocument(input: {file: $file, user: $user_id, file_category: $file_category}) {
      productDocument {
        id
      }
    }
  }

with the variable_values containing now directly the variables instead of the single input variable.

That being said, your problem seems quite strange.

  • which backend are you using? Is it publicly available?
  • I noticed that the file scalar is named MultiFileScalar. What happens if you provide a list of files instead of a single file?

Could you share a working code for streaming file upload if you have?
aiohttp has the possibility to use FormData() and set filename for the uploaded file:
https://docs.aiohttp.org/en/stable/client_quickstart.html#post-a-multipart-encoded-file

data = FormData()
data.add_field("user", 'user_id', content_type="multipart/form-data")
data.add_field("file", open("filepath", rb), filename="example.zip", content_type="multipart/form-data")

With gql, following format works but the filename is set as the whole file path i.e /home/username/.... Is it possible to set the filename in the following request?

data = {
                    "user": 'user_id',
                   "file_category": 'product_doc'
                    "file":  open('filepath', 'rb),
                }
result = await gql_session.execute(
            gql_query,
            operation_name="create",
            variable_values={"input": data},
            upload_files=True,
        )

You can find some examples in the tests/test_aiohttp.py file. Search for upload_files to find the relevant tests.

You can run specific tests by running a pytest command like this:

pytest tests/test_aiohttp.py::test_aiohttp_file_upload -s

Could you share a working code for streaming file upload if you have?

Check out the test_aiohttp_async_generator_upload test.

aiohttp has the possibility to use FormData() and set filename for the uploaded file: https://docs.aiohttp.org/en/stable/client_quickstart.html#post-a-multipart-encoded-file

That is what we are doing. gql is open-source you know, you can check the code.

With gql, following format works but the filename is set as the whole file path i.e /home/username/.... Is it possible to set the filename in the following request?

gql uses the name parameter of the provided file object if it is present for the filename parameter.

I thought you could do something like:

f = open('filepath', 'rb)
f.name = "your_name"

data = {
    "user": 'user_id',
    "file_category": 'product_doc',
    "file":  f,
}

but in that case I got the error:

AttributeError: attribute 'name' of '_io.BufferedReader' objects is not writable

So one way to change the filename would be what you did above, even if it's a bit inefficient:

file = io.BytesIO(open(filepath, "rb").read())
file.name = "your_name"

In the case of the streaming uploads, we have the same kind of problem.
Doing something like this:

async_generator = file_sender(file_path)
async_generator.name = "your_name"

would generate the following error:

AttributeError: 'async_generator' object has no attribute 'name'

We can get around by making a new class inheriting AsyncGenerator this but it's kind of hackish:

class NamedAsyncGenerator(collections.abc.AsyncGenerator):                                                              
                                                                                                                        
    name = None                                                                                                         
    inner_generator = None                                                                                              
                                                                                                                        
    def __init__(self, inner_generator: collections.abc.AsyncGenerator, name=None):                                     
        self.inner_generator = inner_generator                                                                          
        self.name = name                                                                                                
                                                                                                                        
    def asend(self, val):                                                                                               
        return self.inner_generator.asend(val)                                                                          
                                                                                                                        
    def athrow(self, typ, val):                                                                                         
        return self.inner_generator.athrow(typ, val)

that you would use like this:

    async def file_sender(file_name):                                                                           
        async with aiofiles.open(file_name, "rb") as f:                                                         
            chunk = await f.read(64 * 1024)                                                                     
            while chunk:                                                                                        
                yield chunk                                                                                     
                chunk = await f.read(64 * 1024)                                                                 
                                                                                                                
    async_generator = file_sender(file_path)
                                                        
    named_async_generator = NamedAsyncGenerator(async_generator, "your_filename")
    
    data = {
	"user": 'user_id',
	"file_category": 'product_doc',
	"file": named_async_generator,
    }

I agree that's it's not really clean and we should consider changing the interface to make this simpler.

@leszekhanusz FYI, your workaround gives error
gql.transport.exceptions.TransportQueryError: {'message': 'Must provide query string.'}.

Whenever I add the generator "file": named_async_generator in the input parameter, it gives this error. But it works when I use it like this "file": io.BytesIO(open(filepath, "rb").read()). Unfortunately I upload large files and need to use the generator which is not working.

That error message is coming from the backend.
If you still think there is a problem with gql, either provide a public backend showing the issue, or add another test to gql showing the issue.