psf / requests

A simple, yet elegant, HTTP library.

Home Page:https://requests.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

POST a Multipart-Encoded File with streaming

robi-wan opened this issue · comments

We have a web application which accepts (large) files together with some meta data.
We build a form for uploading these kind of files - and then we added a REST-Interface to this form.
I build an upload task in our fabfile which essentialliy does this:

    with open(filename, 'rb') as f:
        response = requests.post(url, data=values, files={'file': f})

It seems that that for multipart-encoded POST requests streaming does not work because I get this error (since our files exceeded 128 MB file size):

Upload artifact 'artifact.tar.gz' (group, SNAPSHOT) running on http://<ip> on behalf of user 'fabric'.
2aeaa77d0ac917a28e15ec73fe92e060 *artifact.tar.gz
Traceback (most recent call last):
  File "C:\Documents and Settings\Administrator\.virtualenvs\avatar\lib\site-packages\fabric\main.py", line 743, in main
    *args, **kwargs
  File "C:\Documents and Settings\Administrator\.virtualenvs\avatar\lib\site-packages\fabric\tasks.py", line 405, in execute
    results['<local-only>'] = task.run(*args, **new_kwargs)
  File "C:\Documents and Settings\Administrator\.virtualenvs\avatar\lib\site-packages\fabric\tasks.py", line 171, in run
    return self.wrapped(*args, **kwargs)
  File "C:\development\work\avatar_herbsting_2013\fabfile.py", line 480, in upload
    response = upload_artifact(**data)
  File "C:\development\work\avatar_herbsting_2013\scripts\fabfile\deploy.py", line 106, in upload_artifact
    response = requests.post(url, data=values, files={'file': f})
  File "C:\Documents and Settings\Administrator\.virtualenvs\avatar\lib\site-packages\requests\api.py", line 88, in post
    return request('post', url, data=data, **kwargs)
  File "C:\Documents and Settings\Administrator\.virtualenvs\avatar\lib\site-packages\requests\api.py", line 44, in request
    return session.request(method=method, url=url, **kwargs)
  File "C:\Documents and Settings\Administrator\.virtualenvs\avatar\lib\site-packages\requests\sessions.py", line 335, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Documents and Settings\Administrator\.virtualenvs\avatar\lib\site-packages\requests\sessions.py", line 438, in send
    r = adapter.send(request, **kwargs)
  File "C:\Documents and Settings\Administrator\.virtualenvs\avatar\lib\site-packages\requests\adapters.py", line 327, in send
    raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='<ip>', port=80): Max retries exceeded with url: /deliver/upload/ (Caused by <class 'socket.error'>: [Errno 10055] An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full)

Happens on Windows XP SP3 32bit with 2 GB RAM and Windows Server 2003 R2 SP2 with 3 GB RAM.
Python 2.7.5 32bit.
requests 1.2.3

Full code (filename contains the path to the large file I want to upload):

def upload_artifact(address, filename, version, group, username):
    """Upload an artifact.
    """
    path = 'deliver/upload/'
    url = urlparse.urljoin(address, path)

    # get id for group
    url_group_id = urlparse.urljoin(address, 'deliver/groupidbyname/?groupname={}'.format(group))
    response = requests.get(url_group_id)
    group_id = response.text

    # upload file
    values = {'md5hash': md5sum(filename),
              'group': group_id,
              'version': version,
              'username': username,
              }

    with open(filename, 'rb') as f:
        response = requests.post(url, data=values, files={'file': f})

    return response

Is there a way to enable streaming of the large file for this case?

Hi @robi-wan, thanks for raising this issue!

Firstly, I'll address the question you actually asked. Currently Requests does not support streaming multipart-encoded files. If you want to do this you'll need to provide a file-like-object wrapper for your file that does the multipart encoding itself, and then pass that to the data object as described here.

Secondly, I'll address your actual problem. Your specific error is the Winsock error WSAENOBUFS. It should not be easily possible to hit this error in Requests because we use blocking sockets, which ought to block until there is sufficient buffer space available. You don't appear to be running out of memory in your process, so I don't think the file size itself has anything to do with this problem.

I'm going to take an educated guess and say that you're running out of ephemeral ports. By default, Windows only exposes 5000 ephemeral ports: sufficiently many long-running uploads could exhaust the supply and cause this error. Does that sound possible in your case? If so, take a look here.

Hi @Lukasa and @robi-wan I'm delighted to see this question and already find it answered by you. I happen to just have hit the same issue with missing streaming functionality for multipart file uploads.

I suggest to you to look at the poster module - I have used this instead for realizing the upload in a script that otherwise uses requests. I have implemented the whole upload request with this other module and urllib2, however it might as well be possible to use it to prepare the file-like data argument for requests.

The support for this kind of streaming uploads was a prime reason for going with requests, therefore I was disappointed when encountering the NotImplementedError that is thrown by PreparedRequest.prepare_body when the files as well as the data argument is provided. This could be made clearer in the documentation.

I agree that we could better document this behaviour. =)

I'm also wondering whether it's worth having a semi-official Requests-y way of better handling complex file behaviours.

Hi @Lukasa thanks for your quick response. I read the Microsoft Knowledge Base Article and tried the suggested solution without success.

@avallen In the meantime I found poster and used it for solving this problem:

def upload_artifact(address, filename, version, group, username):
    """Upload an artifact.
    """
    path = 'deliver/upload/'
    url = urlparse.urljoin(address, path)

    # get id for group
    url_group_id = urlparse.urljoin(address, 'deliver/groupidbyname/?groupname={}'.format(group))
    response = requests.get(url_group_id)
    group_id = response.text

    # upload file
    values = {'md5hash': md5sum(filename),
              'group': group_id,
              'version': version,
              'username': username,
              'file': open(filename, 'rb')
              }

    # Register the streaming http handlers with urllib2
    poster.streaminghttp.register_openers()

    # Start the multipart/form-data encoding of the file filename.
    # headers contains the necessary Content-Type and Content-Length
    # datagen is a generator object that yields the encoded parameters
    datagen, headers = poster.encode.multipart_encode(values)
    # Create the Request object
    request = urllib2.Request(url, datagen, headers)
    resp = None
    try:
        # Actually do the request, and get the response
        resp = urllib2.urlopen(request)
    except urllib2.HTTPError as error:
        print(error)
        print(error.fp.read())

    return resp

This works... but I would like to use requests for task like this.

It's difficult to answer that question because we don't know what you're trying to do. What are you hoping to achieve?

@bernardolima how is your question relevant to this issue? Your question should be asked on StackOverflow. I have a strong suspicion as to why your Post is not working. I'll answer you either on StackOverfow or by email (if you choose to email me privately).

@sigmavirus24 sorry, you're right, I will email you, if you don't mind.
Thank you very much.

I don't mind. That's why I suggested it. 😉

I think the semi-official way to do this is to use sigmavirus24/requests-toolbelt, so I'm going to close this now. =)

"semi-official" is very accurate.

Maybe it's faster then Django, but docs ...

from vibora import Vibora, JsonResponse
​Traceback (most recent call last):
. . .
ImportError: cannot import name 'JsonResponse'

And Websockets same! Sad

@Cyxapic wrong repo =)