okigan / awscurl

curl-like access to AWS resources with AWS Signature Version 4 request signing.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PR #77 returns raw bytes as output, resulting output that is not parsable as json in current(0.19) release when using python 3

p5k6 opened this issue · comments

When using 0.19 awscurl with python 3, encoding response.text results in a byte object being returned. Printing that directly results in the output being prefixed with b' (see screenshots below), and thus json output cannot be parsed directly from the response. In Python 2 this is not an issue. I could also see this being an issue for (say) an xml based response as well (though I have not tested that).

My use case: I'm currently invoking awscurl in a unix pipe, where the result of awscurl is piped to jq for further processing (in CI); updating to the latest version broke the pipe. For the moment I've pinned the version in CI at 0.17

Python 3 screenshot

Screen Shot 2019-12-18 at 4 50 09 PM

Python 2 screenshot

Screen Shot 2019-12-18 at 4 50 47 PM

hat 'b' would be annoying - thanks for identifying the location of the problem

@p5k6 I this topic requires more info, horror stories about mercurial's journey concerns me (https://news.ycombinator.com/item?id=22037905) and I am not versed enough on pythons handling of this -- so need some help on what is more holistic approach (vs squashing the b"")

IMO, this change should be reverted. I can't speak to what is/was really going on with the contributor, but it seems that if the Content-Encoding header in the response was set correctly, that change would not be necessary because the requests.Response.text method is already trying to do the right thing. It states specifically in the docstring of the requests.Response.text method, that implementers should set the encoding value to something specific if the HTTP Content-Encoding header is not determining the correct encoding.

Also, this change is a little odd in general because you're taking a string that was converted from bytes (by requests.Response.text) and then converting it to bytes to send to the print function that converts everything to a string. This accidentally works in Python 2 (read up on Python 2/3 bytes and string differences) but breaks Python 3. The result is that this change breaks Python 3 to support Python 2 which is likely not intended.

FWIW, I don't think you have the same problems as they had with Mercurial. You're simply trying to print a string. Mercurial was trying to solve the problem for myriad interfaces. I think you can safely revert the change; ask the contributor for more information and if the Content-Encoding header is set properly. Also, an additional argument to awscurl that could be used to explicitly set the response encoding.

@p5k6 reading @marklap comment and reviewing your snapshot -- could it be the issue is with your content encoding that returned from your api gateway?

Also i've setup a small experiment here: https://repl.it/@okigan/py-string-test

@p5k6 please review/advise or we'll start moving to revert this

Just to clarify, @p5k6 identified the bug and @muziyoshiz is the contributor of the MR that is likely the root cause.

I bumped into this today. It would be great if the leading 'b' was removed from the output for curl compatibility and ease of use.

I can open a different ticket if needed, but is there a way to tell the version of awscurl installed? Today I ran brew install awscurl and awscurl -v:

$ awscurl -v
usage: awscurl [-h] [-v] [-i] [-X REQUEST] [-d DATA] [-H HEADER] [-k] [--data-binary] [--region REGION] [--profile PROFILE] [--service SERVICE]
               [--access_key ACCESS_KEY] [--secret_key SECRET_KEY] [--security_token SECURITY_TOKEN] [--session_token SESSION_TOKEN]
               uri
awscurl: error: the following arguments are required: uri

While I'm at it, my AWS CLI is broken after installing awscurl with brew. Kind of a bummer. I think it required a different version of python than I had on my system. Noted over here #87

@cyrfer did you run into that while calling a labmda/api gateway? what is the content type your endpoint is returning?

@okigan I was calling AppSync (like API GW, but for GraphQL).

In Postman I sign requests with service = appsync

Content-Type: application/json;charset=UTF-8

FYI - If you're following some other troubleshooting path then ignore this. If not, I'll offer this correction: it's Content-Encoding that changes the way the response body is handled with respect to this issue. Of course, Content-Type may also but not for the purposes of this discussion.

I am also running into this issue. Any plans on resolving this?

Sure,

We have a pretty straight forward setup:
Setup using Serverless Framework using API Gateway to invoke a Lambda that has aws_iam authentication set up. The lambda returns a json document.

In case it helps, my repro:

  1. Running on Debian testing w/ python2 and 3 installed:
$ python3 --version
Python 3.8.5
  1. Installing awscurl:
$ pip3 install --user awscurl
WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
Collecting awscurl
  Downloading awscurl-0.20-py3-none-any.whl (8.3 kB)
Requirement already satisfied: urllib3[secure] in /usr/lib/python3/dist-packages (from awscurl) (1.25.9)
Requirement already satisfied: requests in /usr/lib/python3/dist-packages (from awscurl) (2.23.0)
Collecting configparser
  Downloading configparser-5.0.0-py3-none-any.whl (22 kB) 
Collecting configargparse
  Downloading ConfigArgParse-1.2.3.tar.gz (42 kB)
     |████████████████████████████████| 42 kB 765 kB/s 
Requirement already satisfied: certifi in /usr/lib/python3/dist-packages (from urllib3[secure]->awscurl) (2020.4.5.1)
Requirement already satisfied: cryptography>=1.3.4 in /usr/lib/python3/dist-packages (from urllib3[secure]->awscurl) (3.0)
Requirement already satisfied: idna>=2.0.0 in /usr/lib/python3/dist-packages (from urllib3[secure]->awscurl) (2.10) 
Requirement already satisfied: pyOpenSSL>=0.14 in /usr/lib/python3/dist-packages (from urllib3[secure]->awscurl) (19.1.0)
Building wheels for collected packages: configargparse
  Building wheel for configargparse (setup.py) ... done
  Created wheel for configargparse: filename=ConfigArgParse-1.2.3-py3-none-any.whl size=19326 sha256=f2b555b0615b925af1e2f599a0919d73fe791fe88e77fa092a2fd6ca65d8085f
  Stored in directory: /home/[REDACTED]/.cache/pip/wheels/b4/12/fe/1e4628f6ec22a9580d9e8fbef23df7a03948a1edde0821d7f5
Successfully built configargparse
Installing collected packages: configparser, configargparse, awscurl
Successfully installed awscurl-0.20 configargparse-1.2.3 configparser-5.0.0

I'm then attempting to make a request to ElasticSearch using signed requests:

$ awscurl --service es --region us-east-2 -v https://[REDACTED].us-east-2.es.amazonaws.com                                                                          
{'access_key': None,                                                                                                                                                                                                                  
 'data': '',                                                                                                                                                                                                                          
 'data_binary': False,                                                                                                                                                                                                                
 'header': None,                                                                                                                                                                                                                      
 'include': False,                                                                                                                                                                                                                    
 'insecure': True,                                                                                                                                                                                                                    
 'profile': 'default',                                                                                                                                                                                                                
 'region': 'us-east-2',                                                                                                                                                                                                               
 'request': 'GET',                                                                                                                                                                                                                    
 'secret_key': None,                                                                                                                                                                                                                  
 'security_token': None,                                                                                                                                                                                                              
 'service': 'es',                                                                                                                                                                                                                     
 'session_token': None,                                                                                                                                                                                                               
 'uri': 'https://[REDACTED].us-east-2.es.amazonaws.com',                                                                                                                                               
 'verbose': True}                                                                                                                                                                                                                     
"Credentials file '/home/[REDACTED]/.aws/credentials' exists 'True'"                                                                                                                                                                      
''                                                                                                                                                                                                                                    
('\n'
 'CANONICAL REQUEST = GET\n'
 '/\n'
 '\n'
 'host:[REDACTED].us-east-2.es.amazonaws.com\n'
 'x-amz-date:20200902T143515Z\n'
 '\n'
 'host;x-amz-date\n'
 '[REDACTED]')
('\n'
 'STRING_TO_SIGN = AWS4-HMAC-SHA256\n'
 '20200902T143515Z\n'
 '20200902/us-east-2/es/aws4_request\n'
 '[REDACTED]')
'\nHEADERS++++++++++++++++++++++++++++++++++++'
{'Accept': 'application/xml',
 'Authorization': 'AWS4-HMAC-SHA256 '
                  'Credential=[REDACTED]/us-east-2/es/aws4_request, '
                  'SignedHeaders=host;x-amz-date, '
                  'Signature=[REDACTED]',
 'Content-Type': 'application/json',
 'x-amz-content-sha256': '[REDACTED]',
 'x-amz-date': '20200902T143515Z',
 'x-amz-security-token': None}
'\nBEGIN REQUEST++++++++++++++++++++++++++++++++++++'
('Request URL = '
 'https://[REDACTED].us-east-2.es.amazonaws.com')
'\nRESPONSE++++++++++++++++++++++++++++++++++++'
'Response code: 200\n'
{'Date': 'Wed, 02 Sep 2020 14:35:15 GMT', 'Content-Type': 'application/json; charset=UTF-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Access-Control-Allow-Origin': '*', 'Content-Encoding': 'gzip', 'Vary': 'Acce
pt-Encoding, User-Agent'}

b'{\n  "name" : "78e111028e3351ec006b1bd84d38c02b",\n  "cluster_name" : "[REDACTED]]",\n  "cluster_uuid" : "icdoqUEBQAiDYiXhzpvWXw",\n  "version" : {\n    "number" : "7.7.0",\n    "build_flavor" : "oss",\n    "build_ty
pe" : "tar",\n    "build_hash" : "unknown",\n    "build_date" : "2020-07-06T09:41:09.176473Z",\n    "build_snapshot" : false,\n    "lucene_version" : "8.5.1",\n    "minimum_wire_compatibility_version" : "6.8.0",\n    "minimum_inde
x_compatibility_version" : "6.0.0-beta1"\n  },\n  "tagline" : "You Know, for Search"\n}\n'

As you can see, a charset is specified in the Content-Type header. I tend to agree with the above, taking the string output of response.text, converting it to a bytes object (.encode()) and then printing the bytes object seems to be suspect.

While I can see the output of the request, I was hoping to use jq to pretty print the output of awscurl.

awscurl --service es https://XXXXXXXX.us-east-1.es.amazonaws.com/_cluster/health | cut -d\' -f2 | jq

FYI - I've added a PR (#92) to address this issue, including a unit test which addresses this issue.

Note that - the response should have the info needed re: the encoding via the api request (response.encoding should be set per the data included in the headers in the response object). By adding .encode(), awscurl forces us to have a bytes output, instead of relying upon how python would natively interpret the encoding provided by the response encoding.

I believe that my PR includes the correct way to handle this issue. The issue raised in #77 was specific to python 2, which is no longer maintained (in fact, I couldn't get a development environment using python 2 to spin up).

Fixed in version 0.21 - thank you @okigan!