boyter / searchcode-server

The offical home of searchcode-server where you can run searchcode locally. Note that master is generally unstable in the sense that it is not a release. Check releases for release versions https://github.com/boyter/searchcode-server/releases

Home Page:https://searchcodeserver.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Search Code HMAC issue error

NathanaelA opened this issue · comments

When adding a repo via the API that doesn't exist; the api returns invalid signature. as the error message. Very minor corner case issue...

Thats not brilliant. Corner/edge cases are where you spend 90% of time programming anyway so lets see if we can resolve this.

So are you saying you are adding a link to a git repository that does not exist? For example, using "https://www.google.com/" as the source? Or did you mean to an API endpoint that doesn't exist?

I have a site that pulls down different npm packages, if they match a criteria then I read the package.json file from the .tgz I downloaded. The last part of this process is to take what they put in as the github repo that they specified in the package json file and submit it to my awesome searchcode server. :-)

Well, last night I noticed the script was throwing an error about invalid sig, and so I started debugging it. Took the entire data and ran it through a hmac generator online and everything matched. So I looked at the data being submitted and it submitted a repo like this:
https://github.com/somename / somerepo.git (It was properly url encoded)

Notice the spaces between the repo username and the repo itself.

Apparently the author of that npm package put spaces into there link in the packages.json file. So when I passed that into your API to add as a repo; it failed with the sig error.

This really is a corner case; and it is also a totally invalid url. I'm going to add more sanity checks to my script to ignore that type of stupid url. But I figured you might want to know about this error just so if someone else runs into a invalid sig, might be the data being passed in causing it.

Ah... well it should come back with an error that says that it is invalid.

TODO Add logic to validate the URL somewhat

Actually assuming that it is encoded isn't that a legitimate URL?

Odd that its throwing the exception actually. Setting up an integration test to cover this now.

I don't believe github allows spaces in the username or the name of the repo. So I don't believe it is a "Valid" url.

it was encoded like:
http%3a%2f%2fgithub.com%2fblah%20%2f%20blah which is correct encoding, and I almost didn't notice the two %20 in the string as it kinda merges in with the rest of them...

Thats interesting. I can understand from a github point of view. Anyway I will have a play around with this. From my point of view it should be acceptable as it is a valid url (just invalid from github) and should add without error but then never index anything.

So I am unable to replicate this... I had a play and could not reproduce.

I have included the test case below which I am using to try and resolve it. I have modified it quite a lot and have been unable to replicate this.

A check through the code base and there is no string that should be returning with the error 'invalid signature'. The only thing I can think of at this point is maybe you were hitting the 'invalid signed url' issue because maybe it was signed using the un-urlencoded string which was then encoded silently?

Not sure. Would you be able to try the below against your instance with the problematic URL?

from hashlib import sha1
from hmac import new as hmac
import urllib2
import json
import urllib

'''Simple usage of the signed key API endpoints.'''
publickey = "APIK-xFvO7WDb4oT4C1FoT6Hx1pLOIZm"
privatekey = "nSA1bZQDObUgaEznchZEFqxW17jK2MaS"


reponame = "myrepo"
repourl = "https://github.com/somename / somerepo.git"
repotype = "git"
repousername = ""
repopassword = ""
reposource = ""
repobranch = "master"

message = "pub=%s&reponame=%s&repourl=%s&repotype=%s&repousername=%s&repopassword=%s&reposource=%s&repobranch=%s" % (
        urllib.quote_plus(publickey),
        urllib.quote_plus(reponame),
        urllib.quote_plus(repourl),
        urllib.quote_plus(repotype),
        urllib.quote_plus(repousername),
        urllib.quote_plus(repopassword),
        urllib.quote_plus(reposource),
        urllib.quote_plus(repobranch)
    )

sig = hmac(privatekey, message, sha1).hexdigest()

url = "http://localhost:8080/api/repo/add/?sig=%s&%s" % (urllib.quote_plus(sig), message)

data = urllib2.urlopen(url)
data = data.read()

data = json.loads(data)
print data['sucessful'], data['message']

@boyter - Sorry for the delay; here is the exact line I am sending, I can duplicate this as often as you want:

http://SERVERNAME/api/repo/add/?sig=<SIGNATURE>&pub=<PUBKEY>&reponame=%20testplugin&repourl=https%3A%2F%2Fgithub.com%2Fshailymishra%20%2F%20testplugin.git&repotype=git&repousername=&repopassword=&reposource=https%3A%2F%2Fgithub.com%2Fshailymishra%20%2F%20testplugin&repobranch=master

The exact error is: {"sucessful":false,"message":"invalid signed url"}

The signature is correct, however, after doing a little bit of testing with your code above I have narrowed down the issue. ;-)

It isn't actually the repourl with spaces in it causing issues.. I sent you down the wrong bunny trail.

It is the reponame with a space in it being encoded as %20. It appears to work if you do the encoding with the + encoding instead of the normal percent urlencoding.

So if you change your program above to
reponame = " myrepo" (i.e. add the space)
and
urllib.quote(reponame), (i.e. remove the _plus)

You will see the failure.

Ah fantastic. Thanks for this. Since I can replicate it I can fix it. Will do so.

Indeed it is the quote_plus portion that causes it. An example that triggers the error,

pub=APIK-uUcJlk5nTI1tr6kQBQ343lR7URH&reponame=+myrepo&repourl=https%3A%2F%2Fgithub.com%2Fsomename+%2F+somerepo.git&repotype=git&repousername=&repopassword=&reposource=&repobranch=master

Thats what actually gets validated. Note that reponame=+myrepo where internally it uses the plus quote logic.

Technically its not a bug I guess because it is doing the right thing, the escaping between them both is different so the signed URL is different.

Checking the Python documentation https://docs.python.org/2/library/urllib.html#urllib.quote_plus

Like quote(), but also replaces spaces by plus signs, as required for quoting HTML form values when building up a query string to go into a URL. Plus signs in the original string are escaped unless they are included in safe. It also does not have safe default to '/'.

It is a form value going into the query string for the URL (thats arguably an issue as it should be post, but something I can fix later).

Quote by contrast https://docs.python.org/2/library/urllib.html#urllib.quote

Replace special characters in string using the %xx escape. Letters, digits, and the characters '_.-' are never quoted. By default, this function is intended for quoting the path section of the URL. The optional safe parameter specifies additional characters that should not be quoted — its default value is '/'.

The interesting bit being "intended for quoting the path section of the url".

Based on the above it looks like everything is working as designed. Are you using Python to add repositories? If so I would say the best fix would be to move to quote_plus to resolve the issue. Later I will make the endpoint more restful as this should have been a post from the beginning.

Let me know if you disagree with the above as its possible I am misreading it.

Well there is two things here. 😀

URL encoding can be either fully % based, where %20 is space. This is the standard. Or the + is used for a space AND you are sending application/x-www-form-urlencoded as the encoding type, . BOTH are valid encoding schemes. The %20 is more standardized as it is valid all the time and does not required the x-www-form-urlencoded encoding type. (Which I didn't send in my request, btw)

So my take is that you really should support %20 based encoding as it is the more recommended standard as you eliminate the ambiguity of a + ( is the + a space or a + ) when no encoding type is sent.

So since your internal parser is using plus instead of %20, you might want to make sure you document it; because it isn't the expected behavior; when I pass&somename= blah; I expect it to be encoded as &somename=%20blah and so that is how it signs it. :-)

The other choice is that you could in your parse add a second check if their is any + in it to replace those with %20 and try to resign and see if either sigs match.

Your argument has moved me. Just going to add logic to deal with both. Its not a huge burden and will solve any ambiguous documentation.