cscorley / whatthepatch

What The Patch!? -- A Python patch parsing library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Header matching fails on unicode filenames (git diff)

thoward opened this issue · comments

I came across a git header that contained filenames with unicode. Git chose to wrap those in quotes. The regex git_diffcmd_header in patch.py couldn't parse the filenames out of the header due to the quotes.

I monkey patched this in my code with the following line:

whatthepatch.patch.git_diffcmd_header = re.compile('^diff --git "?a/(.+)"? "?b/(.+)"?$')

That seems to work!

Here's an example of the offending diff header (could be used as a fixture):

diff --git "a/somedata\340\270\202\340\270\233.json" "b/somedata\340\270\202\340\270\233.json"
deleted file mode 100644
index 817de57..0000000
--- "a/somedata\340\270\202\340\270\233.json"
+++ /dev/null
@@ -1 +0,0 @@
-[...snip...]

@thoward can you make a PR with these changes?