marrow / uri

A type to represent, query, and manipulate a Uniform Resource Identifier.

Home Page:https://pretty-rfc.herokuapp.com/RFC3986

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Automatically handle punycode encoding and decoding.

amcgregor opened this issue · comments

When deserializing a URL provided at instantiation time, or on any assignment of a hostname, punycode encoding should be detected and decoded. When serializing (str(), &c.) the punycode should encoded version of any non-ASCII-safe hostname should be used.

Reference in Marrow Mailer:

Initial implementation in 9c6fce3. Tested by hand locally, automated tests TBD prior to release.

Screen Shot 2021-04-01 at 10 39 25

Comprehensive tests added, with a test using Cyrillic encoding borrowed from the url-normalize package also added. There is a point of debate over if the summary compound attribute (typically used for short-form presentation as the text of an anchor) should use the encoded, or Unicode forms.

Ref: https://github.com/marrow/uri/blob/develop/test/test_url_normalize.py#L53

I'd vote in favor of using the readable form for this, but the encoded form for other compound forms, due to the intended purpose/use of this particular attribute for presentation to end-users.