zws-im / zws

Shorten URLs using invisible spaces

Home Page:https://zws.im

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Use combining diacritical marks

farteryhr opened this issue · comments

commented

I suggest using combining diacritical marks to shorten URLs.
https://unicode-table.com/en/blocks/combining-diacritical-marks/
since you still have an extra slash at the end, it's 1-width. the first slash combined with an arbitrary number of combining diacritics is true zero-width (they're just tall when pró́́́́́́́́́́́́́́perly rendered).
test: https://zws.im/̀̀̀̀̀̀̀̀̀̀̀̀.

Also, these (basic 112) combining diacritics are below u07ff which is the upper limit of a 2-byte UTF-8 sequence. when converted to equivalent ASCII URL, it's %xx%xx each instead of %xx%xx%xx each because all "zero-width" named chars are all above u2000. (compression rate of 1/72, haha)

This is very interesting, however, it makes the shortened part visible. I'd like to keep the current shortening scheme but use different space characters to fix the broken URLs.

commented

to be honest i think those unicode-aware url recognition patterns will be very likely to treat things classified as "whitespace" (according to unicode properties) as a breaker.
i'm not sure what about other joiner/non-joiner or whatever though. when i search for "zero-width", just these 4 (plus the two space currently using) popped up.
http://unicode.org/L2/L2002/02368-default-ignorable.html fyi

the other way, since it's like base-112 instead of your base-2, 5 of them will be sufficient for 1e10 urls. not that tall.