documentcloud / wordpress-documentcloud

Embed DocumentCloud documents that won't be eaten by the visual editor

Home Page:https://wordpress.org/plugins/documentcloud/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Slugs with uppercase letters throw off URL-cleaner

reefdog opened this issue · comments

The pattern to recognize a URL as a DocumentCloud oEmbedable URL is very permissive.

Since many of our resources (pages, notes) have multiple URL patterns, including with page anchors, we have a clean_dc_url() function that recomposes them into the single canonical (and oEmbed-safe) versions. E.g., https://www.documentcloud.org/documents/282753-lefler-thesis.html#document/p57/a42282 is recomposed to https://www.documentcloud.org/documents/282753-lefler-thesis/annotations/42282.html.

Our base document slug pattern, however, has a bug. It only recognizes lowercase alphanumeric slugs, not uppercase. Because of the permissive pattern pointed to above, those URLs still get passed to the oEmbed endpoint, but they don't get cleaned and recomposed, so anchored-variant pages/notes get the document viewer returned instead.

The impact of this was that pages and notes from documents with an uppercase letter in the slug weren't embeddable via the plugin; you'd always get the full document embed instead.