gosimple / slug

URL-friendly slugify with multiple languages support.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

No slug-representation of emojis/pictograms

remnestal opened this issue Β· comments

This seems like a silly use case at first, but when creating a slug from a string containing emojis or pictograms, there is no representation of those characters. For example:

slug.Make("πŸ›")
slug.Make("☺")
slug.Make("π•—π•’π•Ÿπ•”π•ͺ π•₯𝕖𝕩π•₯")

all yield empty strings.

I'm not sure how such a character would best be represented in a slug, but simply removing them could be problematic in some cases. Is this intentional?

@dalu Let's say I have a blog platform where I let my customers set the title of their posts. I want the title of their posts to be turned into a slug for the URL. For example, let's say there's a post titled "No slug-representation of emojis/pictograms", then I expect the URL to look something like example.com/posts/no-slug-representation-of-emojis-and-pictograms. No problem.

But let's then say that a user has created two posts, whose titles contain more than just the "standard" ascii characters:

  • "𝕋𝕙𝕖𝕀𝕖 𝕔𝕙𝕒𝕣𝕒𝕔π•₯𝕖𝕣𝕀 𝕒𝕣𝕖 𝕣𝕖𝕒𝕝𝕝π•ͺ 𝕑𝕠𝕑𝕦𝕝𝕒𝕣 π• π•Ÿ π•šπ•Ÿπ•€π•₯π•’π•˜π•£π•’π•ž", and
  • "π•³π–”π–œ 𝖙𝖔 π–ˆπ–—π–Šπ–†π–™π–Š 𝖆 π–π–†π–—π–‰π–ˆπ–”π–—π–Š π–‰π–Šπ–†π–™π– π–’π–Šπ–™π–†π–‘ π–‡π–‘π–”π–Œ π–•π–”π–˜π–™ π–™π–Žπ–™π–‘π–Š"

Then both those blog posts would have the slug "", which is problematic. Don't focus too much on the πŸ›-emoji in my previous example, there's lots of unicode not covered by this package that can make URLs collide.

I realize that there's is no obvious solution to this problem, in fact I said so in the last sentence of my original post, but forcing every platform to implement a huge custom substitution map for all of these characters is hardly a satisfying solution

Facing same problem and its looks like there are no solution to make slug from any forbidden symbol :(
For this case you can do something like this:

func createSlug(title string) string {
        // generate non empty slug
	pSlug := slug.Make(title)
	if pSlug == "" {
		pSlug = "untitled"
	}

        // add "random" part to keep slug unique
	return fmt.Sprintf("%s-%d", slug.Make(title), time.Now().Nanosecond()/1000)
}

Thank you for this report and sorry it took so long, burnout is not nice...

So first: https://github.com/rainycape/unidecode that slug package is using underneath have test showing that it is stripping emojis:

https://github.com/rainycape/unidecode/blob/cb7f23ec59bec0d61b19c56cd88cee3d0cc1870c/unidecode_test.go#L30-L33

I forked it to https://github.com/gosimple/unidecode
It's true that it's missing a lot of characters that could be properly converted to ASCII and everyone are welcome to provide more updates (I'll also merge at some point additions from forks, like https://github.com/cuilun/unidecode).

Second: from the beginning I designed slug to be on the safe site and, for example, I also used it for generating file names so chars like / should not be in the output.

Third: I will not change default behavior (I don't want to break anyone) but it's possible to add some flag like AllPrintableASCII by default set to false (to allow all chars from https://en.wikipedia.org/wiki/ASCII#Printable_characters - but space will be still replaced with -).

Or maybe just export:

slug/slug.go

Line 35 in a0807d1

regexpNonAuthorizedChars = regexp.MustCompile("[^a-zA-Z0-9-_]")

so everyone could configure it themselves? I'm open to your ideas.