codsen / codsen

a monorepo of npm packages

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Stripped tags get replaced with spaces (only for tags with more than one char)

lublak opened this issue · comments

commented

Package's name

string-strip-html

Describe the bug

i have the same issue but with longer tags #13

<body>test</body>123

to:

test 123

To Reproduce
Steps to reproduce the behavior:

  1. Use string-strip-html with <body>test</body>123

Expected behavior
Without space:

test123
commented

I have seen that it is so wanted.
Makes sense somehow on the one hand.
Is there a way to disable this?

hi lublak,

It's tricky to programatically detect word boundaries without adding a natural language recognition. So, I went with the way of detecting inline tags, ie., <b>un</b>bold is probably one word because <b> is inline tag; but <div>un</div>bold is two words (to be stripped to un bold), because the HTML may well be minified and we actually have a case of <div>hi</div>John. Do you see?

In the example above, we may well have <body>Hi</body>John.

I'd say, if you have an option, use a callback from the opts, tailor the string-strip-html to your inputs. Just reuse some examples, even #13 has one I posted.

If you believe there is a specific situation we would need to address, open a new ticket, but bring exact, bigger examples of both desired behaviour and false-positive cases — we'll try to investigate.

Thank you for contributing!