left-pad / left-pad

:arrow_left: String left pad -- deprecated, use String​.prototype​.pad​Start()

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Wrong size when left padding a unicode string

dubzzz opened this issue · comments

There is an inconsistency when padding strings containg unicode characters out of BMP plan (ie. code points encoded on two chars in UTF-16).

leftPad('a\u{1f431}b', 4, 'x') => 'a\u{1f431}b' // in: 3 code points, out: 3 code points
leftPad('abc', 4, '\u{1f431}') => '\u{1f431}abc' // in: 3 code points, out: 4 code points

You should maybe specify that left-pad does not handle code points out of BMP plan as single characters.

Failure found using property based testing:
https://runkit.com/dubzzz/5ab9f3d8cc861f0012852eff

You should maybe specify that left-pad does not handle code points out of BMP plan as single characters.

Let's add it to the docs. PR welcome

I had a look to the implementation selected by latest versions of ECMA for padStart. They chose to consider code points outside the BMP plan as two distinct characters.

With padStart on Chrome and Firefox I get the following:

'a\u{1f431}b'.padStart(4, 'x') => "a🐱b"
'abc'.padStart(4, '\u{1f431}') => "\ud83dabc"

If the choice of left-pad is to be compliant with padStart then the only way to handle this case would be to solve the issue of the third argument not accepting multiple characters.

Otherwise if the choice is leftPad on code points (and not code units) the fix consists in measuring the size of the input string by by-passing String.length and measuring the length manually (only characters in the range \ud800 to \udfff can be by pairs). Then the pad code will be the same.