Bugs in tokenize()

Question

Bugs in tokenize()

AntonBogun opened this issue 2 years ago · comments

AntonBogun commented 2 years ago

print(tokenize(";a;;", ";", "")) outputs ["","a"] instead of expected ["","a","",""]

AntonBogun · Answer 1 · Tue Jul 19 2022 21:14:03 GMT+0800 (China Standard Time)

Furthermore, the whitespace argument does not work as intended:

let str="Hello,\r\nWorld"
print(tokenize(str,"\n",""))
print(tokenize(str,"\n","\r"))

prints
["Hello,\r", "World"]
["Hello,\r", "World"]
while the second print is expected to be ["Hello,", "World"]

Wouter van Oortmerssen · Answer 2 · Wed Jul 20 2022 00:00:32 GMT+0800 (China Standard Time)

That is intentional, it says by splitting into segments upon each dividing or terminating delimiter, i.e. the last 2 , are terminating.

The intention was to make the function useful for situations where the delimiter is dividing (such as typically intended with ,) or terminating (as typically intended with ; ).

I suppose I could have added a bool indicating whichever strategy, or two separate functions.

Agree the whitespace case is a bug.

Wouter van Oortmerssen · Answer 3 · Wed Jul 20 2022 00:37:28 GMT+0800 (China Standard Time)

Ok, improved things in fd1f0c1

The whitespace bug is fixed.
Two consecutive dividers will not be regarded as one anymore, result in empty string in between.
Optional argument to get dividing rather than terminating behavior.