Panicked at 'byte index
makorne opened this issue · comments
let persp = "ПЕРСПЕКТИВА24".to_case(Case::Title);
thread 'main' panicked at 'byte index 11 is not a char boundary; it is inside 'Е' (bytes 10..12) of ПЕРСПЕКТИВА24
', /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/core/src/str/mod.rs:576:13
The same for:
let persp = "тЦ".to_case(Case::Title);
thread 'main' panicked at 'byte index 3 is not a char boundary; it is inside 'Ц' (bytes 2..4) of тЦ
', /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/core/src/str/mod.rs:576:13
This panics when splitting the string into segments. The method I wrote uses the str::chars
method which didn't iterate over characters in a way I had first thought. It describes the exact problem in the documentation for the method.
I know that heck
uses a library called unicode-segmentation
crates.io that would fix this problem.
My first thoughts are introducing unicode-segmentation as a feature, to still allow a simple, no dependency library that works on ASCII strings reliably (and many other strings, but not all as you have pointed out). One could argue however that it should just be default behavior. Maybe you have an opinion?
I wrote this library particularly to create the command line utility ccase
and I imagined that this would also be applicable in transpilers and other code mutating/generating programs in esoteric language compilers, etc. Both of these I don't think encounter non-ascii strings too often. But if you look at the libraries which use convert_case
as a dependency you can see it's usually typically used exactly once and only for converting some string into snake_case or CamelCase for example. So maybe, it would make more sense for it to just including unicode character segmentation by default.
That said, due to the fine-grained nature of this library for case-conversion, I don't think it's primary audience is for one time, simple usage (which it is perfectly fine for). So I am leaning more towards not including unicode character segmentation by default.