rutrum / convert-case

Converts to and from various cases.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Panicked at 'byte index

makorne opened this issue · comments

let persp = "ПЕРСПЕКТИВА24".to_case(Case::Title);

thread 'main' panicked at 'byte index 11 is not a char boundary; it is inside 'Е' (bytes 10..12) of ПЕРСПЕКТИВА24', /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/core/src/str/mod.rs:576:13

The same for:
let persp = "тЦ".to_case(Case::Title);

thread 'main' panicked at 'byte index 3 is not a char boundary; it is inside 'Ц' (bytes 2..4) of тЦ', /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/core/src/str/mod.rs:576:13

This panics when splitting the string into segments. The method I wrote uses the str::chars method which didn't iterate over characters in a way I had first thought. It describes the exact problem in the documentation for the method.

I know that heck uses a library called unicode-segmentation crates.io that would fix this problem.

My first thoughts are introducing unicode-segmentation as a feature, to still allow a simple, no dependency library that works on ASCII strings reliably (and many other strings, but not all as you have pointed out). One could argue however that it should just be default behavior. Maybe you have an opinion?

I wrote this library particularly to create the command line utility ccase and I imagined that this would also be applicable in transpilers and other code mutating/generating programs in esoteric language compilers, etc. Both of these I don't think encounter non-ascii strings too often. But if you look at the libraries which use convert_case as a dependency you can see it's usually typically used exactly once and only for converting some string into snake_case or CamelCase for example. So maybe, it would make more sense for it to just including unicode character segmentation by default.

That said, due to the fine-grained nature of this library for case-conversion, I don't think it's primary audience is for one time, simple usage (which it is perfectly fine for). So I am leaning more towards not including unicode character segmentation by default.

I have added unicode-segmentation as a dependency, and added a test which specifically address the string you provided. The newest version, convert-case version 0.6.0 fixes this issue.

convert-case/src/lib.rs

Lines 657 to 660 in a8702a0

fn russian() {
let s = "ПЕРСПЕКТИВА24".to_string();
let _n = s.to_case(Case::Title);
}