Wrong boundary detected when converting from camel case

Question

Wrong boundary detected when converting from camel case

ThoFrank opened this issue 2 years ago · comments

Hi there,
I really like you crate. However I came across a small issue. If the second char is uppercase in a camel case string, there should not be boundary inserted.

Here are two test cases:

failes with "foo_bar" != "f_oo_bar"

#[test]
fn test_camel_case() {
    assert_eq!("foo_bar", "fOOBar".from_case(Case::Camel).to_case(Case::Snake))
}

succeeds

#[test]
fn test_pascal_case() {
    assert_eq!("foo_bar", "FOOBar".from_case(Case::Camel).to_case(Case::Snake))
}

It's probably a quite niche bug and also easy to work around by just making the first letter always uppercase and then converting from pascal case instead.

Rutrum · Answer 1 · Thu Nov 24 2022 20:36:26 GMT+0800 (China Standard Time)

Hello there,

This is not a bug, and is intended behavior. Recall that from_case really just pulled a list of Boundary that are commonly associated with that case. In camel case you would expect a lowercase followed by uppercase to be a boundary (aA), for example. There are also boundaries for digits as well. Luckily within convert_case you can actually easily see the associated boundaries for a case. Here are those for camel case.

println!("{:?}", Case::Camel.boundaries());

[LowerUpper, Acronym, LowerDigit, UpperDigit, DigitLower, DigitUpper]

We can also look at all the possible boundaries that can be identified in a provided string. Let's look at what is in your example strings.

println!("{:?}", Boundary::list_from("FOOBar"));

[Acronym]

println!("{:?}", Boundary::list_from("fOOBar");

[LowerUpper, Acronym]

FOOBar contains the acronym boundary, and because that is in camel case's boundaries it is used a the point to split the string into words. It gets split to create FOO and Bar which are then combined into foo_bar as snake case.

fOOBar contains the acronym boundary AND the lowerupper boundary. This lowerupper boundary is at the first two characters fO. This is also in camel case's list of boundaries so the string is split into f and OO and Bar which is combined to f_oo_bar as snake case.

This lowerupper boundary is expected for camel case, since that's how we join words. The end of one word is lowercase and the next begins with uppercase. In the case of fOOBar, the first word is f, followed by OOBar.

All that is to say this is expected behavior.

Thomas Frank · Answer 2 · Fri Nov 25 2022 15:23:02 GMT+0800 (China Standard Time)

Thanks for the answer,
I already dug a bit around the code and I understand that what happend is expected to happen based on the boundary logic. However I think in the case of fOOBar the more correct way would be to ignore the first boundary / treat the first letter as uppercase.
It's probably an ugly patch to "fix" it (make it the way that I see more correct). And the user-side fix is quite easy to do.
The main motivation of this bug report was to make you aware of this and then maybe tell others who are looking for this that they have to manually fix it on their end.

Cheers