str_extensions

Useful extension methods for strings in Rust, carefully benchmarked and extensively configurable by feature flags to minimise its footprint.

Usage

These extensions are implemented for all types implementing AsRef<str>. This covers most of the usual string types, including String, Cow<str> and &str itself.

Extensions

Formatting, cases

❎ WIP: not yet implemented, but the underlying word bounding logic is fully implemented and passes rudimentary tests

trait CaseConversions (click to open details)

Function Name	Example	Details
`to_snake_case`	`this_is_an_example`	has an uppercase variant
`to_camel_case`	`thisIsAnExample`
`to_pascal_case`	`ThisIsAnExample`
`to_kebab_case`	`this-is-an-example`	has an uppercase variant
`to_human_readable`	`This is an example.`	tries its best, work in progress
`to_title_case`	`This is an Example`	tries its best, work in progress

String building

⚠️ WIP: Not completely implemented; Also unstable, unoptimized, not all impls match descriptions currently

trait StringBuildExtensions (click to open details)

Function Name	Example	Details
`join`	`"foo".join("bar")` -> `"foobar"` borrow -> owned	only naively functional, work in progress
`concat`	`"foo".concat(["bar", "bat"])` -> `"foobarbat"` borrow -> borrow	only naively functional, work in progress
`append`	`"foo".append("bar")` -> `"foobar"` borrow -> borrow	only naively functional, work in progress
`prepend`	`"foo".prepend("bar")` -> `"barfoo"` borrow -> borrow	only naively functional, work in progress

Type coercion

⚠️ WIP: not yet implemented

trait StringTypeExtensions (click to open details)

Function Name	Example	Details
`as_cow`		Essentially free, cost only associated with mutating the string, which turns it into `Cow::Owned` state
`into_arc`		Allocates a `String` and wraps it into an `Arc`

Naming conventions

Naming conventions (click to open details)

We try to follow the official rust naming guidelines, i.e:

Prefix	Cost	Ownership
as_	Free	borrowed -> borrowed
to_	Expensive	borrowed -> borrowed borrowed -> owned (non-Copy types) owned -> owned (Copy types)
into_	Variable	owned -> owned (non-Copy types)

This means that you can expect the extension methods to follow the official semantics and behave similarly, especially regarding the cost.

Performance

This repository contains three different methods to perform word bounds resolution - with standard regex crate, with fancy_regex crate, and a custom regexless char-walking version.

The performance of these methods is evaluated using criterion benchmarking library. See benches/bench_word_bounds.rs for the benchmarking code and try it yourself. Here are the latest results on a macbook air m1 (which shows the relational performance, while the exacts will of course vary by system etc.):

Trait	Execution Time	Description
`WordBoundResolverRegex`	119.09 µs (average)	⚠️ Major WIP (More) Accurate, but currently ~50x slower than `no_regex`. Based on prior proof-of-concepts, we should ultimately land at around ~3x slower than the charwalk variant. Suitable for non-critical performance paths.
`WordBoundResolverFancyRegex`	15.433 µs (average)	🚧 WIP, but almost there All-inclusive regex logic including lookahead/lookback, which should be even more accurate, but ~7x slower than `no_regex`. Use only when other variants fail.
`WordBoundResolverCharwalk`	2.4 µs (average)	❎ Just needs more optimization Fastest and simplest, but could fail on certain edge cases. Officially suggested method for common cases.

The criterion benchmark results show that WordBoundResolverCharwalk is the fastest yet simplest method, taking only about 2.4 µs on average per the benchmarking execution. The regex variants can be more accurate, and their logic is using a tried and tested framework, but they are significantly more expensive to run; the WordBoundResolverRegex that has no integrated lookahead/lookback features, replaces this absence with a custom post-process pass, and should be about 3 times slower than the WordBoundResolverCharwalk variant (⚠️ but is under construction and while it passes the tests, it's 50x slower at the moment ⚠️). The WordBoundResolverFancyRegex which makes use of the regex engine for all of its logic (including lookahead/lookback), is more than 7 times slower than the WordBoundResolverCharwalk variant, though should yield the most accurate results.

Note: The regex variants are somewhat optimized, and in addition the crate has two different focuses for optimizations with the feature flags optimize_for_cpu and optimize_for_memory. This is mostly relevant for someone doing extreme and picky optimizations on a larger project, otherwise one should stick to the defaults. The default configuration for optimizations bring the heaviest one, fancy_regex variant, down from around the 40 micro second range to its current ~15 micro second range (with the same system as for the above benchmark results).

The official suggestion is to use WordBoundResolverCharwalk (i.e neither use_regex nor use_fancy_regex features are enabled), unless you face an edge case that isn't covered yet in the manual parsing logic. After that, you should test whether WordBoundResolverRegex works, and if not, try WordBoundResolverFancyRegex.

Note: Ultimately the costs are not usually all that significant, since this shouldn't be called in any hot loops, but your mileage may vary. Any and all issues and pull requests are welcome, if you face an edge case that isn't covered on the WordBoundResolverCharwalk variant.

Example

TODO

The Problem

TODO

Support

Whether you use this project, have learned something from it, or just like it, please consider supporting it by buying me a coffee, so I can dedicate more time on open-source projects like this :)

License

You can check out the full license here

This project is licensed under the terms of the MIT license.

orgrinrt / str_extensions