Useful extension methods for strings in Rust, carefully benchmarked and extensively configurable by feature flags to minimise its footprint.
These extensions are implemented for all types implementing AsRef<str>
. This covers most of the usual string types,
including String
, Cow<str>
and &str
itself.
❎ WIP: not yet implemented, but the underlying word bounding logic is fully implemented and passes rudimentary tests
trait CaseConversions
(click to open details)
Function Name | Example | Details |
---|---|---|
to_snake_case |
this_is_an_example |
has an uppercase variant |
to_camel_case |
thisIsAnExample |
|
to_pascal_case |
ThisIsAnExample |
|
to_kebab_case |
this-is-an-example |
has an uppercase variant |
to_human_readable |
This is an example. |
tries its best, work in progress |
to_title_case |
This is an Example |
tries its best, work in progress |
trait StringBuildExtensions
(click to open details)
Function Name | Example | Details |
---|---|---|
join |
"foo".join("bar") -> "foobar" borrow -> owned |
only naively functional, work in progress |
concat |
"foo".concat(["bar", "bat"]) -> "foobarbat" borrow -> borrow |
only naively functional, work in progress |
append |
"foo".append("bar") -> "foobar" borrow -> borrow |
only naively functional, work in progress |
prepend |
"foo".prepend("bar") -> "barfoo" borrow -> borrow |
only naively functional, work in progress |
trait StringTypeExtensions
(click to open details)
Function Name | Example | Details |
---|---|---|
as_cow |
Essentially free, cost only associated with mutating the string, which turns it into Cow::Owned state |
|
into_arc |
Allocates a String and wraps it into an Arc |
Naming conventions (click to open details)
We try to follow the official rust naming guidelines, i.e:
Prefix | Cost | Ownership |
---|---|---|
as_ | Free | borrowed -> borrowed |
to_ | Expensive | borrowed -> borrowed borrowed -> owned (non-Copy types) owned -> owned (Copy types) |
into_ | Variable | owned -> owned (non-Copy types) |
This means that you can expect the extension methods to follow the official semantics and behave similarly, especially regarding the cost.
This repository contains three different methods to perform word bounds resolution - with standard regex
crate,
with fancy_regex
crate, and a custom regexless char-walking version.
The performance of these methods is evaluated using criterion
benchmarking library. See benches/bench_word_bounds.rs for the benchmarking code and
try it yourself. Here are the latest results on a macbook air m1 (which shows the relational performance, while the
exacts
will of course vary by system etc.):
Trait | Execution Time | Description |
---|---|---|
WordBoundResolverRegex |
119.09 µs (average) | (More) Accurate, but currently ~50x slower than no_regex . Based on prior proof-of-concepts, we should ultimately land at around ~3x slower than the charwalk variant. Suitable for non-critical performance paths. |
WordBoundResolverFancyRegex |
15.433 µs (average) | 🚧 WIP, but almost there All-inclusive regex logic including lookahead/lookback, which should be even more accurate, but ~7x slower than no_regex . Use only when other variants fail. |
WordBoundResolverCharwalk |
2.4 µs (average) | ❎ Just needs more optimization Fastest and simplest, but could fail on certain edge cases. Officially suggested method for common cases. |
The criterion
benchmark results show that WordBoundResolverCharwalk
is the fastest yet simplest method, taking only
about
2.4 µs on average per the benchmarking execution. The regex variants can be more accurate, and their logic is
using a tried and
tested framework, but they are significantly more expensive to run; the WordBoundResolverRegex
that has no integrated
lookahead/lookback features, replaces this absence with a custom post-process pass, and should be about 3 times slower
than the
WordBoundResolverCharwalk
variant (WordBoundResolverFancyRegex
which makes use of the regex
engine for all of
its logic (including
lookahead/lookback), is more than 7 times slower than the WordBoundResolverCharwalk
variant, though should yield
the most accurate results.
Note: The regex variants are somewhat optimized, and in addition the crate has two different focuses for optimizations with the feature flags
optimize_for_cpu
andoptimize_for_memory
. This is mostly relevant for someone doing extreme and picky optimizations on a larger project, otherwise one should stick to the defaults. The default configuration for optimizations bring the heaviest one,fancy_regex
variant, down from around the 40 micro second range to its current ~15 micro second range (with the same system as for the above benchmark results).
The official suggestion is to use WordBoundResolverCharwalk
(i.e neither use_regex
nor use_fancy_regex
features are enabled),
unless you face an edge case that isn't covered yet in the manual parsing logic. After that, you should test whether
WordBoundResolverRegex
works, and if not, try WordBoundResolverFancyRegex
.
Note: Ultimately the costs are not usually all that significant, since this shouldn't be called in any hot loops, but your mileage may vary. Any and all issues and pull requests are welcome, if you face an edge case that isn't covered on the
WordBoundResolverCharwalk
variant.
TODO
TODO
Whether you use this project, have learned something from it, or just like it, please consider supporting it by buying me a coffee, so I can dedicate more time on open-source projects like this :)
You can check out the full license here
This project is licensed under the terms of the MIT license.