Decoupling byte-level encoding
BurningWitness opened this issue · comments
When writing a JSON parser (GaloisInc/json#17) I needed some way to decode UTF-8 and to my dismay I found all existing solutions do not fit my expectations:
GHC.Encoding.UTF8
andGHC.IO.Encoding
areIO
-based and I don't want that in a parser;Data.Text.Internal.Encoding.Utf8
, while pure, appears to both returnReject
as an error and has a rather complex interface;Data.Text.Encoding.*
andData.Text.Lazy.Encoding.*
are already parsers themselves, too high-level for this task;utf-string
'sCodec.Binary.UTF8.String
consumes and returns lists, so it isn't parser-compatible.
I decided to handroll the UTF-8 decoding, which allowed me to categorize the errors (see Encoding.Mixed.Error) and resulted in a lot of code on the parser side that has little to do with consuming bytes per se (see Codec.Web.JSON.Parse.String).
However the code I wrote can instead be generalized to:
-- Assume Error is Encoding.Mixed.Error.Error
data UTF8 a = UTF8_1 a
| Part_2 (Word8 -> UTF8_2 a)
| Part_3_1 (Word8 -> Part_3_1 a)
| Part_4_1 (Word8 -> Part_4_1 a)
| Error_1 Error
data UTF8_2 a = UTF8_2 a
| Error_2 Error
data Part_3_1 a = Part_3_2 (Word8 -> UTF8_3 a)
| Error_3_1 Error
data UTF8_3 a = UTF8_3 a
| Error_3_2 Error
data Part_4_1 a = Part_4_2 (Word8 -> Part_4_2 a)
| Error_4_1 Error
data Part_4_2 a = Part_4_3 (Word8 -> UTF8_4 a)
| Error_4_2 Error
data UTF8_4 a = UTF8_4 a
| Error_4_3 Error
newtype Conv1 a = Conv1 (Word8 -> a)
newtype Conv2 a = Conv2 (Word8 -> Word8 -> a)
newtype Conv3 a = Conv3 (Word8 -> Word8 -> Word8 -> a)
newtype Conv4 a = Conv4 (Word8 -> Word8 -> Word8 -> Word8 -> a)
utf8 :: Conv1 a -> Conv2 a -> Conv3 a -> Conv4 a -> Word8 -> UTF8 a
utf8 = -- I'm omitting the implementation, but it's only 50 lines long
Parsing then is simply unwrapping UTF8
. This decouples character validation and conversion, the only part of decoding left is ensuring only the maximal subpart of an ill-formed sequence is consumed, which is the responsibility of the parser.
My proposal is creating a separate package with a focus specifically on decoding/encoding UTF-8/UTF-16/UTF-32 on byte-level. Then text
can drop some internal modules in favor of a simpler common interface.
This proposal is however naive: I do not know whether GHC can inline these datatypes reliably or, indeed, at all. Based on my cursory reading of the Secrets of the Glasgow Haskell Compiler inliner paper it should, as each of these expressions is trivial.
This doesn't clash with the issue of GHC's many UTF-8 implementations (outlined in GHC.Encoding.UTF8
) as all other algorithms are in IO
.
Other concerns:
text
is a core library, so I assume an extra dependency can't just be added on a whim;- Package named
utf
already exists and is deprecated. I don't know how hard reclaiming deprecated packages is.
Adding a dependency to text
is too much of hassle IMO. But we can probably incorporate desired changes into text
itself. Could you please elaborate why a naive parser from Data.Text.Internal.Encoding.Utf8
is not sufficient for your needs?
While Data.Text.Internal.Encoding.Utf8
is indeed functional enough to serve its purpose, my concerns are the following:
- The interface is recursive, so the
Incomplete
state on the fourth byte is unreachable; - The
Accept
andIncomplete
constructors force their fields, so returned codepoints need to be evaluated even if they're never used; - Ideally I'd want to share the error type with the
text
library, but alasDecodeError
represents that as aString
and there's no way to derive that from theReject
result.
I do have to admit that all of these issues are minor and I do not know why anyone would ever need to use succinct errors (other than cool error reporting), but the approach I'm proposing is the properly decoupled Haskell view of things.
One thing to note is that I haven't looked deep into the structure of Hoehrmann's C-based decoder, but from what I see the by-the-book decoding is just a chain of up to thirteen comparisons, so I don't yet understand the need for a complex state machine here (other than code shortness of course, but Haskell isn't C).
For performance reasons two array lookups are much better than up to 13 comparisons.
Once Reject
ed, one is supposed to apply whatever error reporting desired. If you kept the previous state at hand, it should be fairly straightforward to do so.
For performance reasons two array lookups are much better than up to 13 comparisons.
Isn't this only true if the entire lookup table resides in L1 cache? Sure this will work fine for C parsers, but I don't know if any random Haskell parser interleaved with the algorithm can guarantee this.
Also it's 1 comparison for 00..7F and 5 for 80..7FF, so for really simple strings even two array lookups in L1 cache seem like an overkill.
Rolling a benchmark to compare the two approaches should be easy, so perhaps I should do that.
The main blocker for this proposal is going to be performance. I'd be surprised if you can use your API to write a streaming JSON parser whose performance is comparable to using the Data.Text.Internal.Encoding.Utf8
module or the recently added validateUtf8Chunk
(etc.) primitives in Data.Text.Internal.Encoding
.
There is an intentional trade off of a tiny bit of imprecision for a lot of performance. The parser state fits in a single byte (DecoderState
), which can be easily unpacked by GHC optimizations into a tight loop that does no allocations. In contrast, an API like you propose with lots of first-class functions aims to more accurately represent the state machine for parsing UTF-8, reducing unreachable branches, but (1) GHC won't be able to optimize the allocations away, (2) it's unclear how that granularity results in practical benefits.
The interface is recursive, so the
Incomplete
state on the fourth byte is unreachable;
Making that state unreachable is really the main point of your API, and as you mention it's unclear what the use case would be.
The
Accept
andIncomplete
constructors force their fields, so returned codepoints need to be evaluated even if they're never used;
The fields are one word each. The expectation is that they are going to be unpacked in a tight loop that does not allocate. This is much cheaper than allocating a thunk for the partial codepoint to be evaluated only if it is used. If you don't need the partial code point (only doing validation), then you can use updateDecoderState
instead.
Isn't this only true if the entire lookup table resides in L1 cache? Sure this will work fine for C parsers, but I don't know if any random Haskell parser interleaved with the algorithm can guarantee this.
This is likely to be true irrespective of caches: bear in mind that in your case each comparison is a condition and involves 1 or 2 jump instructions depending on branch chosen.
So after a week of tinkering I made a benchmark repository (link). The algorithms I wrote include no low-level magic, just inlines and strict copying. The benchmark timings can be found on the README.md
there, and here are a few points that follow from the results:
-
GHC does indeed inline the data structure, even at
-O1
. INOINLINE
d both of the parsers I wrote and the only places that retain references toCodec.Encoding.UTF8
on the final STG are theText
variants on chunk end, solely because I force it in theResume
type; -
Pretty much all UTF-8 decoding is done using
simdutf
, so on every chunk border the arrays have to be pulled back from the ether just to do 1-4 lookups; -
decodeUtf8
does not follow the maximal subpart rule.
Problems I could not resolve:
-
For whatever reason I can't turn off the
simdutf
flag. If someone can try outdecodeUtf8
without the SIMD algorithm, that'd be quite nice as it's probably the only place that clearly outperforms my solution; -
Based on the fact that the SIMD version of my
Text
algorithm runs faster on late errors than the basic one, the latter one screws up inlining and it should be at least 10% faster when done right. This isn't that important, so I haven't dug into it;
Also I wonder why simdutf
returns a boolean when it could return the last known valid UTF-8 boundary.
For the record all of my benchmarks have been executed on a laptop CPU, so, as with all things cache-related, YMMV, and extra benchmark subjects are welcome.
I concede that you can get your data structure to be inlined. But that relies on unrolling the loop yourself so you always start an iteration at a code point boundary. Performance-wise, the extra branching may have a detrimental effect on branch prediction. Your initial comment about IO made me assume you didn't want simdutf but I misunderstood. If the main loop uses simdutf then performance for the error branch is much less of a concern.
I'm still not convinced a more fine grained API for UTF-8 is really better. I disagree that in comparison "Data.Text.Internal.Encoding.Utf8
(...) has a rather complex interface." That API is an automaton, which is as simple and standard as it gets: byte goes in, new state and/or output comes out. You don't have to unroll four nested pattern-matches to use that API efficiently. I think the main bit of apparent complexity is that it exposes the internal state for error reporting, and that part of the interface could be cleaned up to make it easier to diagnose errors.
decodeUtf8
does not follow the maximal subpart rule.
That sounds like a bug, right?
Also I wonder why simdutf returns a boolean when it could return the last known valid UTF-8 boundary.
That's a feature request for simdutf. I don't know what the current status is, but it would indeed let us simplify UTF-8 parsing further.
Also, another API you haven't mentioned is Data.Text.Encoding.decodeUtf8Chunk
. What's your opinion of that for your problem?
decodeUtf8
does not follow the maximal subpart rule ... sounds like a bug, right?
Yes, it should probably have its own issue. Can be replicated through the tests here.
You don't have to unroll four nested pattern-matches to use that API efficiently.
If you wish to respect maximal subpart rule, an error encountered on the first byte results in byte consumption and an error on any successive byte does not. As such you need to track byte boundaries, that's four repeats with the array lookup algorithm. The full unroll is just as deep on the 4-byte branch, and every other branch is shallower than that.
I disagree that in comparison "Data.Text.Internal.Encoding.Utf8 (...) has a rather complex interface."
I admit my phrasing on this point is incorrect, if anything the fact that it's in an Internal
module is a much better reason to not use it.
I'm fine with it existing in a separate module with proper documentation, as it may indeed be useful for some highly specific parsers, but so far the benchmarks I linked above show it's not even performance-critical in this library.
another API you haven't mentioned is
Data.Text.Encoding.decodeUtf8Chunk
I have, it's the third bullet point of this issue. A JSON parser needs to treat "
as end-of-parse and \
as its own small subparser, so anything beyond a byte-level decoder doesn't fit the purpose.
Having conceded that performance is not an issue, the only remaining difference with Data.Text.Internal.Encoding.Utf8
I see is that your API lets you not have any parser state (except the offset) in between code points. Am I missing anything else?
The trade-off is that you have to write a big tree of nested cases to effectively use that API, since every byte within a codepoint results in a different type of state. Those 40 lines of code correspond to these 7 lines of code in the text library. So even purely in terms of aesthetics ("the properly decoupled Haskell view of things.") it's a hard sell.
The proposed API actually makes things more coupled than Data.Text.Internal.Encoding.Utf8
because it exposes too many details of UTF-8 in the types.
In that case, would making Data.Text.Internal.Encoding.Utf8
not internal resolve this?
lets you not have any parser state (except the offset) in between code points
I don't think "lets" is a correct term here, you can weave any state you want into it, it's just a datatype.
in terms of aesthetics ("the properly decoupled Haskell view of things.") it's a hard sell / exposes too much UTF-8 in the types
The entire point is that it exposes all the innerworkings while abstracting all the hard parts. "Decoupled" doesn't mean "short" or "convenient", it just means you get the power to write whatever you want with no frills. It's obviously a low-level module, so people using it will be ready to spend five extra minutes and duplicate 30 lines.
making
Data.Text.Internal.Encoding.Utf8
not internal resolve this
It would definitely help with other people using it, but at this point I would rather carry around a 170 line module that does it in a much more streamlined fashion with predictable performance.
This applies to the StrictBuilder
as well (I call it Copy
on my side). The exposed API can be used to do what's advertised, but it's not exposed properly or documented succinctly enough to be useful.
For the record you don't need a strong reason to deny this proposal, a simple "we don't have people for this, sorry" is enough. The reason I'm pushing for it is because I already have two different places I want to use it in and I don't want to toss it onto my personal pile of "should be on Hackage, but uploading is a nuisance, I haven't tested it enough and the previous maintainer is nowhere to be seen" projects.
I'm just making sure that I'm not completely missing the point of your proposal. Beyond that, we'll indeed have to agree to disagree, unless another @haskell/text maintainer wants to chime in.
- Pretty much all UTF-8 decoding is done using
simdutf
, so on every chunk border the arrays have to be pulled back from the ether just to do 1-4 lookups;
There are three engines for UTF-8 validation in text
:
- If you can afford linking against C++,
simdutf
is used for bulk processing, and the naive engine kicks in only at the boundaries of chunks. Somewhat frustratingly, if you get precompiledtext
directly from GHC bindist, most likelysimdutf
flag is disabled (because of linking issues). - Otherwise if
bytestring >= 0.11.5
(which is fairly new), we use UTF-8 engine from there (written in C) and again invoke the naive engine only at the boundaries. - Otherwise we use the naive engine full time.
I'm not sure what the supposed story for JavaScript backend: it might happen that it's better to use the naive engine instead of compiling C decoder from bytestring
into JS.
If you want to benchmark Haskell native implementations, pass cabal build --constraint 'text -simdutf' --constraint 'bytestring < 0.11.5'
.
There are ways to embellish Data.Text.Internal.Encoding.Utf8
, e. g., expose byteToClass
, provide descriptive pattern synonyms for ByteClass
, and add something like explain :: DecoderState -> ByteClass -> String
, which produces an explanation what exactly went wrong. I am however reluctant to replace the mechanism entirely or add one more UTF-8 decoding engine.
I agree that a more fine-grained error reporting has its use cases, but I fell that it's better to iterate on it outside of text
, in a separate package. Bear in mind, it is very difficult to change something in a boot library, and not easy to allow users to upgrade, so it's better to evolve API elsewhere.
pass
cabal build --constraint 'text -simdutf' --constraint 'bytestring < 0.11.5'
While I was missing the fact that bytestring
needs to have a specific version, neither contraints, nor cabal.project
modifications, nor even specifying a bytestring
boundary directly in the cabal
file change anything. Even source-repository-package
over a git clone
doesn't apply, so I'm out of relatively sane options here.
There are three engines for UTF-8 validation
The performance concern applies specifically to the sidecase of using simdutf
/bytestring
C validator, since crossing chunk borders with continuations still uses the array lookup algorithm. This is something I have tested and it's slower even than naive comparisons (mind you my algorithm is actually very slow on this sidecase too, since I force it to allocate the data structure).
a more fine-grained error reporting
My original point was that I wanted to share error handling with text
for consistency, but now I know that OnDecodeError
effectively returns a constant String
and an entirely ambiguous Word8
. As such this point is moot.
Right now the proposal grinds down to the following points:
-
Both the current array lookup algorithm and zero-cost datatype versions could be moved into a separate package or a set of non-internal modules within
text
, which would also allow the removal ofData.Text.Internal.Encoding.Utf*
modules; -
OnDecodeError
andUnicodeException
do not provide any reliable error information and as such may be reduced toMaybe Char
and a unit respectively; -
There is a minor performance improvement to be gained from using regular branches instead of array lookup when using
simdutf
/C validation code; -
It may be a good idea to move
StrictBuilder
out of internals as well.
As the main point of this issue is adding algorithms that are not immediately needed within the library and cannot be abstracted into a separate boot library for management reasons, this issue is indeed dead in the water. If noone else has any strong opinions on this topic, I will close the issue at the end of this week.
Even
source-repository-package
over agit clone
doesn't apply, so I'm out of relatively sane options here.
That's extremely strange, could you share a reproducer? Might be worse to raise a bug against Cabal
.
The performance concern applies specifically to the sidecase of using
simdutf
/bytestring
C validator,
My point above was that there are situations when the naive decoder is the only available, and its performance matters. If one wants to make a statements about performance, this case should be measured by disabling simdutf
/ bytestring
.
the current array lookup algorithm ... could be moved into ... a set of non-internal modules within
text
Makes sense to me.
OnDecodeError
andUnicodeException
do not provide any reliable error information and as such may be reduced toMaybe Char
and a unit respectively;
That's largely true. Unfortunately, it's very difficult to iterate on a better interface without repeatedly breaking clients. There is not much demand although: usually clients treat pretty much any UTF-8 decoding error is just "hey, this data is not UTF-8", and precise offence reason matters less. I appreciate that JSON decoding is somewhat less forgiving.
Anyways thanks for your efforts and interest!
Okay, I was able to run the benchmarks without SIMD by git clone
ing the package, renaming it and adding it to the packages
section of the cabal.project
, then adding PackageImports
clarifications everywhere.
The results are surprisingly bad for the array lookup algorithm.
Variant | Benchmark | |||||||
---|---|---|---|---|---|---|---|---|
Correct | Early errors | Late errors | Garbage | |||||
32KiB | 2MiB | 32KiB | 2MiB | 32KiB | 2MiB | 32KiB | 2MiB | |
Hoehrmann (SIMD) | 13.69 μs | 1.183 ms | 22.45 μs | 1.790 ms | 163.8 μs | 12.04 ms | 7.104 ms | 459.3 ms |
Lazy (SIMD) | 10.94 μs | 888.5 μs | 17.38 μs | 1.255 ms | 103.5 μs | 7.962 ms | 3.435 ms | 221.0 ms |
Hoehrmann | 162.3 μs | 12.24 ms | 163.1 μs | 11.68 ms | 167.8 μs | 12.52 ms | 917.7 μs | 58.82 ms |
Lazy | 93.24 μs | 7.756 ms | 119.0 μs | 8.576 ms | 121.6 μs | 8.614 ms | 611.7 μs | 41.46 ms |
I'm going to need someone to replicate this on their end and to check my findings for correctness, of course.
For the sake of benchmark reproducibility I incorporated the changes in a fork.
I have inlined everything best I could, the only thing I did not touch is Data.Text.Internal.StrictBuilder
(appendR
zero-length checks may be the cause for the SIMD performance losses seen previously).
The following list includes every single library benchmark that matches a pattern of $0 ~ /ecode/
.
73620de -- HEAD
ebb70b1 -- Naive algorithm (with 73620de as baseline)
17bb010 -- HEAD without SIMD validation
67dea22 -- Naive algorithm without SIMD validation (with 17bb010 as baseline)
Test case | 73620de | ebb70b1 | 17bb010 | 67dea22 | ||
---|---|---|---|---|---|---|
DecodeUtf8.html.Strict | 69.9 μs | 69.2 μs | 1.41 ms | 252 μs | −82% | |
DecodeUtf8.html.Stream | 70.1 μs | 68.4 μs | 823 μs | 264 μs | −67% | |
DecodeUtf8.html.StrictLength | 111 μs | 111 μs | 1.45 ms | 291 μs | −80% | |
DecodeUtf8.html.StrictInitLength | 112 μs | 109 μs | 1.45 ms | 291 μs | −79% | |
DecodeUtf8.html.Lazy | 67.5 μs | 68.6 μs | 820 μs | 262 μs | −67% | |
DecodeUtf8.html.LazyLength | 112 μs | 111 μs | 857 μs | 334 μs | −60% | |
DecodeUtf8.html.LazyInitLength | 111 μs | 110 μs | 857 μs | 301 μs | −64% | |
DecodeUtf8.xml.Strict | 11.3 ms | 11.3 ms | 245 ms | 71.7 ms | −70% | |
DecodeUtf8.xml.Stream | 15.1 ms | 14.9 ms | 174 ms | 78.2 ms | −55% | |
DecodeUtf8.xml.StrictLength | 19.2 ms | 18.7 ms | 252 ms | 79.4 ms | −68% | |
DecodeUtf8.xml.StrictInitLength | 19.3 ms | 19.1 ms | 251 ms | 79.4 ms | −68% | |
DecodeUtf8.xml.Lazy | 13.4 ms | 13.3 ms | 170 ms | 76.7 ms | −54% | |
DecodeUtf8.xml.LazyLength | 19.8 ms | 19.6 ms | 176 ms | 83.4 ms | −52% | |
DecodeUtf8.xml.LazyInitLength | 19.7 ms | 19.6 ms | 175 ms | 83.1 ms | −52% | |
DecodeUtf8.ascii.Strict | 7.52 ms | 7.46 ms | 254 ms | 35.5 ms | −86% | |
DecodeUtf8.ascii.Stream | 11.3 ms | 11.1 ms | 162 ms | 39.2 ms | −75% | |
DecodeUtf8.ascii.StrictLength | 17.2 ms | 16.1 ms | 264 ms | 44.5 ms | −83% | |
DecodeUtf8.ascii.StrictInitLength | 15.8 ms | 15.6 ms | 263 ms | 44.4 ms | −83% | |
DecodeUtf8.ascii.Lazy | 12.4 ms | 12.4 ms | 161 ms | 36.7 ms | −77% | |
DecodeUtf8.ascii.LazyLength | 19.1 ms | 18.7 ms | 170 ms | 44.8 ms | −73% | |
DecodeUtf8.ascii.LazyInitLength | 18.9 ms | 18.6 ms | 168 ms | 44.2 ms | −73% | |
DecodeUtf8.russian.Strict | 1.17 ms | 1.17 ms | 25.5 ms | 8.36 ms | −67% | |
DecodeUtf8.russian.Stream | 1.37 ms | 1.37 ms | 16.4 ms | 8.58 ms | −47% | |
DecodeUtf8.russian.StrictLength | 1.88 ms | 1.89 ms | 26.0 ms | 9.75 ms | −62% | |
DecodeUtf8.russian.StrictInitLength | 1.88 ms | 1.89 ms | 26.0 ms | 9.28 ms | −64% | |
DecodeUtf8.russian.Lazy | 1.37 ms | 1.37 ms | 16.5 ms | 8.57 ms | −48% | |
DecodeUtf8.russian.LazyLength | 2.05 ms | 2.03 ms | 17.1 ms | 9.24 ms | −46% | |
DecodeUtf8.russian.LazyInitLength | 2.03 ms | 2.04 ms | 16.8 ms | 9.24 ms | −45% | |
DecodeUtf8.japanese.Strict | 3.61 μs | 3.67 μs | 59.0 μs | 14.5 μs | −75% | |
DecodeUtf8.japanese.Stream | 3.63 μs | 3.72 μs | 31.5 μs | 14.5 μs | −53% | |
DecodeUtf8.japanese.StrictLength | 5.34 μs | 5.40 μs | 60.9 μs | 16.4 μs | −73% | |
DecodeUtf8.japanese.StrictInitLength | 5.32 μs | 5.40 μs | 60.3 μs | 16.1 μs | −73% | |
DecodeUtf8.japanese.Lazy | 3.63 μs | 3.62 μs | 31.5 μs | 14.5 μs | −53% | |
DecodeUtf8.japanese.LazyLength | 5.44 μs | 5.42 μs | 33.2 μs | 16.3 μs | −50% | |
DecodeUtf8.japanese.LazyInitLength | 5.46 μs | 5.43 μs | 33.4 μs | 16.1 μs | −51% | |
DecodeUtf8.ascii.strict decodeUtf8 | 7.66 ms | 7.41 ms | 256 ms | 35.4 ms | −86% | |
DecodeUtf8.ascii.strict decodeLatin1 | 8.12 ms | 8.02 ms | 8.03 ms | 8.06 ms | ||
DecodeUtf8.ascii.strict decodeASCII | 8.06 ms | 8.05 ms | 9.17 ms | 8.06 ms | −12% | |
DecodeUtf8.ascii.lazy decodeUtf8 | 11.4 ms | 11.0 ms | −3% | 168 ms | 37.3 ms | −77% |
DecodeUtf8.ascii.lazy decodeLatin1 | 13.1 ms | 13.1 ms | 14.0 ms | 13.0 ms | −7% | |
DecodeUtf8.ascii.lazy decodeASCII | 11.6 ms | 11.6 ms | 13.0 ms | 11.6 ms | −11% | |
Pure.tiny.decode.Text | 35.6 ns | 59.7 ns | +67% | 27.7 ns | 50.0 ns | +80% |
Pure.tiny.decode.LazyText | 116 ns | 87.8 ns | −24% | 133 ns | 75.5 ns | −43% |
Pure.tiny.decode'.Text | 47.9 ns | 74.4 ns | +55% | 45.4 ns | 62.7 ns | +38% |
Pure.tiny.decode'.LazyText | 150 ns | 115 ns | −23% | 159 ns | 109 ns | −31% |
Pure.tiny.length.decode.Text | 45.5 ns | 73.7 ns | +61% | 38.5 ns | 64.7 ns | +68% |
Pure.tiny.length.decode.LazyText | 130 ns | 90.5 ns | −30% | 146 ns | 89.7 ns | −38% |
Pure.ascii−small.decode.Text | 9.48 μs | 9.57 μs | 311 μs | 46.4 μs | −85% | |
Pure.ascii−small.decode.LazyText | 11.6 μs | 11.4 μs | 237 μs | 46.7 μs | −80% | |
Pure.ascii−small.decode'.Text | 9.58 μs | 9.54 μs | 310 μs | 45.4 μs | −85% | |
Pure.ascii−small.decode'.LazyText | 11.6 μs | 11.2 μs | 237 μs | 47.0 μs | −80% | |
Pure.ascii−small.length.decode.Text | 18.9 μs | 18.9 μs | 318 μs | 55.3 μs | −82% | |
Pure.ascii−small.length.decode.LazyText | 20.4 μs | 20.2 μs | 244 μs | 56.9 μs | −76% | |
Pure.ascii.decode.Text | 7.45 ms | 7.42 ms | 258 ms | 35.5 ms | −86% | |
Pure.ascii.decode.LazyText | 20.8 ms | 20.5 ms | 205 ms | 46.2 ms | −77% | |
Pure.ascii.decode'.Text | 7.40 ms | 7.41 ms | 254 ms | 35.6 ms | −86% | |
Pure.ascii.decode'.LazyText | 20.6 ms | 20.5 ms | 205 ms | 37.2 ms | −81% | |
Pure.ascii.length.decode.Text | 15.6 ms | 15.5 ms | 264 ms | 44.6 ms | −83% | |
Pure.ascii.length.decode.LazyText | 19.9 ms | 19.5 ms | 213 ms | 44.7 ms | −78% | |
Pure.english.decode.Text | 245 μs | 201 μs | −17% | 17.8 ms | 2.30 ms | −87% |
Pure.english.decode.LazyText | 807 μs | 800 μs | 14.4 ms | 2.55 ms | −82% | |
Pure.english.decode'.Text | 242 μs | 201 μs | −16% | 17.3 ms | 2.30 ms | −86% |
Pure.english.decode'.LazyText | 817 μs | 801 μs | 14.2 ms | 2.53 ms | −82% | |
Pure.english.length.decode.Text | 916 μs | 911 μs | 18.0 ms | 2.91 ms | −83% | |
Pure.english.length.decode.LazyText | 1.35 ms | 1.35 ms | 14.1 ms | 3.01 ms | −78% | |
Pure.russian.decode.Text | 3.30 μs | 3.35 μs | 59.5 μs | 20.4 μs | −65% | |
Pure.russian.decode.LazyText | 3.39 μs | 3.37 μs | 41.7 μs | 20.4 μs | −51% | |
Pure.russian.decode'.Text | 3.31 μs | 3.37 μs | 60.1 μs | 20.4 μs | −66% | |
Pure.russian.decode'.LazyText | 3.45 μs | 3.42 μs | 41.8 μs | 20.4 μs | −51% | |
Pure.russian.length.decode.Text | 4.90 μs | 4.97 μs | 61.6 μs | 21.9 μs | −64% | |
Pure.russian.length.decode.LazyText | 5.03 μs | 5.00 μs | 43.2 μs | 22.0 μs | −49% | |
Pure.japanese.decode.Text | 3.53 μs | 3.58 μs | 59.0 μs | 14.5 μs | −75% | |
Pure.japanese.decode.LazyText | 3.73 μs | 3.71 μs | 34.4 μs | 14.2 μs | −58% | |
Pure.japanese.decode'.Text | 3.64 μs | 3.69 μs | 59.0 μs | 14.5 μs | −75% | |
Pure.japanese.decode'.LazyText | 3.77 μs | 3.74 μs | 34.4 μs | 14.6 μs | −57% | |
Pure.japanese.length.decode.Text | 5.32 μs | 5.41 μs | 60.8 μs | 16.2 μs | −73% | |
Pure.japanese.length.decode.LazyText | 5.49 μs | 5.44 μs | 36.1 μs | 16.2 μs | −55% |
Thanks for benchmarking @BurningWitness. Sorry, I'm extra busy this week, will take a look later.
@BurningWitness sorry again, I didn't forget about your work here, but still no time to dive in properly.