memchr2_iter and memchr3_iter do not properly advance slice position

Question

memchr2_iter and memchr3_iter do not properly advance slice position

lopopolo opened this issue 5 years ago · comments

Ryan Lopopolo commented 5 years ago

Hi @BurntSushi I'm the author of Artichoke Ruby. We met on Twitter.

Playground: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=7f18fe26ae4413e95f1e350d77f86b28

The iter_next! macro hard codes how much to advance the haystack position by.

https://github.com/BurntSushi/rust-memchr/blob/1ec5ecce03c220c762dd9a8b08f7a3d95522b765/src/iter.rs#L17

This means that memchr*_iter functions on more than one byte incorrectly scan. For example, this code outputs 2 when it should output 1:

extern crate memchr; // 2.2.1

fn main() {
    let haystack = b"abcdefghijklmnopqrstuvwxyz";
    println!("{}", memchr::memchr2_iter(b'a', b'b', haystack.as_ref()).count());
}

Ryan Lopopolo · Answer 1 · Mon Oct 21 2019 08:07:58 GMT+0800 (China Standard Time)

This is tickled in this particular example by another bug where memchr2 reports a match if the haystack starts with the second byte:

extern crate memchr; // 2.2.1

fn main() {
    let haystack = b"abcdefghijklmnopqrstuvwxyz";
    println!("{}", memchr::memchr2_iter(b'a', b'b', haystack.as_ref()).count());
    println!("{:?}", memchr::memchr2(b'a', b'b', haystack.as_ref()));
    println!("{:?}", memchr::memchr2(b'a', b'b', &haystack[1..]));
}

output:

2
Some(0)
Some(0)

Andrew Gallant · Answer 2 · Mon Oct 21 2019 08:17:19 GMT+0800 (China Standard Time)

Hmmm, sorry, but all of your examples are working as intended and are correct. The memchr2 and memchr3 functions provide all matches for any of the given needles. The needles are not concatenated and treated like a substring.

It looks like the docs could be clearer and include examples.

Ryan Lopopolo · Answer 3 · Mon Oct 21 2019 08:19:27 GMT+0800 (China Standard Time)

oh that was unclear. I should be able to implement substring matching with regular memchr I think! Thanks for helping me to understand.

Ryan Lopopolo · Answer 4 · Mon Oct 21 2019 08:36:15 GMT+0800 (China Standard Time)

Yay it works! Thanks for pointing me in the right direction.

[17:34] [~/dev/artichoke/artichoke]
▶ time ./target/release/artichoke -e 's = "abcdefg" * 1024' -e '100_000.times { raise if s.scan("abcdef").length != 1024 }'

real	0m5.017s
user	0m4.961s
sys	0m0.035s
[17:34] [~/dev/artichoke/artichoke]
▶ time ruby -e 's = "abcdefg" * 1024' -e '100_000.times { raise if s.scan("abcdef").length != 1024 }'

real	0m12.892s
user	0m12.754s
sys	0m0.057s

Andrew Gallant · Answer 5 · Mon Oct 21 2019 13:02:28 GMT+0800 (China Standard Time)

I should be able to implement substring matching

Out of curiosity, why not just use bstr's substring search, since it sounds like you are already depending on bstr?

Ryan Lopopolo · Answer 6 · Mon Oct 21 2019 16:50:12 GMT+0800 (China Standard Time)

bstr depends on twoway I think. I implemented String#scan with twoway and it was 4x slower (even with the sse4.2 vectorized impl) than using memchr to find the first byte and doing an equality check on the trailing bytes in the pattern.

String#scan requires collecting the position of all matches in the haystack.

With twoway:

▶ time ./target/release/artichoke -e 'LEN = 1024' -e 's = "abcdefghijklmnopqrstuvwxyz" * LEN' -e '100_000.times { raise if s.scan("abcdefghijklmnop").length != LEN }'

real	0m25.609s
user	0m21.374s
sys	0m0.148s

with memchr:

▶ time ./target/release/artichoke -e 'LEN = 1024' -e 's = "abcdefghijklmnopqrstuvwxyz" * LEN' -e '100_000.times { raise if s.scan("abcdefghijklmnop").length != LEN }'

real	0m6.545s
user	0m6.472s
sys	0m0.040s

Andrew Gallant · Answer 7 · Mon Oct 21 2019 17:35:17 GMT+0800 (China Standard Time)

bstr's twoway implementation uses memchr. Have you tried it?

Also, using a single benchmark to decide which substring search algorithm you use is probably not wise, especially one with fairly contrived input.

Ryan Lopopolo · Answer 8 · Tue Oct 22 2019 14:40:25 GMT+0800 (China Standard Time)

@BurntSushi I saw twoway and assumed you meant the twoway crate. It did not occur to me that bstr would have its own, separate twoway implementation.

artichoke/artichoke#316 swaps out my handrolled search for bstr's. The code change was surprisingly small and avoids allocating a vec of every match position.

Thanks for helping me through this. The code is better, faster, and has more consistent runtime guarantees. I use a great many of your crates and am grateful for your work. 🙏 🚀