NLnetLabs / domain

A DNS library for Rust.

Home Page:https://nlnetlabs.nl/projects/domain/about/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Panic: "attempt to calculate the remainder with a divisor of zero"

jeromegn opened this issue · comments

I got the following panic:

thread 'main' panicked at 'attempt to calculate the remainder with a divisor of zero', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/domain-0.7.0/src/resolv/stub/mod.rs:709:15

Looks like this line is problematic:

pub fn info<'a>(&self, list: &'a ServerList) -> &'a ServerInfo {
&list[self.cur % list.servers.len()]
}

I haven't had time to dig into it, but I can take a look next week.

It seems you are iterating over an empty server list. Did you perchance forget to call ResolvConf::finalize before using the config?

I did! Well, I used ResolvConf::default() which finalizes. Then I added my servers and changed a few options. That's a common pattern in Rust.

Perhaps it would be better to use a "builder" pattern since finalization is required?

What's odd is I explicitly put servers in there. Could this condition happen under different circumstances?

            let mut conf = ResolvConf::default();
            conf.servers = vec![ServerConf::new(config.fwd_addr, TransportKind::Tcp)];
            conf.options.timeout = Duration::from_secs(5);

            StubResolver::from_conf(conf)

Edit: I now see the docs says this is like a builder pattern. I think it's less error-prone to make a ResolvConfBuilder that can only return a ResolvConf on finalize (or build).

I agree this should be changed into having a separate builder type.

But in any case, looking at the code properly now, it should be able to deal with empty server lists. Could you perchance get a stack trace to see what it’s been doing before it crashed? (Just the domain-related parts of the stack trace are enough.)

Yes. Initially I thought I wasn't running with the proper RUST_BACKTRACE env vars and didn't bother to look for it, but we actually are!

thread 'main' panicked at 'attempt to calculate the remainder with a divisor of zero', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/domain-0.7.0/src/re
stack backtrace:
   0: rust_begin_unwind
             at ./rustc/432abd86f231c908f6df3cdd779e83f35084be90/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at ./rustc/432abd86f231c908f6df3cdd779e83f35084be90/library/core/src/panicking.rs:142:14
   2: core::panicking::panic
             at ./rustc/432abd86f231c908f6df3cdd779e83f35084be90/library/core/src/panicking.rs:48:5
   3: <domain::resolv::stub::ServerListCounter>::info
             at ./usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/domain-0.7.0/src/resolv/stub/mod.rs:709:15
   4: <domain::resolv::stub::Query>::current_server
             at ./usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/domain-0.7.0/src/resolv/stub/mod.rs:349:9
   5: <domain::resolv::stub::Query>::run_query::{closure#0}
             at ./usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/domain-0.7.0/src/resolv/stub/mod.rs:338:22
   6: <core::future::from_generator::GenFuture<<domain::resolv::stub::Query>::run_query::{closure#0}> as core::future::future::Future>::poll
             at ./rustc/432abd86f231c908f6df3cdd779e83f35084be90/library/core/src/future/mod.rs:91:19
   7: <domain::resolv::stub::Query>::run::{closure#0}
             at ./usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/domain-0.7.0/src/resolv/stub/mod.rs:287:47
   8: <core::future::from_generator::GenFuture<<domain::resolv::stub::Query>::run::{closure#0}> as core::future::future::Future>::poll
             at ./rustc/432abd86f231c908f6df3cdd779e83f35084be90/library/core/src/future/mod.rs:91:19
   9: <domain::resolv::stub::StubResolver>::query::<domain::base::name::parsed::ParsedDname<&bytes::bytes::Bytes>, domain::base::question::Question<domain::base::name::parsed::ParsedDname<&bytes::bytes::Bytes>>>::{closure#0}
             at ./usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/domain-0.7.0/src/resolv/stub/mod.rs:111:13

This appears to happen repeatedly during the same execution. I'm assuming it can get in a bad state and stay that way.

Using a stub resolver in the way you build it above works fine here – indeed in this case you don’t need to explicitly finalize t. It also looks in the code like it should handle an empty UDP list correctly by using the TCP list.

Can you try and create a minimalist breaking example from your code? I don’t think I’ll get anywhere from just staring at the code longer ;)

Ah! Maybe this is related them: I have 2 stub resolvers, one for UDP and one for TCP. I'm forwarding queries and I want to keep the same protocol as the original query.

        let tcp_recursor = {
            let mut conf = ResolvConf::default();
            conf.servers = vec![ServerConf::new(config.fwd_addr, TransportKind::Tcp)];
            conf.options.timeout = Duration::from_secs(5);

            StubResolver::from_conf(conf)
        };

        let udp_recursor = {
            let mut conf = ResolvConf::default();
            conf.servers = vec![ServerConf::new(config.fwd_addr, TransportKind::Udp)];
            conf.options.timeout = Duration::from_secs(5);

            StubResolver::from_conf(conf)
        };

Later on, I use them like so:

                let res = match w.kind() {
                    TransportKind::Udp => self.udp_recursor.query(q).await,
                    TransportKind::Tcp => self.tcp_recursor.query(q).await,
                };

(where w is the "transport" used for the original query)

Maybe there's a better way to achieve this? My current reasoning is: if the query is UDP, then that's all I can respond with. Often that'll end up being a truncate response and the client will make a second query over TCP, in which case I'll use the TCP resolver.

Can you try and create a minimalist breaking example from your code? I don’t think I’ll get anywhere from just staring at the code longer ;)

I don't quite know how this happens :( The server we forward to is a local unbound on the host. I've only seen happen on 4 hosts out of 50+ running this software for over a week now. I'm not logging nearly enough information to know when it's first triggered.

We have a plan to factor our the actual networking from the stub resolver to make it easier to build proxies and such. For the time being, you approach looks good and should work.

I guess I’ll stare a bit more at the code to try and figure out why it tries using the empty list.

Thank you!

Anything I can do in the meantime? I deployed our server to more hosts and this seems to happen quite a bit. Restarting the server seems to be the only fix.

I do not mind refactoring our own code to circumvent this issue. I could manually reproduce what the stub resolver does without the server logic. We have a single server, so we don't need that whole thing.

The actual querying code is rather simple. All of it is in src/resolv/stub/mod.rs.

The question is turned into a query in Query::create_message and ServerInfo::prepare_message (the latter adds an OPT record if the server supports EDNS – Unbound does, so you want to do that), and the ServerInfo::query wraps the future for the actual query (ServerInto::query_tcp or ServerInfo::query_udp) into a timeout and fires it off. The query futures basically send the request and then wait for a response that matches the request. That’s really it.

Thanks! For now I've forked this repo and make Query::run_query public. Using that instead of run shouldn't use the next_server function.

Cheeky ;)

For what it’s worth: Factoring out the transports and making them available directly is probably the next thing we are going to do once I have finished #160.