LN-Zap / zap-desktop

Zap Wallet - Cross platform Lightning Network wallet focused on user experience and ease of use ⚡️

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Improve error handling for (slow) DNS queries

501st-alpha1 opened this issue · comments

Description

TL;DR: I was having some trouble getting Zap working, and while I have solved my specific issue, this could be fixed so that other users don't run into the same problem. (Apologies in advance if this isn't the right repo to open this issue; I'm not sure if the relevant config is handled by Zap or by some component in another repo.)

I downloaded the Zap AppImage to run on my Linux desktop, but I couldn't get it to start the initial sync; it was stuck on "Fetching latest data from the blockchain" and the progress bar never initialized. I even left it running overnight and that didn't help, so I started looking for logs.

In .config/Zap/lnd/bitcoin/mainnet/wallet-1/logs/bitcoin/mainnet/lnd.log, I saw some lines like this:

unable to lookup IP for mainnet3-btcd.zaphq.io: lookup mainnet3-btcd.zaphq.io on [gateway]:53: read udp [local]:42996->[gateway]:53: i/o timeout

I hadn't had any recent problems with DNS not resolving, so I tried a manual nslookup mainnet3-btcd.zaphq.io:

Server:         [gateway]
Address:        [gateway]#53

Non-authoritative answer:
Name:   mainnet3-btcd.zaphq.io
Address: 34.73.79.114

That looked fine to me, but I noticed it took a few seconds to return a result. Additionally, after I had done this I switched back to Zap and discovered the progress bar had initialized and it now showed the sync was in progress. Repeating the same nslookup command returned a result much faster, so I assume that was because the result was cached, and that probably prevented the timeout error above.

To give some more concrete numbers, the cached DNS lookup took about 5 seconds total:

$ time nslookup mainnet3-btcd.zaphq.io
Server:         [gateway]
Address:        [gateway]#53

Non-authoritative answer:
Name:   mainnet3-btcd.zaphq.io
Address: 34.73.79.114


real    0m5.041s
user    0m0.013s
sys     0m0.000s

while a "fresh" lookup took about 10 seconds:

$ time nslookup mainnet2-btcd.zaphq.io
Server:         [gateway]
Address:        [gateway]#53

Non-authoritative answer:
Name:   mainnet2-btcd.zaphq.io
Address: 35.237.118.241


real    0m10.060s
user    0m0.012s
sys     0m0.000s

Expected Behavior / Possible Fix

Ideally, I would expect it to wait a little longer for the DNS result, just in case the response is slow coming in. It would also be nice if it would eventually give up and report an error rather than looping retries indefinitely.

Your Environment

  • Zap version: 0.7.6-beta
  • Operating System and version: Debian GNU/Linux 11 (bullseye)

👋 Thanks for opening your first issue here! If you're reporting a 🐞 bug, please make sure you include steps to reproduce it.
To help make it easier for us to investigate your issue, please follow the contributing guidelines.

I had a similar problem after the initial sync when Zap was trying to bootstrap the Lightning graph.

I saw these messages in the logs:

[INF] DISC: Attempting to bootstrap with: Authenticated Channel Graph
[INF] DISC: Attempting to bootstrap with: BOLT-0010 DNS Seed: [[nodes.lightning.directory soa.nodes.lightning.directory] [lseed.bitcoinstats.com ]]
...
[ERR] LNWL: unable to query web api for fee response: Get "https://nodes.lightning.computer/fees/v1/btc-fee-estimates.json": dial tcp: i/o timeout
...
[ERR] SRVR: Unable to retrieve initial bootstrap peers: no addresses found
[DBG] SRVR: Waiting 1m0s before trying to locate bootstrap peers (attempt #374)

I tried doing a manual nslookup again, but after a couple rounds of that it didn't seem to help.

So instead I started a manual connection (to a Bitrefill Lightning node I found online) with:

lncli --lnddir ~/.config/Zap/lnd/bitcoin/mainnet/wallet-1 --rpcserver localhost:11009 connect 030c3f19d742ca294a55c00376b3b355c3c90d61c6b6b39554dbc7ac19b141c14f@52.50.244.44:9735

and it finally started populating Lightning network data.

So it appears this issue occurs more broadly, not just during the inital sync.