spacemeshos / go-spacemesh

Go Implementation of the Spacemesh protocol full node. 💾⏰💪

Home Page:https://spacemesh.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Possible network latency issues: nipostBuilder Post service not connected - Post service not registered

etmjansen opened this issue · comments

Description

I have set up a SpaceMesh node on my Mac book and provided enough disk space and accepted the given Proof Generation. After Quick syncing, the network was up and running and in sync. I am now 4 hours further and it does seem it is smeshing but it is just not posted to the network. As the log shows multiple times this entry after each smeshing.

2024-03-03T11:22:56.047+0100 WARN 87edf.nipostBuilder post service not connected - waiting for reconnection {"node_id": "87edfcc91e8f1af90a64c727f39d63d891d14a8a1e4e198b6b47b4d8ee9ea665", "module": "nipostBuilder", "service id": "87edfcc91e8f1af90a64c727f39d63d891d14a8a1e4e198b6b47b4d8ee9ea665", "error": "post service not registered"}

Steps to reproduce

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'

Actual Behavior

No SMH has been tallied in the mac osx application .
Reward address is not reported by the SpaceMesh explorer. It tells me it is non-existent.
In the reward section of the SpaceMesh explorer no rewards show up when filtering on the wallet address, nor the wallet id, nor the smesher id.
Wallet is also reported by the SpaceMesh explorer as non-existent in the account section.

Expected Behavior

SMH gets tallied in the mac osx application as smashing is recorded on the network.
Reward address is reported by SpaceMesh explorer as existent and active.
In the reward section of the SpaceMesh explorer rewards show up when filtering on the wallet address, or the wallet id, or the smesher id.
Wallet address is reported by SpaceMesh explorer as existent and active and receiving SMH.
SpaceMEsh node application reports Rewards being tallied.
Node logs show no errors that the post service is not connected. It shows that the post service is used to post smashing to the network and that it is successful.

Environment

Please complete the following information:

  • OS:

Mac OSX 12.7.3

  • Node Version:

v1.3.11+4f5cce199eb1fdde459307d9ff85ad54bf206ba3

  • Smapp version (if applicable):

1.3.12

Additional Resources

node-config.7c8cef2b.json
node-config.json
postdata_metadata.json
smeshing_metadata.json
spacemesh-log-7c8cef2b.txt

It looks like I am treading water here, nothing gets onto the network.

I am restarting smeshing now to see what happens. I am running an Intego Net barrier firewall that tells me it is allowing connections to and from go-spacemesh. I have not had any challenge from it to approve any connection. Yes I have ran it without the firewall on with the same result.

At startup the log reports:

2024-03-03T08:09:01.525+0100 WARN 87edf.timesync failed to read response from peer {"node_id": "87edfcc91e8f1af90a64c727f39d63d891d14a8a1e4e198b6b47b4d8ee9ea665", "module": "timesync", "pid": "12D3KooWR2tr5Ki3m5GrVrvSvmSNqpsUWYJfEzwBdVvMneVeYxaQ", "errmsg": "read compact header: i/o deadline reached", "name": "timesync"}

Right before it gives the first instance of the above warn log entry after starting up.

I am guessing somebody knows what it means.

Another (related) possible reason might be:

WARN 87edf.timesync failed to create new stream {"node_id": "87edfcc91e8f1af90a64c727f39d63d891d14a8a1e4e198b6b47b4d8ee9ea665", "module": "timesync", "pid": "12D3KooWEhSSzfpjYr88o4XTPkCSKVeCUC9uKqTgD98LXVYY8fL4", "errmsg": "failed to negotiate protocol: protocols not supported: [/peersync/1.0/]", "name": "timesync"}

WARN 87edf.fetcher failed to write response {"node_id": "87edfcc91e8f1af90a64c727f39d63d891d14a8a1e4e198b6b47b4d8ee9ea665", "module": "fetcher", "protocol": "ax/1", "remotePeer": "12D3KooWT36B6eDrUp2W6soDjXAxLL7Pbv77uby2x6EAxSoBnULc", "remoteMultiaddr": "/ip4/39.149.43.220/tcp/65520", "resp.Data len": 42805476, "resp.Error len": 0, "errmsg": "5m0.004007422s elapsed, 2 bytes read, 32092160 bytes written, timeout 25s, hard timeout 5m0s: i/o deadline reached", "name": "fetcher"}

Is this a network latency issue? I am running a modern Internet connection. Speed test says:

Screenshot 2024-03-03 at 14 01 06

Last login: Sun Mar 3 13:53:33 on ttys002
➜ ~ ping -c 5 39.149.43.220:65520
ping: cannot resolve 39.149.43.220:65520: Unknown host
➜ ~ ping -c 5 39.149.43.220
PING 39.149.43.220 (39.149.43.220): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
Request timeout for icmp_seq 3

A traceroute does indeed take 'forever':

➜ ~ traceroute 39.149.43.220
traceroute to 39.149.43.220 (39.149.43.220), 64 hops max, 52 byte packets
1 192.168.178.1 (192.168.178.1) 30.001 ms 2.025 ms 1.896 ms
2 * * *
3 zl-rc0001-cr102-et109-251.core.as33915.net (213.51.202.189) 12.958 ms 10.639 ms 11.345 ms
4 asd-tr0021-cr101-be151-2.core.as33915.net (213.51.5.8) 14.822 ms 14.390 ms 15.728 ms
5 * * *
6 d192002.upc-d.chello.nl (213.46.192.2) 14.503 ms 15.354 ms 11.698 ms
7 adm-bb1-link.ip.twelve99.net (62.115.120.226) 14.298 ms *
adm-bb2-link.ip.twelve99.net (62.115.120.228) 16.410 ms
8 ffm-bb1-link.ip.twelve99.net (62.115.120.241) 20.321 ms 22.036 ms
ffm-bb2-link.ip.twelve99.net (62.115.137.223) 22.036 ms
9 ffm-b5-link.ip.twelve99.net (62.115.136.219) 21.777 ms 20.621 ms 20.733 ms
10 chinamobile-ic-355891.ip.twelve99-cust.net (62.115.47.45) 23.347 ms 24.692 ms 20.008 ms
11 223.120.10.241 (223.120.10.241) 30.949 ms 32.351 ms 38.270 ms
12 * 223.120.14.154 (223.120.14.154) 246.478 ms
223.120.16.186 (223.120.16.186) 262.274 ms
13 * * 221.183.89.182 (221.183.89.182) 242.536 ms
14 * * *
15 221.183.89.46 (221.183.89.46) 254.270 ms
221.183.89.10 (221.183.89.10) 249.929 ms
221.183.89.14 (221.183.89.14) 253.684 ms
16 221.183.40.94 (221.183.40.94) 210.704 ms
221.183.171.129 (221.183.171.129) 266.543 ms
221.183.40.174 (221.183.40.174) 222.352 ms
17 * 221.183.48.226 (221.183.48.226) 205.626 ms *
18 111.5.76.102 (111.5.76.102) 280.363 ms
111.5.76.98 (111.5.76.98) 231.469 ms 220.167 ms
19 * * *
20 * * *
....

I truncated it, and after 40 hops it was still posting asterixes. About 3 to 4 minutes later.

That does not sound like latency, that sounds like something is down.

Reported the issue with SpaceMesh support under issue id 40e5ab51fca842e3b9cc95bfa05d099b

Not sure if this is checking the post service, but after some searching on the website I found this on Github:

curl http://localhost:50051/status
curl: (7) Failed to connect to localhost port 50051 after 0 ms: Couldn't connect to server

url http://39.149.43.220:65020/status

curl: (7) Failed to connect to 39.149.43.220 port 65020 after 244 ms: Couldn't connect to server

or for a current node that the server log says is being tried:

curl http://120.40.195.204:50247/status

No reply so far.
It will probably time out.
It ended the same:

curl: (28) Failed to connect to 120.40.195.204 port 50247 after 75002 ms: Couldn't connect to server

Well I guess I might have to wait till I get another newer mac book....

Screenshot 2024-03-03 at 16 43 50

I am throwing in the towel. Now each time my Time Machine starts a backup my laptop becomes totally unresponsive. The SpaceMesh node is not for the people who can not afford modern laptops. 😉 Is this another crypto project that falls victim to its own success? Well too bad. I would have loved to contribute but if I want that now I either need to fork money for a newer laptop or get a monthly fee based vpn. We will see if I get back in the future. For now there is nothing left than to remove the node from my laptop.

Hey @etmjansen

there are few "issues" here.

First of all the "high latency" one. You took some high latency peer that you have been connected to and... well yes have high latency. Nothing to worry about here. Normal situation with p2p. By it's high port I think it was UPnP based so... all ok.

Then local post service API
It's not yet in the release nor default to enable it so localhost:50051 is not used by anything. And remote IP peer obviously don't expose it too. Expected.

Why your post service is not connected that's likely other issue.

So actually not starting post-service is expected as you're still initializing, besides that you're fully synced so I'd say everything is ok?

So basically it's all ok.

To get any SMH you need to wait at least one epoch (2 weeks) https://spacemesh.io/blog/requirements-for-spacemesh-rewards/

Now each time my Time Machine starts a backup my laptop becomes totally unresponsive.

You should not back up your PoST data directory. For performance reasons I would even recommend to move the directory to a different disk than your OS. Then backing up and using your Macbook while the node is running should still be possible.

a newer laptop or get a monthly fee based vpn

Initialization is computationally difficult by design but only has to be done once. Your Macbook isn't powerful enough to do it in a reasonable time. I would recommend to rent a vps for a few days, initialize there and then download the data to your Macbook. After that the node can operate on less powerful Macbooks as well: 2019 Macbook Pro or newer work - I never tested on older MacBooks, but might still work depending on your PoST size. We have benchmark tools where you can test if your Macbook is able to handle PoST proofing: https://github.com/spacemeshos/post-rs/blob/main/docs/profiler.md

The warning from your original post:

2024-03-03T11:22:56.047+0100 WARN 87edf.nipostBuilder post service not connected - waiting for reconnection {"node_id": "87edfcc91e8f1af90a64c727f39d63d891d14a8a1e4e198b6b47b4d8ee9ea665", "module": "nipostBuilder", "service id": "87edfcc91e8f1af90a64c727f39d63d891d14a8a1e4e198b6b47b4d8ee9ea665", "error": "post service not registered"}

Can safely be ignored in your case - your node hasn't finished initialization, which means it also hasn't started the internal post service yet. I can only extrapolate from the logs but on your machine the node will take at least 7 days to init, so I would recommend either upgrading to newer hardware or initialize your PoST in the cloud.

No SMH has been tallied in the mac osx application.

That seems to be a misunderstanding of how our protocol works. To be eligible for rewards you need to finish initializing PoST, then participate in PoET and then create another PoST. If you start fresh that means at least 2 weeks (and up to 4 depending on your timing) that your node needs to be online before you get your first rewards.

Since I don't see a real issue here I'm closing the ticket. @etmjansen if you need help setting up and running a node feel free to join our community on discord: https://discord.com/invite/yVhQ7rC (you can also find the link on https://spacemesh.io), where other users and members of the spacemesh team can help you do so 🙂