keybase / kbfs

Keybase Filesystem (KBFS)

Home Page:https://keybase.io/docs/kbfs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

keybase chat commands get slower and slower

mortenkjarulff opened this issue · comments

Attached script do:

while true ; do

  sleep 15

  keybase chat send ...

  ls /keybase/team/...

  echo '{"method": "read", ... | keybase chat api ...

  if 6 hours has gone
    run_keybase
  fi

done

After a while, chat send and chat read gets slower and slower, going from less that 1 second to more than 60 seconds.
sleep and ls times seems constant.
When 6 hours has gone and the run_keybase is executed, times get low again (but build up over the next 6 hours).

If I outcomment the ls command, all times are constant low.

These logs are from the same run, WITH the ls command:
a4f72ae29df5ba7e9a4c2a1c
0e3959537531bc60a74ac21c

These logs are from the same run, WITHOUT the ls command:
41c5342a84ccee599571831c
b7cc0500989c6a67920c0a1c

/Morten

@mortenkjarulff did you mean to file this in keybase/client? Chat is pretty much completely separate from kbfs at this point.

Hi,

Appears not to be so separate 😀

Shall I file it there instead?

/Morten

Oh, I guess I glossed over or didn't understand the line about "outcomment the ls".

Ok, here's what appears to be the issue. The team folder you're lsing has been written to by a device that has since been revoked. When you ls it, that triggers kbfs to attempt to "garbage-collect" the old data in it, by iterating through the old revisions of the folder. When it looks at the revisions that were written by the now-revoked device, it triggers a bunch of extra checking to verify that the device was valid at the time of the revision.

It looks like, in the case of this folder, the revisions by that revoked device are so expensive that they are slowing down chat.

Can you just ls that folder once, and maybe let the garbage collection run overnight and see if it finishes? You can tell if it made progress overnight by looking at the LastGCRevision field in the team folder's .kbfs_status file.

If it doesn't make any progress, even after many hours of work, I'll try to figure out a custom solution for you.

And obviously we need to figure out a way to reduce these extra checks without affecting security...

Not sure I understand. I can reproduce at any time, even long time after I revoked.

I will manually ls now, and start the script tomorrow.

We only garbage-collect data when it is at least two weeks old, but it could be longer depending on when your devices were online and when they accessed this particular folder. In your case the device was revoked on October 3rd, but it's trying to garbage-collect data written on September 29th.

Ok, will try

Log after manual ls
a0cb1d208188c2886665501c

LastGCRevision": 0,

Is that good?

That means it has never been successfully garbage collected. If the team/folder is very old, this is surprising and might indicate some other bug.

Timestamp on folder is October 8, that could well be the date I created the subteam

Looks like it was created on September 20, so that lines up with the garbage collection and revocation messages I mentioned above.

LastGCRevision": 0,
d46b49510feabaac7be4161c

If it can help

Please just let it run for a long time uninterrupted.

Still 0

Hrm that's a shame. I'm going to work today on making this whole thing better.

I put up #1893 to help eliminate much of the redundant work we're doing for revoked devices. It's under review now, but if you're feeling bold and want to try it out before it's merged officially, let me know.

When will it be official maybe?

If it can help, how can I test it?

Would it help me if I created a new subteam, copy over and rename?

/Morten

It probably won't be out in the official release for a while, but I've made a .deb file for you, if you feel like trying it out:

/keybase/public/strib/keybase-2.10.0-20181102162247.e4a1860091-amd64.deb

or if you trust keybase.pub:

https://keybase.pub/strib/keybase-2.10.0-20181102162247.e4a1860091-amd64.deb

Note this is not a well-tested release, so there could be some rough edges. Also, there might be multiple issues going on with your folder, and I only fixed the most obvious first one, so this debugging might take more steps. We'll see!

If you feel like trying it, please install the above deb, run run_keybase, ls your folder again, and wait a bit. If you do a log send after a few minutes I can check to see if it seems to be making progress. Thanks!

Installed your test version.
Still 0.
921efd3e424dcbad89bf081c

/Morten

Thanks. I was wrong to say that it starts the GC process after 1 minute -- we changed it a while back to one hour. It looks like you sent the log less than an hour after you did the ls. Can you please send another log now if it's been up this whole time?

"LastGCRevision": 9,
d96ca4acddd6f20834cd6d1c

It has been running for some hours.

/Morten

Ok cool. It at least stopped chewing up your CPU right? I'll look into why it hasn't gone past 9 yet tomorrow, but that seems like a separate issue.

Have been running my test script all day and send/read times are constant low.

I don't think the issue was CPU, but can be wrong.

Should I reinstall the official version again, or?

/ Morten

Great!

I wouldn't recommend re-installing the master version, as the problem could come back. Just wait for the next apt release, which will include these changes. Though if you seem to be having unexpected problems with the release you have. let me know.