decred / vspd

A Voting Service Provider (VSP) for the Decred network.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

db/ticket: Consider optimising ticket import.

JoeGruffins opened this issue · comments

This section where the fee address is checked to already belong to another ticket does not scale well.

vspd/database/ticket.go

Lines 112 to 121 in e42b1ca

// Error if a ticket already exists with the same fee address.
err = ticketBkt.ForEach(func(k, v []byte) error {
tbkt := ticketBkt.Bucket(k)
if string(tbkt.Get(feeAddressK)) == ticket.FeeAddress {
return fmt.Errorf("ticket with fee address %s already exists", ticket.FeeAddress)
}
return nil
})

You can test it out easily with this diff and running go test

10000 entries in the db will give a speed of 25ms for me, up from 1ms for one record in the db. This is with a very fast cpu and ssd. I'm also not sure how many tickets a mature vspd is expected to have.

Or are tickets deleted somewhere? Probably fine if so, because shouldn't be all that many in a voting window...

There's already been a bunch of work recently to improve DB performance, most relevant to your suggestion would be #223 and #243 - lots of analysis and benchmarking in those. It's correct to say that this does not scale well, but the situation right now is far better than it was a few months ago.

Right now tickets are not deleted at all, they are kept in the database forever. If we look at historic data, the biggest legacy VSP is https://dcr.stakeminer.com which voted almost 300k tickets in around 5 years of operation.

Do you have any suggestions for improvements? I have three:

  • Don't do the check for dupliated fee addresses (a little risky, but not awful)
  • Migrate to a new DB, e.g. postgres (ew)
  • Add some policy for removing old tickets

Personally I think removing old ticket is my favourite of those, but that decision probably needs some input from VSP operators. They may wish to keep a complete log of their vspd history, eg. for customer service purposes.

You could store completed tickets in another db rather than delete, and only look through the active tickets bucket, but then it's not really doing the same thing if the purpose is to be dead certain you never ever reuse the address.

In my opinion, the check is not needed. If it is, and it looks like it is, tracking the fee's pub key's index properly, there should be no reason for a duplicate address to be in there correct? From what I have seen, I don't think the check is necessary.

Have you seen ever seen it error there?

Do you have any suggestions for improvements? I have three:
Don't do the check for duplicated fee addresses (a little risky, but not awful)

@jholdstock What is the risk and probability of having a duplicate fee address?

You could store completed tickets in another db rather than delete, and only look through the active tickets bucket, but then it's not really doing the same thing if the purpose is to be dead certain you never ever reuse the address.

@jholdstock This sounds like a solution unless not checking is better?

commented

Personally I think removing old ticket is my favourite of those, but that decision probably needs some input from VSP operators. They may wish to keep a complete log of their vspd history, eg. for customer service purposes.

I think keeping history forever allows to reliably rebuild all-time stats (tickets voted, revoked, etc.). Unless there is a smart way to derive all tickets managed by the VSP from some seed. I'm not a VSP operator but I love having these numbers, so at least I would archive it somewhere instead of removing.

Do you have any suggestions for improvements?

Shower thought if the (now removed) "fee addr exists" check is ever needed again. Load all fee addrs into a cache with fast lookup and update the cache when new fee is submitted. If one addr is 39 bytes the lookup index may be quite compact.