facebook / akd

An implementation of an auditable key directory

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Use cached user states in MySQL layer

eozturk1 opened this issue · comments

get_user_state currently is not using the cache and rather queries the requested data (according to the filters) from MySQL.

We should consider caching user states and implementing a filtering mechanism. For instance, for user states that are in the cache if they match the filter we can directly use them. But for the rest, we'd need to go to MySQL storage and not request the ones already in cache.

So the big problem here (and the reason we don't have caching atm in this call) is that most of the queries we do are

.storage
  .get_user_state(&uname, ValueStateRetrievalFlag::LeqEpoch(epoch))
  .await

in other words "Retrieve the user's value state where the epoch is <= this target epoch". However if a new state is entered in the DB, and we add a cache, how would we detect a cache miss? We'd have to pass through the DB anyways to know if there a more up-to-date entry. In the bulk lookup generation, we'll really want to give a vector of user id's which we can also do with a single query (generally). So something like

.storage
  .get_users_states(users, ValueStateRetrievalFlag::LeqEpoch(epoch))
  .await

where users is a vector of the AkdLabels to retrieve for.

Caching helps when we're doing specific get operations, but this type of filtered scan is near impossible

Just ideating... Could we use the same MySQL query but exclude the value states in the cache by using a similar temp table creation for batch_get?

So yeah I think I understand what you mean. We would indeed take the batch of user id's and utilize a temp-table over a select * join (in mysql that is). And we'd just need the single epoch argument. Something like

SELECT * FROM `users` WHERE `epoch` < :epoch AND `user` IN (SELECT * FROM `temp_table`)

Or something like that

I think this might be addressed with #269 now because it's handling additions and management of elements in the transaction log. We could definitely add more tests around this logic, but it should synchronize the queries and make sure the most up-to-date elements are picked. I still think we need to go to the DB however, and can't trust the cache to have enough information, but we can use the transaction log to fill out potentially future information, or not fully committed information.