Data model for lightweight mode on MvStorage
pragmaxim opened this issue · comments
Schema for embedded database
Finding out if any box related data have been spent or not in query time puts huge pressure on DB
=> let's do that at indexing time so that queries are real-time !
Shared
headerIdsByHeight: Map[Height, Set[HeaderId]] // more than one in case of a fork-in-progress
blockByHeaderId: Map[HeaderId, Block] // arbitrary block data (depends on performance)
Unspent/NonEmpty
utxosByAddress: Map[Address, Map[BoxId, Value]] // this would be a clone of Node's utxo state
addressByUtxo: Map[BoxId, Address] // non-empty address by utxo
Spent
allBoxesByCustomAddress: Map[Address, Set[BoxId]] // all boxes for a configured address by a dApp developer
There can be many of these indexes, please provide your suggestions and use-cases!
Facts to consider :
- some bare minimum of data like UtxoState is going to be indexed for all data
- arbitrary data, especially spent inputs (tens of millions of records) could be configurable for specific
addresses
so that dApps developers can customize the explorer for their needs
Eventually the Http API will allow for retrieving anything that can be put together from these persistent Maps.
Shared
headerById: Map[HeaderId, Value] // Probably best to only keep n headers?
// Most of dApps will only need the last 10
// headers to be used as reduction context.
headerIdsByHeight: Map[Height, Set[HeaderId]]
Unspent
boxById: Map[BoxId, Value]
boxIdsByContract: Map[ContractHex, Set[BoxId]]
boxIdsByContractTemplate: Map[TemplateHex, Set[BoxId]] // Constant segregated contract template
boxIdsByCreationHeight: Map[Height, Set[BoxId]
boxIdsByR4: Map[RegisterHex, Set[BoxId]] // Non-empty R4 register
boxIdsByR5: Map[RegisterHex, Set[BoxId]] // Non-empty R5 register
boxIdsByR6: Map[RegisterHex, Set[BoxId]] // Non-empty R6 register
boxIdsByR7: Map[RegisterHex, Set[BoxId]] // Non-empty R7 register
boxIdsByR8: Map[RegisterHex, Set[BoxId]] // Non-empty R8 register
boxIdsByR9: Map[RegisterHex, Set[BoxId]] // Non-empty R9 register
boxIdsByTokenId: Map[TokenId, Set[BoxId]]
boxIdsByTransactionId: Map[TransactionId, Set[BoxId]]
Spent
Spent boxes needs all Unspent maps plus the following:
mintingBoxIdsByTokenId: Map[TokenId, Set[BoxId]] // EIP-4 only considers one minting box
// per token, but protocol allows multiple
// boxes in the same minting transaction,
// so best to follow the protocol.
Using hashes as contract and registers indexing keys
From storing efficiency point of view, it's better to use hashes instead of the content directly as indexing keys, BLAKE2b256 have 32 bytes against contracts and registers that can be as big as the maximum box size (4 KB) minus the required registers' size.
Average contract size: 121 bytes;Average register size: 33 bytes, but it tends to grow with complex dApps like Paideia which stores entire boxes on its registers.
BLAKE hashing algorithm is know by its speed and security, and is extensively used on Ergo, however indexing times must be taken into consideration.
Updated - v1
- Replaced Hash by the
base16
content ofContracts
,Templates
andRegisters
; - boxIdsByContractTemplate: Only index constant segregated contracts; and
- Added
creationHeight
map.
Copy/pasting some rest-endpoints from @arobsn
get /blocks/{blockId} // block metadata and statistics
get /boxes/{state}/tokens/{tokenId}/
get /boxes/{state}/{boxId}/
get /boxes/{state}/addresses/{address}/
get /boxes/{state}/addresses/{address}/tokens/{tokenId}/
get /boxes/{state}/contracts/{contractHex}/
get /boxes/{state}/contracts/{contractHex}/tokens/{tokenId}/
get /boxes/{state}/contracts/hashes/{contractHashHex}/
get /boxes/{state}/contracts/hashes/{contractHashHex}/tokens/{tokenId}/
get /boxes/{state}/contracts/templates/{contractTemplateHex}/
get /boxes/{state}/contracts/templates/{contractTemplateHex}/tokens/{tokenId}/
get /boxes/{state}/contracts/templates/hashes/{contractTemplateHashHex}/
get /boxes/{state}/contracts/templates/hashes/{contractTemplateHashHex}/?R4=deadbeef&R5=cafe
get /boxes/{state}/contracts/templates/hashes/{contractTemplateHashHex}/tokens/{tokenId}/
post /boxes/query/
// state = spent | unspent
get /tokens/{tokenId}/
get /tokens/{tokenId}/minting-box/
I keep uexplorer on Scala3, I'm currently spiking this on multiple tech stacks, started with Slick
as I had experience with it, then Doobie
, ended up with : zio-protoquill, zio-http, zio-json
which are all production ready or very close to production ready ... Eventually zio-protoquill
could be replaced with zio-sql
which is currently in development.
There are 2 choices in the scala ecosystem when it comes to SQL : Typelevel
stack and Zio
stack ... My bet is on Zio
as the Typelevel
stack is not really united well. One needs to have at least 10 various dependencies to put a simple CRUD app together, whereas in Zio land, you are good to go with just : zio-protoquill, zio-http, zio-json
. This pays off especially when using Scala3 as it is basically first-class citizen in Zio 2.0.