Data model for lightweight mode on MvStorage

Question

Data model for lightweight mode on MvStorage

pragmaxim opened this issue a year ago · comments

Schema for embedded database

Finding out if any box related data have been spent or not in query time puts huge pressure on DB
=> let's do that at indexing time so that queries are real-time !

Shared

headerIdsByHeight:  Map[Height, Set[HeaderId]] // more than one in case of a fork-in-progress
blockByHeaderId:    Map[HeaderId, Block] // arbitrary block data (depends on performance)

Unspent/NonEmpty

utxosByAddress:     Map[Address, Map[BoxId, Value]] // this would be a clone of Node's utxo state
addressByUtxo:      Map[BoxId, Address] // non-empty address by utxo

Spent

allBoxesByCustomAddress:     Map[Address, Set[BoxId]] // all boxes for a configured address by a dApp developer

There can be many of these indexes, please provide your suggestions and use-cases!

Facts to consider :

some bare minimum of data like UtxoState is going to be indexed for all data
arbitrary data, especially spent inputs (tens of millions of records) could be configurable for specific addresses
so that dApps developers can customize the explorer for their needs

Eventually the Http API will allow for retrieving anything that can be put together from these persistent Maps.

Alison Robson · Answer 1 · Fri Jun 16 2023 23:21:07 GMT+0800 (China Standard Time)

Shared

headerById:                       Map[HeaderId, Value] // Probably best to only keep n headers? 
                                                       // Most of dApps will only need the last 10 
                                                       // headers to be used as reduction context.
headerIdsByHeight:                Map[Height, Set[HeaderId]]

Unspent

boxById:                          Map[BoxId, Value]
boxIdsByContract:                 Map[ContractHex, Set[BoxId]]
boxIdsByContractTemplate:         Map[TemplateHex, Set[BoxId]] // Constant segregated contract template
boxIdsByCreationHeight:           Map[Height, Set[BoxId]
boxIdsByR4:                       Map[RegisterHex, Set[BoxId]] // Non-empty R4 register
boxIdsByR5:                       Map[RegisterHex, Set[BoxId]] // Non-empty R5 register
boxIdsByR6:                       Map[RegisterHex, Set[BoxId]] // Non-empty R6 register
boxIdsByR7:                       Map[RegisterHex, Set[BoxId]] // Non-empty R7 register
boxIdsByR8:                       Map[RegisterHex, Set[BoxId]] // Non-empty R8 register
boxIdsByR9:                       Map[RegisterHex, Set[BoxId]] // Non-empty R9 register
boxIdsByTokenId:                  Map[TokenId, Set[BoxId]]
boxIdsByTransactionId:            Map[TransactionId, Set[BoxId]]

Spent

Spent boxes needs all Unspent maps plus the following:

mintingBoxIdsByTokenId:           Map[TokenId, Set[BoxId]] // EIP-4 only considers one minting box 
                                                           // per token, but protocol allows multiple 
                                                           // boxes in the same minting transaction, 
                                                           // so best to follow the protocol.

Using hashes as contract and registers indexing keys

From storing efficiency point of view, it's better to use hashes instead of the content directly as indexing keys, BLAKE2b256 have 32 bytes against contracts and registers that can be as big as the maximum box size (4 KB) minus the required registers' size.

~~Average contract size: 121 bytes;~~
~~Average register size: 33 bytes, but it tends to grow with complex dApps like Paideia which stores entire boxes on its registers.~~

~~BLAKE hashing algorithm is know by its speed and security, and is extensively used on Ergo, however indexing times must be taken into consideration.~~

Updated - `v1`

Replaced Hash by the base16 content of Contracts, Templates and Registers;
boxIdsByContractTemplate: Only index constant segregated contracts; and
Added creationHeight map.

pragmaxim · Answer 2 · Mon Jul 03 2023 14:31:08 GMT+0800 (China Standard Time)

Copy/pasting some rest-endpoints from @arobsn

get  /blocks/{blockId}  // block metadata and statistics
get  /boxes/{state}/tokens/{tokenId}/
get  /boxes/{state}/{boxId}/
get  /boxes/{state}/addresses/{address}/
get  /boxes/{state}/addresses/{address}/tokens/{tokenId}/
get  /boxes/{state}/contracts/{contractHex}/
get  /boxes/{state}/contracts/{contractHex}/tokens/{tokenId}/
get  /boxes/{state}/contracts/hashes/{contractHashHex}/
get  /boxes/{state}/contracts/hashes/{contractHashHex}/tokens/{tokenId}/
get  /boxes/{state}/contracts/templates/{contractTemplateHex}/
get  /boxes/{state}/contracts/templates/{contractTemplateHex}/tokens/{tokenId}/
get  /boxes/{state}/contracts/templates/hashes/{contractTemplateHashHex}/
get  /boxes/{state}/contracts/templates/hashes/{contractTemplateHashHex}/?R4=deadbeef&R5=cafe
get  /boxes/{state}/contracts/templates/hashes/{contractTemplateHashHex}/tokens/{tokenId}/
post /boxes/query/

// state = spent | unspent

get /tokens/{tokenId}/
get /tokens/{tokenId}/minting-box/

pragmaxim · Answer 3 · Mon Jul 03 2023 14:55:31 GMT+0800 (China Standard Time)

I keep uexplorer on Scala3, I'm currently spiking this on multiple tech stacks, started with Slick as I had experience with it, then Doobie, ended up with : zio-protoquill, zio-http, zio-json which are all production ready or very close to production ready ... Eventually zio-protoquill could be replaced with zio-sql which is currently in development.

There are 2 choices in the scala ecosystem when it comes to SQL : Typelevel stack and Zio stack ... My bet is on Zio as the Typelevel stack is not really united well. One needs to have at least 10 various dependencies to put a simple CRUD app together, whereas in Zio land, you are good to go with just : zio-protoquill, zio-http, zio-json. This pays off especially when using Scala3 as it is basically first-class citizen in Zio 2.0.

Data model for lightweight mode on MvStorage

Schema for embedded database

Shared

Unspent/NonEmpty

Spent

Shared

Unspent

Spent

Using hashes as contract and registers indexing keys

Updated - v1

Updated - `v1`