Initialize Database from memory

Question

Initialize Database from memory

Mandragorian opened this issue 7 years ago · comments

Kostis Andrikopoulos commented 7 years ago

Since having a file for each database element can be untractable for many applications, I believe having a way to initialize the databse from memory, for example an std::vector, would be great.

From a quick look at the code I think that this should not be that difficult, except that the DBHandler intrface uses ifstreams which are file specific.

Maybe a new interface like the following could be more generic:

virtual bool openStream(uint64_t streamNb, uint64_t requested_offset)=0;
virtual uint64_t readStream(uint64_t streamNb, char * buf, uint64_t size)=0;
virtual void closeStream(unit64_t streamNb)=0;

Database handlers that deal with files would open an ifstream when openStream is called and store them in a private mapping with keys the streamNbs. Calls to readStream would retrieve the opened ifstream from the mapping and otherwise work as they already do. closeStream would be updated similarly.

The new database handler that deals with vectors would keep an offset for each vector that would represent how many bytes have already been read from that "stream". Using that information it would simulate reading and writting from a file.

Carlos Aguilar Melchor · Answer 1 · Wed Dec 20 2017 16:39:23 GMT+0800 (China Standard Time)

Hi @Mandragorian,
thx for your proposal. Indeed this would be a neat improvement. Would you want to do a PR with it ?

Have you had a look at DBDirectoryProcessor with file splitting? It takes one single file as the database and splits it into equally sized chunks that will be the elements of the db.

Carlos

Kostis Andrikopoulos · Answer 2 · Thu Dec 21 2017 22:21:20 GMT+0800 (China Standard Time)

I think i can do a PR for the changes.

I have seen the Directory Processor. My personal reason behind this issue is that i want to keep a database in my application, and then use XPIR to send a transformed database, not the library i have in memory itself.

It would be some trouble if for every time i had to give a reply I also had to write the transformed database in a file, store it and then delete it.

Being able to simply use a vector in order to temporarily store the trasnformed databse, load it in the response generator and then destroy the vector is to my opinion much simpler.

Carlos Aguilar Melchor · Answer 3 · Thu Dec 21 2017 23:15:44 GMT+0800 (China Standard Time)

Wonderful! It's always great to find people ready to contribute :)

Note however that importing new elements is done at 1-4 Gbits/s whereas reply generation is done at 15-25 Gbits/s.

Thus importing a new db for each query is not the best case scenario.

Kostis Andrikopoulos · Answer 4 · Sun Dec 24 2017 06:28:24 GMT+0800 (China Standard Time)

Hi again,

I am trying to understand what the readAggregatedStream method is supposed to do. Could you provide a bried explanation? Does it read all database elements in one buffer?

Carlos Aguilar Melchor · Answer 5 · Sun Dec 24 2017 21:13:47 GMT+0800 (China Standard Time)

It is about the agreggation parameter alpha of xpir. To optimize PIR, alpha database elements can ba aggregated into a single larger element. ReadAggregatedStream reads from the alpha initial streams and produces a single output stream !

Carlos Aguilar Melchor · Answer 6 · Tue Jan 16 2018 03:59:09 GMT+0800 (China Standard Time)

Closed by PR 40