google-deepmind / reverb

Reverb is an efficient and easy-to-use data storage and transport system designed for machine learning research

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TFclient Error Message about data loss

DawoonJang opened this issue · comments

Hello

Recently, I tried reverb.TFClinet as below

'''
table = reverb.Table(name='experience', sampler=reverb.selectors.Prioritized(0.8), remover=reverb.selectors.Fifo(), max_size=2048, rate_limiter=reverb.rate_limiters.MinSize(1))
reverb_server = reverb.Server(tables=[table], port=8003)
reverb_client = reverb.TFClient(f"localhost:{self.reverb_server.port}", shared_name="tfc", name="tclient")
'''

However, I got the below messages when used reverb_client.insert() method

'''
[reverb/cc/writer.cc:391] Error when stopping the confirmation worker: DATA_LOSS: Item confirmation worker were stopped when 1 unconfirmed items (sent to server but validation response not yet received).
[reverb/cc/writer.cc:387] Unable to confirm that items were written.
'''

I used tensorflow 2.11, reverb 0.10, ubuntu 20.04, python 3.10

Hi,

Thanks for reporting this. I would like to start off by encouraging you to avoid using the tf ops if possible. Using the TrajectoryWriter or the StructuredWriter should be preferred when possible and you should prefer TrajectoryDataset for sampling.

That being said, this error message isn't as scary as it seems. What it really means is that the client didn't receive the confirmation from the server that the item has been successfully inserted into the table before the connection was closed. Note that this doesn't mean that the item wasn't inserted, it just means that the client didn't get the confirmation of the insertion.

I have sent a fix though which ensures that the insert op blocks until the confirmations are received which will make this messages go away. It will be available in nightly in the next few days and then part of the next stable release once tf 2.12 is available.

Cheers,
Albin