ENCODE-DCC / encoded

Metadata database for ENCODE project

Home Page:https://www.encodeproject.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Monitoring indexing progress

Parul-Kudtarkar opened this issue · comments

Hi,

After starting a new server and fetching dataset from wal backups, is there a way to monitor indexing status or possible way to know that everything has been indexed by elasticsearch?

We have been using -
curl -s "localhost:9200/_recovery?pretty&human"
curl 'localhost:9200/_cat/indices?v'

But this isn't particularly helpful

Thanks!
Parul

Also is there a way to verify if all trackhubs are cached?

@Parul-Kudtarkar we have made some changes to the indexer that show more progress. I don't think specifically you can check all trackhubs... but you can at least tell the total number?

See: https://github.com/ENCODE-DCC/snovault/blob/master/docs/indexer.rst for more info, or ask @tdreszer

@hitz this is great! Since last December we haven't been forking encode code base and several commits behind. The recent work on indexer is something we would want to implement and can certainly benefit! We will keep you posted!

@tdreszer
I see changes made to following files in the Snovault repository
src/snovault/elasticsearch/init.py
src/snovault/elasticsearch/indexer.py
src/snovault/elasticsearch/indexer_state.py
src/snovault/tests/test_indexing.py
are there any additional changes made to python scripts or initialization scripts in the Encode repository to implement recent indexer?
Thanks!