Running concurrent clients

Question

Running concurrent clients

mitchelldehaven opened this issue 4 years ago · comments

Is there any documentation on the correct way to run concurrent clients? The README.md contains runtime performance using 6 concurrent clients, but looking through the documentation I didn't see anything on this.

Patrice Lopez · Answer 1 · Tue Jul 07 2020 02:46:33 GMT+0800 (China Standard Time)

Hello @mitchelldehaven !

I made the runtime benchmarks using shell scripts and I am using the service with various Java tools, but there are at least two clients managing concurrent calls that could help you more easily:

a GO client: https://github.com/pjox/gofishing
a python client: https://github.com/hirmeos/entity-fishing-client-python

(disclamer: I've not tested them)

Mitchell DeHaven · Answer 2 · Tue Jul 07 2020 09:08:36 GMT+0800 (China Standard Time)

I'm wanting to run this HPC environment to process thousands of PDFs, but when attempting to run on different worker nodes from the same project directory, maven seems dislike this. The naive approach would be to copy the project directory several times, but the project directory is like ~100gb, so I'm unsure if the approach you were using would avoid this.

Mitchell DeHaven · Answer 3 · Tue Jul 07 2020 20:02:45 GMT+0800 (China Standard Time)

Sorry, I think I found the mistake I was making. It was unrelated to concurrent threads. Thanks!

Patrice Lopez · Answer 4 · Tue Jul 07 2020 20:12:22 GMT+0800 (China Standard Time)

@mitchelldehaven I am actually also trying to run the tool in an HPC environment. It's challenging because the tool is seen more as a service deployed in an environment like a AWS cloud. The issue with the 100GB resource space is that a shared disk will harm the performance a lot. It is working fine on an attached SSD because it used memory mapped files, but with shared disk access, it could be a disaster :)
So I am interested in your feedback on this!

Also note that there is a new release with updated resource dbs (now as of end of May 2020 Wikidata and Wikipedia) and some fixes, and gradle is now used instead of maven.