Azure / azure-data-lake-store-python

Microsoft Azure Data Lake Store Filesystem Library for Python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Number of threads shouldn't be tied to core count.

akharit opened this issue · comments

Description

Outline the issue here:
Since Python threads don't use separate cores i.e run within the same process, core count is not a good method of deciding number of threads for ADLDownloader and Uploader.


Reproduction Steps

** Enumerate the steps to reproduce the issue here:**

Environment summary

SDK Version: What version of the SDK are you using? (pip show azure-datalake-store)
Answer here:

Python Version: What Python version are you using? Is it 64-bit or 32-bit?
Answer here:

OS Version: What OS and version are you using?
Answer here:

Shell Type: What shell are you using? (e.g. bash, cmd.exe, Bash on Windows)
Answer here:

On discussion, it seems ThreadPoolExecutor by default is also using number based on cpu count

max_workers = (os.cpu_count() or 1) * 5