apache / datafusion-ballista

Apache DataFusion Ballista Distributed Query Engine

Home Page:https://datafusion.apache.org/ballista

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[feature] support shuffle read with retry when facing IO error.

Ted-Jiang opened this issue · comments

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
(This section helps Arrow developers understand the context and why for this feature, in addition to the what)
Same in spark

 /**
   * Max number of times we will try IO exceptions (such as connection timeouts) per request.
   * If set to 0, we will not do any retries.
   */
  public int maxIORetries() { return conf.getInt(SPARK_NETWORK_IO_MAXRETRIES_KEY, 3); }

https://github.com/apache/spark/blob/2878cd84d5d5edcaa9bbc642c12d8c7f6f009328/common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java#L157
Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.