[fea] Avoid multithreaded write lock conflicts in event queue.
Ted-Jiang opened this issue · comments
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
After print long running events #749
In load test
Cluster
: with 60 executor (each 200 slots) and 1 scheduler cluster.
WorkLoad
: 200 clients submit 0.5s cost query in sequence.
found :
[Metrics] ReservationOffering : reservations:[[ExecutorReservation { executor_id: "ee419ec5-acdd-4466-b4b9-5467a3e8b0b1", job_id: None }, ExecutorReservation { executor_id: "5d2d9f60-aa4b-48d1-ae7d-6e01dc2bc12f", job_id: None },
events cost 159 ms!
[Metrics] JobFinished : job_id=r3N6vIe.
events cost 279 ms!
[Metrics] JobUpdated : job_id=r3N6vIe.
events cost 289 ms!
Modify memory status cost hunder us, seems have lock confict issue.
After read the code found in QueryStageSchedulerEvent::JobQueued
use tokio::spawn
to update memory status in parallel
https://github.com/apache/arrow-ballista/blob/a9ecd3a065077bec6c5d271e890d091c594746fa/ballista/scheduler/src/state/mod.rs#L377-L379
Describe the solution you'd like
Move the self.task_manager.submit_job
to JobSubmitted
stage, keep it in single thread , and use Arc
to leave the plan on heap avoid deep clone.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.