Run any code in parallel Processes(> use all CPUs) or Threads(> speedup blocking operations).
Best suited for map-reduce or e.g. parallel downloads/uploads.
gem install parallel
# 2 CPUs -> work in 2 processes (a,b + c)
results = Parallel.map(['a','b','c']) do |one_letter|
expensive_calculation(one_letter)
end
# 3 Processes -> finished after 1 run
results = Parallel.map(['a','b','c'], in_processes: 3) { |one_letter| ... }
# 3 Threads -> finished after 1 run
results = Parallel.map(['a','b','c'], in_threads: 3) { |one_letter| ... }
Same can be done with each
Parallel.each(['a','b','c']) { |one_letter| ... }
or each_with_index
or map_with_index
Produce one item at a time with lambda
(anything that responds to .call
) or Queue
.
items = [1,2,3]
Parallel.each( -> { items.pop || Parallel::Stop }) { |number| ... }
You can also call any?
or all?
, which work the same way as Array#any?
and Array#all?
.
Parallel.any?([1,2,3,4,5,6,7]) { |number| number == 4 }
# => true
Parallel.all?([1,2,nil,4,5]) { |number| number != nil }
# => false
To avoid overhead from reducing big result sets at the end of processing, use reduce
this method will return results collected from each worker. You need to merge them on the end to get the fully reduced result.
result = Parallel.reduce(['a','b','c','d','a','b','c','d']) do |result,x|
result ||= Set.new
result << x
result
end
result.reduce(&:+)
Reduce can be used with initial value similar to original use:
result = Parallel.reduce(['a','b','c','d','a','b','c','d'], start_with: Set.new) do |result,x|
result << x
result
end
result.reduce(&:+)
Processes/Threads are workers, they grab the next piece of work when they finish. However in case of reduce the end result is returned only when all work is done.
- Speedup through multiple CPUs
- Speedup for blocking operations
- Variables are protected from change
- Extra memory used
- Child processes are killed when your main process is killed through Ctrl+c or kill -2
- Speedup for blocking operations
- Variables can be shared/modified
- No extra memory used
Try any of those to get working parallel AR
# reproducibly fixes things (spec/cases/map_with_ar.rb)
Parallel.each(User.all, in_processes: 8) do |user|
user.update_attribute(:some_attribute, some_value)
end
User.connection.reconnect!
# maybe helps: explicitly use connection pool
Parallel.each(User.all, in_threads: 8) do |user|
ActiveRecord::Base.connection_pool.with_connection do
user.update_attribute(:some_attribute, some_value)
end
end
# maybe helps: reconnect once inside every fork
Parallel.each(User.all, in_processes: 8) do |user|
@reconnected ||= User.connection.reconnect! || true
user.update_attribute(:some_attribute, some_value)
end
Parallel.map(User.all) do |user|
raise Parallel::Break # -> stops after all current items are finished
end
Only use if whatever is executing in the sub-command is safe to kill at any point
Parallel.map([1,2,3]) do |x|
raise Parallel::Kill if x == 1# -> stop all sub-processes, killing them instantly
sleep 100 # Do stuff
end
# gem install ruby-progressbar
Parallel.map(1..50, progress: "Doing stuff") { sleep 1 }
# Doing stuff | ETA: 00:00:02 | ==================== | Time: 00:00:10
Use :finish
or :start
hook to get progress information.
:start
has item and index:finish
has item, index, result
They are called on the main process and protected with a mutex.
Parallel.map(1..100, finish: -> (item, i, result) { ... do something ... }) { sleep 1 }
NOTE: If all you are trying to do is get the index, it is much more performant to use each_with_index
instead.
Use Parallel.worker_number
to determine the worker slot in which your
task is running.
Parallel.each(1..5, :in_processes => 2) { |i| puts "Item: #{i}, Worker: #{Parallel.worker_number}" }
Item: 1, Worker: 1
Item: 2, Worker: 0
Item: 3, Worker: 1
Item: 4, Worker: 0
Item: 5, Worker: 1
Here are a few notable options.
- [Benchmark/Test] Disable threading/forking with
in_threads: 0
orin_processes: 0
, great to test performance or to debug parallel issues - [Isolation] Do not reuse previous worker processes:
isolation: true
- [Stop all processses with an alternate interrupt signal]
'INT'
(fromctrl+c
) is caught by default. Catch'TERM'
(fromkill
) withinterrupt_signal: 'TERM'
- Replace Signal trapping with simple
rescue Interrupt
handler
- Przemyslaw Wroblewski
- TJ Holowaychuk
- Masatomo Nakano
- Fred Wu
- mikezter
- Jeremy Durham
- Nick Gauthier
- Andrew Bowerman
- Byron Bowerman
- Mikko Kokkonen
- brian p o'rourke
- [Norio Sato]
- Neal Stewart
- Jurriaan Pruis
- Rob Worley
- Tasveer Singh
- Joachim
- yaoguai
- Bartosz Dziewoński
- yaoguai
- Guillaume Hain
- Adam Wróbel
- Matthew Brennan
- Brendan Dougherty
- Daniel Finnie
- Philip M. White
- Arlan Jaska
- Sean Walbran
- Nathan Broadbent
Michael Grosser
michael@grosser.it
License: MIT