vinodborole / batch

Split an array/slice into n evenly chunks. Spread load evenly across workers

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Batch

Split an array/slice into n evenly chunks.

Inspired from the blog post by Paul Di Gian on his blog: Split a slice or array in a defined number of chunks in golang

Installation

Requires Go 1.18 or later.

add github.com/veggiemonk/batch to your go.mod file

then run the following command:

go mod tidy

Usage

Note: you might better off just copying the function into your codebase. It is less 10 lines of code.

See Go Proverbs for more details.

A little copying is better than a little dependency.

package main

import (
	"fmt"

	"github.com/veggiemonk/batch"
)

func main() {
    s :=  []int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

    // Split the slice into 3 even parts
    chunks := batch.BatchSlice(s, 3)

    // Print the chunks
    fmt.Println(chunks)
    // length      3       3        4
    // output: [[1 2 3] [4 5 6] [7 8 9 10]]
    // the size of each batch has variation of max 1 item
    // this can spread the load evenly amongst workers
}

Usage with Cloud Run Jobs

batchID = uuid.New().String()
taskCount, _ = strconv.Atoi(os.Getenv("CLOUD_RUN_TASK_COUNT"))
taskIndex, _ = strconv.Atoi(os.Getenv("CLOUD_RUN_TASK_INDEX"))

tt, err := requestToTasks(request)
if err != nil {
	return fmt.Errorf("failed to get list of tasks (id:%s): %w", batchID, err)
}

if len(tt) == 0 {
	return fmt.Errorf("no tasks found (id:%s): %w", batchID, ErrNoTaskFound)
}

batches := batch.BatchSlice(tt, taskCount)
if taskIndex >= len(batches) || taskIndex < 0 {
	return fmt.Errorf("index (%d) out of bounds (max: %d), (id:%s): %w", taskIndex, len(batches), batchID, ErrTaskIndexOutOfBounds)
}

b := batches[taskIndex]

err = process(b)
if err != nil {
    return fmt.Errorf("failed to process batch (id:%s): %w", batchID, err)
}

Rationale

Having evenly sized batch is useful when you want to distribute the workload evenly across multiple workers.

As opposed to defining the size of each batch, we define the number of batch we want to have.

Here a counter example:

package main
import "fmt"

func main() {
	array := []int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
	chunkSize := 3
    var result [][]int
	
	for i := 0; i < len(array); i += chunkSize {
		end := i + chunkSize

		if end > len(array) {
			end = len(array)
		}

		result = append(result, array[i:end])
	}
	
	fmt.Println(result)
	// length       4    |    4    |  2 
	// output: [[1 2 3 4] [5 6 7 8] [9 10]]
	// 2 workers will do double the work of the last worker.
}

This is not ideal when you want to distribute the workload evenly across multiple workers.

About

Split an array/slice into n evenly chunks. Spread load evenly across workers

License:Apache License 2.0


Languages

Language:Go 100.0%