anpandu / i1m

Exploring Go language and its features to write high-performance program.

Home Page:https://anpandu.github.io/blog/inserting-1-million-data-into-bigquery-table/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Inserting 1 Million data into BigQuery Table

Introduction

This is the source code of my blog post "Inserting 1 Million data into BigQuery Table". In this blog post, we will demonstrate how to insert JSON text file into Google's BigQuery Table using Go language. Go is known for one of the best language to write high-performance programs due to its native libraries that make concurrent and parallel programming easier. We will also demonstrate how to use Go various native libraries (channel, goroutine, waitgroup).

Usage

Installation

# Install Go (1.12.1)
go version
# go version go1.12.1 linux/amd64

# Clone this project
go get https://github.com/anpandu/go-json-to-bq

# Install dependencies
cd $GOPATH/src/github.com/anpandu/go-json-to-bq
go get

# Generate dataset
python3 gen-txt.py
# you will see files like this
# -rw-rw-r-- 1 pandu pandu  220 Mar  5 18:31 students-10.json.txt
# -rw-rw-r-- 1 pandu pandu 2.3K Mar  5 18:31 students-100.json.txt
# -rw-rw-r-- 1 pandu pandu  24K Mar  5 18:31 students-1000.json.txt
# -rw-rw-r-- 1 pandu pandu 244K Mar  5 18:31 students-10000.json.txt
# -rw-rw-r-- 1 pandu pandu 2.5M Mar  5 18:31 students-100000.json.txt
# -rw-rw-r-- 1 pandu pandu  26M Mar  5 18:31 students-1000000.json.txt

# Enable credential
export GOOGLE_APPLICATION_CREDENTIALS=~/mykey.json

How to use

# Run Part One: Simple Approach
go run cmd/main1/main1.go \
  --project=myproject \
  --dataset=mydataset \
  --table=mytable \
  --filepath=./students-10.json.txt

# Run Part Two: Multiple Rows Insertion
go run cmd/main2/main2.go \
  --project=myproject \
  --dataset=mydataset \
  --table=mytable \
  --buffer-length=100 \
  --filepath=./students-10.json.txt

# Part Three: Multiple Workers and Multiple Rows Insertion
go run cmd/main3/main3.go \
  --project=myproject \
  --dataset=mydataset \
  --table=mytable \
  --buffer-length=100 \
  --worker=4 \
  --filepath=./students-10.json.txt

Benchmark

Benchmark was taken using same type of machine, n1-standard-1 (1 vCPU, 3.75 GB mem). Using multiple JSON text files generated at different sizes, we benchmark three approaches and measure time taken.

File Parameter 1.000 rows 10.000 100.000 1.000.000
main.go w1 - n1 312.164s 3242.766s n/a n/a
main2.go w1 - n100 4.599s 35.735s 381.251s 3738.669s
main3.go w4 - n100 1.453s 9.137s 95.666s 939.175s
main3.go w16 - n100 0.808s 2.938s 24.754s 224.630s
main3.go w64 - n100 0.848s 1.643s 11.667s 62.624s
main3.go w300 - n500 0.934s 1.296s 4.081s 14.787s

As we can see, higher number of w and n make insertion faster. Using our highest configuration enable us to insert one million rows in a mere 14 seconds!

License

MIT © Ananta Pandu

About

Exploring Go language and its features to write high-performance program.

https://anpandu.github.io/blog/inserting-1-million-data-into-bigquery-table/


Languages

Language:Go 93.7%Language:Python 5.1%Language:Makefile 1.2%