zachgoldstein / datapi

A service to quickly provide an API over flat files in cloud storage

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

      ██████╗  █████╗ ████████╗ █████╗ ██████╗ ██╗
      ██╔══██╗██╔══██╗╚══██╔══╝██╔══██╗██╔══██╗██║
      ██║  ██║███████║   ██║   ███████║██████╔╝██║
      ██║  ██║██╔══██║   ██║   ██╔══██║██╔═══╝ ██║
      ██████╔╝██║  ██║   ██║   ██║  ██║██║     ██║
      ╚═════╝ ╚═╝  ╚═╝   ╚═╝   ╚═╝  ╚═╝╚═╝     ╚═╝

Datatoapi

asciicast

When would you use this?

  • Your data is sitting in simple files (jsonfiles, csv, etc) on cloud storage
  • You want to access a single data point or small subsets of your data
  • There is too much data to read it all for access to a single object
  • Access latency isn't critically important but must be reasonable
  • You don't want to spend timing ingesting the data into a traditional database
  • Your use case has you reading alot more than writing
  • You don't want to spend time writing routing or serialisation code

An example of how this would be done previously:

If you have flat files sitting in s3, you can retrieve the whole record and search without alot of effort, but it takes quite a bit of time.

aws s3 cp s3://datatoapi/data.jsonfiles - | jq 'select(.username == "wyman.maye")' -c  

Takes about 3.53 secs. (requires downloading the entire file).

How this would work if you're running datapi?

curl "http://127.0.0.1:8123/username/wyman.maye"

Took 0.757 secs.

Why is this faster? Datapi has built an index to find the record more quickly, and only downloads a small chunk of the file.

In this example, the file we're interested in is very small (890K), but when you're looking at larger files, the difference in performance will be much more significant.

Installation

Clone the repo here and run with go run main.go A binary isn't available quite yet.

Usage

Running the service:

go run main.go -storage "https://s3.amazonaws.com/datatoapi"

Retrieving a specific result:

curl "http://127.0.0.1:8123/id/1000001"

Retrieving all results:

curl "http://127.0.0.1:8123/all/has_existential_identity_crisis/true"

Searching for results:

curl "http://127.0.0.1:8123/search/Brakus"

If you want pretty, formatted results, pipe this data through jq!

curl "http://127.0.0.1:8123/id/1000001" | jq '.'

Supported data formats

  • jsonfiles
  • csv (TODO)
  • tsv (TODO)
  • json (TODO)
  • xml (TODO)

Supported storage backends

  • Amazon S3
  • Azure (TODO)
  • Digital Ocean Spaces (TODO)
  • Openstack Swift storage (TODO)
  • Ceph (TODO)

What datatoapi is not good for

  • Write heavy situations where your data set is rapidly changing
  • You need some sort of authorisation scheme associated with the data. You'll have to build this functionality separately.

:shipit:

About

A service to quickly provide an API over flat files in cloud storage


Languages

Language:Go 97.1%Language:Python 2.9%