supabase / stripe-sync-engine

Sync your Stripe account to you Postgres database.

Home Page:https://supabase.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RFC: loading existing data

kiwicopple opened this issue · comments

It would be nice to be able to load existing Stripe data into a database. From a high level:

  1. Bulk fetch all Stripe data for a single entity (e.g. customer)
  2. Bulk insert/upsert data into the postgres table (e.g. stripe.customers)

Possible options

1. Don't do it

Just let the user download a CSV then COPY to their database

2. Upload CSV

Create an endpoint to POST a CSV, then the server can upload the data. eg:

// POST multipart /customers/load
// => { csv: file }
fastify.post('/', async function (req, reply) {

  const data = await req.csv()
  
  // upload data to Postgres database
  
  reply.send()
})

3. Fetch data from URL

Create an endpoint to fetch a CSV, then the server can upload the data. eg:

// PATCH /customers/load
// => { csv_url: file }
fastify.path('/', async function (req, reply) {

  // fetch file from URL
  // upload data to Postgres database
  
  reply.send()
})

4. Loop through Stripe API

Loop through all the stripe data via the Stripe API (but this could have rate-limiting limitations)

Regarding #4 and rate limits... the best way to design this is to pull data in batches sorted on an entry timestamp (if that's available in the Stripe API -- I'd have to look it up.). But basically just pull all the data in batches and set a high watermark, then use the watermark to keep things in sync.

This could all be done in a Postgres function using the http extension, or using copy from program with any command-line tool that can talk to the Stripe API.

EDIT: Stripe has pagination available for doing large bulk loading: https://stripe.com/docs/api/pagination

Could use singer but it looks like it has a few limitations:

(from https://github.com/datamill-co/target-postgres)

  • de-nests everything into separate tables (rather than using using a jsonb datatype)
  • the JSON Schema seems to map the timestamps to strings (I've been using numbers). Ideally I'd like to use numbers (to match stripe "exactly") and then generated columns to create timestamps

I've thought about the best way to do this and I think the best "v1" would be:

  • Add a sync task to each object
  • this will fetch all the elements on stripe and loop through them -> calling the upsert function for each individually

This will be a bit slower than doing a full insert into the database, but it ensures data integrity by backfilling. We might improve this later by bulk uploading

commented

Option 4 is implemented here #45