markdrago / pgsanity

Check syntax of postgresql sql files

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add progress bar for long running syntax checks

opened this issue · comments

I'm currently syntax checking a 600mb SQL file and it'd be nice to know how far through pgsanity is.

Bonus: it's currently maxing out a single core of my VM and using 1.3GB of RAM :D

Interesting. A few things to note:

  • A bunch of the runtime is probably going to ecpg which is the thing that actually checks for syntax problems.
  • Producing a progress bar is a neat idea. It may end up being a bit complicated though. ecpg does not have a way of feeding it input via stdin, so pgsanity takes the approach of converting all of the input to pgsanity in to a temporary file and only then starting ecpg. There may be something neat which could be done by batching calls to ecpg, feeding it data over a named pipe, or something like that which may help in giving insight in to how much syntax has been checked.
  • Additionally, pgsanity isn't very smart with regards to how it reads files. Essentially it reads the input file in to memory entirely and then does some operations on the contents to produce its output. And we're talking about python strings here so there's going to be a lot of object instantiation, copying of data, etc. This is not a problem for SQL files that are a few K in size, but it's probably going to be noticeable on 600M files.

Adam, are you interested in working on any of this? Let me know and I can help out. If not I'll probably take a crack at some of this at some point.

I was incorrect in blaming ecpg for the slowness. It was in pgsanity itself and a bunch of performance improvements were made in 0.2.5. With that said, a progress bar is a neat idea and may still come in handy for very large files. I'll leave this issue open to document the progress bar idea.

Midnight inspiration: you may be able to do this with a program called pv, or something similar.