Support for convert to stdout

Question

Support for convert to stdout

bdon opened this issue 10 months ago · comments

I'd like to do something like this:

gpq convert Cairo_Governorate.parquet --stdout --to=geojson | tippecanoe -o Cairo_Governorate.pmtiles --drop-densest-as-needed

Would this functionality be useful? It would require some changes in convert.go to allow for a blank positional output argument.

Tim Schaub · Answer 1 · Tue Sep 26 2023 05:04:51 GMT+0800 (China Standard Time)

Hey @bdon - nice idea. I put together #79 to make all the commands optionally work with stdin/stdout.

If you omit the output arg in the convert command, it writes to stdout. Not as explicit as a --stdout arg. Hopefully isn't trying to be too tricky.

Brandon Liu · Answer 2 · Tue Sep 26 2023 08:27:50 GMT+0800 (China Standard Time)

All together!

curl https://data.source.coop/cholmes/google-open-buildings/geoparquet-admin1/country=EGY/Cairo_Governorate.parquet | ./gpq convert --from=geoparquet --to=geojson | tippecanoe -o buildings.pmtiles --force --drop-densest-as-needed

Tim Schaub · Answer 3 · Tue Sep 26 2023 11:12:07 GMT+0800 (China Standard Time)

Included in the v0.15.0 release (brew update && brew install planetlabs/tap/gpq or download from the release page).

Tim Schaub · Answer 4 · Tue Sep 26 2023 11:23:30 GMT+0800 (China Standard Time)

@bdon - you'll probably notice that this needs to buffer the whole file since the Parquet metadata is in the footer. But that suggests another enhancement - to accept a URL for the input. Then if ranged reads are supported, the metadata could be read first (and then maybe only buffer one data page at a time).

Brandon Liu · Answer 5 · Tue Sep 26 2023 11:34:40 GMT+0800 (China Standard Time)

@tschaub have you looked into using https://gocloud.dev for reading Parquet?

For https://github.com/protomaps/go-pmtiles/blob/main/pmtiles/extract.go#L276 I use only the blob functionality, but that means it supports GCP, Azure, and S3-compatible blob storage with credentials out of the box. I had to add a layer of abstraction to handle public unauthenticated HTTP URLs but it was otherwise simple.

Tim Schaub · Answer 6 · Tue Sep 26 2023 11:52:40 GMT+0800 (China Standard Time)

I've used similar libs, but not yet gocloud.dev, will check it out.

My ideal would be a multi-cloud blob reader that implemented io.ReadSeeker and io.ReaderAt (I know this isn't efficient for all providers, but it is possible - with lots of guessing to know how much to buffer for the seeker reads).

Brandon Liu · Answer 7 · Tue Sep 26 2023 12:14:41 GMT+0800 (China Standard Time)

For PMTiles it uses bucket.NewRangeReader without any guessing - it downloads the entire (compressed) relevant part of the index in advance, and then pre-merges request ranges to avoid thousands of small requests, before fetching any actual "features" (tiles).

Is a similar batching behavior needed to be effective for geoparquet? I haven't delved deeply into actual reader implementations yet.