isamplesorg / export_client

Client for the iSamples export service

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add examples for how to use DuckDB to query geoparquet files

rdhyee opened this issue · comments

I managed to export the Smithonsian records with

isample export -j $ISAMPLES_JWT -f geoparquet -d /tmp -q 'source:SMITHSONIAN'

Next challenge is how to do simple spatial queries. I installed duckdb on my Mac:

brew install duckdb

I installed duckdb extensions as suggested by cholmes/duckdb-geoparquet-tutorials:

INSTALL spatial;
LOAD spatial;
INSTALL httpfs;
LOAD httpfs;

I then

CREATE TABLE geoparquet_data AS SELECT * FROM read_parquet('isamples_export_2024_06_07_07_40_00_geo.parquet');
DESCRIBE geoparquet_data
  ;

to get

┌──────────────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─────────┬─────────┬─────────┬─────────┐
│ column_name │ column_type │ null │ key │ default │ extra │
│ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │
├──────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┤
│ sample_identifier │ VARCHAR │ YES │ │ │ │
│ label │ VARCHAR │ YES │ │ │ │
│ description │ VARCHAR │ YES │ │ │ │
│ source_collection │ VARCHAR │ YES │ │ │ │
│ has_specimen_categ… │ STRUCT("label" VARCHAR)[] │ YES │ │ │ │
│ has_material_categ… │ STRUCT("label" VARCHAR)[] │ YES │ │ │ │
│ has_context_category │ STRUCT("label" VARCHAR)[] │ YES │ │ │ │
│ keywords │ STRUCT(keyword VARCHAR)[] │ YES │ │ │ │
│ produced_by │ STRUCT(description VARCHAR, has_feature_of_interest VARCHAR, identifier VARCHAR, "label" VARCHAR, responsibility STRUCT("name" VARCHAR, "role" VARCHAR)[], result_time VARCHAR, sampling_site STRUCT(desc… │ YES │ │ │ │
│ registrant │ STRUCT("name" VARCHAR) │ YES │ │ │ │
│ sample_location_lo… │ DOUBLE │ YES │ │ │ │
│ sample_location_la… │ DOUBLE │ YES │ │ │ │
│ geometry │ BLOB │ YES │ │ │ │
├──────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴─────────┴─────────┴─────────┴─────────┤
│ 13 rows 6 columns │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

but I'm trying to a spatial extent query:

D SELECT ST_Extent(geometry) AS extent FROM geoparquet_data;
Binder Error: No function matches the given name and argument types 'ST_Extent(BLOB)'. You might need to add explicit type casts.
	Candidate functions:
	ST_Extent(GEOMETRY) -> BOX_2D

LINE 1: SELECT ST_Extent(geometry) AS extent FROM geop...
               ^
D SELECT ST_Extent(CAST(geometry AS BOX_2D)) AS extent FROM geoparquet_data;
Conversion Error: Unimplemented type for cast (BLOB -> BOX_2D)
LINE 1: SELECT ST_Extent(CAST(geometry AS BOX_2D)) AS extent FRO...
                         ^
D SELECT ST_Extent(CAST(geometry AS GEOMETRY)) AS extent FROM geoparquet_data;
Conversion Error: Unimplemented type for cast (BLOB -> GEOMETRY)
LINE 1: SELECT ST_Extent(CAST(geometry AS GEOMETRY)) AS extent F...

How to fix?