dathere / datapusher-plus

A standalone web service that pushes data into the CKAN Datastore fast & reliably. It pushes real good!

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Roadmap Tracking Issue - EPIC

jqnatividad opened this issue · comments

OVERALL VISION: To increase the utility and performance of the CKAN Datastore:

  • by enriching resources, so that right after a file is pushed by DP+, it does a lot of data-wrangling tasks that are typically done manually:
    • a lot of metadata is inferred, so the Data Publisher does not have to laboriously enter it in
    • descriptive statistics are computed, allowing the Data Publisher and the end-user to better understand the resource
    • location information is automatically normalized and geocoded
    • related datasets/resources are automatically inferred
    • auto-tagging
  • by taking advantage of PostgreSQL native features
    • also use it as a Document Database leveraging JSONB?
    • partitioning/sharding?
  • by tapping into the rich PostgreSQL extensions ecosystem (in particular - PostGIS, Timescale, Citus, CartoDB, Apache Age and ZomboDB)
  • give it "Data Lake"-like capabilities
  • enable Datastore API users to issue performant, reliable SQL queries

  • #98
  • #18
  • #11
  • Auto-tagging
  • Automatic spatial extent calculation
  • Automatic processing/recognition of whitelisted common column names (e.g. latitude, longitude, status, open date, closed date, etc.)
  • #53
  • #47
  • #27
  • #9
  • Auto partitioning
  • #60
  • Deferred datapush on initial package creation to allow per package Datapusher+ Configuration
  • #87
  • #17
  • Enabling record-level search
  • #8
  • #13
  • #54
  • #10
  • #19
  • #30
  • Native PostGIS support
  • Native time-series support with Timescale
  • #34
  • #35
  • #46