datopian / datahub-qa

:package: Bugs, issues and suggestions for datahub.io

Home Page:https://datahub.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[init] data init improvements April 2018

rufuspollock opened this issue · comments

Work in progress!

  • make init non-interactive by default and add option --interactive or -i for interactive mode
    • guess name from directory name
    • use name to generate title
    • set license to ODC attribution or PDDL (?)
    • (?) Use readme to set description
  • decide which fields to auto create for the user e.g.
    • licenses
    • sources (?)
  • prompt for title first and then auto-suggest name from title (or use directory name for title?)

Bug: when I init and add README.md it try to re-add it again (in interactive mode).

  1. create a project with README.md
  2. run data init -i
  3. add README.md as resource
  4. see it asks to add README again

FIXED in v0.9.1

TESTED:
Everything looks good...

...except the very rare case if we already have an INVALID datapackage.json file:

$ echo error > datapackage.json 
$ data init
> This process updates existing datapackage.json file.
> 
Press ^C at any time to quit.
? There is datapackage.json already. Do you want to update it - y/n? y
> Error! Unexpected token e in JSON at position

@anuveyatsu I would like to print an error: "existing descriptor file is invalid, please fix or delete it manually". Created a PR for it, please review: datopian/datahub-client#42

FIXED:

  • data init now works in automatic mode by default
  • interactive mode by -i, --interactive arguments
  • blog post has been published.

Oh, yes, I found another error with not-acceptable names:

$ ls
data.csv  README.md  Кровь эльфов.mobi.rar
$ data init
> This process initializes a new datapackage.json file.
> Once there is a datapackage.json file, you can still run `data init` to update/extend it.
> Press ^C at any time to quit.

> Detected special file: README.md
> data.csv is just added to resources
> Кровь эльфов.mobi.rar is just added to resources
> 
💾 Descriptor is saved in "datapackage.json"
$ data push
> Error! Descriptor validation error:
          String does not match pattern: ^([-a-z0-9._/])+$
          at "/name" in descriptor and
          at "/properties/name/pattern" in profile
> Error! Descriptor validation error:
          String does not match pattern: ^([-a-z0-9._/])+$
          at "/resources/1/name" in descriptor and
          at "/properties/resources/items/properties/name/pattern" in profile

@anuveyatsu could you fix this, e.g. check names and transform into latin form if needed?

@anuveyatsu also we should skip hidden folders, like .git, .idea etc, otherwise we get invalid data package:

$ data init
> This process initializes a new datapackage.json file.
....
> data.csv is just added to resources
> .datahub/datapackage.json is just added to resources
> .datahub/flow.yaml is just added to resources
> .git/HEAD is just added to resources
.......
> 
💾 Descriptor is saved in "datapackage.json"

At the end you cannot push such a dataset, coz

$ data push
> Error! Descriptor validation error:
          Data does not match any schemas from "oneOf"
          at "/resources/1/path" in descriptor and
          at "/properties/resources/items/properties/path/oneOf" in profile
 Because of security reasons 'path' property cannot:
          - have backwards path '../'
          - start from filesystem root '/'
          - start from user root '~/'
          - start with '.'            <<<<<<<<<<<<<<< AND IT IS THE CASE

Update: FIXED NOW

@AcckiyGerman

All that will be available with the next release of the CLI.

TESTED & FIXED.

@anuveyatsu ok, its working but the result is that whole non-latin phrases are replaced by - symbol.
What do you think about this module: https://www.npmjs.com/package/transliteration

import { transliterate as tr, slugify } from 'transliteration';
 
tr('你好, world!'); // Ni Hao , world! 
slugify('你好, world!'); // ni-hao-world 

It also replaces spaces by - and transliterates a lot of languages, has a fallback symbol for unknown letters, etc...

The only minus is this module weight is ~2mb.