[init] data init improvements April 2018
rufuspollock opened this issue · comments
Work in progress!
- make init non-interactive by default and add option --interactive or -i for interactive mode
- guess name from directory name
- use name to generate title
- set license to ODC attribution or PDDL (?)
- (?) Use readme to set description
- decide which fields to auto create for the user e.g.
- licenses
-
sources (?)
prompt for title first and then auto-suggest name from title (or use directory name for title?)
Bug: when I init and add README.md it try to re-add it again (in interactive mode).
- create a project with README.md
- run
data init -i
- add README.md as resource
- see it asks to add README again
FIXED in v0.9.1
TESTED:
Everything looks good...
...except the very rare case if we already have an INVALID datapackage.json file:
$ echo error > datapackage.json
$ data init
> This process updates existing datapackage.json file.
>
Press ^C at any time to quit.
? There is datapackage.json already. Do you want to update it - y/n? y
> Error! Unexpected token e in JSON at position
@anuveyatsu I would like to print an error: "existing descriptor file is invalid, please fix or delete it manually". Created a PR for it, please review: datopian/datahub-client#42
FIXED:
- data init now works in automatic mode by default
- interactive mode by
-i, --interactive
arguments - blog post has been published.
Oh, yes, I found another error with not-acceptable names:
$ ls
data.csv README.md Кровь эльфов.mobi.rar
$ data init
> This process initializes a new datapackage.json file.
> Once there is a datapackage.json file, you can still run `data init` to update/extend it.
> Press ^C at any time to quit.
> Detected special file: README.md
> data.csv is just added to resources
> Кровь эльфов.mobi.rar is just added to resources
>
💾 Descriptor is saved in "datapackage.json"
$ data push
> Error! Descriptor validation error:
String does not match pattern: ^([-a-z0-9._/])+$
at "/name" in descriptor and
at "/properties/name/pattern" in profile
> Error! Descriptor validation error:
String does not match pattern: ^([-a-z0-9._/])+$
at "/resources/1/name" in descriptor and
at "/properties/resources/items/properties/name/pattern" in profile
@anuveyatsu could you fix this, e.g. check names and transform into latin form if needed?
@anuveyatsu also we should skip hidden folders, like .git
, .idea
etc, otherwise we get invalid data package:
$ data init
> This process initializes a new datapackage.json file.
....
> data.csv is just added to resources
> .datahub/datapackage.json is just added to resources
> .datahub/flow.yaml is just added to resources
> .git/HEAD is just added to resources
.......
>
💾 Descriptor is saved in "datapackage.json"
At the end you cannot push such a dataset, coz
$ data push
> Error! Descriptor validation error:
Data does not match any schemas from "oneOf"
at "/resources/1/path" in descriptor and
at "/properties/resources/items/properties/path/oneOf" in profile
Because of security reasons 'path' property cannot:
- have backwards path '../'
- start from filesystem root '/'
- start from user root '~/'
- start with '.' <<<<<<<<<<<<<<< AND IT IS THE CASE
Update: FIXED NOW
- I have added a fix that normalizes the dataset name + file name according to specs. However, all non-ascii characters are just replaced by
-
character. We do more work on handling non-ascii in datopian/data-cli#235. The fix is here: - skipping directories and files where name starts with
.
- datopian/datahub-client@22a8f82
All that will be available with the next release of the CLI.
TESTED & FIXED.
@anuveyatsu ok, its working but the result is that whole non-latin phrases are replaced by -
symbol.
What do you think about this module: https://www.npmjs.com/package/transliteration
import { transliterate as tr, slugify } from 'transliteration';
tr('你好, world!'); // Ni Hao , world!
slugify('你好, world!'); // ni-hao-world
It also replaces spaces by -
and transliterates a lot of languages, has a fallback symbol for unknown letters, etc...
The only minus is this module weight is ~2mb.