- if you need external namespaces, define them in
metadata/imported-namespaces.yaml
prefix: {uri: ..., tag (optional): ...}
format in yaml- import them to code with
inv import-namespaces
- in-script metadata in
src/namespace_metadata.py
- load all the data to
complete
subset with theupdate_data
function insrc/update_data.py
- use ScruTable objects created in
namespace_metadata
to dump the data indata
directory - execute it with
inv update-data
- this calls the function defined in teh first step
- run
inv serialize-inscript-metadata
to move the structure defined in the script to.yaml
files - optionally extend the schema with descriptions where necessary
metadata schema is created with the intention for export to
- add all remotes to main branch
- set default remotes for different branches with
- fill
conf/default-remotes.yaml
- run
inv set-dvc-remotes
- fill
- create and document different environments with different access restrictions (in different dvc remotes)
- save them to
conf/created-envs.yaml
using the{name: {branch: ..., kwargs: {...}}
schema - either anonymized datasets with the same schema or simply smaller samples for personal workstation use
- save them to
- run the script that updates the
complete
subset - run the commands creating and uploading the other subsets
inv write-envs
inv push-envs --git-push
some script should present the dataset automatically
- schema diagram
- maybe some table profiling
- simple figures
- as general as possible
- with specifications regarding privacy / security