kids-first / c2m2-submission-process

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Kids First repository logo

Repo Description

This repository will initially serve as a staging point for the source and data files associated with the C2M2 submission process. It might eventually grow into a pipeline for the C2M2 process, but we are starting small.

"A journey of a thousand miles begins with a single step" -Laozi

Important Links

  1. Base Wiki Page for C2M2 Submissions
  2. Submission Prep Script Wiki
  3. C2M2 Submission Prep Script
  4. C2M2 Table Summary
  5. CV Reference Files
  6. JSON Schema document describing the current C2M2 metadata model
  7. Frictionless: Data management framework for Python
  8. OSF Client: Cli tool for grabbing OSF artifacts
  9. CFDE Submission Doc

Submission Process Steps

  1. Evironment Setup
  • Creates a virtual environment in the current directory
  • Activates the venv
  • Installs package dependencies from requirements.txt
source setup_evn.sh
  1. Acquire submission tools from OSF
  • OSF cli tool grabbing the submission script and the cv reference files
  • Moves submission script and reference files to root directory
  • OSF cli tool grabbing the C2M2 data package to validate the submission
# To get OSF Tools
./acquire_osf_c2m2_submission_tools.sh

OR

# To refresh OSF Tools
./acquire_osf_c2m2_submission_tools.sh refresh
  1. Execute kf to c2m2 etl process
  • Execute transform script
    • The script executes in 3 phases
      1. Extract
      • Ingest class utilizing kf-utils writes kf model data in the form of tsv to the /kf_to_c2m2_etl/ingested/tables directory.
      1. Transform
      • KF model data mapping to tables is transformed into c2m2 tables and written back out as tsv's to /kf_to_c2m2_etl/transformed.
      1. Load
      • Moves transformed tsv's into directory in order to execute script contributing controlled vocabularies.
      • Also, adds empty tables required for submission
python /kf_to_c2m2_etl/etl.py
  1. Execute osf script for preparing c2m2 submission
  • Executes prepare submission script
  • Creates frictionless validation directory
  • Moves data files and generated files to validation directory
  • Moves the C2M2 file used to validate the submitted files
./prepare_c2m2_submission.sh
  1. Validate C2M2 submission data
  • Move to the validation directory
  • Generates the validation report
./validate_submission.sh YEAR QUARTER VERSION
  1. Submit data to CFDE portal

*** Refer to Important Links #7 for additional info***

  • Login with submit tool
  • Execute submission
    • Check tables for conformance to C2M2's latest release notes
    • Set DCC for submission (cfde_registry_dcc:kidsfirst)
  • Verify submission in progress
  • Review submission results
# Will be redirected to web browser for credentials
cfde-submit login 

# Command starts submission and sets data coordinating center
cfde-submit run path-to-frictionless-validation --ignore-git --dcc-id cfde_registry_dcc:kidsfirst 

# Can be executed intermittently to verify submission status
cfde-submit status

# Logout when submission is completed
cfde-submit logout

About

License:Apache License 2.0


Languages

Language:Jupyter Notebook 65.4%Language:Python 33.4%Language:Shell 1.2%