DataHerb / dataherb-flora

DataHerb Flora: The core of DataHerb

Home Page:https://dataherb.github.io/flora

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

dataherb-flora


Markdownify
DataHerb Flora

A DataHerb Core Service to Bundle the Datasets into Flora.

What is DataHerb

DataHerb is an open data initiative to make the access of open datasets easier.

  • A DataHerb or Herb is a dataset. A dataset comes with the data files, and the metadata of the data files.
  • A DataHerb Leaf or Leaf is a data file in the DataHerb.
  • A Flora is the combination of all the DataHerbs.

In many data projects, finding the right datasets to enhance your data is one of the most time consuming part. DataHerb adds flavor to your data project.

What is DataHerb Flora

We desigined the following workflow to share and index datasets.

DataHerb Workflow

This repository is being used for listing of datasets (Listings in DataHerb flora repository).

How to Add Your Dataset

A Complete Tutorals

Simply create a yml file in the flora folder to link to your dataset repository. Your dataset repository should have a .dataherb folder and a metadata.yml file in it.

The indexing part will be done by GitHub Actions.

How is Everything Connected

There are three components to build the dataset index.

  1. dataherb-flora: Index datasets using yml files.
  2. dataherb-metadata-aggregator: Aggregrates all information about the datasets and create database.
  3. dataherb.github.io: Builds the website using the database.

Some packages are also created to make the access and creation of the datasets easier. Refer to the website for the details.