kytos-ng / kytos-ng.github.io

Home Page:https://kytos-ng.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

better file/folder organization for DB scripts across all NApps

viniarck opened this issue · comments

Currently these NApps have the following scripts:

  • flow_manager:
├── scripts
│   ├── drop_compound_index.py
│   ├── pipeline_related.py
│   ├── README.md
│   └── storehouse_to_mongo.py
  • topology:
├── scripts
│   ├── pipeline_related.py
│   ├── README.md
│   ├── storehouse_to_mongo.py
│   ├── unset_active.py
│   └── vlan_pool.py
  • mef_eline:
├── scripts
│   ├── 001_rename_priority.py
│   ├── 002_unset_spf_attribute.py
│   ├── 003_vlan_type_string.py
│   ├── README.md
│   └── storehouse_to_mongo.py

On each NApp scripts README.md we have further information about each script, which is great to understand when to use. But it's becoming difficult to link this on release notes and trying to understand on which version upgrade the script is needed for. Also, the ordering of the script isn't immediately clear, on mef_eline we started to use a 3-digit as a prefix which helped out, but this pattern is only be used on mef_eline so far.

This issue is to discuss a proposal for it, and then sticky with a new pattern, it needs to solve the following problems:

  • It must be clear which version the script is needed for.
  • It must be clear and easy to derive the order of the execution of the scripts.

Here's naming convention for folders and scripts to easily identify which version, and sequence, and how to use:

scripts/db/<version>/README.md
scripts/db/<version>/<\d+{3}>_script_name.py
  • Each NApp will have a scripts/db/<version> dir for DB migrations scripts of a specific version. The README.md will provide more brief information about each script and how to execute them.
  • Each DB script name will follow this name pattern <\d{3}>_script_name.py, with a 3-digit prefix starting with 000, just so the sequence of execution can be followed accordingly if needed (since the versions also increase monotonically, even if you have to traverse all the dirs you can derive a chronological sequence regardless of file metadata attrs).
  • scripts/db/README.md can also provide general information about general pre-requisites that applies for all scripts to avoid repetition on README files.

Here's an example of how flow_manager's current scripts would be organized with this convention:

scripts/db/2023.1.0/README.md
scripts/db/2023.1.0/000_drop_compound_index.py.
scripts/db/2023.1.0/001_pipeline_related.py.
scripts/db/2022.3.3/README.md
scripts/db/2022.3.3/000_drop_compound_index.py
scripts/db/2022.2.0/README.md
scripts/db/2022.2.0/000_storehouse_to_mongo.py

Let me know if you have any other suggestions to consider, otherwise we'll go with this one

We should also consider including DB version information somewhere in the DB. It could be as simple as a collection, with entries containing the name of a collection, and the version id of that collection. We can then use that information during migrations to validate the migration.

We should also consider including DB version information somewhere in the DB. It could be as simple as a collection, with entries containing the name of a collection, and the version id of that collection. We can then use that information during migrations to validate the migration.

That's a good idea, @Ktmi. We could reserve a migrations collection for it where each object would have this structure:

class MigrationDoc(BaseModel):
  napp_id: str 
  id: str # unique mapped to underlying Mongo doc _id
  collection: str
  inserted_at: datetime
  updated_at: datetime

Where id will be a uuid.uuid4() str value to facilitate for BSON serialization, which pre-defined when creating a DB script. To find if a migration has been applied it's a matter of finding the _id, and then to also get which migrations have been applied it's matter of fetching the entire collection sorting by created_at.

Another benefit of keeping track of the id is that it also facilitate scripts to be idempotent by simply querying first if the _id has already been inserted. If any script needs to be augmented for some unexpected reason, then a new script will be generated pretty much this collection will behave like an immutable collection when it comes to its objects. In the core we can export a controller managing this collection for NApps to use. Let me know if you have any other suggestions.

We'll move forward with the proposed approach.

The ideia with the MigrationDoc document in a migrations collection, is to allow to easily keep track of which migrations have already been applied, for instance, let's say these migrations have been applied on kytos/flow_manager, maybe we can also add a optional description string field:

rs0 [direct: primary] napps> db.migrations.find()
[
  {
    _id: 'cdfe8cd0-58f7-4942-a6e1-08c2d72c45be',
    napp_id: 'kytos/flow_manager',
    collection: 'flows',
    updated_at: ISODate("2024-05-02T18:09:29.603Z"),
    inserted_at: ISODate("2024-05-02T18:09:29.603Z")
  },
  {
    _id: 'f26dc0f5-207a-4fa8-ac51-4b198e84c1ca',
    napp_id: 'kytos/flow_manager',
    collection: 'flows',
    updated_at: ISODate("2024-05-02T18:10:06.992Z")
    inserted_at: ISODate("2024-05-02T18:10:06.992Z")
  }
]

So, if an user is trying to execute a new flow_manager DB script on its scripts folder, then all you have to do before performing any updates or writes to the collection is to check if a given _id hasn't been inserted yet. This means that whenever we also create/introduce a new DB script this script should also insert in this collection once the migration has succeeded. So on kytos core we could also provide some crud ops for this collection. But then in the DB scripts we can also provide a sort of a force option (implemented in the script) if they want to overwrite it anyway, most of the cases they wont.