WIPACrepo / lta

Long Term Archive

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

FC bundle date

dsschult opened this issue · comments

tl;dr: We need to record the date a bundle was archived to NERSC in the
File Catalog under a consistent field. This has to be done carefully to
preserve "meta_modify_date" which is (often) the field we currently need
to capture.


The problem is that the date a bundle is archived to NERSC is a bit
slippery when it comes to how it is represented in the File Catalog.

Last time, it was a combination of the MySQL data plus File Catalog
data, which you would think would be tricker, but the MySQL dates were
well defined, and we could rely on meta_modify_date for the File Catalog
dates.

This time, the bundles uploaded by the old JADE LTA are also in the File
Catalog, but the NERSC archival date is under "create_date", but only
for those records. Anything uploaded by LTA doesn't have that field but
we can rely(?) on "meta_modify_date".

After figuring all this out (and worrying that some other tool has
mucked with "meta_modify_date" in the meantime), I put together the data
for Benedikt. I think this is a dicey situation, and we may run tools
between now and six months from now that basically wreck the data that
we need to pull for Benedikt.

So, three things:

  1. We need to pick a key to use as a "date bundle was archived to NERSC"
    field for File Catalog records.

  2. We need to run a query that can capture create_date vs
    meta_modify_date (without disturbing meta_modify_date) as the nersc
    archival date.

  3. LTA needs to be updated to start using this field for future bundle
    file records.

I propose making a new date_archived key under the LTA section in FC bundle records.

Won't this require updates to the File Catalog interface itself? I thought "meta_modify_date" was programatically re-calculated on PUSH requests. And I'm not sure we want to override that field, for that same reason.

We're creating a new field specifically to not interfere with meta_modify_date. This shouldn't require any FC API changes.

All old bundles will need to be updated. Probably easiest to do that in mongo directly, since we don't have any bulk patch routes in the API.

I used this query to handle #2 on the list.

var record_count = 0;
db.files.find(
    {
        "logical_name": {
            "$regex": "^/home/projects/icecube"
        },
        "locations.site": {
            "$eq": "NERSC"
        }
    }
).forEach(function(doc) {
    record_count++;
    // get or create the object for the lta key
    var new_lta = {};
    if(doc.lta != null) {
        new_lta = doc.lta;
    }
    // set the date_archived field in our lta object
    new_lta["date_archived"] = doc.meta_modify_date;
    if(doc.create_date != null) {
        new_lta["date_archived"] = doc.create_date;
    }
    // build the query object
    query = {
        "_id": doc._id
    }
    set_lta = {
        "$set": {
            "lta": new_lta
        }
    }
    db.files.update(query, set_lta);
    if((record_count % 350) == 0) {
        print(".");
    }
})