XENON1T / cax

Simple data management tool

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Modify cax for reprocessing in Stockholm

XeBoris opened this issue · comments

I started to modify cax for data reprocessing in Stockholm to have an alternative to Midway. (branch: pax_process_stockholm). The idea is to run eg. cax --once --ProcessAt tegner-login-1.
The modification is almost done but I still get some error if I try to run cax now.

In process.py, line 223 it checks for the data base entries:

# Check if processed data already exists in DB (!!!! --> FAILS)
if datum['type'] == 'processed':
    for ivers in range(len(versions)):
        if versions[ivers] == datum['pax_version'] and get_pax_hash(versions[ivers], processat) == datum['pax_hash']:
                        have_processed[ivers] = True

If this condition is true it asks for datum['pax_version'] which does not exists for my understanding. Anyone has an idea how this can be solved?

I think the copying forgot to set the pax version for the new site.  pax version is added during processing, so there may be a copy bug.

That said, I'd suggest just making a cax --process command.  This could take as an argument a run number to process. (Did we think about how we'll handle having two processings with same pax version?  maybe we should have a random number tag generated at each new processing)

In general for automatic running, I assume you want cax at Teger to have the following behavior:

Grab raw data that you can
Grab processed data that you can
If have raw but not processed (i.e. no processed data available to you), then process.

From: Boris Bauermeister notifications@github.com
Reply: XENON1T/cax reply@reply.github.com
Date: 6 May 2016 at 16:18:53
To: XENON1T/cax cax@noreply.github.com
Subject:  [XENON1T/cax] Modify cax for reprocessing in Stockholm (#12)

I started to modify cax for data reprocessing in Stockholm to have an alternative to Midway. (branch: pax_process_stockholm). The idea is to run eg. cax --once --ProcessAt tegner-login-1.
The modification is almost done but I still get some error if I try to run cax now.

In process.py, line 223 it checks for the data base entries:

Check if processed data already exists in DB (!!!! --> FAILS)

if datum['type'] == 'processed':
for ivers in range(len(versions)):
if versions[ivers] == datum['pax_version'] and get_pax_hash(versions[ivers], processat) == datum['pax_hash']:
have_processed[ivers] = True

If this condition is true it asks for datum['pax_version'] which does not exists for my understanding. Anyone has an idea how this can be solved?


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub

Why switch to command line option now? Should we stick with the host name identification to be consistent with what we've been doing with every other task? i.e. just switching the submission script/command inside process.py depending on what host you're running on, as well as the existing processed file synchronization Chris described (for all sites).

Also, I forgot to mention: I was thinking we work on one branch: https://github.com/XENON1T/cax/tree/pax_version_process
which now takes care of using multiple pax versions or same pax head with different hash. I don't think we need multiple copies of the same version, except if you want to validate between sites, in which case you could just append the site name instead of a random number.

@pdeperio: There is no need to switch, true! I had in mind to select the host via the command line but this is not mandatory as you already pointed out.
I will update my changes (regarding the process.py) to the branch pax_version_process.
@tunnell:

That said, I'd suggest just making a cax --process command. This could take as an argument a run number to process. (Did we think about how we'll handle having two processings with same pax version? maybe we should have a random number tag generated at each new processing) In general for automatic running, I assume you want cax at Teger to have the following behavior: Grab raw data that you can Grab processed data that you can If have raw but not processed (i.e. no processed data available to you), then process

This is what I want in general. I also like the idea of taking a run number as an argument for the reprocessing. This allows to reprocess a specific data again very easily in case something went wrong. Without a run number as argument the process.py reprocess the raw data which are not yet reprocessed.

I updated pax_version_process with some extensions for reprocessing in Stockholm OR Midway. Reprocessing in Stockholm does not yet work.

The master now includes the latest work on this. @XeBoris to test and modify if necessary.

Let's focus on getting existing data at stockholm before worrying about processing?

From: Patrick de Perio notifications@github.com
Reply: XENON1T/cax reply@reply.github.com
Date: 21 May 2016 at 20:37:08
To: XENON1T/cax cax@noreply.github.com
CC: Christopher Tunnell ctunnell@nikhef.nl, Mention mention@noreply.github.com
Subject:  Re: [XENON1T/cax] Modify cax for reprocessing in Stockholm (#12)

The master now includes the latest work on this. @XeBoris to test and modify if necessary.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

done?