Atlas Grid Production System Development (ProdSys II): Notes on Template Based Job Parametrization

1. Datasets

Problem: job templates created by the converter are not really templates, since they contain information that varies from job to job, such as dataset names.

Solution: use a more appropriate information source, i.e. the DEFT dataset DB table, which does have provisions for names and other attributes of the dataset. Use the same "placeholder"/variable approach as with other parameters, and same syntax.

"jobParameters": [
        {
            "dataset": "${DEFT_DATASET_IN}",
            "param_type": "input",
            "format": "AOD"
            "type": "template",
            "value": "inputAODFile=${IN}"
        },
        {
            "type": "constant",
            "value": "maxEvents=1000 RunNumber=213816 autoConfiguration=everything preExec=\"from BTagging.BTaggingFlags import BTaggingFlags;BTaggingFlags.CalibrationTag=\"BTagCalibALL-07-02\"\""
        },
        {
            "attribute": "repeat,nosplit",
            "dataset": "${DEFT_DATASET_IN}",
            "param_type": "input",
            "flavor": "dbrelease",
            "type": "template",
            "value": "DBRelease=${DBR}"
        },
        {
            "type": "constant",
            "value": "AMITag=p1462"
        },
        {
            "dataset": "${DEFT_OUTPUT}",
            "param_type": "output",
            "flavor": "pool",
            "format": "root",
            "token": "ATLASDATADISK",
            "type": "template",
            "value": "${SN}"
        }
    ]

2. TRF

The following nomenclature is followed:

"TRANSUSES" - defines the base release of ATLAS software to be used by the transform
"TRANSHOME" - the cache release, which effectively overlays the base release
"TRANSPATH" - simply the path (pretty much the filename) of the transformation script

Action item in Nov.2013 - the JEDI-alpha "template" has this hardcoded (similar to the dataset case) so this needs to be changed. These are in fact proper attributes in the DEFT_TASK table and JEDI can easily obtain this information, as opposed to consuming a prefab string.

3. Architecture, corecount and other attributes

I observed that there are a few other parameters that are parsed from JSON (in the mid-November version of JEDI) and inserted as proper columns into the JEDI_TASKS table. It obviously makes sense to augment the DEFT schemas accordingly for consistency and to save a little JSON, and enable searches (e.g. on architecture).

Other examples: VO, Working Group, cloud.

4. Summary of attributes to be read by JEDI from the DEFT tables

For backward compatibility, I propose the following:

JEDI attempts to locate the usual attributes (corecount, architecture etc) in the DEFT table, for each task
If such attribute is not found, JEDI takes these values from the parsed JSON data

This way the "alpha/converter" functionality will still work, while a proper DEFT schema becomes possible.

In summary, the following parameters have been refactored from JSON into RDBMS:

dataset, along with its format and "flavor"
TRANS*
Architecture
Corecount
VO
Working Group
Cloud

Run number also needs to be added for consistency.

5. More on Datasets

name
offset

Task ID will be read by JEDI.

Atlas Grid Production System Development (ProdSys II)

ATLAS Production System Twiki

Thursday, November 21, 2013

Notes on Template Based Job Parametrization

1. Datasets

2. TRF

3. Architecture, corecount and other attributes

4. Summary of attributes to be read by JEDI from the DEFT tables

5. More on Datasets

No comments:

Post a Comment