ATLAS Production System Twiki

Join this site and follow this blog to be up to date, or simply subscribe to our RSS feed!

Permanent Documentation Links
Blog Tags: prodsys1, prodsys2

Thursday, November 21, 2013

Notes on Template Based Job Parametrization

1. Datasets

Problem: job templates created by the converter are not really templates, since they contain information that varies from job to job, such as dataset names.

Solution: use a more appropriate information source, i.e. the DEFT dataset DB table, which does have provisions for names and other attributes of the dataset. Use the same "placeholder"/variable approach as with other parameters, and same syntax.

  "jobParameters": [
        {
            "dataset": "${DEFT_DATASET_IN}",
            "param_type": "input",
            "format": "AOD"
            "type": "template",
            "value": "inputAODFile=${IN}"
        },
        {
            "type": "constant",
            "value": "maxEvents=1000 RunNumber=213816 autoConfiguration=everything preExec=\"from BTagging.BTaggingFlags import BTaggingFlags;BTaggingFlags.CalibrationTag=\"BTagCalibALL-07-02\"\""
        },
        {
            "attribute": "repeat,nosplit",
            "dataset": "${DEFT_DATASET_IN}",
            "param_type": "input",
            "flavor": "dbrelease",
            "type": "template",
            "value": "DBRelease=${DBR}"
        },
        {
            "type": "constant",
            "value": "AMITag=p1462"
        },
        {
            "dataset": "${DEFT_OUTPUT}",
            "param_type": "output",
            "flavor": "pool",
            "format": "root",
            "token": "ATLASDATADISK",
            "type": "template",
            "value": "${SN}"
        }
    ]

2. TRF

The following nomenclature is followed:
  • "TRANSUSES" - defines the base release of ATLAS software to be used by the transform
  • "TRANSHOME" - the cache release, which effectively overlays the base release
  • "TRANSPATH" - simply the path (pretty much the filename) of the transformation script
Action item in Nov.2013 - the JEDI-alpha "template" has this hardcoded (similar to the dataset case) so this needs to be changed. These are in fact proper attributes in the DEFT_TASK table and JEDI can easily obtain this information, as opposed to consuming a prefab string.


3. Architecture, corecount and other attributes

I observed that there are a few other parameters that are parsed from JSON (in the mid-November version of JEDI) and inserted as proper columns into the JEDI_TASKS table. It obviously makes sense to augment the DEFT schemas accordingly for consistency and to save a little JSON, and enable searches (e.g. on architecture).

Other examples: VO, Working Group, cloud.

4. Summary of attributes to be read by JEDI from the DEFT tables

For backward compatibility, I propose the following:
  • JEDI attempts to locate the usual attributes (corecount, architecture etc) in the DEFT table, for each task
  • If such attribute is not found, JEDI takes these values from the parsed JSON data
This way the "alpha/converter" functionality will still work, while a proper DEFT schema becomes possible.

In summary, the following parameters have been refactored from JSON into RDBMS:
  • dataset, along with its format and "flavor"
  • TRANS*
  • Architecture
  • Corecount
  • VO
  • Working Group
  • Cloud
Run number also needs to be added for consistency.

5. More on Datasets

name
offset

Task ID will be read by JEDI.


No comments:

Post a Comment