ATLAS Production System Twiki

Join this site and follow this blog to be up to date, or simply subscribe to our RSS feed!

Permanent Documentation Links
Blog Tags: prodsys1, prodsys2

Tuesday, December 3, 2013

DEFT Development in December 2013

The "Manager Request" system:
  •  The "REQ" application: implemented DB schemas based on version V104 of the document circulated earlier
  • Created a view for that in the DEFT UI
  • Created a CLI utility to upload the requests

Thursday, November 21, 2013

Notes on Template Based Job Parametrization

1. Datasets

Problem: job templates created by the converter are not really templates, since they contain information that varies from job to job, such as dataset names.

Solution: use a more appropriate information source, i.e. the DEFT dataset DB table, which does have provisions for names and other attributes of the dataset. Use the same "placeholder"/variable approach as with other parameters, and same syntax.

  "jobParameters": [
        {
            "dataset": "${DEFT_DATASET_IN}",
            "param_type": "input",
            "format": "AOD"
            "type": "template",
            "value": "inputAODFile=${IN}"
        },
        {
            "type": "constant",
            "value": "maxEvents=1000 RunNumber=213816 autoConfiguration=everything preExec=\"from BTagging.BTaggingFlags import BTaggingFlags;BTaggingFlags.CalibrationTag=\"BTagCalibALL-07-02\"\""
        },
        {
            "attribute": "repeat,nosplit",
            "dataset": "${DEFT_DATASET_IN}",
            "param_type": "input",
            "flavor": "dbrelease",
            "type": "template",
            "value": "DBRelease=${DBR}"
        },
        {
            "type": "constant",
            "value": "AMITag=p1462"
        },
        {
            "dataset": "${DEFT_OUTPUT}",
            "param_type": "output",
            "flavor": "pool",
            "format": "root",
            "token": "ATLASDATADISK",
            "type": "template",
            "value": "${SN}"
        }
    ]

2. TRF

The following nomenclature is followed:
  • "TRANSUSES" - defines the base release of ATLAS software to be used by the transform
  • "TRANSHOME" - the cache release, which effectively overlays the base release
  • "TRANSPATH" - simply the path (pretty much the filename) of the transformation script
Action item in Nov.2013 - the JEDI-alpha "template" has this hardcoded (similar to the dataset case) so this needs to be changed. These are in fact proper attributes in the DEFT_TASK table and JEDI can easily obtain this information, as opposed to consuming a prefab string.


3. Architecture, corecount and other attributes

I observed that there are a few other parameters that are parsed from JSON (in the mid-November version of JEDI) and inserted as proper columns into the JEDI_TASKS table. It obviously makes sense to augment the DEFT schemas accordingly for consistency and to save a little JSON, and enable searches (e.g. on architecture).

Other examples: VO, Working Group, cloud.

4. Summary of attributes to be read by JEDI from the DEFT tables

For backward compatibility, I propose the following:
  • JEDI attempts to locate the usual attributes (corecount, architecture etc) in the DEFT table, for each task
  • If such attribute is not found, JEDI takes these values from the parsed JSON data
This way the "alpha/converter" functionality will still work, while a proper DEFT schema becomes possible.

In summary, the following parameters have been refactored from JSON into RDBMS:
  • dataset, along with its format and "flavor"
  • TRANS*
  • Architecture
  • Corecount
  • VO
  • Working Group
  • Cloud
Run number also needs to be added for consistency.

5. More on Datasets

name
offset

Task ID will be read by JEDI.


Sunday, November 3, 2013

DEFT Development in November 2013

Development after 11/8:
  • Corrected the dataset object schema (will need further tweaking)
  • Added the dataset view
  • Improved the "developer's editor", added controls to delete datasets from the DB 
  • Corrected deft-cli to catch up with the schema changes
  • Worked out an improved version of the task template (parametrization of the dataset and similar info)

Summary 11/8/2013:
  • First integration test worked, i.e. JEDI picked up the test task from DEFT, and put it into its own queue
DEFT UI:
  • Added a stub for XML input to the template library page in the UI. The idea is to reuse the DEFT-CLI functionality to inject templates from XML source at will.
  •  Added PRODSYS_COMM interface to the UI
  • Cleaned up a few pages (removed redundant columns etc).
  • Testing form-based editor (works for all important fields)
  • Ensured non-editable fields
DEFT CORE:
  • Corrected the PRODSYS_COMM schema which was simple but obsolete
    • new input from Tadashi
    • need to add the "recipient" column for cleaner logic
  • Started using "real" job templates in task templates
  • Added PRODSYS_COMM interface to CLI
  • TWiki updated
  • Small bug fixes 
  • Added the new DEFT_JOB_TEMPLATE table to handle job templates. TBD with Sasha, Dmitry and others.

Tuesday, October 29, 2013

DEFT Development in October 2013

A functioning prototype of the DEFT UI has been created and is run continuously under Apache at CERN. The most recent version is available on a development port, which is distinct to one which is world-visible. Screenshots were presented at the ATLAS S&C week at CERN (Oct.21-15 2013).

As of October, the functionality includes views of Meta-Tasks, Templates and Tasks. It is possible to edit any attribute of most objects in the system, and more importantly to clone a Meta-Task from a Template.

Report for the last week of October:

a) Refactored pagination into a separate code unit for brevity
b) More compact display of timestamps
c) Added JIRA links, at this point as the "browse" links, will look into restful api - this is based on discussions with Nurcan and others
d) Based on user comments, removed cloud and site from Meta-Task request as less important (and maybe superfluous) bit of information
e) in addition to "developer's editor", created a form for editing tasks, will finalize soon



Friday, July 5, 2013

Converter for JEDI-alpha: AKTR->JEDI

Converter for JEDI-alpha: AKTR->JEDI

https://indico.cern.ch/getFile.py/access?resId=0&materialId=slides&contribId=28&sessionId=6&subContId=2&confId=210657
  • Implemented non crones version of ProdSys
    • Calling as program, on demand
    • No writing into ProdSys tables
    • Multiple instances support
  • Caching support in Google Drive API
  • Support of non-direct URLs
  • Filling TASK_REQUEST
  • TASK_REQUEST.COMMENT_ includes link to file for edit (in Google Docs)
  • Special flag TASK_REQUEST.GRID = "jedi@cern" ("ProdSys 2 processing")
...
 

Friday, June 21, 2013

June-July 2013: ProdSys II progress report (Maxim)

June 2013: S&C week at CERN:
  • presented status and plans for DEFT
  • staged approach to commissioning in late 2013
  • Web UI can be developed in parallel, facilitates team effort
Meetings at CERN:
  • Nurcan and Johannes: requirements for the Analysis application of DEFT
  • No fundamental difference between production and analysis
  • More details in previous entries in this blog
  • Baranov, Sargsyan, Potekhin, Klimentov, Stradling: machine provisioning, software installation and general setup of the Web service for the DEFT UI
Development:
  • Bug fixes and additional functionality in deft-core (both in CLI and Oracle interface), meta-task storage
  • DB Insertion performance testing
  • Initial test drive of Django 1.4 and some of the new semantics since 0.96 which was used in previous apps
  • Import of DEFT schemas into Django ORM and validation of database access from the data model (with Oracle RDBMS); created HTML templates for initial dev effort
  • Integration of task numbering in JEDI: tested with ATLAS_PANDA.DEFT_TASK_SEQ under ATLAS_PANDA, and also PRODSYS2_TASK_ID_SEQ (the DEFT/JEDI seq.)
  • Changed schema to better handle time stamps
  • Added attributes to improve the schema
  • DEFT Web UI: added JSON serialization to the app
  • Tested additional Python modules installed on voatlas270 in order to enable running deft-core on that machine
Documentation:
  • Created and maintained a TWiki  page for the DEFT Web UI.
  • Documented the DEFT meetings
  • Coordination with  A.Petrosyan: since JEDI monitoring is not in the coding stage yet, it makes sense to integrate the effort, and benefit from easier cross-reference of the data.

Monday, June 17, 2013

Event server twiki page

There is now an ATLAS twiki page for the event server scheme described in the earlier post by Tadashi
https://twiki.cern.ch/twiki/bin/viewauth/Atlas/EventServer
  - Torre

Friday, June 14, 2013

Kick-off meeting for managing the Analysis applications in DEFT, and the Web interface for that process

On June 13th, 2013 there was a meeting at CERN during which we discussed the initial requirements and parameters of the project aimed at supporting the analysis workflows in Prodsys2.

Present: J.Elmsheuser, N.Ozturk, M.Potekhin, A.Stradling

The scope of the items presented and discussed was as follows:
  • description of the updated analysis model which involves "slimming" and "skimming"
  • requirement for the user interface that would be optimal to support this particular mode of processing
  • discussion of whether the database schemas being developed in Prodsys2 in the context of managed production can be extended and reused to cover the use cases presented
  • itemization of purely technical issues that already are on our plate and which will need to be resolved very soon
  • exploration of security, auth/auth and access policies
  • characterization of the data elements present in the analysis stream as being similar to what's used in production, i.e. essentially relying on same dataset infrastructure and nomenclature
  • evaluation of Django as a candidate platform for the Web service, based on the experience of the project participants
  • usefulness of the XML format adopted for Meta-Task description in DEF
  • the urgency of setting up a dedicated machine, to cover the needs of the project at CERN
  • general timeline of the project
Consensus was reached and plans were made accordingly, in particular:
  • the timeline of this project closely matches what was planned for "vanilla" DEFT commissioning
  •  the starting point for development, in terms of the platform, will be Django
  • authentication and access policies will be implemented by mapping identities obtained from CERN SSO and the encrypted DN from X509 certificate, to the user table
  • the UI will provide ready capabilities for using templates and reusing typical tasks, with providing easy to use templates and settings based on the working group attribution (context-sensitive autoimation)
Separately, it was decided that the software development team will consist of the following personnel:
  • M.Potekhin (project lead)
  • A.Vaniachine (conceptual design, commissioning and QA)
  • D.Golubkov (Web service design and coding)
  • A.Stradling (technical design + module development)
  • L.Sargsyan (Web service design and coding)
  • S.Baranov (System Administrator, tech support and issue tracking, commissioning and QA)
The immediate deliverables were agreed upon:
  • a Python module encapsulating the dataset naming logic according to the official nomenclature. To be done by A.Stradling, ETA end of June
  • setting up a Web server with all components to support Django and a proper Apache configuration. To be done by S.Baranov and M.Potekhin, ETA June 22nd
  • Having a service running to enable port scans etc, M.Potekhin, ETA end of June
  • Prototype of a dataset registration service (functionality still not factored out of AKTR), M.Potekhin, ETA mid-July
  • Updating DEFT schemas to support the Analysis workflow, D.Golubkov, ETA mid-July
According to the personnel breakdown presented above, the bulk of the Wev development will be done by D.Golubkov and L.Sargsyan.

June 18th update:
Laura and Dmitry started work on the initial task display module for the UI
July 3rd update:
Django prototype ready (with simplified schemas) for Tasks and Meta-Tasks
Dev server running, Apache TBD


Tuesday, April 9, 2013

Plans for JEDI-α

Note posted in June 2013: since this post was originated and updated, the activity which it describes led to a successful commissioning of JEDI-α. Work is still under way to tune the database schemas and to complete functional testing.


This is a schematic description of flow of logic JEDI-α and how it compares to Vanilla ProdSys I:

Sequence:
 
Vanilla  [1] "Task Definition"  AKTR tables   [2] ProdSys tables   [3] ProdSys1 sequence 
α        -       -  [2'] DEfT tables [3'] JEDI sequence

Description:
  • [1] In the Task Definition object a flag will be added, to differentiate ProdSys1/ProdSys2. If the flag is set to "ProdSys2" then DEfT tables will be filled by the system
  • [2'] Tables (as defined in https://twiki.cern.ch/twiki/bin/viewauth/Atlas/WorkFlow) will be filled by Dmitry
  • [3'] JEDI-alpha will take info from the above DEfT tables
Upon implementation, we will be able to debug ProdSys II w/o disturbing the current system and even run production with both systems if needed.

Please comment.

Question: who will register the output dataset(s) in [2']?
In principle it will be transparent for current datasets/containers registration.

Probably ProdSys2 tasks can have a special state and in this case it will be easily seen in monitoring.

Comment from MP: this could be a "type" or "origin" column in the task. Different "state" can make things more complex.

Tuesday, April 2, 2013

April-May 2013: ProdSys II progress report (Maxim)

  • Documentation work:
    • Cleanup and updates of documentation (ADC page, blog, TWiki)
    • Created most of the ProdSys TDR as well as subsequent updates. Presented at the Technical Interchange Meeting.
    • Finalized the TDR based on additional materials sent in (Alexei, Kaushik, Torre)
    • Major edits to PanDA LDRD
  • Design and Development:
    • "Communication" table created to serve as the command passing medium between DEFT and JEDI
    • "Task parameter" object added as a CLOB
    • Core DEFT coding:
      • instrumented deft-cli to traverse the meta-task graph and process it repeatedly, in order to provide a ready and reproducible way to benchmark and measure performance.
      • add the "task parameters" capability, a necessary feature. Translates into a CLOB in Oracle. JSON format chosen. Documentation updated.
  • JEDI-alpha:
    • Communicated with Tadashi and Dmitry to ensure consistency in schemas and APIs while working on DEFT and JEDI-alpha
    • Improved documentation for DEFT/JEDI Interaction
  • "DEFT-alpha" (codename):
    • as the next iteration of the integration test, and based on the successful test of JEDI-alpha, implementing data entry in DEFT database to be injected into JEDI for "standard" processing
  • Testing:
    • In a first performance test, a simple 3-stage meta-task traversal (w/o DDM or any other such external system interaction) yielded these numbers: 9 seconds of elapsed time on an interactive node, to process 1000 meta-tasks. Conclusion - the performance of the core DEFT logic won't be the limiting factor in determining scalability of the system, as the likely latencies due to graph traversal seem to be acceptable. Keep in mind that the graph traversal processes can be easily farmed, to provide sufficient scaling.
  • Common Analysis Platform:
    • Participated in the CAP Meeting on 5/2/2013

Thursday, March 14, 2013

March 2013. Updated List of requirements (ProdSys SW development).

This is an updated list, the previous list can be found using link below

http://prodsys.blogspot.ch/2012/10/prodsys-splinter-meeting-october-2012.html


  1. AP transient datasets deletion
       Alexei before May 1st
         Mar 30.  version for testing is ready. Reported to MC Coordination
  2. 'clone' tasks use-cases
       Valeri, Dmitry (Wolfgang for validation and testing)
  3. fair share and priority policy
        Kaushik, Tadashi (Rod, Kaushik for testing)
  4. New G4 TRF integration (March-August)
       Sasha, Dmitry (Wolfgang, Jose for validation and testing)
   4.1 StoppedParticleG4_tf.py ☑ done March 27, 2013
   4.2 ISF Simulation: Sim_tf.py ☑ done April 11, 2013
   4.3 Simulation:
    4.3.1 AtlasG4_tf.py ☑ done April 11, 2013
    4.3.2 HITSMerge_tf.py ☑ done
    4.3.3 FilterHit_tf.py ☑ done
   4.4 Reconstruction:
    4.4.1 Reco_tf.py ☑ done
    4.4.2 Digi_tf.py ☑ done
    4.4.3 AODMerge_tf.py ☑ done
    4.4.4 ESDMerge_tf.py ☑ done
   4.5 Overlay:
    4.5.1. RAWOverlayFilter_trf.py ☑ done
    4.5.2. BSOverlayFilter_trf.py
5. log files archiving (TBD)
      Simone for technical specs
  6. For prodsysII: If a task has more than one output dataset, destination should be configurable per dataset. For example AOD datasets should go to DATADISK (default), RDO/ESD datasets should be replicated to group space. log files should go to DATADISK (default).  
    MORE INFO IS NEEDED. PLEASE DO NOT POST RANDOM REQUIREMENTS W/O DISCUSSING IT FIRST. 
  7. Implementation of the FTK emulation in the production system (March)
      High Priority for Trigger TDRs in April and September 2013.
      MC samples with FTK simulation are needed.
   7.1 Skim silicon data (1 RDO event -> 1 FTK input event) ☑ done
   7.2 FTK emulation for each FTK input event split into 64 tower regions ☑ done
     a) Emulate FTK response with TrigFTKSim_tf.py for each region split into four subregions
     b) Merge every four subregions into one tower region with TrigFTKMerge_tf.py
   7.3 Merge 64 regions into 1 FTK event with TrigFTKMerge_tf.py  ☑ done
   7.4 Combine same RDO and FTK events for reconstruction  ☑ done
  8. Integrate new FTK transformations (April)
      Sasha, Dmitry (Wolfgang for validation and testing)
   8.1 TrigFTKSM4_tf.py ☑ done
   8.2 TrigFTKMergeReco_tf.py ☑ done
  9. Provide RW needed for dynamic task brokerage (TBD)
      Sasha
10. Support for event counting in MC and GP (TBD)
      Sasha

Friday, March 8, 2013

DEFT/JEDI Communication Redux

General Notes on Communication

There has been progress in the design of the JEDI database schemas, documented in a separate section of the JEDI Twiki. Among a few other detail, there is a "COMMAND" column in the task table. This is a reminder that the DB is acting as the point of interaction and effectively an asynchronous messaging medium for DEFT and JEDI. Both are allowed to post requests to each other. Human operators are also capable to post requests to either of these systems, under certain conditions.

We reiterate what was previously stated with regards to DEFT/JEDI interaction:
  • both components periodically do a database sweep, i.e. the operation is 100% asynchronous and there are no in-memory processes that are bound to a specific Panda Task or job
  • the database is the medium for DEFT/JEDI communication
  • JEDI never creates or deletes tasks or modifies the meta-task topology otherwise. It can post a request to DEFT to get this done. Thus, the important functionality remains withing one unit of code and is manageable. This sounds simple but it's fundamental for the system viability.
  • examples for the previous item in action involve live task augmentation, and a related subject of handling the missing file issue. In the latter case, an additional task is added to the Meta-Task to make up for the lost statistics. Such request is formulated by JEDI, picked up by DEFT and is translated into a task, which is in turned picked up by JEDI.
After some consideration, we arrived to a solution which has a separate table to store the semaphores/commands

Task Parameters

Task parameters are a necessary attribute of the task object. They are essentially schema-free tuples of strings (could have a different implementation, but that's the reality). We will use CLOB to store these in Oracle DB, probably in a separate table to not impact performance.

In the context of JEDI-alpha, a good choice (agreed by most) is using JSON as the format for storage and inter-process communication.


Tuesday, February 26, 2013

Curious case of related datasets

Going through the use cases for DEFT/JEDI, I have noticed that there is a feature implicit in how the current workflow works, and not well understood by many people (including myself). It's the mechanisms that "bundles together" the datasets that are related in the processing logic, i.e. a PanDA task may produce datasets B,C based on an input dataset A, and datasets D,E based on another dataset F. Since there is no concept of Meta-Task in ProdSys I/PanDA, the inter-task chains of dependencies among the datasets are maintained semi-manually. The situation is represented by the simplified diagram below:

In this diagram, different colors represent different logical connection between datasets, for example DS4 is produced by task T3 to result in creation of the dataset DS7 (also colored red). Same applies to "blue" datasets in the same diagram. For completeness' sake, we also have the dataset DS5 there, which is thought to be the result of processing both "blue" and "red" input datasets.

At first glance, this presents a complication for the graph model employed in ProdSys II, since it introduces dependencies involving datasets, already modeled as edges in the Meta-Task graph. One approach would be to create a conjugate graph where the edges (datasets) become nodes and their connections (tasks) would be edges (the conjugate graph is also know as line graph, in graph theory). Then there is an option to store such graph in a separate GraphML file, or store it in the same file which will result in two disconnected graphs created upon parsing.

Either solution creates its own set of problems, and both share the necessity for crafty logic which is required to keep referential integrity when manipulating Meta-Tasks graphs, in operations like template processing etc.

Alternatively, one may try to implement an equivalent of a "port" in the task node. Unfortunately, this feature is not supported in the parser which comes with NetworkX. While it's always possible to roll our own, this likely negates one of the advantages of using this package.

A possible solution to this dilemma is to go back to the basic definition of the term "task", which originally was thought of as a unit of processing with one dataset coming in, and one dataset being output. If we lift the restriction on the number of datasets in that definition but introduce  "conservation of color", i.e. define the "task" as a processing unit which only operates on related "sets of datasets", we end up with essentially a partitioned but connected graph.

This actually amounts to a relatively straightforward reformulation of the graph presented above, with creation of additional nodes. For example, the picture above is transformed into the following:

 The new feature in this graph is that different tasks can have dependencies on same group of datasets, such as DS2 and DS3. What allows this to work in practice is that while the edges representing these different part of the graph are distinct in the graph description, they refer to the same datasets whose state is managed by JEDI.

Note that the "prime" tasks in this diagram share all or most of their attributes with the original ones, and only differ by the input and output, however this is taken care of in the graph itself.



Monday, February 11, 2013

February-March 2013: ProdSys II progress report (Maxim)


Documentation work:
  • This blog: created tags "prodsys1" and "prodsys2" for better search capability.
  • Created a common navigation header (bar) that can be included in all ProdSys TWiki pages.
  • References to DEFT and further details added to documentation on the ProdSys pages.
  • Corrections in DEFT/JEDI interface description as per Tadashi's comments.
  • Prepared Abstract for the ProdSys paper (CHEP). Abstract approved by ATLAS and submitted.
  • Presentation for the CMS/ATLAS Common Analysis Platform on 2/28/2013:
    • Based on the announcement on 2/14 of a CMS development largely parallel to what we do in ATLAS
    • Potential redundancy, under-utilization of PanDA capability, suboptimal database load
    • Clear potential for common development and platform
  • Presentation for the ATLAS Software and Computing Workshop, March 11-15 2013
  • Meeting with Wolfgang to discuss progress and requirements
Development:
  • DEFT prototype: functionality complete
  • SVN project created, code checked in
    • Continuous updates and checkpoints
    • Naming of the SVN tree as per Tadashi's comments
  • Tested database schemas for the Task, Dataset and Meta-Task objects.
  • Extensive refactoring and rewrite of the main code unit due to lots of new functionality and increased complexity, the application has become a simple CLI driver for underlying classes.
  • Dedicated test of the code state-switching functionality
  • Improvements in logging functionality, Logger class created based on standard Python package
  • Started work on the Dependency Model for datasets

Monday, January 28, 2013

Deft v1.0 - the prototype - is in SVN now

The prototype of the Deft system is now in a stage where it has a minimal but complete set of functionality as a standalone system. Integration with Jedi/PanDA will be achieved in a future version. The v1.0 designation should not be taken too seriously, since it's only significance is that we have a working baseline prototype, and not a pre-production version.

The cornerstone of the Deft design is its reliance on the graph model to handle Meta-Tasks and their components. This approach is prevalent in computer science and industry and there is no good reason to not follow it. In addition, an effort was made to use existing and proven software components in order to minimize the amount of application-specific code and significantly improve maintainability. Specifically, we use NetworkX as the graph engine, and the Workflow component of PyUtilib as the state machine.

Right now, Deft  has the following capabilities:
  • Parsing Meta-Task definition supplied by the user in industry-standard GraphML format. The latter is a portable XML schema, developed for graph description and modeling, and supported in a number of current applications which can be used to visualize, explore and edit the meta-task structure.
  • Building Meta-Tasks based on templates, and likewise clone Meta-Tasks based on previously existing GraphML descriptions. This was one of the crucial requirements put forth by Wolfgang. 
  • Analysis of the Meta-Task and state transitions of its components, i.e. individual tasks.
  • Tasks can be set to "armed" or "disarmed" state which will either enable or disable automated transition to the next state during processing by Deft.
  • Integration with Oracle: implementation of the Adjacency Table technique to store Graph information in RDBMS.
  • Interoperability between GraphML (XML) and RDBMS storage, whereby the data can be sourced from, and written to either.
The DEFT suite of Python modules and supporting materials is now in CERN SVN .