- Combined list requests (Dmitry, Sasha), October 2012 ☑ done
- SSO for list request (Dmitry), October 2012☑ done
- Automatic task splitting (Dmitry), End October - November 2012
- Tasks cloning (Alexei, Valeri, Dmitry), October, 2012
- I/F part is ready. Nov 5, 2012. : http://pandamon.cern.ch/tasks/clonetask
- Task Request I/F parameters checking is in progress
- Running from nightlies (Andrej, Rod), October 2012
- Tag definition I/F. New implementation (Sasha, Dmitry), December 2012
- TR features for Group Production. Hiding unnecessary fields (Nurcan). Not assigned. More info is needed to implement it
- Pile up tasks start up before simulation is done. (Sasha), October 2012 ☑ done
- 1% issue. Implementation is postponed
- Scouts info usage for simulation tasks (Andrej, Rod, Wolfgang), Oct-Nov 2012 ☑ done
- FTK, file naming convention for merging step (Sasha, Graeme), October 2012☑ done
- CPU consumption information taken from TRF (Sasha, Graeme, 'Wuppertal group'), November 2012
- Meta-Language for Task Requests. ☑ done ( GraphML schema chosen - Maxim)
- CAPTCHA in TR I/F (Dmitry), October-November 2012☑ done
- it will be implemented as SSO and CAPTCHA option won't be needed anymore
- Requestor I/F . RIF. Wolfgang, Maxim, Valeri
- RIF specs (Wolfgang, Maxim)
- Twiki from Maxim : https://twiki.cern.ch/twiki/bin/viewauth/Atlas/ProdSys
- mid-Nov : Wolfgang will prepare an initial list of requirements
- Tаsk Request CLI
- postponed until ProdSys II
- Documentation, Savannah, Twiki (Maxim, Dmitry)
- AGIS/PanDA integration (Ale, Alden, AlexeyA)
- "Alden part " December 2012
- End-to-end test Jan 2013
- Production version, Feb 2013
- Task search options (Alexei), October 2012, ☑ done
- Monitoring.
- long running jobs/tasks
- Task progress based on task's submission info
- Failed jobs monitoring (by error type)
- 'Stuck' tasks
- Integration of existing group production monitoring tools with PanDA Classical and Dashboard monitoring (Nurcan, Jarka, Laura, Valeri)
- PanDA classical pages response time (Valeri)
Monday, October 29, 2012
ProdSys splinter meeting (October 2012) action items
Wednesday, October 24, 2012
Notes on Workflow Management
Workflow management is a rich topic in both theoretical and practical sense. There are a large number of concepts and software products which support WM. There is specialization according to the domain, such as managing business workflow vs scientific workflow. There are some shared commonalities across these boundaries, e.g. widely adopted use of XML as a means to describe or capture the state of the system. However, business-oriented WMS do not appear well suited for computational workflow management.
We note YAWL and BEPL as commercial oriented systems, while Kepler was designed to drive scientific workflow applications.
Paper on workflow scheduling on the Grid.
There is a useful Paper on Workflow Patterns. Illustrations below represent two of workflow patterns of interest in our application, among others.
Paper on workflow scheduling on the Grid.
There is a useful Paper on Workflow Patterns. Illustrations below represent two of workflow patterns of interest in our application, among others.
Tuesday, October 23, 2012
Introduction to ATLAS PanDA Production System
October 23, 2012
ATLAS Production System serves an extremely important role of defining jobs for a large part of the workload handled by PanDA. Jobs are defined in large sets that constitute "tasks", and are formulated to fulfill "task requests". Each task has a number of attributes, set in accordance with a particular request. Each task is typically translated into a large number of jobs. The existing Production System consists of a task request interface, a set of scripts that translate tasks into respective jobs, and a few tools for modification of certain parameters of active tasks.
Individual job definitions in the existing system are created based on the task parameters and remain static for the duration of the task execution. Data pertaining to requests, tasks and jobs reside in the database, and operation of the Production System can be described as transforming one object into another, starting with requirements, formulating tasks and then creating a list of jobs for each task, for execution in PanDA.
The Role of ProdSys
The natural unit of workload that is handled by PanDA is a single payload job. Defining the exact nature of the payload, source and destination of data and various other parameters that characterize a job is outside of the scope of core PanDA itself.
ATLAS Production System serves an extremely important role of defining jobs for a large part of the workload handled by PanDA. Jobs are defined in large sets that constitute "tasks", and are formulated to fulfill "task requests". Each task has a number of attributes, set in accordance with a particular request. Each task is typically translated into a large number of jobs. The existing Production System consists of a task request interface, a set of scripts that translate tasks into respective jobs, and a few tools for modification of certain parameters of active tasks.
Individual job definitions in the existing system are created based on the task parameters and remain static for the duration of the task execution. Data pertaining to requests, tasks and jobs reside in the database, and operation of the Production System can be described as transforming one object into another, starting with requirements, formulating tasks and then creating a list of jobs for each task, for execution in PanDA.
Motivations for system evolution
Motivations for evolving the ATLAS production system come from realization that we need to address the following:
- The concept of Meta-Task. Absent in the original product (ProdSys I), it emerged based on operational experience with PanDA and its workflow. It became the central object in the workflow management and must be properly introduced into the system.
- Operator intervention and Meta-Task recovery: there must be adequate opportunities for the operators and managers to direct the Meta-Task processing, be able to start certain steps before others are defined, augment a task, and recover from failures in an optimal way.
- Flexibility of job definition (e.g. making it dynamic as opposed to static once the task is created): there are a number of advantages that we hope can be realized once there is a capability to define jobs dynamically, based on the resources and other conditions present once the task moves into the execution stage
- Maintainability: the code of the existing Production System was written "organically", to actively support emerging requests from users, and starts showing its age
- Scalability: there are issues with the way the interaction between the ProdSys software and the database back-end, which lead to lockup condition of the database when a transaction is handled, and also the issue of general insufficient throughput when inserting tasks and other data into the system
- Ease of use: there is currently a great amount of detail that the end user (Physics Coordination) must define in order to achieve a valid task request. It's desirable to automate the task creation process, whereby cumbersome logic is handled within the application, and the user interface is more concise and transparent.
Subscribe to:
Posts (Atom)