Published on October 17, 2007
Workflows and Schedulingin Grids : Workflows and Scheduling in Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID – Summer School Budapest, 05 September 2007 CoreGRID RMS Institute Objective: CoreGRID RMS Institute Objective Objectives: Development of a common and generic solution for Grid resource management/scheduling in Next Generation Grids. Development of new algorithms for coordinated scheduling for all resource types, including data, network etc. Support of Grid business models in the scheduling process Goal: linking theoretical foundation and practical implementation on the different level of Resource Management Current Institute Roadmap: Inst. RMS Current Institute Roadmap Participants: Participants CETIC, Belgium IPP-BAS, Bulgaria CNR-ISTI, Italy CNRS, France Delft University, Netherlands EPFL, Switzerland Fraunhofer Gesellschaft, Germany Research Center Jülich, Germany PSNC, Poland MTA SZTAKI, Hungary University of Münster, Germany University of Calabria, Italy University of Cyprus University of Dortmund, Germany University of Manchester, UK EAI-FR, Switzerland University of Westminster, UK Technical University of Catalonia, Spain Zuse Institute Berlin, Germany University of Innsbruck, Austria 20 participating institutes; 89 researchers Grid Scheduling: Grid Scheduling Key Question: Key Question “Which services/resources to use for an activity, when, where, how?” Typically: A particular user, or business application, or component application needs for an activity one or several services/resources under given constraints Trust & Security Timing & Economics Functionality & Service level Application-specifics & Inter-dependencies Scheduling and Access Policies This question has to be answered in an automatic, efficient, and reliable way. Part of the invisible and smart infrastructure! Motivation: Motivation Resource Management for Future/Next Generation Grids! But what are Future Generation Grids? depends on who you ask! Resource Definition: Resource Definition Concluding from the different interpretations of “Grid”: for broad acceptance Grid RMS should probably cover the whole scope; Resources: Compute Network Storage Data Software components, licenses Services functionality, ability Resource Management Layer: Resource Management Layer Grid Resource Management System consists of : Local resource management system (Resource Layer) Basic resource management unit Provide a standard interface for using remote resources e.g. GRAM, etc. Global resource management system (Collective Layer) Coordinate all Local resource management system within multiple or distributed Virtual Organizations (VOs) Provide high-level functionalities to efficiently use all of resources Job Submission Resource Discovery and Selection Scheduling Co-allocation Job Monitoring, etc. e.g. Meta-scheduler, Resource Broker, etc. Grid RMS: Resource Broker Grid Middleware Higher-Level Services User/ Application Grid RMS Grid Scheduling: Grid Scheduling Scheduler Schedule time Job-Queue Machine 1 Scheduler Schedule time Job-Queue Machine 2 Scheduler Schedule time Job-Queue Machine 3 Grid-Scheduler Grid User Select a Resource for Execution: Select a Resource for Execution Most systems do not provide advance information about future job execution user information not accurate as mentioned before new jobs arrive that may surpass current queue entries due to higher priority Grid scheduler might consider current queue situation, however this does not give reliable information for future executions: A job may wait long in a short queue while it would have been executed earlier on another system. Available information: Grid information service gives the state of the resources and possibly authorization information Prediction heuristics: estimate job’s wait time for a given resource, based on the current state and the job’s requirements. Co-allocation: Co-allocation It is often requested that several resources are used for a single job. that is, a scheduler has to assure that all resources are available when needed. in parallel (e.g. visualization and processing) with time dependencies (e.g. a workflow) The task is especially difficult if the resources belong to different administrative domains. The actual allocation time must be known for co-allocation or the different local resource management systems must synchronize each other (wait for availability of all resources) Example Multi-Site Job Execution: Example Multi-Site Job Execution A job uses several resources at different sites in parallel. Network communication is an issue. Grid-Scheduler Multi-Side Job Advanced Reservation: Advanced Reservation Co-allocation and other applications require a priori information about the precise resource availability With the concept of advanced reservation, the resource provider guarantees a specified resource allocation. includes a two- or three-phase commit for agreeing on the reservation Implementations: GARA/DUROC/SNAP provide interfaces for Globus to create advanced reservation implementations for network QoS available. setup of a dedicated bandwidth between endpoints “WS-Agreement” defines a protocol for agreement management Using Service Level Agreements: Using Service Level Agreements The mapping of jobs to resources can be abstracted using the concept of Service Level Agreement (SLAs) SLA: Contract negotiated between resource provider, e.g. local scheduler resource consumer, e.g., grid scheduler, application SLAs provide a uniform approach for the client to specify resource and QoS requirements, while hiding from the client details about the resources, such as queue names and current workload Execution Alternatives: Execution Alternatives Time sharing: The local scheduler starts multiple processes per physical CPU with the goal of increasing resource utilization. multi-tasking The scheduler may also suspend jobs to keep the system load under control preemption Space sharing: The job uses the requested resources exclusively; no other job is allocated to the same set of CPUs. The job has to be queued until sufficient resources are free. Job Classifications: Job Classifications Batch Jobs vs interactive jobs batch jobs are queued until execution interactive jobs need immediate resource allocation Parallel vs. sequential jobs a job requires several processing nodes in parallel the majority of HPC installations are used to run batch jobs in space-sharing mode! a job is not influenced by other co-allocated jobs the assigned processors, node memory, caches etc. are exclusively available for a single job. overhead for context switches is minimized important aspects for parallel applications Parallel Application Types: Parallel Application Types Rigid Requires a fixed number of processors Moldable The number of processors can be adapted only at the start of the execution Malleable Number of assigned processors can be changed during runtime (i.e., grow/shrink) D. G. Feitelson and L. Rudolph, “Toward convergence in job schedulers for parallel supercomputers,” in JSPP’96 Rigid Moldable Malleable # of Processors # of Processors # of Processors time time time Preemption: Preemption A job is preempted by interrupting its current execution the job might be on hold on a CPU set and later resumed; job still resident on that nodes (consumption of memory) alternatively a checkpoint is written and the job is migrated to another resource where it is restarted later Preemption can be useful to reallocate resources due to new job submissions (e.g. with higher priority) or if a job is running longer then expected. Job Scheduling: Job Scheduling A job is assigned to resources through a scheduling process responsible for identifying available resources matching job requirements to resources making decision about job ordering and priorities HPC resources are typically subject to high utilization therefore, resources are not immediately available and jobs are queued for future execution time until execution is often quite long (many production systems have an average delay until execution of >1h) jobs may run for a long time (several hours, days or weeks) Typical Scheduling Objectives: Typical Scheduling Objectives Minimizing the Average Weighted Response Time Maximize machine utilization/minimize idle time conflicting objective criteria is usually static for an installation and implicit given by the scheduling algorithm r : submission time of a job t : completion time of a job w : weight/priority of a job Job Steps: Job Steps Scheduler Schedule time lokale Job-Queue HPC Machine Grid- User Job Execution Management Node Job Mgmt Node Job Mgmt Node Job Mgmt Job Description A user job enters a job queue, the scheduler (its strategy) decides on start time and resource allocation of the job. Example of Grid Scheduling Decision Making: Example of Grid Scheduling Decision Making Scheduler Schedule time Job-Queue Machine 1 Scheduler Schedule time Job-Queue Machine 2 Scheduler Schedule time Job-Queue Machine 3 Grid-Scheduler Grid User 15 jobs running 20 jobs queued 5 jobs running 2 jobs queued 40 jobs running 80 jobs queued Where to put the Grid job? Available Information from the Local Schedulers: Available Information from the Local Schedulers Decision making is difficult for the Grid scheduler limited information about local schedulers is available available information may not be reliable Possible information: queue length, running jobs detailed information about the queued jobs execution length, process requirements,… tentative schedule about future job executions These information are often technically not provided by the local scheduler In addition, these information may be subject to privacy concerns! Grid-Level Scheduler: Grid-Level Scheduler Discovers & selects the appropriate resource(s) for a job If selected resources are under the control of several local schedulers, a meta-scheduling action is performed Architecture: Centralized: all lower level schedulers are under the control of a single Grid scheduler not realistic in global Grids Distributed: lower level schedulers are under the control of several grid scheduler components; a local scheduler may receive jobs from several components of the grid scheduler Towards Grid Scheduling: Towards Grid Scheduling Grid Scheduling Methods: Support for individual scheduling objectives and policies Multi-criteria scheduling models Economic scheduling methods to Grids Architectural requirements: Generic job description Negotiation interface between higher- and lower-level scheduler Economic management services Workflow management Integration of data and network management Scheduling Objectives in the Grid: Scheduling Objectives in the Grid In contrast to local computing, there is no general scheduling objective anymore minimizing response time, minimizing cost tradeoff between quality, cost, response-time etc. Cost and different service quality come into play the user will introduce individual objectives the Grid can be seen as a market where resource are concurring alternatives Similarly, the resource provider has individual scheduling policies Workflow Scheduling: Workflow Scheduling Workflows: Workflows What is a workflow? Task1 Task 2 Task 3 Task 4 Example: A simple Job Chain Dependencies between tasks/job steps: Control and/or data dependencies Example of a Workflow : Example of a Workflow A simple workflow from climate research with data dependencies Task1 Task 2 Task 3 Climate Archive Select Interesting Data Visualize Simulate Data Subset New Results Communication/Data Dependencies: Communication/Data Dependencies Workflows can cover different communication models synchronous (e.g. streaming of multiple active jobs) or asynchronous (e.g. via files) Synchronous communication requires co-allocation of jobs and data streaming management Asynchronous communication requires file/data management in distributed Grid environments Impact of Coordinated Scheduling (1): Impact of Coordinated Scheduling (1) Consider an application example with a simple workflow consisting of 4 consecutive tasks/steps each running 4 minutes Task1 Task 2 Task 3 Task 4 Consider also a Grid resource with a batch queuing system (e.g. Torque) that has on average a queue waiting time of 60 minutes. We apply a just-in-time scheduling. How long will it take to execute the whole workflow? Task 1 waits for 1h and runs for 5 minutes Task 2 waits for Task 1 to complete, all other tasks analogous = 4*1h + 4*5min = 4h 20 min Impact of Coordinated Scheduling (2): Impact of Coordinated Scheduling (2) How to improve? or using advance reservations (= Planning) How long will it ideally take to execute the whole workflow? Task 1 waits for 1h and runs for 5 minutes Task 2 starts immediately after Task 1 all other tasks analogous = 1h + 4*5min = 1h 20 min put several step in the queue and keep them on hold if preceeding step is not finished (might produce idle times on resources) More complex workflow (1): More complex workflow (1) Concurrent activities Task1 T 2.3 T 2.2 T 2.1 Task3 T 4.3 T 4.2 T 4.1 Task5 More complex workflow (1): More complex workflow (1) Using loops Task1 Task 2 Task 3 Task 4 Example: DAGMan: Example: DAGMan Directed Acyclic Graph Manager DAGMan allows you to specify the dependencies between your Condor-G jobs, so it can manage them automatically for you. (e.g., “Don’t run job “B” until job “A” has completed successfully.”) A DAG is defined by a .dag file, listing each of its nodes and their dependencies: # diamond.dag Job A a.sub Job B b.sub Job C c.sub Job D d.sub Parent A Child B C Parent B C Child D Source: Miron Livny Dynamic Workflows vs Static Workflows: Dynamic Workflows vs Static Workflows Some workflows are not known in advance and its structure might be determined during run time = Dynamic Workflows Static workflows are known in advance Major impact for planning and scheduling workflows Promoter Identification Workflow: Promoter Identification Workflow Ecology: GARP Analysis Pipeline for Invasive Species Prediction: Ecology: GARP Analysis Pipeline for Invasive Species Prediction Slide42: http://www.gridlab.org/ Triana Prototype: Triana Prototype GEO 600 Coalescing Binary Search Workflow Taxonomy: Workflow Taxonomy Workflow design And specification Component/Service Discovery Scheduling and Enactment Data Management Operational Attributes Workflow System structure Model/spec composition Source: Omer Rana Workflow Composition: Workflow Composition User Directed Automated Language-based Graph-based Markup Functional Logic DAG UML Petri Net Process Calculi Process Calculi Composition User defined scripting Planner Templates Design Patterns Sub-workflows Factory Source: Omer Rana Taxonomy of Workflow Scheduling: Taxonomy of Workflow Scheduling Scheduling Criteria Single vs. multiple Number of workflows considered during scheduling step Single (optimizing a single workflow) vs. multiple (optimizing several or all workflows at the same time) Dynamicity Full-ahead vs. Just-time vs. Hybrid Source: CoreGRID Report by U. Innsbruck, FhG FIRST Berlin Taxonomy of Workflow Scheduling (2): Taxonomy of Workflow Scheduling (2) Optimization Model Workflow-oriented (considering the benefit of a single workflow/user) vs. Grid-wide (overall optimization goal) Advance Reservation With AR (using reservations/SLAs) or without Source: CoreGRID Report by U. Innsbruck, FhG FIRST Berlin Taxonomy of Workflow Scheduling Systems: Taxonomy of Workflow Scheduling Systems Source: Jia Yu, Rajkumar Buyya Workflow Languages: Workflow Languages Plenty of them, see Grid Workflow Forum: Workflow languages (scientific and industrial) * AGWL * BPEL4WS * BPML * DGL * DPML * GJobDL * GSFL * GFDL * GWorkflowDL * MoML * SWFL * WSCL * WSCI * WSFL * XLANG * YAWL * SCUFL/XScufl * WPDL * PIF * PSL * OWL-S * xWFL Source: Grid Workflow Forum (www.gridworkflow.org) Excerpt of Workflow Scheduling Systems: Excerpt of Workflow Scheduling Systems DAGMan Pegasus Triana ICENI Taverna GridAnt GrADS GridFlow Unicore Gridbus workflow Askalon Karajan Kepler Source: Grid Workflow Forum (www.gridworkflow.org) Scheduling of a Workflow (1): Scheduling of a Workflow (1) Schedules without advance reservation All times are depending on the local queues The probability of an accidental schedule that reflects the logical flow of the workflow tasks is rather low In many cases the workflow will be broken Scheduling of a Workflow (2): Scheduling of a Workflow (2) Schedules without advance reservation - more “intelligent” All times are depending on the state of the local queues A subsequent task is submitted when the previous one terminates The logical flow of the workflow’s tasks is maintained Overall time depends on the local queues and the probability for longer makespan is quite high Scheduling of a Workflow (3): Scheduling of a Workflow (3) Optimal schedules with advance reservation t3 = t2 and t5 = t4 In case of data transfer of lengths td12 and td23 between tasks: t3 = t2 + td12, t5 = t4 + td23 Slide54: Thanks! Background Information: Background Information Surveys on Grid Workflow Scheduling: A Taxonomy of Workflow Management Systems for Grid Computing, Yu, J. Buyya, R., JOURNAL OF GRID COMPUTING, 2005, VOL 3; NUMBER 3-4, pages 171-200 Taxonomy of the Multi-criteria Grid Workflow Scheduling Problem, Marek Wieczorek, Andreas Hoheisel, Radu Prodan CoreGrid Technical Report
Project start was on 01.06.2010 and the duration is 24 month. D-Grid Integrationsprojekt Fachgebiet 6: Nachhaltigkeit und Koordinierung ...
Workflows and Scheduling in Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID – Summer ...
... 26.05. - 27.05.2010 in ... R. Yahyapour, Virtualization Management for Grids and SOA, CoreGrid Whitepaper WHP-005, August 2008. 2007: Freitag, S ...
Schloss Dagstuhl - Leibniz-Zentrum ... 09.05.10-11.05.10: ... August 2007 auf Schloss Dagstuhl : Veranstaltungsbesprechung : S. 718: Reimer, Helmut.
... Reference Model for Production e-science Infrastructures University of Hagen Hagen, Germany M.SC, Computer Science 2007 Thesis: ...