Gold Accounting Manager:  Gold Accounting Manager Scott Jackson Scalable Systems Software Center SC2002 20 NOV 2002 Introduction:  Introduction In a nutshell, Gold is: A resource bank (allocation management system) tracks and manages resource usage. Much like a bank, it associates a cost to computing resources and allows resource credits to be allocated to users and projects. As jobs complete or as resources are utilized, projects are dynamically charged and resource usage recorded. An accounting system Can be dynamically customized to record any type of accounting data – pacct, sar, node availability, etc. An information service Also functions as a powerful generalized information service useful in a variety of means, such as providing mappings for meta-scheduling mappings of machines to resources, applications, accounts, users, etc. But First, A Little Background…:  But First, A Little Background… Scalable Systems Software Center Research, develop and support an integrated suite of systems software and tools for the effective management and utilization of the highest scale computational resources. SciDAC Scientific Discovery through Advanced Computing – A DOE initiative to improve the impact of scientific computing QBank A dynamic allocation management system developed at PNNL and in use at about a dozen sites. Motivation:  Motivation Show return on investment Funding sources have invested heavily in a supercomputer and require a means to show that it is being utilized efficiently. Fairness Management needs a means to fairly distribute the underlying computing resources (processors, memory, disk) to the various users and projects. Capacity Planning Accurate allocation and usage information is needed to effectively make decisions on resource commitments and new procurements. Centralized Access Control Many organizations want centralized control over which users and projects have access to what machines and for how long. Motivation (from a meta-scheduling slant):  Motivation (from a meta-scheduling slant) Meta-computing Trust issues Guarantees Distributed Accounting Local Control Equitable trade agreement Security Different userids, accounts, execution environments on different resources Nonfunctional Requirements:  Nonfunctional Requirements Scalable Targeting systems with tens of thousands of processors and thousands of simultaneous jobs. Secure Will use strong authentication (no clear-text passwords) to prevent unauthorized access and data encryption to prevent sensitive information from being intercepted (XML-DSIG and XML-ENC). Fault Tolerant Database performs automatic rollbacks on failed transactions. A distributed design that includes data replication will be researched. Nonfunctional Requirements (cont.):  Nonfunctional Requirements (cont.) Open Source Allows free distribution, allows sites to make local modifications, and derived works, and promotes sharing of patches, ports and enhancements from user community. Portable Written in Java – initially tested to a reference Linux platform and expanded to include architectures used at the largest DOE computing facilities. Easy to Use Web accessible GUI (based on PHP and Javascript) will help managers, users and admins gain the access they need from their own PC’s. Operational Characteristics:  Operational Characteristics Supports familiar bank operations Deposits, withdrawals, transfers, refunds, balance checks and bank statements Reservations Before a job runs, a reservation (or hold) is placed on the account based on the wallclock limit. This prevents overdrafts. Quotations In a meta-scheduling environment it is useful to know how much a job is going to “cost” so that you can make a decision on the best place to run your job. Hierarchical Accounts Projects can be nested (trickle down deposits, trickle up withdrawals) Dynamic Resource Management Interaction:  Dynamic Resource Management Interaction Make Deposits, etc. Submit Job Balance Check Make Reservation Start Job Job Completes Remove Reservation & Make Withdrawal Dynamic Resource Management Interaction (with meta-scheduling):  Dynamic Resource Management Interaction (with meta-scheduling) Resource Manager (PBS, LL) Allocation Manager (Gold) 0 2 1 5 3 3 7 Make Deposits, etc. Submit Job Locate Feasible Systems & Obtain Quote Stage Job Balance Check Make Reservation Start Job Job Completes Remove Reservation & Make Withdrawal Meta-Scheduler (Silver) Scheduler (Maui) 6 8 4 Allocations:  Allocations An allocation is a collection of resource credits valid toward an arbitrary group of users, machines and projects and a timeframe for expenditure. Commonly associated with a single project (account), and a set of users and machines. Fine-grained control of who can use how much within a project can be achieved by multiple allocations in the same project. The dimensions of Allocation Management:  The dimensions of Allocation Management Projects Grand Challenge Development Weather Modeling Chem101 Navy SETI Viz … Users Tom Sheri Scientist Developer Admin Workshop Manager … Resources Time DOE PNNL LLNL SDSC ANL MPP1 Colony Jupiter Allocation-User Distribution Possibilities:  Allocation-User Distribution Possibilities Allocation-Machine Distribution Possibilities:  Allocation-Machine Distribution Possibilities Allocation Timeframes:  Allocation Timeframes Allocation Timeframes:  Allocation Timeframes Allocation Timeframes:  Allocation Timeframes Allocation Timeframes:  Allocation Timeframes Allocation Timeframes:  Allocation Timeframes Journaling:  Journaling State Preservation Preserves indefinite historical state of all objects and records Bank Statements Journaling allows bank statements to show balances for any arbitrary time in the past Undo/Redo With a powerful querying/updating comes the potential for rampant administrative mistakes Time Travel You can run any command as if it were an arbitrary date in the past Flexible Charging Mechanism:  Flexible Charging Mechanism Besides CPU, a resource supplier can charge based on the amount of memory, disk, or any other consumable resource as well as quality of service, primetime, nodetype, class, etc. An external pricing engine interface will allow any sort of charging algorithm to be used such as dynamic price adjustment according to load or queue backlog, a query to an external information service or a cached second-price auction result. Traceback Mechanism:  Traceback Mechanism Allows all parties of a transaction (resource requestor and provider) to have a record of the resource utilization and to have a say as to whether or not the job should be permitted to run, based on their independent policies and priorities. A job will only run if all parties are agreeable to the idea that the target resources can be used in the manner and amount requested. MetaLBNL runPNNL Meta Account PNNL LBNL Traceback debit Flexible and Extensible:  Flexible and Extensible Powerful Querying/Updating Capabilities Create, query, modify, delete, undelete Support for operators (equals, less than, not equal, matching, etc.) Conjunctive expression combinations (and, or) Object joined queries Dynamically Extensible New object/record types and their fields can be dynamically created/modified through the regular query language (command line or GUI). This capability turns this system into a generalized information service. This capability is extremely powerful and can be used for meta-scheduling resource-mapping, an interface for persistence for other components, and all varieties of accounting possibilities! Schedule:  Schedule 4Q02 Requirements gathering completed and release initial Resource Management Interface Specs 2Q03 QBank bundled with SSS initial release (possible alpha-testing on Gold) 2Q04 Beta release of Gold 4Q05 Production release of Gold (includes support) Contact Information:  Contact Information Scott Jackson Pacific Northwest National Laboratory (509) 376-2205 Scalable Systems Software Center QBank documentation and download

