Session 33 - Production Grids

50 %
50 %
Information about Session 33 - Production Grids
Education

Published on July 13, 2009

Author: ISSGC

Source: slideshare.net

Overview of Production Grids Steven Newhouse

Contents Open Science Grid DEISA NAREGI Nordic DataGrid Facility EGEE TeraGrid EGI

Open Science Grid Ruth Pordes

Open Science Grid Consortium - >100 member organizations contributing resources, software, applications, services. Project: Funded by DOE and NSF to deliver to the OSG Consortium for 5 years 2006-2011, 33FTEs. VO science deliverables are OSG’s milestones. Collaboratively focused: Partnerships, international connections, multidisciplinary Satellites - independently funded projects contributing to the OSG Consortium program and vision: CI-Team User and Campus Engagement, VOSS study of Virtual Organizations, CILogin Integration of end point Shibboleth identity management into the OSG infrastructure, Funding for students to the International Summer School for Grid Computing 2009

TeraGrid'09 (Jun. 23, 2009) Paul Avery 5 OSG & Internet2 Work Closely w/ Universities ~100 compute resources ~20 storage resources ~70 modules in software stack ~35 User Communities (VOs) 600,00-900,000 CPUhours/day, 200K-300K jobs/day, >2000 users ~5 other infrastructures ~25 resource sites. 2 5

Users Nearly all applications High Throughput. Small number of users starting MPI production use. Major accounts: US LHC, LIGO ATLAS, CMS >3000 physicists each. US ATLAS & US CMS Tier-1, 17 Tier-2s and new focus on Tier-3s (~35 today, expect ~70 in a year) ALICE taskforce to show usability of OSG infrastructure for their applications. LIGO Einstein@Home US Physics Community: Tevatron - CDF & D0 FNAL and remote sites. Other Fermilab users – Neutrino, astro, simulation, theory STAR IceCube Non-Physics: ~6% of usage. ~25 single PIs or small groups from biology, molecular dynamics, chemitry, weather forecasting, mathematics, protein prediction. campus infrastructures: ~7 including universities and labs.

Non-physics use highly cyclic

Operations All hardware contributed by members of the Consortium Distributed operations infrastructure including security, monitoring, registration, accounting services etc. Central ticketing system, 24x7 problem reporting and triaging at the Grid Operations Center. Distributed set of Support Centers as first line of support for VOs, services (e.g. software) and Sites. Security incident response teams include Site Security Administrators and VO Security Contacts. Software distribution, patches (security) and update. Targetted Production, Site and VO support teams.

OSG Job Counts (2008-9) TeraGrid'09 (Jun. 23, 2009) Paul Avery 9 100M Jobs 300K jobs/day

Software OSG Virtual Data Toolkit packaged, tested, distributed, supported software stack used by multiple projects – OSG, EGEE, NYSGrid, TG, APAC, NGS. ~70 components covering Condor, Globus, security infrastructure, data movement, storage implementations, job management and scheduling, network monitoring tools, validation and testing, monitoring/accounting/information, needed utilities such as Apache, Tomcat; Server, User Client, Worker-Node/Application Client releases. Build and regression tested using U of Wisconsin Madison Metronome system. Pre-release testing on 3 “VTB” sites – UofC, LBNL, Caltech Post-release testing of major releases on Integration Testbed Distributed team at Uof Wisconsin, Fermilab, LBNL. Improved support for incremental upgrades in OSG 1.2 release summer ’09. OSG configuration and validation scripts distributed to use the VDT. OSG does not develop software except for tools and contributions (extensions) to external software projects delivering to OSG stakeholder requirements. Identified liaisons provide bi-directional support and communication between OSG and External Software Provider projects. OSG Software Tools Group oversees all software developed within the project. Software vulnerability and auditing processes in place.

VDT Progress (1.10.1 Just Released) TeraGrid'09 (Jun. 23, 2009) Paul Avery 11 ~ 70 components

Partnerships and Collaborations Partnerships with network fabric and identity service providers – ESNET, Internet2 Continuing bridging work with EGEE, SuraGrid, TeraGrid. ~17 points of contact/collaboration with EGEE and WLCG. Partnership statement for EGI/NGIs. Emerging collaborations with TG on Workforce Training, Software, Security. Creator(co-sponsor) of successful e-weekly (International) Science Grid This Week. Co-sponsor of this iSSGC’09 school. Member of Production Infrastructure Policy Group (OGF affiliated).

Community Collaboratories 13 Community Collaboratory

DEISA Advancing Science in Europe H. Lederer, A. Streit, J. Reetz - DEISA RI-222919 www.deisa.eu

DEISA consortium and partners Eleven Supercomputing Centres in EuropeBSC, CSC, CINECA, ECMWF, EPCC, FZJ, HLRS, IDRIS, LRZ, RZG, SARA Four associated partners: CEA, CSCS, JSCC, KTH July 2009 H. Lederer, A. Streit, J. Reetz - DEISA 15 RI-222919 Co-Funded by the European Commission DEISA2 contract RI-222919 15

Infrastructure and Services HPC infrastructure with heterogeneous resources State-of-the-art supercomputer Cray XT4/5, Linux IBM Power5, Power6, AIX / Linux IBM BlueGene/P, Linux IBM PowerPC, Linux SGI ALTIX 4700 (Itanium2 Montecito), Linux NEC SX8/9 vector systems, Super UX More than 1 PetaFlop/s of aggregated peak performance Dedicated network, 10 Gb/s links provided by GEANT2 and NRENs Continental shared high-performance filesystem (GPFS-MC, IBM) HPC systems are owned and operated by national HPC centres DEISA services are layered and operated on top Fixed fractions of the HPC resources are dedicated for DEISA Europe-wide coordinated expert teams for operation, technology developments, and application enabling and support July 2009 H. Lederer, A. Streit, J. Reetz - DEISA 16 RI-222919 16

July 2009 H. Lederer, A. Streit, J. Reetz - DEISA 17 HPC resource usage RI-222919 HPC Applications from various scientific fields: astrophysics, earth sciences, engineering, life sciences, materials sciences, particle physics, plasma physics require capability computing facilities (low latency, high throughput interconnect), often application enabling and support Resources granted through: - DEISA Extreme Computing Initiative (DECI, annual calls) DECI call 2008 42 proposals accepted 50 mio CPU-h granted* DECI call 2009 (proposals currently under review) 75 proposals more than 200 mio CPU-h requested* *) normalized to IBM P4+ Over 160 universities and research institutes from 15 European countries with co-investigators from four other continents have already benefitted - Virtual Science Community Support 2008: EFDA, EUFORIA, VIROLAB 2009: EFDA, EUFORIA, ENES, LFI-PLANCK, VPH/VIROLAB, VIRGO 17

Middleware Various services are provided on the middleware layer: DEISA Common Production Environment (DCPE) (Homogeneous software environment layer for heterogeneous HPC platforms) High performance data stage-in/-out to GPFS: GridFTP Workflow management: UNICORE Job submission: UNICORE WS-GRAM (optional) Interactive usage of local batch systems remote job submission between IBM P6/AIX systems (LL-MC) Monitoring System: INCA Unified AAA: distributed LDAP and resource usage data bases Only few software components are developed within DEISA Focus on technology evaluation, deployment and operation Bugs are reported to the software maintainers July 2009 H. Lederer, A. Streit, J. Reetz - DEISA 18 RI-222919 18

Standards DEISA has a vital interest in the standardization of interfaces to HPC services Job submission, job and workflow management, data management, data access and archiving, networking and security (including AAA) DEISA supports OGF standardization groups JSDL-WG and OGSA-BES for job submission, UR-WG and RUS-WG for accounting DAIS for data services Engagement in Production Grid Infrastructure WG DEISA collaboration in standardization with other projects GIN community Infrastructure Policy Group (DEISA, EGEE, TeraGrid, OSG, NAREGI) Goal: Achievement of seamless interoperation of leading Grid Infrastructures worldwide - Authentication, Authorization, Accounting (AAA) - Resource allocation policies - Portal / access policies July 2009 H. Lederer, A. Streit, J. Reetz - DEISA 19 RI-222919 19

Status of CSI Grid (NAREGI) Kento Aida National Institute of Informatics

Overview Current Status We started pilot operation in May 2009. Organization Computer centers in 9 universities resource provider National Institute of Informatics network provider (SINET 3) and GOC Funding organizations’ own funding Kento Aida, National Institute of Informatics 21

Operational Infrastructure Kento Aida, National Institute of Informatics 22

Middleware NAREGI middleware Ver. 1.1.3 developer National Institute of Informatics ( http://middleware.naregi.org/Download/ ) platform CentOS 5.2 + PBS Pro 9.1/9.2 OpenSUSE 10.3 + Sun Grid Engine v6.0 Kento Aida, National Institute of Informatics 23

Nordic DataGrid Facility Michael Gronager

NDGF Organization A Co-operative Nordic Data and Computing Grid facilityNordic production grid, leveraging national grid resourcesCommon policy framework for Nordic production gridJoint Nordic planning and coordinationOperate Nordic storage facility for major projectsCo-ordinate & host major eScience projects (i.e., Nordic WLGC Tier-1)‏Contribute to grid middleware and develop servicesNDGF 2006-2010Funded (2 M€/year) by National Research Councils of the Nordic Countries25 NOS-N IS DK SE FI NO Nordic Data Grid Facility

A Co-operative Nordic Data and Computing Grid facility

Nordic production grid, leveraging national grid resources

Common policy framework for Nordic production grid

Joint Nordic planning and coordination

Operate Nordic storage facility for major projects

Co-ordinate & host major eScience projects (i.e., Nordic WLGC Tier-1)‏

Contribute to grid middleware and develop services

NDGF 2006-2010

Funded (2 M€/year) by National Research Councils of the Nordic Countries

NDGF Facility - 2009Q1

NDGF People - 2009Q2

Application Communities WLCG – the Worldwide Large Hadron Collider GridBio-informatics sciencesScreening of CO2-Sequestration suitable reservoirsComputational ChemistryMaterial ScienceAnd the more horizontal:Common Nordic User Administration,Authentication,Authorization &Accounting28

WLCG – the Worldwide Large Hadron Collider Grid

Bio-informatics sciences

Screening of CO2-Sequestration suitable reservoirs

Computational Chemistry

Material Science

And the more horizontal:

Common Nordic User Administration,

Authentication,

Authorization &

Accounting

Operations Operation team of 5-7 peopleCollaboration btw. NDGF and SNIC and NUNOCExpert 365 days a year24x7 by Regional RENDistributed over the NordicsRuns:rCOD + ROC – for Nordic + BalticDistributed Sites (T1, T2s)Sysadmins well known by the operation teamContinuous chatroom meetings29

Operation team of 5-7 people

Collaboration btw. NDGF and SNIC and NUNOC

Expert 365 days a year

24x7 by Regional REN

Distributed over the Nordics

Runs:

rCOD + ROC – for Nordic + Baltic

Distributed Sites (T1, T2s)

Sysadmins well known by the operation team

Continuous chatroom meetings

Middleware Philosophy:We need tools to run an e-Infrastructure.Tools cost: money / in kind.In kind means Open Source tools– hence we contribute to things we use:dCache (storage) – a DESY, FNAL, NDGF ++ collaborationARC (computing) – a Collaboration btw Nordic, Slovenian, Swiss insts.SGAS (accounting) and Confusa (client-cert from IdPs)BDII, WMS, SAM, AliEn, Panda – gLite/CERN toolsMonAmi, Nagios (Monitoring)30

Philosophy:

We need tools to run an e-Infrastructure.

Tools cost: money / in kind.

In kind means Open Source tools

– hence we contribute to things we use:

dCache (storage) – a DESY, FNAL, NDGF ++ collaboration

ARC (computing) – a Collaboration btw Nordic, Slovenian, Swiss insts.

SGAS (accounting) and Confusa (client-cert from IdPs)

BDII, WMS, SAM, AliEn, Panda – gLite/CERN tools

MonAmi, Nagios (Monitoring)

NDGF now and in the future e-Infrastructure as a whole is importantResources count Capacity and Capability Computing and different Network and Storage systemsThe infrastructure must support different access methods (grid, ssh, application portals etc) – note that the average grid use of shared resources are only around 10-25%Uniform User Mgmt, Id, Access, Accounting, Policy Enforcement and resource allocation and sharingIndependent of access methodFor all users

e-Infrastructure as a whole is important

Resources count Capacity and Capability Computing and different Network and Storage systems

The infrastructure must support different access methods (grid, ssh, application portals etc) – note that the average grid use of shared resources are only around 10-25%

Uniform User Mgmt, Id, Access, Accounting, Policy Enforcement and resource allocation and sharing

Independent of access method

For all users

Add a comment

Related presentations

Related pages

Smart grid - Wikipedia, the free encyclopedia

Smart grid; Smart growth; ... In a recent brainstorming session, [39] the power grid was ... Smart grids can also coordinate the production of power from ...
Read more

Athene TEM Grids for Transmission Electron Microscopy

Athene grids production started 70 years ago for the first transmission electron microscopes and still retain the qualities which made the early ...
Read more

LEARN & EARN UP TO 33 CPE CREDITS - Amazon S3

LEARN & EARN UP TO 33 CPE CREDITS DAY & TIME Accounting & Finance Benefits & ... General Session II – CEO Panel: Road to the Top – Brantley Barrow, ...
Read more

Energy development - Wikipedia, the free encyclopedia

Thus any energy "production" is actually a recovery transformation of the forms of energy whose origin ... [33] Nuclear Fission. The ... (off-grid) energy ...
Read more

CONFERENCE PROGRAMME SCHEDULE - IPTCNET.org

Session 5: Session 33: Reservoir Geologic Modelling I Session 6: ... Production & Operations: Field Development Engineering, Projects & Facilities
Read more

ie8-grid-foundation-4.css · GitHub - Create a new Gist ...

... 16.66667%; } .small-3, .row .small-3 { width: 25%; } .small-4, .row .small-4 { width: 33 ... ie8-grid-foundation-4.css with ... session. You signed out ...
Read more

7. GRID 2006: Barcelona, Spain - dblp: computer science ...

GRID 2006: Barcelona, Spain. Trier 1. ... Run-time Optimisation of Grid Workflow Applications. 33-40. ... Service-Oriented Production Grids and User ...
Read more

CFMA’S 2014 CONFERENCE & EXHIBITION - Amazon S3

Mini-Conference I – Risk Management Mini-Conference II ... General Session III – Back to the Future – Anirban Basu Sponsored by: Viewpoint
Read more