gLite Data Management System

60 %
40 %
Information about gLite Data Management System
Technology

Published on August 6, 2009

Author: leandrociuffo

Source: slideshare.net

Description

Data Management architecture and commands supported by the gLite Grid middleware

Architecture of the gLite Data Management System Leandro Neumann Ciuffo INFN-Catania (Italy) EELA-2 Tutorial Montevideo, 22.07.2009

Outline Challenges of data management in a Grid infrastructure Initial definitions Types of Storage Elements File naming conventions File catalogue Practical exercises (hands on) Be prepared for a bunch of acronyms! gLite DMS – EELA-2 Tutorial, 22.07.2009

Challenges of data management in a Grid infrastructure

Initial definitions

Types of Storage Elements

File naming conventions

File catalogue

Practical exercises (hands on)

Be prepared for a bunch of acronyms!

Challenges Heterogeneity Data are stored on different storage systems using different access technologies Distribution Data are stored in different locations (in most cases there is no shared file system or common namespace) Data need to be moved between different locations Data description Data are stored as files (need to describe and locate them according to their content) gLite DMS – EELA-2 Tutorial, 22.07.2009 Storage Resource Manager interface File Catalogue File Transfer Service Metadata Service

Heterogeneity

Data are stored on different storage systems using different access technologies

Distribution

Data are stored in different locations (in most cases there is no shared file system or common namespace)

Data need to be moved between different locations

Data description

Data are stored as files (need to describe and locate them according to their content)

Getting started The Storage Element (SE) is the service which allows users and applications (programs) to store/retrieve data (files) The DMS provide services for location, access and transfer of files User do not need to know the file location, just its logical name. Files can be replicated or transferred to several locations (SEs) as needed. Files are shared within a VO Files are write-once, read-many Files cannot be changed unless remove or replaced No intention of providing a global file management system gLite DMS – EELA-2 Tutorial, 22.07.2009

The Storage Element (SE) is the service which allows users and applications (programs) to store/retrieve data (files)

The DMS provide services for location, access and transfer of files

User do not need to know the file location, just its logical name.

Files can be replicated or transferred to several locations (SEs) as needed.

Files are shared within a VO

Files are write-once, read-many

Files cannot be changed unless remove or replaced

No intention of providing a global file management system

Getting started Files located in the Storage Elements (SEs)… Are mostly write-once, read-many. Accessible by users and applications from “anywhere” in the Grid. Several replicas of one file can be replicated at different sites. Cannot be changed unless remove or replaced. Storage Elements (SEs)… Provide storage space for files. Provide transfer protocol (GSIFTP) ~ GSI based FTP server Provide an interface for the management of disk and tape storage resources: Storage Resource Manager (SRM) gLite DMS – EELA-2 Tutorial, 22.07.2009

Files located in the Storage Elements (SEs)…

Are mostly write-once, read-many.

Accessible by users and applications from “anywhere” in the Grid.

Several replicas of one file can be replicated at different sites.

Cannot be changed unless remove or replaced.

Storage Elements (SEs)…

Provide storage space for files.

Provide transfer protocol (GSIFTP) ~ GSI based FTP server

Provide an interface for the management of disk and tape storage resources: Storage Resource Manager (SRM)

Types of Storage Elements dCache Consists of a server and one or more pool nodes. Centralized admin.: single point of access to the SE. Files are presented in the disk pools under a single virtual filesystem tree. Uses the GSI dCache Access Protocol (gsidcap). CERN Advanced STORage manager (CASTOR) Files are migrated from a disk buffer frontend to a tape mass storage Uses the insecure Remote File I/O protocol (RFIO) Disk Pool Manager (DPM) Used for fairly small SEs (max 10 TB of total space) with disk-based storage only. Uses secure RFIO protocol gLite DMS – EELA-2 Tutorial, 22.07.2009

dCache

Consists of a server and one or more pool nodes.

Centralized admin.: single point of access to the SE.

Files are presented in the disk pools under a single virtual filesystem tree.

Uses the GSI dCache Access Protocol (gsidcap).

CERN Advanced STORage manager (CASTOR)

Files are migrated from a disk buffer frontend to a tape mass storage

Uses the insecure Remote File I/O protocol (RFIO)

Disk Pool Manager (DPM)

Used for fairly small SEs (max 10 TB of total space) with disk-based storage only.

Uses secure RFIO protocol

Storage Resource Manager (SRM) B C Worker Nodes A User Interface SE - CASTOR SE - DPM dCache submit read input read input store output gLite DMS – EELA-2 Tutorial, 22.07.2009 myJOB

Storage Resource Manager (SRM) You as a user need to know all the systems!!! SRM I talk to them on your behalf I will even allocate space for your files And I will use transfer protocols to send your files there SE CASTOR SE DPM SE dCache The SRM is a single interface that takes care of local storage interaction and provides a Grid interface to the outside world. gLite DMS – EELA-2 Tutorial, 22.07.2009

You as a user need to know all the systems!!!

File Naming conventions (1) Grid Unique IDentifier (GUID) Every file has a GUID A non-human-readable unique identifier, e.g.: guid:38ed3f60-c402-11d7-a6b0-f53ee5a37e1d Note: all replicas of a file will share the same GUID Logical File Name (LFN) An a lias that can be used to refer to a file, e.g.: lfn://grid/gilda/users/mario/myfile.dat gLite DMS – EELA-2 Tutorial, 22.07.2009 Logical File Name 1 Logical File Name N GUID ...

Grid Unique IDentifier (GUID)

Every file has a GUID

A non-human-readable unique identifier, e.g.: guid:38ed3f60-c402-11d7-a6b0-f53ee5a37e1d

Note: all replicas of a file will share the same GUID

Logical File Name (LFN)

An a lias that can be used to refer to a file, e.g.: lfn://grid/gilda/users/mario/myfile.dat

File Naming conventions (2) Storage URL (SURL) or Physical File Name (PFN) The location of an actual file on a storage system, e.g.: srm://aliserv6.ct.infn.it/dpm/home/gilda/project1/test.dat Note: Used by the system to find where the replica is physically stored Transport URL (TURL) Complete URI with the necessary information to access a file in a SE (including the access protocol) e.g.: rfio://lxshare0209.cern.ch//data/alice/ntuples.dat Logical File Name 1 Logical File Name N GUID ... ... Physical File SURL N Physical File SURL 1 TURL 1 TURL 1 ... gLite DMS – EELA-2 Tutorial, 22.07.2009

Storage URL (SURL) or Physical File Name (PFN)

The location of an actual file on a storage system, e.g.: srm://aliserv6.ct.infn.it/dpm/home/gilda/project1/test.dat

Note: Used by the system to find where the replica is physically stored

Transport URL (TURL)

Complete URI with the necessary information to access a file in a SE (including the access protocol) e.g.: rfio://lxshare0209.cern.ch//data/alice/ntuples.dat

SRM interactions SRM The client asks the SRM for the file providing an SURL The SRM asks the Storage Element to provide the file The Storage Element notifies the availability of the file and its location The SRM returns a TURL (Transfer URL), i.e. the location from where the file can be accessed The client interacts with the storage using the protocol specified in the TURL 2 3 5 1 4 SE gLite DMS – EELA-2 Tutorial, 22.07.2009 Client

The client asks the SRM for the file providing an SURL

The SRM asks the Storage Element to provide the file

The Storage Element notifies the availability of the file and its location

The SRM returns a TURL (Transfer URL), i.e. the location from where the file can be accessed

The client interacts with the storage using the protocol specified in the TURL

Needles in a haystack How do I keep track of all files I have on the Grid? Even if I remember all the LFN’s of my files, what about someone else's files? How does the Grid keep track of the mapping between LFN(s), GUID and SURL(s)? LFC = L CG F ile C atalogue LCG = LHC Compute Grid LHC = Large Hadron Collider gLite DMS – EELA-2 Tutorial, 22.07.2009 File Catalogue

How do I keep track of all files I have on the Grid?

Even if I remember all the LFN’s of my files, what about someone else's files?

How does the Grid keep track of the mapping between LFN(s), GUID and SURL(s)?

LFC = L CG F ile C atalogue

LCG = LHC Compute Grid

LHC = Large Hadron Collider

File Catalogue Is the service which maintains mappings between LFN(s), GUID and SURL(s) It keeps track of the location of copies (replicas) of files It consists of a unique catalogue, where the LFN is the main key Looks like a “top-level” directory in the Grid For each of the supported VO a separate subdirectory exists under the "/grid" directory. All members of a given VO have read-write permissions in such a directory gLite DMS – EELA-2 Tutorial, 22.07.2009

Is the service which maintains mappings between LFN(s), GUID and SURL(s)

It keeps track of the location of copies (replicas) of files

It consists of a unique catalogue, where the LFN is the main key

Looks like a “top-level” directory in the Grid

For each of the supported VO a separate subdirectory exists under the "/grid" directory.

All members of a given VO have read-write permissions in such a directory

The LFC Service User Interface SE B SE A SE C File Catalogue lfn:/grid/gilda/tcaland/mpi.txt gLite DMS – EELA-2 Tutorial, 22.07.2009

The LFC Service srm://host.example.com/foo/bar host.example.com /grid/dteam/dir1/dir2/file1.root LFN GUID 38ed3f60-c402-11d7 -a6b0… Replicas /grid/dteam/mydir/mylink Symlink Further LFNs can be added as symlinks to the main LFN. LCF key SURLs User Metadata System Metadata gLite DMS – EELA-2 Tutorial, 22.07.2009

Job submission – example 1 User Interface CE Worker Nodes WMS Small files: InputSandbox / OutputSandbox gLite DMS – EELA-2 Tutorial, 22.07.2009

Small files: InputSandbox / OutputSandbox

Data Management – example 2 User Interface CE Worker Nodes WMS LFC SE SE gLite DMS – EELA-2 Tutorial, 22.07.2009

LFC commands Interact with the catalogue only gLite DMS – EELA-2 Tutorial, 22.07.2009 Add/replace a comment lfc-setcomment Set file/directory access control lists lfc-setacl Remove a file/directory lfc-rm Rename a file/directory lfc-rename Create a directory lfc-mkdir List file/directory entries in a directory lfc-ls Make a symbolic link to a file/directory lfc-ln Get file/directory access control lists lfc-getacl Delete the comment associated with the file/directory lfc-delcomment Change owner and group of the LFC file-directory lfc-chown Change access mode of the LFC file/directory lfc-chmod

Interact with the catalogue only

lcg-utils commands Copy files to/from/between SEs. Keep the SEs and the Catalogue up to date. The RPM containing these tools (lcg_util) is installed in the WNs and UIs. gLite DMS – EELA-2 Tutorial, 22.07.2009 lcg-cp Copies a grid file to a local destination lcg-cr Copies a file to a SE and registers the file in the catalog lcg-del Delete one file lcg-rep Replication between SEs and registration of the replica lcg-gt Gets the TURL for a given SURL and transfer protocol lcg-sd Sets file status to “Done” for a given SURL in a SRM request

Copy files to/from/between SEs.

Keep the SEs and the Catalogue up to date.

The RPM containing these tools (lcg_util) is installed in the WNs and UIs.

Environment Variables Make sure to use the correct BDII and LFC BDII - LCG_GFAL_INFOSYS export LCG_GFAL_INFOSYS=gilda-bdii.ct.infn.it:2170 LFC - LFC_HOST export LFC_HOST=lfc-gilda.ct.infn.it gLite DMS – EELA-2 Tutorial, 22.07.2009

Make sure to use the correct BDII and LFC

BDII - LCG_GFAL_INFOSYS

export LCG_GFAL_INFOSYS=gilda-bdii.ct.infn.it:2170

LFC - LFC_HOST

export LFC_HOST=lfc-gilda.ct.infn.it

Let’s practice! Reference: https://grid.ct.infn.it/twiki/bin/view/GILDA/DataManagement

Environment Variables Pointing to the right BDII Pointing to the right LFC echo $ LCG_GFAL_INFOSYS export LCG_GFAL_INFOSYS =gilda-bdii.ct.infn.it:2170 echo $ LFC_HOST export LFC_HOST =lfc-gilda.ct.infn.it gLite DMS – EELA-2 Tutorial, 22.07.2009

Pointing to the right BDII

Pointing to the right LFC

Before starting… voms-proxy-init --voms gilda gLite DMS – EELA-2 Tutorial, 22.07.2009 Make sure to have a proxy created

Make sure to have a proxy created

LFC: Listing file and directory lfc-ls -l /grid/gilda Remember that LFC has a directory tree structure /grid/ <VO_name> / <user directory> Defined by the user LFC Namespace You can set LFC_HOME variable to use relative paths export LFC_HOME =/grid/gilda/tutorials lfc-ls gLite DMS – EELA-2 Tutorial, 22.07.2009

Remember that LFC has a directory tree structure

/grid/ <VO_name> / <user directory>

You can set LFC_HOME variable to use relative paths

LFC: creating a directory lfc-mkdir /grid/gilda/tutorials/ yourname Create your own personal directory inside: /grid/gilda/tutorials/ <your dir> You can check the creation typing: lfc-ls /grid/gilda/tutorials gLite DMS – EELA-2 Tutorial, 22.07.2009

Create your own personal directory inside:

/grid/gilda/tutorials/ <your dir>

You can check the creation typing:

Downloading a file lcg-cp --vo gilda lfn:/grid/gilda/users/example/alien.txt file://$HOME/alien.txt First of all, let ’s download a file from a SE to start “playing” with it. Basic Usage: Try it: lcg-cp --vo <vo name> <LFN origin> <local destination> gLite DMS – EELA-2 Tutorial, 22.07.2009

First of all, let ’s download a file from a SE to start “playing” with it.

Basic Usage:

Try it:

Copying and registering a file lcg-cr --vo <vo name> -l <LFN destination> -d <SE> <local file> lcg-cr Copies a file to a SE and registers the file in the catalogue This command will return the GUID for your file gLite DMS – EELA-2 Tutorial, 22.07.2009 Make sure to have a directory in the LFC ( /grid/gilda/users/sagrid/yourname/ ) Use the lcg-info or lcg-infosites commands to figure out the available SEs lcg-infosites --vo gilda se Avail Space(Kb) Used Space(Kb) Type SEs ---------------------------------------------------------- 1100000000 1145007 n.a gilda-se.rediris.es 1030000000 32 n.a fn2.hpcc.sztaki.hu 295250000 75945624 n.a aliserv6.ct.infn.it n.a 999999 n.a se-edu.grid.acad.bg 60440000 3280565 n.a iceage-se-01.ct.infn.it 1008437 8844236 n.a se.hpc.iit.bme.hu 53160000 440416 n.a vega-se.ct.infn.it 2430000000 440450 n.a se1-egee.srce.hr 97890000 440423 n.a dgt02.ui.savba.sk lcg-cr --vo gilda -l lfn:/grid/gilda/tutorials/ yourname/yourfile.txt -d aliserv6.ct.infn.it file://$HOME/alien.txt

lcg-cr

Copies a file to a SE and registers the file in the catalogue

This command will return the GUID for your file

Replicate a file between SEs lcg-rep --vo gilda -d gilda-se.rediris.es lfn:/grid/gilda/tutorials/ yourname/yourfile.txt Basic Usage: Try it: lcg-rep --vo <vo name> -d <destination SE> <LFN of your file> gLite DMS – EELA-2 Tutorial, 22.07.2009

Basic Usage:

Try it:

Listing the replicas Use the same lcg-lr command used previously: The command will return the SURL of all replicas A file can be stored on multiple SE's so that a job can download it from the closest SE while is running. lcg-lr --vo gilda lfn:/grid/gilda/tutorials/ yourname/yourfile.txt gLite DMS – EELA-2 Tutorial, 22.07.2009

Use the same lcg-lr command used previously:

The command will return the SURL of all replicas

A file can be stored on multiple SE's so that a job can download it from the closest SE while is running.

Adding metadata information lfc-setcomment /grid/gilda/tutorials/ yourname/yourfile.txt “ This is my comment ” This is the only user-defined metadata that can be associated with catalogue entries. Basic Usage: Try it: lfc-setcomment <LFC file path> &quot;Your comments&quot; gLite DMS – EELA-2 Tutorial, 22.07.2009

This is the only user-defined metadata that can be associated with catalogue entries.

Basic Usage:

Try it:

Listing with comments lfc-ls --comment /grid/gilda/tutorials/ yourname/ Try it: gLite DMS – EELA-2 Tutorial, 22.07.2009

Try it:

Creating a symbolic link Two different LFNs will point to the same file. Basic Usage: Try it: Check your link typing: lfc-ln -s /grid/gilda/tutorials/ yourname/yourlink.txt /grid/gilda/tutorials/ yourname/yourfile.txt lfc-ln -s <your symbolic link> <original file> lfc-ls -l /grid/gilda/tutorials/ yourname/ gLite DMS – EELA-2 Tutorial, 22.07.2009

Two different LFNs will point to the same file.

Basic Usage:

Try it:

Check your link typing:

Downloading a file lcg-cp --vo gilda lfn:/grid/gilda/tutorials/ yourname/yourfile.txt file://$HOME/ yourfile.txt Basic Usage: Try it: lcg-cp --vo <vo name> <LFN origin> <local destination> gLite DMS – EELA-2 Tutorial, 22.07.2009

Basic Usage:

Try it:

Deleting a file lcg-del -a --vo gilda lfn:/grid/gilda/tutorials/ yourname/yourfile.txt Basic Usage: Try it : lcg-del -a --vo <vo name> <LFN> gLite DMS – EELA-2 Tutorial, 22.07.2009

Basic Usage:

Try it :

Removing a LFC directory Basic Usage: Try it : lfc-rm -r <LFC file path> lfc-rm -r /grid/gilda/tutorials/ yourname gLite DMS – EELA-2 Tutorial, 22.07.2009

Basic Usage:

Try it :

Get the file SURL Basic Usage: Try it: Some advanced Data Management commands (File Transfer Service, for instance) requires the SURL of a file lcg-lr --vo gilda lfn:/grid/gilda/tutorials/ yourname/yourfile.txt lcg-lr --vo <vo name> <LFN> gLite DMS – EELA-2 Tutorial, 22.07.2009

Basic Usage:

Try it:

Some advanced Data Management commands (File Transfer Service, for instance) requires the SURL of a file

Get the file TURL lcg-gt <paste the file SURL: srm://…> gsiftp Basic Usage: Try it: lcg-gt <file SURL> <protocol supported by the SE> gLite DMS – EELA-2 Tutorial, 22.07.2009

Basic Usage:

Try it:

Add a comment

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages

gLite Data Management

Regionales Rechenzentrum für Niedersachsen gLite Data Management Grid Seminar 2005 Julia Coester, Thomas Warntjen
Read more

gLite - Wikipedia, the free encyclopedia

gLite (pronounced "gee ... Workload and Data management systems. ... The purpose of the Workload Management System (WMS) is to accept user jobs, ...
Read more

gLite Data Management - National Knowledge Network ...

3 Data Management System (DMS) • Provides file manipulation services for users and other Grid services. • DMS enables the location, access and transfer ...
Read more

gLite Data Management System - Agenda Catania [Home]

INFSO-SSA-26637 International Collaboration to Extend and Advance Grid Education gLite Data Management System Part I Antonio Calanducci INFN Catania
Read more

gLite Data Management System

INFSO-SSA-26637 Corso Introduttivo al Grid Computing - Catania, 10/04/08 2 Outline • Grid Data Management Challenge • Storage Elements and SRM
Read more

The gLite Data Management System - UP

The gLite Data Management System Giuseppe LA ROCCA INFN Catania giuseppe.larocca@ct.infn.it
Read more

Architecture of gLite Data Management System - PowerPoint ...

Architecture of gLite Data Management System - PowerPoint PPT Presentation. The presentation will start after a short (15 second) ...
Read more

The gLite workload management system - Institute of Physics

The gLite workload management system View the table of contents for this issue, ... • Support for Data management interfaces (DLI and StorageIndex)
Read more

Workload Management System Extensions for gLite

Workload Management System Extensions for gLite on ResearchGate, the professional network for scientists.
Read more