advertisement

Scalable Services For Digital Preservation Ross King

50 %
50 %
advertisement
Information about Scalable Services For Digital Preservation Ross King

Published on November 7, 2008

Author: DigitalPreservationEurope

Source: slideshare.net

Description

3rd Annual WePreserve Conference Nice 2008
advertisement

The Planets Interoperability Framework Scalable Services for Digital Preservation DPE, Planets and CASPAR Third Annual Conference: Costs, Benefits and Motivations for Digital Preservation 30. October 2008 Ross King, Christian Sadilek, Rainer Schmidt Austrian Research Centers GmbH – ARC

Outline  Planets Interoperability Framework  Grids and Clouds  Initial Experimental Results

The Planets Interoperability Framework Motivation  There are a number of functions that all (or nearly all) software applications commonly need. These include functions such as • Data persistence • User management • Authentication and Authorization • Monitoring, Logging, and Notification  The Interoperability Framework (IF) software components provide these commonly required functions.

The Planets Interoperability Framework  Defines an Service-Oriented Architecture for Digital Preservation  Set of Services, Interfaces, a common Data Model  Implements Common Services  Authentication and Authorization, Monitoring, Logging, Notification, …  Service Registration and Lookup  Provides APIs for Applications that use Planets  Testbed Experiments, Executing Preservation Plans  Provides Workflow Enactment Service and Engine  Components-based, XML serialization

The Problem of Scalability  Planets is a preservation architecture based on Web Services  Supports interoperability and a distributed environment  Sufficient for a controlled experiments (Testbed)  Not sufficient for handling a production environment  Massively, uncontrolled user requests  Mass migration of hundreds of TBytes of data  Content Holders are faced with loosing vast amounts of data  Sufficient computational resources in-house?  There is a clear demand for incorporating Grid or Cloud Technology

Integrating Virtual Clusters and Clouds  Basic Idea: Extending Planets SOA with Grid Services  The Planets IF Job Submission Services  Allow Job Submission to a PC cluster (e.g. Hadoop, Condor)  Grid approach/standards (SOAP, HPC-BP, JSDL)  Cluster nodes are instantiated from specific system images  Most Preservation Tools are 3rd party applications  Software need to be preinstalled on cluster nodes  Cluster and JSS be instantiated in-house (e.g. a PC lab) or on top of (leased) cloud resources (AWS EC2).  Computation be moved to data or vice-versa

Integration – Planets Tiered Architecture Preservation Planning reference Tools + Planets and Service Format Workflow Generation Registry Services Registry Experimental lookup Environment (qualitative) 3rd Party Tool Workbench for Execution Services Testbed Experiments Engine Workflow Def. maintain Grid/Cloud Execution Services Web Portal Metadata Data Model Browser Store Production Environment Web/Grid Clients Planets Service Context Resources Services

Experimental Setup  Amazon Elastic Compute Cloud (EC2)  1 – 150 cluster nodes  Custom image based on RedHat Fedora 8 i386  Amazon Simple Storage Service (S3)  max. 1TB I/O,  ~32,5MBit/s download / ~13,8MBit/s upload (cloud internally)  Apache Hadoop (v.0.18)  MapReduce Implementation  Pre-installed command line tools (e.g, ps2pdf )

Preservation Planning reference Tools + Planets and Service Format Workflow Generation Registry Services Registry lookup 3rd Party Tool Workbench for Execution Services Testbed Experiments Engine Workflow Def. maintain Grid/Cloud Execution Services Web Portal Metadata Data Model Browser Store Web/Grid Clients Planets Service Context Resources Services

Experimental Setup Virtual Cluster (Apache Hadoop) Cloud Infrastructure (EC2) Job JSDL JSS Virtual Node Job Description File (Xen) Storage Infrastructure (S3) Raw Data Data Transfer Service

Experimental Results 1 – Scaling Job Size x(1k) = 3,5 x(1k) = 4,4 x(1k) = 3,6 10,00 number of nodes = 5 9,00 8,00 EC2 0,07 MB 7,00 EC2 7,5 MB time [min] 6,00 EC2 250 MB 5,00 SLE 0,07 MB 4,00 3,00 SLE 7,5 MB 2,00 SLE 250 MB 1,00 0,00 1 10 100 1000 tasks x(1k) = t_seq / t_parallel and tasks = 1000

Experimental Results 2 – Scaling #Nodes 40,00 n=1, t=36, s = 0.72, e=72% X 35,00 30,00 n=1 (local), s1=1, t=26 X 25,00 time [min] EC2 1000 x 0,07 MB 20,00 n=5, t=8, s=3.25, e=65% SLE 1000 x 0,07 MB 15,00 n=10, t=4.5, s=5.8, e=58% 10,00 X n=50, t=1.68, s=15.5, e=31% 5,00 X n=100, t=1.03, s=25.2, e=25% 0,00 X X 0 50 100 150 nodes

Conclusions  Preservation systems will need to employ Grid/Cloud resources  Therefore there is a need to bridge communities in the areas of digital libraries and e-science.  Cloud and virtual infrastructures provide a powerful solution for obtaining on-demand access to computational resources.  Planets IF Job Submission Service provides a first step  Submission to virtual cluster of preservation nodes using Grid protocols.  Performance scales roughly with the number of nodes, accounting for expected overheads  Many open issues remain! Security, reliability, standardization, legal aspects...

Thank you for your attention!  Planets Project http://www.planets-project.eu  Contacts Ross King ross.king@arcs.ac.at Rainer Schmidt rainer.schmidt@arcs.ac.at

Sample JSDL code <?xml version=quot;1.0quot; encoding=quot;UTF-8quot;?> <jsdl:JobDefinition xmlns=quot;http://www.example.org/quot; xmlns:jsdl=quot;http://schemas.ggf.org/jsdl/2005/11/jsdlquot; xmlns:jsdl-posix=quot;http://schemas.ggf.org/jsdl/2005/11/jsdl-posixquot; xmlns:xsi=quot;http://www.w3.org/2001/XMLSchema-instancequot; xsi:schemaLocation=quot;http://schemas.ggf.org/jsdl/2005/11/jsdl jsdl.xsd quot;> <jsdl:JobDescription> <jsdl:JobIdentification> <jsdl:JobName>start vi</jsdl:JobName> </jsdl:JobIdentification> <jsdl:Application> <jsdl:ApplicationName>ls</jsdl:ApplicationName> <jsdl-posix:POSIXApplication> <jsdl-posix:Executable>/bin/ls</jsdl-posix:Executable> <jsdl-posix:Argument>-la file.txt</jsdl-posix:Argument> <jsdl-posix:Environment name=quot;LD_LIBRARY_PATHquot;>/usr/local/lib</jsdl-posix:Environment> <jsdl-posix:Input>/dev/null</jsdl-posix:Input> <jsdl-posix:Output>stdout.${JOB_ID}</jsdl-posix:Output> <jsdl-posix:Error>stderr.${JOB_ID}</jsdl-posix:Error> </jsdl-posix:POSIXApplication> </jsdl:Application> </jsdl:JobDescription> </jsdl:JobDefinition>

Map-Reduce for Migrating Digital Objects  Map-Reduce implements a framework and prog. model for processing large documents (Sorting, Searching, Indexing) on multiple nodes.  Automated decomposition (split)  Mapping to intermediary pairs (map), optionally (combine)  Merge output (reduce)  Provides implementation for data parallel problems, i/o intensive,  Example: Conversion digital object (e.g website, folder, archive)  Decompose into atomic pieces (e.g. file, image, movie)  On each node, convert piece to target format  Merge pieces and create new data object

#nodes presentations

Add a comment

Related pages

Scalable Services for Digital Preservation (PDF Download ...

Ross King; Read more. ... Official Full-Text Publication: Scalable Services for Digital Preservation on ResearchGate, the professional network for scientists.
Read more

Ross King | LinkedIn

View Ross King’s professional ... is a EU ICT FP7 Integrated Project addressing the challenge of scalable digital preservation ... Ross King. Services ...
Read more

Ross King - Publications

Ross King · Rainer Schmidt · Andrew N. Jackson · ...] · ... Scalable Services for Digital Preservation. R Schmidt · C Sadilek · R King.
Read more

SCAPE Scalable Preservation Environments. 2 Its all about ...

SCAPE Scalable Preservation Environments. 2 Its all about scalability! ... Scalable services for planning and execution of institutional preservation ...
Read more

Scalable Preservation Environments - CORDIS

... of digital preservation through scalable preservation ... scalable services for planning ... complex digital objects. These services ...
Read more

SCAPE - Scalable Preservation Environments

The SCAPE project developed scalable services for ... SCAPE Project – Digital Preservation into the ... our SCAPE Project co-ordinator, Dr. Ross King, ...
Read more

Scalable Services | LinkedIn

Andres Olarte presents: Scalable Services with JMS, Spring Integration and Apache Camel.
Read more

Open Community Approaches to Digital Preservation

Open Community Approaches to Digital Preservation ... services •Atlas of Digital Damages ... • Scalable digital preservation ...
Read more