Persistent Management of Distributed Data

67 %
33 %
Information about Persistent Management of Distributed Data
Education

Published on September 20, 2007

Author: lokesht

Source: authorstream.com

Slide1:  Persistent Management of Distributed Data Reagan W. Moore University of California, San Diego San Diego Supercomputer Center moore@sdsc.edu http://www.npaci.edu/DICE/ Data and Knowledge Systems Group:  Data and Knowledge Systems Group Staff Reagan Moore Ilkai Altintas Chaitan Baru Sheau Yen Chen Charles Cowart Amarnath Gupta George Kremenek Bertram Ludäscher Richard Marciano XuFei Qian Roman Olshanowsky Arcot Rajasekar Abe Singer Michael Wan Ilya Zaslavsky Bing Zhu Graduate Students A. Bagchi S. Bansal A. Behere R. Bharath S. Bharath M. Kulrul L. Sui Undergraduate Interns N. Cotofana M. Shumaker J. Trang L. Yin +/- NN Information Management Projects:  Information Management Projects Digital Libraries California Digital Library - Art Museum Image COnsortium DARPA/USPTO Patent digital library - SAIC, NCSA, U Virginia NLM Visible Embryo digital library - GMU, OHSI, UC, LSU, JHU NSF Digital Library Initiative, Phase II - UCSB, Stanford NSF NPACI Digital Sky - Caltech NSF National Science Education Digital Library - UCAR, Cornell, UCSB, U Mass, Columbia Data Grid Environments DOE Data Visualization Corridor - LLNL, OSU DOE Particle Physics Data Grid - Stanford, Caltech NASA Information Power Grid - NASA Ames, USC/ISI, U Texas NIH Biomedical Informatics Research Network - Duke, Harvard NSF Grid Physics Network - U Florida, Caltech, U Wisc, USC/ISI, U Chicago NSF National Virtual Observatory - JHU, Caltech NSF Southern California Earthquake Center - USC/ISI Persistent Archives NARA Persistent Archive - UCB, U Maryland NHPRC Archivist workbench - U Minnesota NSF NSDL Persistent archive for curricula modules - Cornell Topics:  Topics Data management systems Data collections, digital libraries Distributed data management Data grids Persistent data management Persistent archives Common infrastructure for data management Data Collections:  Data Collections Define the context for describing a collection of digital entities Context specified by metadata attributes Provenance, origin of the digital entities Administrative, location of the digital entities Technical, purpose of the digital entities Support organization of attributes as hierarchy of sub-collections Digital Libraries:  Digital Libraries Provide services on the data collection Ingestion, loading of attribute values Extensibility, definition of new attributes Discovery, queries on attributes Browsing, hierarchical listing Presentation, formatting specified data models Data Grids:  Data Grids Manage data in a distributed environment Logical name space, provide global identifier Data access, storage system abstraction Replication, disaster back up Uniform access, common API across file systems, archives, and databases Single sign-on, authenticate across administration domains Persistent Archives:  Persistent Archives Manage technology evolution Storage system abstraction, support data migration across storage systems Information repository abstraction, support catalog migration to new databases Logical name space, support global persistent identifier Storage Resource Broker:  Storage Resource Broker Integration of collection-based management of digital entities, with Remote data access through storage system abstraction Catalog access through information repository abstraction Automation through collection-owned data Capabilities:  Capabilities Support legacy systems Integrate archives with file systems Share distributed data Maintain persistent collection Control data access Uniform API:  Uniform API Provide common access semantics Map from the interface preferred by your application to the interfaces required by legacy storage systems Slide12:  Java, NT Browsers Web WSDL GridFTP SDSC Storage Resource Broker & Meta-data Catalog Common APIs Application HRM Access APIs Servers Storage Abstraction Catalog Abstraction Databases DB2, Oracle, Sybase Logical Name Space Latency Management Data Transport Metadata Transport Consistency Management / Authorization-Authentication Prime Server Linux I/O DLL / Python Discovery Transparencies:  Discovery Transparencies Naming transparency - find a data set without knowing its name Map from attributes to a global file name Location transparency - access a data set without knowing where it is Map from global file name to local file name Access transparency - access a data set without knowing the type of storage system Federated client-server architecture Slide14:  Java, NT Browsers Web WSDL GridFTP SDSC Storage Resource Broker & Meta-data Catalog Transparencies Application HRM Access APIs Servers Storage Abstraction Catalog Abstraction Databases DB2, Oracle, Sybase Logical Name Space Latency Management Data Transport Metadata Transport Consistency Management / Authorization-Authentication Prime Server Linux I/O DLL / Python Persistent Collection:  Persistent Collection Maintain authenticity Authenticate all accesses Assign roles for access control lists (curation, write, annotate, read) Manage audit trails of all operations Collection-owned data All accesses through the data management system Slide16:  Java, NT Browsers Web WSDL GridFTP SDSC Storage Resource Broker & Meta-data Catalog Persistency Application HRM Access APIs Servers Storage Abstraction Catalog Abstraction Databases DB2, Oracle, Sybase Logical Name Space Latency Management Data Transport Metadata Transport Consistency Management / Authorization-Authentication Prime Server Linux I/O DLL / Python Preservation(Similar requirements to a data grid):  Preservation (Similar requirements to a data grid) Name transparency Find a file by attributes (map from attributes to global name) Location transparency Access a file by a global identifier (map from global to local file name) Access transparency Use same API to access data in archive or file cache Authenticity Disaster recovery, replicate data across storage systems Audit and process management Slide18:  Java, NT Browsers Web WSDL GridFTP SDSC Storage Resource Broker & Meta-data Catalog Preservation Application HRM Access APIs Servers Storage Abstraction Catalog Abstraction Databases DB2, Oracle, Sybase Logical Name Space Latency Management Data Transport Metadata Transport Consistency Management / Authorization-Authentication Prime Server Linux I/O DLL / Python Convergence of Technologies:  Convergence of Technologies Data grids as basis for distributed data management Federation of distributed resources Creation of logical name space to automate discovery Distributed data collections Discovery based on attributes Distributed data storage systems Digital libraries Development of services for manipulating, viewing data Persistent archives Management of technology evolution Digital Entities:  Digital Entities Digital entities are “images of reality”, made of Data, the bits (zeros and ones) put on a storage system Information, the attributes used to assign semantic meaning to the data Knowledge, the structural relationships described by a data model Every digital entity requires information and knowledge to correctly interpret and display Differentiating between Data, Information, and Knowledge:  Differentiating between Data, Information, and Knowledge Data Digital object Objects are streams of bits Information Any tagged data, which is treated as an attribute. Attributes may be tagged data within the digital object, or tagged data that is associated with the digital object Knowledge Relationships between attributes Relationships can be procedural/temporal, structural/spatial, logical/semantic, functional Knowledge Creation Roadmap:  Knowledge Creation Roadmap Knowledge syntax (consensus) RDF, XMI, Topic Map Knowledge management (recursive operations) Oracle parallel database Knowledge manipulation (spatial/procedural rules) Generation of inference rules and mapping to data models Knowledge generation (scalable inference engine) Application of inference rules in inference engine Slide23:  Knowledge Based Data Grid Roadmap Attributes Semantics Knowledge Information Data Ingest Services Management Access Services (Model-based Access) (Data Handling System - SRB) MCAT/HDF Grids XML DTD SDLIP XTM DTD Rules - KQL Information Repository Attribute- based Query Feature-based Query Knowledge or Topic-Based Query / Browse Knowledge Repository for Rules Relationships Between Concepts Fields Containers Folders Storage (Replicas, Persistent IDs) Information Management Projects:  Information Management Projects Digital Libraries California Digital Library - Art Museum Image COnsortium DARPA/USPTO Patent digital library - SAIC, NCSA, U Virginia NLM Visible Embryo digital library - George Mason University NSF Digital Library Initiative, Phase II - UCSB, Stanford NSF NPACI Digital Sky - Caltech 2MASS sky survey Data Grid Environments DOE Data Visualization Corridor - LLNL DOE Particle Physics Data Grid - Stanford, Caltech NASA Information Power Grid - NASA Ames NIH Biomedical Informatics Research Network - Duke, Harvard NSF Grid Physics Network - U Florida NSF National Virtual Observatory - Johns Hopkins University, Caltech NSF Southern California Earthquake Center - USC/ISI Persistent Archives NARA Persistent Archive - UCB, U Maryland NHPRC Archivist workbench - U Minnesota NSF NSDL Persistent archive for curricula modules - Cornell University Additional Projects:  Additional Projects ACS Alliance for Cell Signaling NSF Digital Government - Fed Web DOE - SciDAC Grid Portal DOE - SciDAC Scientific Data Management Project Hayden Planetarium NASA Information Power Grid:  NASA Information Power Grid Develop digital library interface to the archives at NASA Ames - SRB Demonstrate high performance data access across both NASA and NSF resources Demonstrate telescience through NASA resources (link electron microscope at UCSD, with image collection at NASA Ames, and processing of data at NASA Ames) Hayden Planetarium:  Hayden Planetarium Provide a data collaboration environment Share data between NCSA (simulations of the solar system evolution), SDSC (3D visualizations), and Hayden (review) Manage 3-6 TBs of data Provide seamless access across administration domains and storage resources Slide28:  The SRB is great. It has been utterly essential in this project - we could not have done this work without the SRB. We have used the SRB as a central shared repository for raw data, derived data, and visualization results. Data has been submitted by, and retrieved by each of the partners in this project on a daily basis. Email back and forth frequently includes SRB directory names into which data or results have been stored for broad review at different sites. The visualization animations we've produced are immediately placed into the SRB where they are downloaded, simultaneously, by people at NCSA and the museum in New York. The animations are reviewed, new data generated, email exchanged, and new images rendered and put back into the SRB to start the next review cycle. The whole thing has worked flawlessly. I am delighted and will gladly promote its virtues at any opportunity. As you've seen, I do have some suggestions for future functionality here and there. The "migration awkwardness" is one. But these suggestions are for added features or minor interface smoothing. They should in no way diminish the fact that it all works wonderfully! Thanks! -- Dave Visible Embryo Project:  Visible Embryo Project Build a digital library of images, reports for use by educators and physicians Manage transfer of images from Armed Forces Institute of Pathology to an archive at SDSC Provide access to the material Use the SRB/MCAT system to assemble a digital library Slide30:  SDSC Los Angeles Oakland OHSU UIC Startap Eolas GMU ASX200 AFIP: Collab WS DC POP MSWS MSWS NT WS NT WS NIC OC-3 Abilene OC-3 Abilene OC-3 JHU VBNS OC-12 DS3 Vegas OC-3 GST 100 Gbit BEN ATD Net WRL HSCC Visible Embryo Project Disk Cache Disk Cache Disk Cache Image Generation Archive Disk Cache National Virtual Observatory:  National Virtual Observatory Federate existing sky survey image collections and catalogs Support statistical analyses across multiple surveys Implement services support environment Use the SRB/MCAT to support bulk data access, replicate the major sky surveys, support large scale database record analyses Slide32:  Compute Resources Catalogs Data Archives Information Discovery Metadata delivery Data Discovery Data Delivery Catalog Mediator Data mediator 1. Portals and Workbenches Bulk Data Analysis Catalog Analysis Metadata View Data View 4.Grid Security Caching Replication Backup Scheduling 2.Knowledge & Resource Management Standard Metadata format, Data model, Wire format Catalog/Image Specific Access Standard APIs and Protocols Concept space 3. 5. 6. 7. Derived Collections National Virtual Observatory Data Grid Particle Physics Data Grid:  Particle Physics Data Grid Support replication of data sets for the high energy physics grid Federate data collections for the BaBar experiment at SLAC using the SRB Federate BaBar collections between SLAC and Lyons, France Support web service interface, support derived data product catalog Particle Physics Data Grid - Replication System:  Particle Physics Data Grid - Replication System National STEM Education Digital Library - NSDL:  National STEM Education Digital Library - NSDL Provide persistent archive for educational material indexed in the NSDL repository Develop knowledge spaces to characterize collection holdings Map knowledge spaces to AAAS2061 concept space, and to state mandated grade level concepts Slide36:  Usage Enhancement Collection Building User Interfaces Metadata & data access-based services Core NSDL Bus Meta-data delivery Data delivery Query Global Ids Security Network Virtual Collections & Mediators Information about collections Delivery Presentation Aggregation - Channels NSDL National Archives Records Administration:  National Archives Records Administration Develop prototype persistent archive for NARA digital holdings Identify pertinent research areas for long term preservation of data, information, and knowledge Apply data grid technology for the implementation of persistent archives. Grid Physics Network:  Grid Physics Network Develop infrastructure to support virtual data products Create repository for derived data products Automate the extraction of metadata from Virtual Data Language files Automate extraction of administrative data through grid portals GridPort + SRB Architecture:  GridPort + SRB Architecture With SRB capabilities, file access is direct, uniform Uses same authentication as portal and other Grid services Single SRB account access allows for more flexible data management DOE Scientific Data Management :  DOE Scientific Data Management Develop knowledge management tools to mediate between biological information resources Integrate the tools into the DOE scientific data management environment Further Information:  Further Information http://www.npaci.edu/DICE

Add a comment

Related presentations

Related pages

Persistent Management of Distributed Data - nesc.ac.uk

National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore
Read more

A Distributed Persistent Object Store for Scalable Service

abstract the distributed data management from the service logic. The goal of TODS is to simplify the ... Distributed persistent data management, ...
Read more

National Partnership for Advanced Computational ...

National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
Read more

Persistent Management of Distributed Data - docin.com

NationalPartnership AdvancedComputational Infrastructure San Diego Supercomputer Center Reagan MooreUniversity California,San Diego San Diego Supercomputer ...
Read more

A Partner Ecosystem for Software ... - Persistent Systems

... the Allotrope Framework is aimed to overcome growing data management problems. Persistent ... based distributed enterprise data management ...
Read more

Persistent Data Management for Visual Applications

Persistent Data Management for Visual Applications Gokhan¨ Kutlu, Bruce A. Draper, J. Eliot B. Moss, ... Concurrent, Distributed Database Operations.
Read more

PPT - Persistent Management of Distributed Data Reagan W ...

Persistent Management of Distributed Data Reagan W. Moore General Atomics ... digital libraries Distributed data management Data grids Persistent ...
Read more

PPT - Persistent Management of Distributed Data Reagan W ...

Persistent Management of Distributed Data Reagan W. Moore University of California, ... . . Data and Knowledge Systems Group.
Read more

Data Management in Distributed CAx Systems new

The following paper discusses several possibilities to guarantee consistent data management and storage of distributed ... persistent storage of data.
Read more