Open Archives Initiative Protocol for Metadata Harvesting

100 %
0 %
Information about Open Archives Initiative Protocol for Metadata Harvesting
Technology

Published on October 8, 2009

Author: chessmu

Source: slideshare.net

Description

Dublin Core conference 2009 Seoul, Oct 2009

The Open Archives Initiative Protocol for Metadata Harvesting Muriel Foulonneau Tudor Research Centre [email_address] 10/2009 Dublin Core conference 2009, Seoul

The protocol was born To create a minimal layer of interoperability between distributed repositories of scientific publications An alternative to federated search Networking of digital repositories Oct 2009 [email_address]

To create a minimal layer of interoperability between distributed repositories of scientific publications

An alternative to federated search

Networking of digital repositories

“ OAI divides the world between data providers and service providers” Oct 2009 [email_address]

Sharing metadata : Data aggregation The portal gathers metadata and implements its own retrieval system Oct 2009 [email_address] Mill? Eg. Search engines, union catalogs, OAI <title>My resource</title> <date>04

The portal gathers metadata and implements its own retrieval system

The OAI framework Oct 2009 [email_address] Service provider Harvester Data provider Data provider Data provider Agregator Mechanisms to transfer large datasets Resumption tokens Incremental harvesting Portal interface Repository Data provider Repository Repository

Mechanisms to transfer large datasets

Resumption tokens

Incremental harvesting

Incremental harvest Harvester Data providers What’s new since the last time I came? New or modified records Deleted records [email_address] Oct 2009 <title>My resource</title> <date>04

New or modified records

Deleted records

OAI is based on standards HTTP protocol XML and XML Schemas Dublin Core Oct 2009 [email_address]

HTTP protocol

XML and XML Schemas

Dublin Core

Dublin Core MARC21 MODS Multiple representations of an object School of arts for girls Kiz Sanayi Mektebi] oai:lcoa1.loc.gov:loc.pnp/cph.3b23005 [email_address] Oct 2009

OAI repositories can be organized in sets Oct 2009 [email_address] Enable selective harvesting Sets can overlap: 1 item in multiple sets Can be described (eg with DC or DC Collection)

Enable selective harvesting

Sets can overlap: 1 item in multiple sets

Can be described (eg with DC or DC Collection)

OAI supports 6 verbs Identify http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=Identify ListSets http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListSets ListRecords http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListRecords&metadataPrefix=oai_dc ListMetadataFormats http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListMetadataFormats ListIdentifiers http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListIdentifiers&metadataPrefix=oai_dc GetRecord http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=GetRecord&identifier=oai:aerialphotos.grainger.uiuc.edu:AP-1A-1-1940&metadataPrefix=oai_dc Oct 2009 [email_address]

Identify

http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=Identify

ListSets

http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListSets

ListRecords http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListRecords&metadataPrefix=oai_dc

ListMetadataFormats

http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListMetadataFormats

ListIdentifiers

http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListIdentifiers&metadataPrefix=oai_dc

GetRecord

http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=GetRecord&identifier=oai:aerialphotos.grainger.uiuc.edu:AP-1A-1-1940&metadataPrefix=oai_dc

An OAI response Oct 2009 [email_address] <record> - < header >   <identifier> oai:images.library.uiuc.edu:emblems/324 </identifier>   <datestamp> 2003-10-22 </datestamp>   <setSpec> emblems </setSpec>   </header> - < metadata > - <oai_dc:dc xmlns:oai_dc=&quot; http://www.openarchives.org/OAI/2.0/oai_dc/ &quot; xmlns:dc=&quot; http://purl.org/dc/elements/1.1/ &quot; xmlns:xsi=&quot; http://www.w3.org/2001/XMLSchema-instance &quot; xsi:schemaLocation=&quot; http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd &quot;>   <dc:creator> Müller, Johann Heinrich Traugott, 1631-1675 </dc:creator>     <dc:identifier> http://images.library.uiuc.edu:8081/u?/emblems,324 </dc:identifier>   </oai_dc:dc>   </metadata>   </record> About section often not used Eg to state rights on the metadata record

About section often not used

Eg to state rights on the metadata record

Examples of repositories Library of Congress http://memory.loc.gov/cgi-bin/oai2_0 ContentDM at UIUC http://images.library.uiuc.edu:8081/cgi-bin/oai.exe Ohio State Knowledge Bank https://kb.osu.edu/dspace-oai/request Oct 2009 [email_address]

Library of Congress

http://memory.loc.gov/cgi-bin/oai2_0

ContentDM at UIUC

http://images.library.uiuc.edu:8081/cgi-bin/oai.exe

Ohio State Knowledge Bank

https://kb.osu.edu/dspace-oai/request

PictureAustralia Aggregates from large institutions Web crawling for small ones Flickr for individuals “ Using OAI has the advantage that only new and changed records need to be harvested, while for web crawl harvesting all records have to be re-harvested each time a harvest is run.” http://www.pictureaustralia.org/schemas/pa/index.html [email_address] Oct 2009

Aggregates from large institutions

Web crawling for small ones

Flickr for individuals

DRIVER – aggregation as an infrastructure [email_address] Oct 2009

Europeana [email_address] Oct 2009

IVOA – synchronization of service repositories [email_address] Oct 2009

Turn key systems ContentDM : http://contentdm.com/ Digitool : http://www.exlibrisgroup.com/digitool.htm DSpace : http://www.dspace.org/ EPrints : http://software.eprints.org/ Oct 2009 [email_address]

ContentDM : http://contentdm.com/

Digitool : http://www.exlibrisgroup.com/digitool.htm

DSpace : http://www.dspace.org/

EPrints : http://software.eprints.org/

Interoperability in practice Quality issues with OAI aggregations Oct 2009 [email_address]

Interoperability in practice

Quality issues

with OAI aggregations

Metadata formats DC, QDC, ETDMS, MODS, MARC, EAD, … Require an XML schema Most implementations only use simple DC Oct 2009 [email_address]

DC, QDC, ETDMS, MODS, MARC, EAD, …

Require an XML schema

Most implementations only use simple DC

Example of values found in DC:Date September 29–October 28, 51 AD; 1970 second half of IXth century AD; 1978 Rebuilt 1984 Possibly Vth/VIth century AD; 1935 Planted 1985 n/a n.d. Mid IInd century AD; 1973 Jul-51 circa 900 AD ca. 701 BC Begun 14th century 184-? 1839 18–? August 23, 2000 between 1827 and 183 VIIIth/IXth century AD ? (TC);1965 Vth-VIth century AD (McNamee); IVth century AD (Cribiore); 1982 XVIII Dynasty Winter 2003 era of redevelopment various 2002-00 1980, refurbished 1997 China: Neolithic Period (5000 BCE-ca 1600 BCE)? 19691968 21. Nouemb. Anno. 1564 . And finisshed on the euen of thanunciacion of our said bilissid Lady falling on the wednesday the xxiiij daye of Marche. in the xix yeer of Kyng Edwarde the fourthe [1479]] 19193 xxxx Oct xx Various 1938-05-38 1963 to 1953 [not after 1579] 163[5?] [email_address] Oct 2009

September 29–October 28, 51 AD; 1970

second half of IXth century AD; 1978

Rebuilt 1984

Possibly Vth/VIth century AD; 1935

Planted 1985

n/a

n.d.

Mid IInd century AD; 1973

Jul-51

circa 900 AD

ca. 701 BC

Begun 14th century

184-?

1839

18–?

August 23, 2000

between 1827 and 183

VIIIth/IXth century AD ? (TC);1965

Vth-VIth century AD (McNamee); IVth century AD (Cribiore); 1982

Who is a metadata made for? machine Dc:type “Text.Correspondence.Letter” Dc:language “wln” human Dc:type Correspondence Dc:language “wallon” Who knows ? Dc:date “197- “ Dc:description “First ed. Cf. BM. “ [email_address] Oct 2009

machine

Dc:type “Text.Correspondence.Letter”

Dc:language “wln”

human

Dc:type Correspondence

Dc:language “wallon”

Who knows ?

Dc:date “197- “

Dc:description “First ed. Cf. BM. “

Improving quality Quality certificates for open access repositories DINI - Deutsche Initiative für Netzwerkinformation Best practices for OAI and shareable metadata by the Digital Library Federation and the National Science Digital Library http://www.diglib.org/pubs/dlf108.pdf Meeting with software providers Test environment (eg Europeana) Community guidelines Oct 2009 [email_address]

Quality certificates for open access repositories

DINI - Deutsche Initiative für Netzwerkinformation

Best practices for OAI and shareable metadata by the Digital Library Federation and the National Science Digital Library

http://www.diglib.org/pubs/dlf108.pdf

Meeting with software providers

Test environment (eg Europeana)

Community guidelines

Conclusion The protocol « crossed the chasm »? The objective is to create a network of repositories rather than networking individual resources Lack of specific mechanism to relate resources to each other Approach to linked data and OAI-ORE Oct 2009 [email_address]

The protocol « crossed the chasm »?

The objective is to create a network of repositories rather than networking individual resources

Lack of specific mechanism to relate resources to each other

Approach to linked data and OAI-ORE

OAI-PMH http://www.openarchives.org/pmh/ Best practices for OAI and shareable metadata http://www.diglib.org/pubs/dlf108.pdf Tim Cole and Muriel Foulonneau, Using the Open Archives Initiative Protocol for Metadata Harvesting , Libraries Unlimited, 2007 Muriel Foulonneau and Jenn Riley Metadata for Digital resources, Chandos Publishing, 2008 References [email_address] Oct 2009

OAI-PMH

http://www.openarchives.org/pmh/

Best practices for OAI and shareable metadata

http://www.diglib.org/pubs/dlf108.pdf

Tim Cole and Muriel Foulonneau, Using the Open Archives Initiative Protocol for Metadata Harvesting , Libraries Unlimited, 2007

Muriel Foulonneau and Jenn Riley Metadata for Digital resources, Chandos Publishing, 2008

Add a comment

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages

Open Archives Initiative - Protocol for Metadata ...

The Open Archives Initiative Protocol for Metadata Harvesting ... of Congress Open Archive Initiative ... Open Archives Initiative ...
Read more

Open Archives Initiative

The Open Archives Initiative ... Continued support of this work remains a cornerstone of the Open Archives ... Protocol for Metadata Harvesting ...
Read more

Open Archives Initiative – Wikipedia

Die Open Archives Initiative (OAI) ist eine Initiative von Betreibern von Preprint- und anderen Dokumentenservern, um die auf diesen Servern abgelegten ...
Read more

Protocol for Metadata Harvesting - Wikipedia, the free ...

The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a protocol developed for harvesting (or collecting) metadata descriptions of ...
Read more

Open Archives Initiative Protocol for Metadata Harvesting ...

Open Archives Initiative Protocol for Metadata Harvesting (Q2430433) From Wikidata. Jump to: navigation, search
Read more

Open Archives Initiative - Wikipedia, the free encyclopedia

Open Archives Initiative ... specified in the Protocol for Metadata Harvesting ... UIUC OAI Metadata Harvesting Project; Open Access Bibliography: ...
Read more

Open Archives Forum - OAI-PMH Online Tutorial

... Open Archives Initiative ... Open Archives Forum online tutorial. This tutorial is an introduction to the Open Archives Initiative Protocol for ...
Read more

Open Archives Initiative Protocol for Metadata Harvesting ...

Open Archives Initiative Protocol for Metadata ... By downloading ICPSR metadata ... Please note that this will probably slow down the data harvesting, ...
Read more

Hindawi Publishing Corporation

Open Archives Initiative Protocol for Metadata Harvesting. Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a low-barrier mechanism ...
Read more

Open Archives Initiative Protocol for Metadata Harvesting ...

Open archive Initiative (OAI) had a significant impact on the direction and pace of the Open Access movement. OAI is an initiative to develop and promote ...
Read more