ARK tools

50 %
50 %
Information about ARK tools
Education

Published on May 14, 2007

Author: TSBaG

Source: authorstream.com

Overview of Archival Resource Key (ARK) Tools:  Overview of Archival Resource Key (ARK) Tools 1 July 2005 John Kunze, California Digital Library ARK Summary:  ARK Summary Instead of one Name Authority: Assigning Authority + Mapping Authorities http://foobar.zaf.org/ark:/12025/654xz321/s3/f8.05v.tiff \___________________/ \__/ \___/ \______/ \____________/ (replaceable) | | | 4 Qualifier | ARK Label | | (NMA-supported) | | | 1 Name Mapping Authority | 3 Name (NAA-assigned) Hostport (NMAH) | 2 Name Assigning Authority Number (NAAN) 1 = current service provider; identity inert; replaceable 2 = organization that originally assigned the id 3 = name originally assigned to the abstract object, often opaque 4 = extension disclosing object hierarchy & variants, often non-opaque ARK usage:  ARK usage Two ARKs accessing the same thing http://loc.gov/ark:/12025/654xz321 http://rutgers.edu/ark:/12025/654xz321 Access to metadata -- add a ‘?’ http://loc.gov/ark:/12025/654xz321? Access to support statement -- add ‘??’ http://loc.gov/ark:/12025/654xz321?? 3 minimal requirements to be an ARK An archive that can’t do all 3 -- trustworthy? Is an ARK persistent? Maybe. Have to ask. Persistence and opaqueness:  Persistence and opaqueness Do ARKs have to be this ugly (opaque)? http://foobar.zaf.org/ark:/12025/654xz321/s3/f8.05v.tiff \___________________/ \__/ \___/ \______/ \____________/ NMAH Label NAAN Name Qualifier No, but they encourage it. Persistence is all about managing associations between strings and things And the landscape is littered with links that were required to die for political, legal, or social reasons the appearance, deliberate or even accidental, of once-true assertions that are now misleading, infringing, offensive makes it hard for our descendants to continue managing Pain of managing opaque ids is mitigated by the certainty of having strongly bound metadata A hostname may also break:  A hostname may also break Did it break because it appears to assert a branding that is no longer relevant? Have to pay attention to this. Semantic rot is inevitable in all ids The more opaque, the more protected Non-opaque ids are very useful ad hoc metadata containers; in the tradeoff, consider the more regular and complete metadata promised by ARKs Non-opaque service label extensions to opaque base ARKs are suitable; eg, “thumb”, “hi-res” When the hostname breaks:  When the hostname breaks Use low-tech, file lookup (like old /etc/hosts) Or use MAPTR algorithm in client or plug-in Resolver discovery using vanilla DNS and script: use Net::DNS; # include simple DNS package my $qtype = "NAPTR"; # initialize query type my $naa = shift; # get NAAN script argument my $mad = new Net::DNS::Resolver; # mapping authority discovery &maptr("$naa.ark.arpa"); # call maptr - that's it sub maptr { # recursive maptr algorithm my $dname = shift; # domain name as argument my ($rr, $order, $pref, $flags, $service, $regexp, $replacement); my $query = $mad->query($dname, $qtype); return if (! $query || ! $query->answer); foreach $rr ($query->answer) { next if ($rr->type ne $qtype); ($order, $pref, $flags, $service, $regexp, $replacement) = split(/\s/, $rr->rdatastr); if ($flags eq "") { &maptr($replacement); # recurse } elsif ($flags eq "h") { print "$replacement\n"; # candidate NMAH }}} ARK lexical goodies:  ARK lexical goodies Hyphens ignored Neutralizes harm done by typesetters Too many search results? Providers may disclose (or not)… Sub-object hierarchy using reserved ‘/’ Variant objects using reserved ‘.’ Usual %hh (hex encoding) as an escape ARK namespaces reserved:  ARK namespaces reserved 12025 National Library of Medicine 12026 Library of Congress 12027 National Agriculture Library 13030 California Digital Library 13038 World Intellectual Property Organization 20775 University of California San Diego 29114 University of California San Francisco 28722 University of California Berkeley 15230 Rutgers University Libraries 13960 Internet Archive 64269 Digital Curation Centre 62624 New York University Libraries 67531 University of North Texas Libraries 27927 Ithaka Electronic-Archiving Initiative 12148 National Library of France Reserve a namespace by email to ark@cdlib.org The Their Stuff problem is easier:  The Their Stuff problem is easier We can’t do much about Their Stuff except defensively test and fix Our links to it Not worth Our ARKs -- we can’t vouch for the objects Indirect naming may help (eg, PURL, SFX, etc) So get a link validator, staff to replace dead URLs, and figure out how much effort you’ll expend Email Them (external providers), if appropriate, but if They don’t maintain their ids, no scheme will help Our Stuff Solutions for persistent identifier problems:  Our Stuff Solutions for persistent identifier problems Identifier maintenance is different from but deeply implicated in collection mgmt Recall: an identifier is [a string and] an association between a string and a thing If you maintain object metadata, you already maintain ids (assuming your object has an id) Natural to maintain redirection info as one more column of metadata, and ask your DB admin to nightly recreate web server redirect config files Opaque identifier tools:  Opaque identifier tools Non-opaque identifier strings are chosen deliberately to assert some things that are true at the time of assignment Opaque identifier strings are best chosen by automated means, such as NOID (nice opaque identifier) Or UUID/GUID (universally unique identifier) Sequence of hex encodings of your computer’s MAC address, current time, and sometimes a random number No need to ask permission or register yourself Looks like a something found in nature, but actually it’s based on IEEE and hardware vendor registries Nice opaque identifiers (NOID):  Nice opaque identifiers (NOID) A noid minter is a lightweight database for generating, tracking, and binding unique ids The noid tool creates minters and accepts commands that operate them Open source, available at www.cpan.org Can mint in random or sequential order, with or without a check character guaranteeing against the most common transcription errors Anyone can run a noid minter, maintain associations via bindings to arbitrary elements (assertions), and set up a resolver (including rule-based) Using NOID:  Using NOID Identifiers minted according to a template: noid dbcreate f5.reedeedk long 13030 which produces as first minted id 13030/f54x54g11 Noid is scheme-independent Can be used to mint DOIs, URNs, URLs, lotto numbers, etc. We (at CDL) use it to mint random ARKs with check chars ARK Documentation:  ARK Documentation ARK specification http://www.ietf.org/internet-drafts/draft-kunze-ark-09.txt ARK information sites http://www.cdlib.org/inside/diglib/ark/ http://ark.nlm.nih.gov/ Overview article http://www.infotoday.com/cilmag/feb04/primers.shtml Background paper http://bibnum.bnf.fr/ecdl/2003/proceedings.php?f=kunze

Add a comment

Related presentations