day slides

50 %
50 %
Information about day slides

Published on May 2, 2008

Author: Rachele


A survey of Web preservation initiatives:  A survey of Web preservation initiatives Michael Day UKOLN, University of Bath 7th European Conference on Research and Advanced Technology for Digital Libraries (ECDL 2003), Trondheim, Norway, 17-22 August 2003 Presentation overview:  Presentation overview The importance of the Web Challenges: Technical, legal, and organisational challenges Approaches to collection: Harvesting based, selective, and deposit; combined approaches Discussion: Collection and access policies, software, costs, long-term preservation Importance of the Web:  Importance of the Web An all pervasive communication medium: In research: Scientists are 'increasingly reliant' on the Web for supporting research (Hendler, 2003) Wider societal role: personal communication, e-commerce, etc. "… the information source of first resort for millions of readers" (Lyman, 2002) The UKOLN study:  The UKOLN study Feasibility study produced for: Joint Information Systems Committee (JISC) Wellcome Library A survey of initiatives Recommendations for the JISC and Wellcome Library Supplementary legal study (Charlesworth) Published February 2003 Technical challenges (1):  Technical challenges (1) Size of Web: Surface web > 50 Tb (2000) … and still growing The 'deep Web' Scale of task means that Web-archiving needs to be a collaborative activity Technical challenges (2):  Technical challenges (2) Dynamic nature of Web: Web pages disappear on average after 75 days Many leave no trace Evolution of Web-based technologies: Increasing reliance on databases, scripts, plug-ins, etc. A 'moving target' Legal challenges:  Legal challenges Copyright Content liability, e.g.: Defamation Data protection In the UK: Selective approach would be the safest solution (unless law changes) See: Charlesworth (2003) Organisational challenges:  Organisational challenges Decentralised organisation: Web-archiving initiatives focus on defined sub-sets of the Web, e.g.: National domain, subject, organisation type Need for co-operation between initiatives Quality: Much on Web is low-quality (or worse) Is there a need to preserve all of this? Initiatives (1):  Initiatives (1) The Internet Archive Largest initiative, running since 1996 Co-operates on special collections and with other repositories National Libraries: Pioneer archives in Sweden (Kulturarw3) and Australia (PANDORA) Now many, many more Changes to legal deposit legislation in some countries Initiatives (2):  Initiatives (2) National archives: Focus on government Web-sites (however defined) Guidance for Web-site managers: e.g., UK and Australia Snapshots: e.g., USA and UK Other: Universities and scholarly societies: e.g., Archipol, Occasio archive, Political Communications Web Archiving (Cornell) Approaches (1):  Approaches (1) Automatic harvesting: Use of Web crawler technologies Crawler follows links and downloads content Pioneered by Internet Archive and Kulturarw3 project Also used for the gathering of the Finnish and Austrian Web Approaches (2):  Approaches (2) Selective approaches: Selection of individual Web sites Negotiate rights with site owners Collection using gathering or mirroring software, ftp, or e-mail Pioneered in PANDORA project Experimented with by Library of Congress and British Library Deposit approaches: Site owners/administrators deposit site in repositories Approaches (3):  Approaches (3) Combined approaches: Combines the advantages of the harvesting and selective approaches Pioneered by the Bibliothèque nationale de France Experimented with enhancements to the harvesting approach e.g., noting the change frequency of sites, and their 'importance') Uses the selective approach for the 'deep Web' Collection policies:  Collection policies Dependent on technical approach chosen National domain ++ (for harvesting-based approaches) Collection guidelines (for selective approaches) Based on relevance, provenance, quality, etc. Frequency of capture Possible overlap with subject gateway initiatives - e.g. the Resource Discovery Network (RDN) in the UK Approximate size (2002):  Approximate size (2002) Source: Day (2003) Access policies:  Access policies Access policies differ: Internet Archive and the PANDORA archive make data available e.g., the Wayback Machine Other collections effectively closed (for legal reasons or because experimental) Need for specialised Web indexes that can search and navigate large collections of Web material e.g., Nordic Web Archive (NWA) Toolset Software:  Software Various software in use: Harvesting: Adapted Combine harvester, NEDLIB harvester, Xyleme, Alexa Selective: HTTrack (popular), etc. PANDAS (PANDORA Digital Archiving System) - helps with managing the process, adding metadata, etc. Costs:  Costs Costs vary widely: Selective approach much more expensive (per Tb.) than bulk harvesting But resulting archives are more widely accessible Significant costs in undertaking rights clearance Long-term preservation:  Long-term preservation Many initiatives until now mainly focused on the collection of resources: Need to consider the longer-term Descriptive and technical metadata Migration needs (e.g. for complex sites) Need for Web archiving initiatives to become trusted repositories Need to be embedded into the 'core activities' of their host organisation Summing up:  Summing up Much experimentation to date, but now moving into implementation phase Co-operation and collaboration is important Combined technical approaches offer best way forward Legal challenges still problematic Long-term preservation issues still to be explored in detail Acknowledgements:  Acknowledgements UKOLN is funded by Resource: the Council for Museums, Archives and Libraries, the Joint Information Systems Committee (JISC) of the UK higher and further education funding councils, as well as by project funding from the JISC and the European Union. UKOLN also receives support from the University of Bath, where it is based.

Add a comment

Related presentations

Related pages

Six Day Slide - Microsoft Store

Six Day Slide Six Day Slide. 2003 • 5 Musiktitel • Pop • Pop International • Six Day Slide. Album kaufen 7,99 € Kostenlos erhältlich ...
Read more

Diverted Profits Tax: open day slides - Publications - GOV.UK

This document provides the slides presented by HM Revenue and Customs (HMRC) during the open day event.
Read more

Australia Day Slides -

Australia Day Slides, Narangba, Queensland, Australia. 398 likes. Australia Day slip and slides. Add your videos here to show Australia the fun you have...
Read more

Slide - definition of slide by The Free Dictionary

Define slide. slide synonyms, slide pronunciation, slide translation, English dictionary definition of slide ... How many winter days have I seen him, ...
Read more

Mother's Day | Powerpoint Slides & Worship Backgrounds

Mother's Day | PowerPoint Slides & Backgrounds PowerPoint Slide presentations for your church services, worship music, and sermons. All PowerPoint files ...
Read more

Slide-Gitarre – Wikipedia

Die Spieltechnik der Slide-Gitarre geht zurück auf zwei Musikkulturen: Die traditionelle hawaiische Musik und damit auch die Spielweise der Slide-Gitarre ...
Read more Wörterbuch :: slide :: Deutsch-Englisch-Übersetzung

Englisch-Deutsch-Übersetzung für slide im Online-Wörterbuch (Deutschwörterbuch).
Read more