advertisement

Archiving Of Electronic Publications

50 %
50 %
advertisement
Information about Archiving Of Electronic Publications

Published on January 7, 2009

Author: annegrete

Source: slideshare.net

advertisement

Annegrete Wulff Statistics Denmark awu@dst.dk 7. januar 2009 Archiving Internet published statistics International Marketing and Output Database Conference Cork, Ireland 24-28 September 2007 Introduction Slogans like Time for Numbers- Numbers on Time1 or Wissen.Nutzen2, refer to the fact that statistics should be timely and is the basis for good planning. Access to the latest figures as soon as they are published is increasingly important. The use of the Internet makes this possible. Yesterday’s figures and historic data are, however, also part of the description of our societies. Most statistical offices – among them Statistics Denmark – used to have simple, objective and easy-to-follow rules concerning archiving its data: − All printed publications were kept in stock with a few copies and the National Archive received a copy. − The statistical files behind the books were delivered to the National Archive − The documentation belonging to the statistical files was archived there as well. During the latest 10 years the Internet has become an ever more important publishing medium and hard copy publications have diminished in importance. What is archived – and what is not? Our legacy of statistics is still accessible and readable in the archives and in the libraries as printed books. It does not include all data that passed throughout our production. It represents everything that was published to be read by a range of users. In this paper I exclude the readiness archiving we do in order to secure the continued production. That means back up procedures of servers with data, programs and systems will not be taken into account here. I shall focus on the archiving of data and information disseminated to the public in electronic form. Today’s practice Currently all pdf publications are archived. If major errors are noted a new Pdf documents release of the pdf is published and both versions are archived. In the case of 1 Statistics Denmark 2 DSTATIS, Germany /home/pptfactory/temp/archiving-of-electronic-publications-1231359482416206-1/archiving-of- electronic-publications-1231359482416206-1.doc

minor errors the existing pdf is overwritten; thus the original is not archived. The electronic archive is accessible on the Internet and from our internal server. The archiving of Statistical Abstracts (Daily News) dates back to 1999. An in-house developed crawler was put in function in 2005. It is used to www.dst.dk discover invalid links on the site as well as to archive the full site. Crawling and archiving is carried out in accordance with the schedule mentioned below: Yearly: − Snapshot of all pages and sub-pages of the website, including all pdf files. Monthly: − All pages on the site of the Danish version www.dst.dk as well as the English version www.dst.dk/uk are archived. (Pdf documents are excluded as they are archived separately). This is a time consuming process taking approximately 20 hours. − Snapshot of the StatBank interface on the Danish as well as the English version. As the StatBank is an interactive databank, figures are retrieved through user’s selection. Dynamic pages that contain forms, JavaScript or other elements that require “human interaction” can not be archived. So neither the functionality of the StatBank nor the data resulting from a selection are archived this way. − www.Alexa.com (The web archive) has since 1997 recorded examples of our web site. In 1997 only three downloads of the site were made. In 2006 it was around 100. They are accessible on the Internet. Weekly: − Economic key indicators on the web Daily: − www.dst.dk front page of the web site, Danish version − www.dst.dk/uk front page of the web site, English version Three times a day − Figures in the IMF DSBB agreement www.dst.dk/imf The user interface and layout of the StatBank is archived according to the www.statbank.dk procedure described in the previous section. The data in the StatBank, however, is not saved in that connection. The StatBank is the primary source for all our published statistics so it would seem logical as the first thing to secure the archiving of this primary source. However, this is not the case. Statistics Denmark is preparing for a set of rules regarding archiving. Considering this we will balance the costs against the usefulness. An error in a table may turn up after data has been published. As a result the data needs to be corrected. There are two ways of handling this: 1. A new file with corrected data is loaded and the original file is “unpublished” but still kept. 2. A new file with corrected data is loaded and overwrites the original file. Both methods are used, although the one where all loads are kept is the more common. Data is stored (even loaded data which was never published) but only the period or part of the file that is actually updated. The file will also contain some metadata – but only codes. As the archived files are not stored in the macro database environment, reading these files may be misleading if the metadatabase has been changed over time. 2

The fact that all erroneous published figures are not archived and available has not been regarded a huge problem so far. Statistics Denmark considers the corrected figures to be the ones of interest for the majority of the users. Moreover it might disturb the majority of users if also the erroneous data would be available - just to satisfy a very small minority. Never the less, when resources in our unit permit, we should pay attention to an archiving method of these files that makes it possible to access them in a better way without interfering with the corrected data. It should be mentioned that series and time periods holding correct data are never deleted from the StatBank. Should everything that we publish be available to the public in the future, we need to take the following “products” into account: − All databank tables – every single update and revisions − All versions of pdf documents − Every single page of the web site − Electronic, interactive publications – all updates. Should we choose an ideal or a pragmatic solution? Will the archiving activity be enormous? Can we archive in a way that allows us to still retrieve and access the information? These are some of the challenges we need to solve. Why do we archive? There may be a range of reasons for an organisation to archive the products and activities. Some are “need to have” while others may be classified as “nice to have”. Pdf publications follow the same rules as printed publications. We are Legal obligations obliged to deliver a copy to the Royal Library. We keep anyway a copy in our own archive as well. From our archive the pdf is accessible to the public, while this is not the case in the Royal Library. There are no legal obligations on any of the other electronically published products Timeliness is an important quality indicator. However, not only the latest Historical interest for data updated statistics are of interest. Historians and others with an interest in the past often need to complement their research with statistical data. In this way our output database StatBank grows larger and larger as we do not delete even statistical series and subjects which are no longer updated. To keep the databank manageable to the public we now and then have to review the presented structure of subjects and tables. One may also mention the interest of being able to check scientific research or analysis done by another researcher in the past. Then you need to be able to find the data as it was when this researcher did his analysis. The website of Statistics Denmark is automatically loaded with selected data from the StatBank. These are Economic Key figures, Data in Focus etc presented in HTML tables. These summary tables are used frequently of laymen as well as professionals. They are visited far more often than our pdf 3

documents. They are archived only as part of the homepage. We should consider archiving these tables selectively. The website of Statistics Denmark is changing according to the content, Historical interest for set up products, values in focus etc. But also according to “state of art” and “best and layout practice” (sometimes called fashion). There is no obligation to archive these changes of style, the obligation lays with The Royal Library, who archives the domain .dk in total. However, only once or twice a year. Their archive is not available to the public because of protection of personal integrity. Nevertheless it is of interest to document the preferences Statistics Denmark made in the past. What was seen as important news? What did we focus on? Did we prioritize some users to others? How has the presentation of the organisation changed over time? This kind of archiving is of the kind “nice to have”. Conclusion Statistics Denmark is today archiving in accordance to all legal obligations. We are doing even more. Still we need considering what are the needs in the future. If we shall fulfil such needs we have to start archiving already today. Instead of archiving everything that is made available on the web, another possibility would be to archive exactly what the users have been looking at. Today all retrievals from the StatBank are kept in temporary files. These files are deleted the following night. –If we moved these files to a permanent archive we would have an exact picture of what was actually displayed and looked at by the public. This is an opportunity that paper media does not give: we know how many books leave our shop but we do not know if every page is read at all or if it is read several times by more users. As a bonus such an archive could draw a picture of the interest different themes have had over the years. 4

Add a comment

Related pages

Archiving Electronic Publications - National Information ...

Archiving Electronic Publications Return to Past Events Archiving Electronic Publications A report of the NISO/BISG Meeting January 20, 2002 This well ...
Read more

Best Practices for Digital Archiving: An Information Life ...

Best Practices for Digital Archiving ... The Electronic Publications Archive Working Group presented a white paper of the major issues in December 1998 ...
Read more

Archiving Electronic Publications - eso.org

Invited presentation at the 23rd General Assembly of the International Astronomical Union (IAU), Kyoto, Japan, August 25, 1997. The viewgraphs used during ...
Read more

Permanent Archiving of Electronic Publications

International Summer School on the Digital Library 2003 Course 3: Libraries, Electronic Resources, and Electronic Publishing ‚ 2003 J. F. Steenbakkers 5a.1
Read more

Archiving - eso.org

Indexing and Retrieving Electronic Up: Electronic Publications: Impact on Previous: Electronic Journals. Archiving. In order to guarantee continued access ...
Read more

A Guide to Archiving of Electronic Records

A new publication by the SAG "A Guide to Archiving of Electronic Records" is now available. The retention and archiving of study materials and process ...
Read more

Permanent archiving of electronic publications - ResearchGate

Publication » Permanent archiving of electronic publications. ... Data provided are for informational purposes only. Although carefully collected ...
Read more

Archiving Electronic Publications - a Librarian's Point of ...

Archiving Electronic Publications - a Librarian's Point of View on ResearchGate, the professional network for scientists.
Read more

Archiving of electronic publications: The Electronic ...

The Electronic Library ISSN: 0264-0473 Online from: 1983. ... Archiving of electronic publications. Article Options and Tools. View: PDF; Cited by ...
Read more