califa newspapers

67 %
33 %
Information about califa newspapers

Published on January 21, 2008

Author: Bina


Handling Digital Newspapers:  Handling Digital Newspapers Geri Ingram OCLC Digital Collection Services Manager, Customer Services October 18, 2007 Slide2:  Why digitize newspapers? Why digitize? :  Why digitize? Because your public wants access Planning is an access issue Selection is an access issue Processing is an access issue Metadata is an access issue Preservation is an access issue Funding is an access issue The newspaper paradox— the good, the bad and the ugly:  The newspaper paradox— the good, the bad and the ugly Widespread use, profoundly embraced, long-lived record in America Cheaply produced for an ephemeral existence Difficult to make searchable, yet everyone from historians to senior genealogists is searching them online Difficult to preserve, yet fundamental to the historical record Planning is an access concern:  Planning is an access concern Project mission answers What is it that we are providing? Intellectual access to newspapers Now and in future (i.e., preservation) For whose benefit? Present in context For local, regional, global audiences How are we providing access? Browsing, searching, full-text, clippings…? Plan collaborative projects:  Plan collaborative projects Enjoy synergies Complementary material End users enjoy consistent and richer experience Staff skill-sets shared Projects that are grouped by format are more efficiently done By a team whose skills can be leveraged Selection is an access concern:  Selection is an access concern Let the users drive—what do they want, and how do they want it? End users often prefer to search by topic, subject, keyword, and by dates— Not always concerned with format at first Papers are just another material type, now with a digital format. Reformatting what you have— a modest proposal:  Reformatting what you have— a modest proposal Consider whether you have already selected these works Titles cataloged? Filmed? Card indexes? Should digitization be a mainstreamed part of processing operations? Do you have complete runs? Even if incomplete, may complement a topical repository Topical repository comprising many formats:  Topical repository comprising many formats Processing is an access concern:  Processing is an access concern How your users will access your materials informs critical processing decisions Processing involves Digitization (scanning paper or microfilm) Optical Character Recognition (OCR) Metadata generation From do-it-yourself to turnkey:  From do-it-yourself to turnkey Desktop scanning from small-format paper (newsletters) with OCR on the fly, no article segmentation Large format scanner for standard newspaper Outsourced scanning from (paper or) film Digitization—scanning:  Digitization—scanning Options: favor the right source for access Scan Scan printed materials or Scan from microfilm *Some funders will not bear cost of scanning from paper Digitization: the quality of the source:  Digitization: the quality of the source Unique issues with historical collections If microfilm, quality may vary over time and vendors If paper, will you film? Is the paper itself important? If so, film to preserve. Which gives better resolution? Best practice: sample across time and materials from contributing partners, to test feasibility of long, complete runs Digitization basics:  Digitization basics Resolution This is is the ability to distinguish fine spatial detail It is is usually expressed as dots-per-inch (dpi) or pixels-per-inch (ppi) These terms are synonymous, but dpi usually refers to printed images and ppi to screen images Digitization basics:  Digitization basics Resolution This is also sometimes referred to in absolute terms Actual pixel dimensions are given 3000 x 2400 for example This is the pixel dimensions of a 10x8 image scanned at 300 dpi. Digitization basics:  Digitization basics Bit depth This is the number of bits (Binary Digits) used to define each pixel. The greater the bit depth, the greater the number of tones (grayscale or color) that can be represented. black and white (bitonal)=1 bit per pixel grayscale=8 bits per pixel (256 shades of gray) color=24 bits per pixel (16.7 million color tones) Slide18:  From the Cornell Digital Imaging Tutorial Slide19:  Some common problems 1. curved characters and images 2.‘noise’ 4. scratches 3. broken characters Optical Character Recognition (OCR):  Optical Character Recognition (OCR) Necessary for full-text searching Depending upon display software, generated searchable text may be Edited Hidden Technical concerns for display and OCR:  Technical concerns for display and OCR Output Files may include: TIFF archival masters, JPEG2000, JPG, bound PDFs Scanning resolution, bit depth Higher resolution, larger file size (more bytes) Colors create very large files OCR performs best with appropriate resolution (no noise please!) Image processing (de-skew, crop, sharpen, page segmentation, article segmentation) To zone or not to zone for article-level handling Note that some funding may not cover extra cost for article segmentation processing Metadata application is an access concern:  Metadata application is an access concern Searchable information rules! We live in a time of unfathomable recall; we need precision searching Users search metadata, but for newspapers, need Full text searching for topics, names, etc. Information needed in context Metadata and structure provide context Users want precise searching, context-rich results:  Users want precise searching, context-rich results First things first— Some processes are incremental; some iterative:  First things first— Some processes are incremental; some iterative Metadata collection Accurate file naming scheme will generate some Hand keying can be done later to supplement and/or correct important elements Authority control tools available for verification Start somewhere! Present full-text, even with minimal metadata:  Start somewhere! Present full-text, even with minimal metadata Users often search newspapers by personal names, topics/keywords Use the presentation tools to create ‘canned’ queries e.g., records by type—birth, marriage Where copyright questionable, explain in metadata, and restrict viewing AFTER digitizing. Recommended metadata elements for digitized newspapers:  Recommended metadata elements for digitized newspapers At the ‘run’ or Title level Title Publisher Date published Place of publication Issue At the issue level Page number Article-level segmentation:  Article-level segmentation Data can be generated during digitization process Some presentation systems can use it Full article segment highlighting and extraction:  Full article segment highlighting and extraction Preservation is an access concern:  Preservation is an access concern Protect high use, single-copy sources first Intellectual versus artifactual value Preserve what you’ve got And invite partners with complementary runs Preserve what you’ve processed (cataloged, filmed, indexed) already When paper is to be preserved; maintaining original paper:  When paper is to be preserved; maintaining original paper Storage space 60 – 70 degrees F. 40 – 50 % relative humidity Storage and Handling stored flat Brittle Clippings Example—preserving for access:  Example—preserving for access Active, crumbling, paper collection Used by genealogists Local, regional, state And by historians Fire department unhappy… Funding is an access concern:  Funding is an access concern Demonstrate cradle to grave processing on small testbed Local and national funders are motivated by access to historical newspapers ONLINE is FUNDABLE! Commercial newspapers are selling online ads first—giving away current content. What is the U.S. National Digital Newspaper Program (NDNP)?:  What is the U.S. National Digital Newspaper Program (NDNP)? “The National Digital Newspaper Program (NDNP) is a partnership between the NEH and the Library of Congress … to provide enhanced access to United States newspapers. over a period of approximately 20 years, … a national, digital resource of historically significant newspapers from all the states and U.S. territories published between 1836 and 1922. …searchable database will be permanently maintained at the Library of Congress (LC) and be freely accessible via the Internet. A prototype of this digital resource: "Chronicling America: Historic American Newspapers" Who can play? :  Who can play? Partnership between National Endowment for the Humanities (NEH) and the Library of Congress (LC) Offers grant funding to any non-profit US organization Provides digitization standards and guidelines that increase efficiencies and cost effectiveness Pays for processing from film; no segmentation Yearly cycles, collaboration helpful next deadline Nov 7th for projects starting after July 2008 Slide36:  Convert to digital: TIFF 400 dpi Grayscale OCR Generate data, metadata: Available metadata (e.g., title, year, month…) From OCR = ASCII Structured data (METS/ALTO) Generate database: Standards-based XML JPEG2000 PDF Import into CONTENTdm server Search, access, view newspaper Start with microfilm OCLC Preservation Services and CONTENTdm Newspapers: print to digital (based on NDNP guidelines) Slide37:  Conforms with ALTO (Analyzed Layout and Text Object) schema ALTO is product of EU-funded METAe project Mapping of OCR’ed text to image coordinates Compatible with Acrobat 5.0 (PDF 1.4) Image with text behind Image will be a grayscale, 150dpi JPEG, using a medium (or 40) quality setting XMP/RDF/Dublin Core metadata Conforms with JPG 2000, Part 1 (.jp2) Use 9-7 irreversible (lossy) filter Compressed to 1/8 of the TIFF or 1 bit/pixel Tiling, but no precincts RDF/Dublin Core metadata in XML box Conforms with TIFF 6.0 8-bit grayscale 400 dpi preferred Uncompressed Only deskewing should be applied Cropped to page edge Additional TIFF tags required OCR text: ALTO Derivative: PDF Production Master: JPEG 2000 Archival Master: TIFF NDNP technical overview specifications Resources for evaluation and study :  Resources for evaluation and study Reference contacts::  Reference contacts: Geri Ingram, Manager, Digital Collection Services, OCLC 760.931.9313 Gayle Palmer, Manager, Digitization and Preservation Programs, OCLC Western Services 800.854.5753

Add a comment

Related presentations

Related pages

Newsbank | Califa

Local newspaper packages from Newsbank include: ... Califa News. Save the date: Califa Vendor Fair Sept. 1 2016. New on enki Library.
Read more wiz khalifa papers wiz khalifa papers. Amazon Try Prime All Go. Departments. Hello. Sign in Your Account Sign in Your ...
Read more

Latest News | Wiz Khalifa

Wiz Khalifa Official Website: New Single “Pull Up” ft. Lil Uzi Vert Out Now
Read more

ProQuest | Califa

National Newspapers 5: Full text coverage of the Christian Science-Monitor, Washington Post, New York Times, Wall Street Journal, and the Los Angeles Times.
Read more

Burj Khalifa news from Gulf News - International, Middle ...

MY GULF NEWS. is the most widely read newspaper, and online site in English in the Middle East. With a daily BPA audited paid circulation of ...
Read more

Wiz Khalifa News -

Wiz Khalifa news & gossip stories, articles, updates and gossip with XML RSS feeds
Read more

Burj Khalifa News, Photos and Videos - ABC News

Browse Burj Khalifa latest news and updates, watch videos and view all photos and more. Join the discussion and find more about Burj Khalifa at ...
Read more

Califa Products and News!

Califa Products and News! A place for staff of Califa member libraries to keep up with our latest news and offerings. ... newspapers (1) oral histories (1)
Read more

Burj Khalifa - Facebook

Burj Khalifa, Dubai, United Arab Emirates. 1,302,342 likes · 65,062 talking about this · 718,522 were here. Welcome to the official page of Burj Khalifa,...
Read more