digitisation and data capture

100 %
0 %
Information about digitisation and data capture

Published on February 27, 2008

Author: Bernadette

Source: authorstream.com

Slide1:  Digitisation Mick Eadie Visual Arts Data Service Slide2:  The ‘input channels’ of digitisation (keyboard, scanner etc.) are narrow and can only capture a partial representation of the original source Source – Digitisation - Resource Slide3:  Photocopy Photograph Recording Original Source Copy of Source Item to Digitise Sound, Moving image Digital Object 2D Image 3D Model Digital Resource Digital audio/movie recording Scan Digital Camera 3D Scan OCR Line tracing Digitisation Pathways Slide4:  Users Knowledge Experience Culture Environment Hardware Software (OS) (Network) Digital Objects Binary Data Data Models Relationships The environment of a digital resource often receives the most attention, but it is the users and digital objects that are most important Hardware and software selection should be based on the needs of the users and the types of digital objects to be used Fit for Purpose: Digital objects must be created with their intended use/purpose of paramount importance Elements of a Digital Resource Slide5:  Digital Objects Text Data stored as a stream of characters (numbers, letters, etc.) Image Data primarily understood as a spatial pattern or shape Bitmap and vector images/raster (bitmap) and vector spatial data Time Data primarily understood as a sequence through time Audio and/or video (multimedia) Slide6:  Text Essentially, numeric codes used by the computer to represent specific characters Fonts must be designed to provide a visual image for each code Software must be designed to interpret the codes ASCII is the most well known text encoding scheme 1 byte per character = 256 unique characters, primarily the Latin alphabet Other characters are handled by having multiple code pages Each code page uses the same codes to represent different characters UNICODE is the replacement for ASCII 2 bytes to store each character = 60,000+ codes Can represent characters from different alphabets simultaneously as each character has a unique code Slide7:  Text Transcription Advantages: Low overhead to start transcription: person, keyboard, document Hand-written documents can be transcribed A transcriber can follow complex disorganised documents Issues: Slow and expensive Human error Good practice: Double entry (two transcribers both enter the same document and the transcriptions are checked for differences) Keep copies of originals with transcriptions (preferably as digital images as this make post-transcription checking simple and quick) Slide8:  Optical Character Recognition Advantages: Automatic, suitable for digitising large numbers of documents Highly accurate for clean, clear type written documents Issues: Current technology is very poor on hand-writing Complex document layout can become scrambled Good practice: Proof-read, spell check OCR output for errors Provide image of page with text so users can check the text themselves Slide9:  Bitmap (Raster) Images The image is made up of many pixels Each pixel stores information about its colour The standard archival file format is uncompressed TIFF Slide10:  Resolution Resolution is often expressed as dots per inch (dpi) More accurately pixels per inch (ppi) The ‘frequency’ at which samples are taken by the capture device from the original source Common misconceptions about ppi Not an indicator of image size or quality Unless we know the size (inches, cms) of the original A better guide to digital image size is pixel dimensions e.g. 2000 x 3000 pixels, which allows us to work out the size of the image we will output to monitor or printer No of pixels/output res = output size Slide11:  Scanners and Digital Cameras Advantages: Accurate(?) visual representation of the source Issues: Text and logical structure of a document is not captured (can be through OCR or line tracing) Good practice: Capture master images at appropriate resolution and bit depth Check the optical resolution of the scanner (avoid interpolated resolution) Check the colour resolution (bit depth) Check scanning time Record details of scanner settings and any image editing done afterwards Slide12:  Vectors A point represents an exact location in two or three dimensional space Two points define a line A series of connected lines define an area x,y x,y,z Slide13:  Vector Data Advantages: Can be zoomed (c.f. bitmap images) Allows spatial analysis (spatial statistics, network analysis) Issues: Precision versus accuracy (detail versus truthfulness) Scale versus resolution Good practice: Ensure polygon topology (the polygons each line belongs to) is stored Slide14:  Digital Audio Human hearing Frequency (pitch) - 20Khz to 20,000Khz Intensity (loudness) - 0 and 120Db Full sound reproduction requires digitisation at more than 40,000 samples a second (44,100 is a common standard) NYQUIST rate: for lossless digitisation, the sampling rate should be at least twice the maximum audio frequency One second of good quality uncompressed digital sound is equivalent to ¼ of the Complete plays of Shakespeare MP3 offers good quality compressed (lossy) files Midi: not a digital recording of actual sounds, but a digital sample ‘library’ of how musical instruments sound Slide15:  Digital Moving Images 1 second of uncompressed good quality digital video (without sound) is equivalent to about ¾ of the complete plays of Shakespeare MPEG - The Motion Pictures Experts Group standards are the most popular compression standards The three standards, MPEG-1, MPEG-2, MPEG-4 Compression basically works by selecting key frames and only recording changes between the frames (but it gets a lot more complicated!) Slide16:  Data Models A data model is a set of rules that defines a particularly way of organising a collection of digital objects List, one item follows another Tree, each item can have several children Sets, items belong to one or more groups Geography/geometry, items are located using a co-ordinate system Slide17:  Selecting a Data Model To be useful, digital objects must be: Arranged according to the rules of an appropriate data model Stored in a file format that can represent the data model Accessed with software that understands the file format and the data model, and can present the data in an appropriate way When selecting a data model Consider the ‘natural’ organisation of your source Consider what method of organisation will be familiar to your users Consider the method of organisation that best fits your purposes Then seek specialist advice if you need it! Slide18:  Selecting Software Selecting the right data model is more important than selecting a particular piece of software Pick software that works with your preferred data model (can perform the right tasks) Don’t use a webpage editor as a database Don’t use a word processor as a spreadsheet Avoid little-used software with proprietary features Look for software with lots of export and import options Look for software that supports important standards Trees  markup  XML (SGML) Sets  relational databases  SQL Coordinates  CAD or GIS  less clear, use file formats like DXF, ESRI shape files Slide19:  Digitisation: a Balancing Act Successful digitisation involves several trade-offs: Amount and detail versus time and cost of digitisation Complexity of the digital resource versus ease of use Flexibility of the digital resource versus suitability for a specific use Digitisation with current technology versus future possibilities Your project should be guided by a firm understanding of the source and the intended purpose of the digital resource Do not exceed available support (financial, technical, labour) Minimise the loss of information from the original during the digitisation process Keep information that tracks the origin and history of the digital resource with the digital resource Slide20:  Where to get more advice AHDS Guides to Good Practice series http://vads.ahds.ac.uk/guides/index.html Technical Advisory Service for Images (TASI) http://www.tasi.ac.uk Text Encoding Workshops http://www.ota.ahds.ac.uk BUFVC Workshops http://www.bufvc.ac.uk

Add a comment

Related presentations

Related pages

Conectys - Data Capture and Digitization

Data Capture and Digitisation Services. Conectys works with you to set up the best digitization strategy for your media. Whether you have a one time ...
Read more

Digitisation Service - TownsWeb Archiving

Digitisation Service. TownsWeb Archiving’s digitisation services offer expert scanning and indexing of old, rare, precious and fragile items. Primarily ...
Read more

Data Concepts, Data Capture & Digitisation - researchgate.net

Data Concepts, Data Capture & Digitisation Elspeth Haston, Robert Cubey & David Harris
Read more

Data Capture - QGIS

Data Capture¶ Objectives: Learn how to create and edit vector and attribute data. Keywords: Editing, data capture, heads-up, table, database.
Read more

Data Capture, Forms Processing Services - New Zealand ...

Data capture services at low cost, data processing and digitising hand written forms, questionnaires, competition entries and surveys.
Read more

Be dynamic with your data. Box-it is proud to offer ‘best ...

Digitisation & Data Extraction Services Advanced Technology, World Class Scanning & Data Capture Service Box-it’s world-class scanning and data capture ...
Read more

Digitizing - Wikipedia

Digitizing or digitization ... the digitized data is in the form of binary numbers, ... Look up digitizing or digitisation in Wiktionary, ...
Read more

Data Capture - GSA ScanIt

Corporate Data Capture Services; Flexible Service Range; Service Delivery to Customer Requirements; Prompt Document Turn Around; Data Capture and Data ...
Read more

Content Capture: Inbound - Data Capture » Fuji Xerox DMS

Data Capture. Inbound content capture eliminates the inconvenience of receiving, sorting and distributing documents around an organisation. Taking ...
Read more