lrec metadata

60 %
40 %
Information about lrec metadata

Published on November 14, 2007

Author: WoodRock


Customizing the IMDI metadata schema for endangered languages:  Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES) Introduction:  Introduction IMDI: International Standards for Language Engineering Metadata Initiative DOBES: Volkswagen Foundation’s Documentation of Endangered Languages initiative AILLA: the Archive of the Indigenous Languages of Latin America Types of resources:  Types of resources Audio and video recordings in various digital formats Annotation text files, e.g. transcriptions and translations Standalone texts, e.g. dictionaries, poetry Wide range of genres: from verbal art to scholarly analyses Bundles of resources:  Bundles of resources Session (IMDI, 2001): resources resulting from a linguistic elicitation session - recordings and annotations. Only models one kind of resource production - a recording session. Collections will include a greater variety of resources, in sets of related materials. Types of bundles:  Types of bundles Canonical bundle: the original session. A digitized recording, in different formats, and some textual annotation files, also in different formats. Minimal bundle: a single file. Examples: dictionary, poem, recording of uninterpretable chants. Meta-bundle: a bundle containing other bundles. Example: a book about a set of annotated recordings. Bundle elements:  Bundle elements Current: Name of bundle Date and place of production Proposed: Resource relations Date archived Last modified Major subschemas:  Major subschemas Project Collector Content Participants Resources References The Content Subschema:  The Content Subschema Genre is the top-level category: Interaction: conversation, interview … Explanation: description, recipe … Performance: narrative, poem, oratory … Teaching: primer, textbook … Analysis: grammar, dictionary … Other Content categories:  Other Content categories Modality: speech, writing, gesture Communication context: Interactivity Planning Involvement Languages Task Description Keys AILLA’s Content Keys:  AILLA’s Content Keys Register: a characterization of how the discourse reflects the social context. Example: honorific speech Style: about poetic and stylistic effects. Examples: parallelism, metered verse. The Project subschema:  The Project subschema Current elements: Name: a nickname or acronym Title: official title ID: a unique identifier Contact information Proposed element: Funder: name of funding organization The Collector subschema:  The Collector subschema AILLA renames this Depositor, since this is the individual we have to keep track of (e.g. for Level 3 access permission). When the Depositor is not also the Collector, Collector can be listed under Participants. The Participants subschema:  The Participants subschema Type: functional role, e.g. creator Role: family relationship Name/Full name Language(s) Ethnic group, age, sex: Education Anonymous: True if participant’s Full name is reserved; False otherwise AILLA additions to Participants:  AILLA additions to Participants Origin: Place (country, region, etc) of origin of the creator of the primary resource in the bundle (e.g. the speaker whose voice is recorded). Occupation: Can be relevant in assessing accuracy of some kinds of data. The Resources subschema:  The Resources subschema Resources contains information about formats and provenance of files in a bundle. Media Files: audio, video, etc. Annotation Files: text files. Proposal: call them all Media Files, to reduce redundancy in the database. (All have URL, size, etc. elements.) Text resources:  Text resources Current elements: Type: type of annotation, e.g. phonetic transcription. Content encoding: annotation encoding scheme, e.g. EUROTYP. Character encoding: character set(s) used in a text file. Text resources 2:  Text resources 2 Proposed elements: Transcription type Translation (aka Glossing) type Software: used to produce transcriptions, translations, other annotations (e.g. Shoebox) Describe Annotator in Participants (along with Translator, etc.) Proposed subschema:  Proposed subschema Place: composed of several elements: Continent Country Region Subregion (address) Repeated at least twice, in Bundle and in Participants (Origin). Might also be useful in the Language subschema. Conclusion:  Conclusion IMDI schema is a flexible tool. Customization through Key/Value pairs allows local modifications. Most of the proposed changes are terminological, moving from the DOBES in-house terminology to more general usage.

Add a comment

Related presentations

Related pages

Customizing the IMDI metadata schema for endangered languages

Customizing the IMDI metadata schema for endangered languages Heidi Johnson Arienne Dwyer The Archive of the Indigenous University of Kansas
Read more

LREC 2012 [Home] :..

The eighth international conference on Language Resources and Evaluation (LREC) will be organised in 2012 by ELRA with the support of a wide range of ...
Read more

All Data is Metadata: LREC 2000 Pre-workshop

29-May-00: All Data is Metadata: Rich architectures for rich resources
Read more

Workshop 'Describing Language Resources with Metadata ...

Workshop 'Describing Language Resources with Metadata', LREC 2012, Instanbul. Activity: Conference participation › Participation in conference
Read more

Describing LRs with Metadata: Towards Flexibility and ...

Describing LRs with Metadata: Towards Flexibility and Interoperability in the Documentation of LR Workshop Programme 22 May 2012 9:00 – 9:10 Welcome and ...
Read more

The META-SHARE Metadata Schema for the Description of ...

The META-SHARE Metadata Schema for the Description of Language Resources Maria Gavrilidou*, Penny Labropoulou*, Elina Desipri*, Stelios Piperidis*, Haris
Read more

IMDI Metadata - The Language Archive | The Language ...

IMDI Metadata Elements for Catalogue Descriptions, Version 3.0.13, MPI Nijmegen (2009). Vocabulary Taxonomy and Structure, Version 1.1, MPI Nijmegen (2001).
Read more

Creating & Testing CLARIN Metadata Components

Creating & Testing CLARIN Metadata Components Folkert de Vriend (1), Daan Broeder (2), Griet Depoorter (3), Laura van Eerten (3), Dieter van Uytvanck (2)
Read more

Semantic metadata mapping in practice: the Virtual ...

1 Semantic metadata mapping in practice: the Virtual Language Observatory Dieter Van Uytvanck, Herman Stehouwer, Lari Lampen {firstname.lastname}
Read more