Making Multi-Structured Documents

50 %
50 %
Information about Making Multi-Structured Documents
Technology

Published on December 17, 2008

Author: peportier

Source: slideshare.net

Description

slides shown to Elisa Bertino (25 november 2008) about the construction of multi-structured documents

Multi-structured documents Modelisation and creation

MSD Problematic Several specific uses several structure types e.g. physical, logical, semantic, poetic, linguistic Recurrent problematic of Digital Humanities TEI recommendations and overlapping hierarchies Example queries: Find all damaged words that contain damaged characters only. Indicate for each word containing restored characters the location of the corresponding line .

Several specific uses several structure types

e.g. physical, logical, semantic, poetic, linguistic

Recurrent problematic of Digital Humanities

TEI recommendations and overlapping hierarchies

Example queries:

Find all damaged words that contain damaged characters only.

Indicate for each word containing restored characters the location of the corresponding line .

Medieval Manuscript (1) Transcription of old manuscripts

Transcription of old manuscripts

Medieval Manuscript (2) Physical structure

Physical structure

Medieval Manuscript (3) Lexical structure

Lexical structure

Medieval Manuscript (4) Damaged characters structure

Damaged characters structure

Medieval Manuscript (5) Image regions structure

Image regions structure

Medieval Manuscript (6) Relations between structures Physical structure Lexical structure Damaged characters structure Text regions structure transcription lines localization broken Words localization damaged characters localization A multi-structured document is a document having multiple structures linked together through a shared content or other inter-structural relations.

Relations between structures

Modern Manuscript (1) Modern manuscript of J.T. Desanti

Modern manuscript of J.T. Desanti

Modern Manuscript (2) Physical structure: lines

Physical structure: lines

Modern Manuscript (3) Idiomatic structure

Idiomatic structure

Modern Manuscript (4) Alterations structure

Alterations structure

Existing works (1) (too) specific “models”

(too) specific “models”

Existing works (2) Generic models

Generic models

Multi-Structured Document Model MSDM

MSDM

MSDM (2) Relations between structures MultiX ; Xinclude ; Etc. Stand-Off Markup

Relations between structures

MultiX (1) Base Structure

Base Structure

MultiX (2) Composition for a line of the physical structure <msd:comp id=“C1” idrefs=“F1 F2 F3=F4 F5 F6 F7” /> <line n=“1”><msd:clink target=“BS” label=“text content” to=“C1”/></line>

Composition for a line of the physical structure

MultiX (3) Querying MultiX documents: Xquery functions rebuild ($elem-seq as element()*) as element()* share-content ($e as element()) as xs:Boolean share-content-with ($e as element(), $str_name as xs:string) as element()* share-fragments ($e1 as element(), $e2 as element()) as xs:Boolean get-shared-fragments ($e1 as element(), $e2 as element()) as element(msd:frag)* includes-fragments-of ($e1 as element(), $e2 as element()) as xs:Boolean Etc.

Querying MultiX documents: Xquery functions

rebuild ($elem-seq as element()*) as element()*

share-content ($e as element()) as xs:Boolean

share-content-with ($e as element(), $str_name as xs:string) as element()*

share-fragments ($e1 as element(), $e2 as element()) as xs:Boolean

get-shared-fragments ($e1 as element(), $e2 as element()) as element(msd:frag)*

includes-fragments-of ($e1 as element(), $e2 as element()) as xs:Boolean

Etc.

MultiX (4) Find all damaged words that contain damaged characters only.

Find all damaged words that contain damaged characters only.

MultiX (5) Creation and evolution of MultiX documents A parser (MXP) creates an internal representation from separated structures Useful with a priori known structures

Creation and evolution of MultiX documents

A parser (MXP) creates an internal representation from separated structures

Useful with a priori known structures

Creation and Evolution of MSD Little or no a priori knowledge about the structures Common situation for scholars in the humanities E.g. transcription of a poem found in a manuscript using the vocabulary defined by the TEI schema The loss of his clothes hardly mattered, because He had seven coats on when he came, With three pairs of boots—but the worst of it was, He had wholly forgotten his name. He would answer to “Hi!” or to any loud cry, Such as “Fry me!” or “Fritter my wig!” To “What-you-may-call-um!” or “What-was-his-name!” But especially “Thing-um-a-jig!”

Little or no a priori knowledge about the structures

Common situation for scholars in the humanities

E.g. transcription of a poem found in a manuscript using the vocabulary defined by the TEI schema

Before restructuring The loss of his clothes hardly mattered, because He had seven coats on when he came, With three pairs of boots— but the worst of it was, He had wholly forgotten his name. He would answer to “Hi!” or to any loud cry, Such as “Fry me!” or “Fritter my wig!” To “What-you-may-call-um!” or “What-was-his-name!” But especially “Thing-um-a-jig!” stanzas sentences verses base structure composition nodes fragments

Restructuring is necessary The loss of his clothes hardly mattered, because He had seven coats on when he came, With three pairs of boots— but the worst of it was , He had wholly forgotten his name. He would answer to “Hi!” or to any loud cry, Such as “Fry me!” or “Fritter my wig!” To “What-you-may-call-um!” or “What-was-his-name!” But especially “Thing-um-a-jig!” stanzas sentences verses base structure composition nodes fragments

Automatic restructuring The loss of his clothes hardly mattered, because He had seven coats on when he came, With three pairs of boots— but the worst of it was , He had wholly forgotten his name. He would answer to “Hi!” or to any loud cry, Such as “Fry me!” or “Fritter my wig!” To “What-you-may-call-um!” or “What-was-his-name!” But especially “Thing-um-a-jig!” stanzas sentences verses

User intervention in restructuring The loss of his clothes hardly mattered, because He had seven coats on when he came, With three pairs of boots— but the worst of it was , He had wholly forgotten his name. He would answer to “Hi!” or to any loud cry, Such as “Fry me!” or “Fritter my wig!” To “What-you-may-call-um!” or “What-was-his-name!” But especially “Thing-um-a-jig!” stanzas verses sentences

Perspectives Shared responsibilities Who is responsible for each document structure ? Life cycle of newly created document structures ? Use of formal knowledge Formal knowledge, the tree structure of well formed XML documents, made possible an automatic restructuring It seems necessary to find simple formal conditions for restructuring times …

Shared responsibilities

Who is responsible for each document structure ?

Life cycle of newly created document structures ?

Use of formal knowledge

Formal knowledge, the tree structure of well formed XML documents, made possible an automatic restructuring

It seems necessary to find simple formal conditions for restructuring times …

Add a comment

Related presentations

Related pages

Creation and maintenance of multi-structured documents ...

Creation and Maintenance of Multi-Structured Documents Pierre-Édouard Portier Université de Lyon, CNRS ... Making concur work. In Extreme Markup Languages,
Read more

ENCODING AND QUERYING MULTI-STRUCTURED DOCUMENTS

1 MODELING, ENCODING AND QUERYING MULTI-STRUCTURED DOCUMENTS Pierre-Édouard Portier, Noureddine Chatti, Sylvie Calabretto, Elöd Egyed-Zsigmond and Jean ...
Read more

Modeling, encoding and querying multi-structured documents

The issue of multi-structured documents became prominent with the emergence of the digital Humanities field of practices. Many distinct structures may be ...
Read more

Methodology for the construction of multi-structured documents

The underlying multi-structured documents model is ... an XML-based formalism to encode multi-structured documents in "Proceedings of ... Making CONCUR ...
Read more

An introduction to structured documents - YouTube

An introduction to structured documents, and the benefits of structuring a document.
Read more

eWeek On the New IBM Watson Features Including Datawatch ...

... which enables citizen analysts by making it easy ... information source—from traditional databases to multi-structured documents such as PDF ...
Read more

Making CONCUR work - ResearchGate - Share and discover ...

Making CONCUR work on ResearchGate, the professional network for scientists.
Read more

What is Structured Data? Webopedia Definition

Structured data refers to any data ... emails, blog entries, wikis and word processing documents. Semi-structured data is a ... making it possible ...
Read more

Proceedings - Multi-structured documents and the emergence ...

Introduction. We proposed [[Portier2009]] a methodology for the construction of multi-structured documents. We add tools for documenting and managing the ...
Read more