gerth van wijk

71 %
29 %
Information about gerth van wijk
Entertainment

Published on December 6, 2007

Author: Goldie

Source: authorstream.com

Five Ways for Managing Hierarchical Keyword Strings:  Five Ways for Managing Hierarchical Keyword Strings Diederik Gerth van Wijk, dgerth@kluwer.nl Content Architect, Kluwer, Deventer XML Europe 2002 Barcelona, 2002.05.22 Five Ways for Managing Hierarchical Keyword Strings:  Five Ways for Managing Hierarchical Keyword Strings Introduction: Kluwer, Legal Publishing, Classical Indexing Central Thesauri Keyword Strings The result: a KWOC Index Way 1: RDBMS Way 2: XML Way 3: Hybrid: RDBMS and XML Way 4: ISO Topic Map Way 5: XTM Lessons learned: tools, hierarchy and relations Kluwer Legal Publishers, Deventer:  Kluwer Legal Publishers, Deventer Founding part of Wolters Kluwer, now 19000 employees, 25 countries, EUR 3.7 G turnover Need to know information for Dutch legal and fiscal professional, market leader in the Netherlands (80%) Books since 1739, now 800/year, 300 series Journals since long, now 300 titles Loos leaf since 1920, now 280 titles Digital type setting (in house) since 1972 On line since 1979 CD-ROM’s since 1987, now 100 titles SGML since 1989 In short: old economy Legal information in 1 minute:  Legal information in 1 minute Statute law (in continental law dominant) Jurisprudence (case law) Journal articles Books and loose leaf sectionwise commentary thematic commentary Section of law is central unit Metadating information:  Metadating information Information Metadating information Content oriented metadating of information Standardised content oriented metadating of information Classical back-of-book indexing:  Classical back-of-book indexing Indexer indexes one book No restriction on choice of terms Limited reuse of terms No restriction on the system No restriction on the internal relations See also references Result: no way to join indexes, as needed when publishing electronically (CD ROM, Internet) Example of classic index:  Example of classic index medical treatment BW 1:261-1:264 court's decision BW 1:264 definition BW 1:261 (aant. 5) performed by nurse without permission BW 1:262 permission BW 1:264 permission medical treatment BW 1:264 treatment medical treatment BW 1:264 permission BW 1:264 Thesaurus (1): General:  Thesaurus (1): General It should now be 11:50 Thesaurus Ordered list of (thesaurus) terms, with internal relations between terms Used relations: USE (synonim), UF (used for), NT (narrower term), BT (broader term) and RT (related term) Per term optional: qualifier [minister (government) and minister (clerical)], search entry (yes/no), abbrevated form (for key word string), status (approved?) and scope note Thesaurus (2): Example:  Thesaurus (2): Example content oriented metadating BT metadating NT content oriented standardised metadating content oriented standardised metadating BT content oriented metadating BT standardised metadating information RT metadating metadating NT content oriented metadating NT standardised metadating RT information standardised metadating BT metadating NT content oriented standardised metadating Keyword and Keyword strings:  Keyword and Keyword strings Building blocks of keyword string: Main term (single term from thesaurus) Zero or more connected strings: Connector (by, of) Keyword string (recursive) Form description Every connected string is logically connected with the main term through the connector Examples: permission [for] medical treatment [by] judge permission; definition Combination order Nested keyword strings:  Nested keyword strings Consequence of combination order: substrings Permission for medical treatment by judge (Permission [for] (medical treatment) [by] (judge)) The judge permits. (Permission [for] (medical treatment [by] (judge))) The judge treats. The new concept:  The new concept Build a thesaurus of related terms Build keyword strings with these terms Link these keyword strings to occurences To make an index to a set of occurences, extract all linked keyword strings Now you know all used terms Now you know all used search entries Now you know all related search entries Now you know which USE/BT/NT/RT-relations to show Duplicate under each search entry all keywordstrings and all linked occurences (KWOC) Example KWOC-index:  Example KWOC-index judge permission [for] medical treatment [by] judge BW 1:264 medical treatment BW 1:261-1:264 medical treatment; definition BW 1: 261 (aant. 5) permission [for] medical treatment [by] judge BW 1:264 permission [for] medical treatment by nurse BW 1:262 nurse permission [for] medical treatment by nurse BW 1:262 permission permission [for] medical treatment [by] judge BW 1:264 permission [for] medical treatment by nurse BW 1:262 treatment see also: medical treatment Pros and cons of the new approach:  Pros and cons of the new approach Pros: Index built automatically: consequent, quick Indexes can be merged Reuse of terms and keyword strings Reuse (regeneration) of indexes Cost effective when used frequently Cons: High costs for creating thesauri and sets of keyword strings Nested strings complicate the system Indexer feels restricted in his intellectual task Only applicable with tool: string management system Requirements for management system:  Requirements for management system Indexer must be able to Link an occurence to a keyword string Create new keyword strings Add terms Add new relations between terms New terms and relations however must be confirmed by an editorial board (states, workflow) System must allow simultanuous editing No file locking, but record, element or topic locking Use standards, independency of vendors, tools, OS Minimize programming RDBMS, OODBMS, DMS or TMMS The thesaurus model as XML DTD:  The thesaurus model as XML DTD <!ELEMENT thesaurus (term+, deprecated+, btnt+, rt+)> <!ATTLIST term id ID #REQUIRED name CDATA #REQUIRED qualifier CDATA #IMPLIED status (proposed, approved) proposed> <!ATTLIST deprecated name CDATA #REQUIRED qualifier CDATA #IMPLIED use IDREF #REQUIRED> <!ATTLIST btnt nt IDREF #REQUIRED bt IDREF #REQUIRED> <!ATTLIST rt rt1 IDREF #REQUIRED rt2 IDREF #REQUIRED> The keyword string model as XML DTD:  The keyword string model as XML DTD <!ELEMENT keywrdstr (connectedkeywrdstr*, formdescref?)> <!ATTLIST keywrdstr id ID #REQUIRED termref IDREF #REQUIRED> <!ATTLIST connectedkeywrdstr connectorref IDREF #REQUIRED keywrdstrref IDREF #REQUIRED> <!ELEMENT connector (#PCDATA)*> <!ATTLIST connector id ID #REQUIRED> <!ATTLIST formdescref formdescref IDREF #REQUIRED> <!ATTLIST keywrdstr-obj-rel keywrdstrref IDREF #REQUIRED objref IDREF #REQUIRED> DTD for generated KWOC index:  DTD for generated KWOC index <!ELEMENT kwocindex (kwoc | unusedterm | deprterm)+> <!ELEMENT kwoc (name, qualifier?, xref*, seealso*, keywrdstr*)> <!ELEMENT unusedterm (name, qualifier?, see+)> <!ELEMENT deprterm (name, qualifier?, see)> <!ELEMENT keywrdstr (name, (connectedkeywrd | connectedkeywrdstr)+)> <!ELEMENT connectedkeywrd (connector, name)> <!ELEMENT connectedkeywrdstr (connector, keywrdstr)> Five ways to implement the model:  Five ways to implement the model Relational Database Management System XML system Hybrid solution ISO Topic Maps XTM Way 1: Relational Database Model:  Way 1: Relational Database Model Results of way 1:  Results of way 1 Hard to explain hierarchy (recursion!) and order to SQL people Recursion needs complex programming Hard to get user friendly interface It still doesn’t work, but we’ll follow this track Way 2: XML:  Way 2: XML Lots of ID/IDREFS: how to make sure a KEYWRDIDREF refers to a KEYWRD ID? Our document management system (SGML) is designed for large documents: no locking of such small elements If you check out one small element, that will be a document, and the IDREFs to the rest will be broken Way 3: hybrid SGL/XML:  Way 3: hybrid SGL/XML Keyword strings modeled as hierarchical XML References to other strings and terms are looked up in RDBMS Needs lots of tuning in XMetaL or Epic Way 4 and 5: The Topic Map Model:  Way 4 and 5: The Topic Map Model Layer to describe the model Layer with thesaurus Layer with keyword strings and occurences Layer that links strings and cocurences The model is the same ISO Topic Maps allow meaning full element names XTM has only topic elements XTM is an exchange format To get a standards based application, you need TMQL and TMCL: don’t exist yet Conclusions:  Conclusions Hierarchy and order are not natural to RDBMS and its tools The relational model is superior for avoiding redundancy and normalizing a data model The relational model is not natural to XML and its tools Tools only provide an interface that is natural to them Mixing hierarchy and relations means no simple application in SQL or XML Every model IS a Topic Map, even if it doesn’t use TM syntax to implement it... Would TMQL and TMCL provide enough precision to describe the model in such a way that a user friendly and intuitive interface follows automatically from the right spex?

#required presentations

Add a comment

Related presentations

Related pages

Roy Gerth van Wijk (Erasmus University Rotterdam ...

Roy Gerth van Wijk of Erasmus University Rotterdam, Rotterdam is on ResearchGate. Read 199 publications, and contact Roy Gerth van Wijk on ResearchGate ...
Read more

Patricia Gerth van Wijk | LinkedIn

Patricia Gerth van Wijk. Sales Manager at SleepNet. Location Cape Town Area, South Africa Industry Medical Devices
Read more

Diederik Gerth van Wijk | LinkedIn

View Diederik Gerth van Wijk’s professional profile on LinkedIn. LinkedIn is the world's largest business network, helping professionals like Diederik ...
Read more

Patricia Gerth Van Wijk | Facebook

Patricia Gerth Van Wijk is on Facebook. Join Facebook to connect with Patricia Gerth Van Wijk and others you may know. Facebook gives people the power to...
Read more

Henri Willem Gerth van Wijk (deceased) - Genealogy

Genealogy for Henri Willem Gerth van Wijk (deceased) family tree on Geni, with over 150 million profiles of ancestors and living relatives.
Read more

Marie Claire Louise Gerth van Wijk (deceased) - Genealogy

Genealogy for Marie Claire Louise Gerth van Wijk (deceased) family tree on Geni, with over 150 million profiles of ancestors and living relatives.
Read more

Marie-Claire Gerth van Wijk - Google+

Marie-Claire Gerth van Wijk - Fastfood medewerker - KFC Holdings B.V. - KFC Roosendaal - Maasstadziekenhuis - Middelburg
Read more

World Allergy Organization

Roy Gerth van Wijk (1953) is a member of the WAO Board of Directors and is a professor of Allergology at the Erasmus Medical Center (EMC). He is head of ...
Read more

Marieke Gerth van Wijk - YouTube

"Not Over You" - Gavin DeGraw - Official Cover Video (Alex Goot & Against The Current) - Duration: 3 minutes, 24 seconds.
Read more

Gert Van Wijk | LinkedIn

Gert Van Wijks berufliches Profil anzeigen LinkedIn ist das weltweit größte berufliche Netzwerk, das Fach- und Führungskräften wie Gert Van Wijk dabei ...
Read more