Centralized Taxonomy Management for Enterprise Information Systems

50 %
50 %
Information about Centralized Taxonomy Management for Enterprise Information Systems
Technology

Published on September 30, 2008

Author: danielabarbosa

Source: slideshare.net

Description

Daniela Barbosa, Synaptica Business Development Manager, Dow Jones Client Solutions, Dow Jones & Company
Paula R McCoy, Manager, Taxonomy Development, ProQuest

Now that you have built your taxonomies, you need to manage and maintain them in a centralized environment that can be leveraged by all of your enterprise applications including search tools, portals, and CMS/DMS systems. This session will review some best practices in centralized taxonomy management and go through the implementation of a thesaurus management tool at ProQuest, which enabled them to create a common language to connect disparate information assets using large and varied vocabularies and authority files linked to new and existing editorial systems.

Centralized Taxonomy Management for Enterprise Information Systems Enterprise Search Summit Wednesday, September 24th, 2:00 pm – 2:30 pm Dow Jones Client Solutions ProQuest Synaptica Manager, Taxonomy Development [email_address] [email_address]

Dow Jones Taxonomy Solutions Words Dow Jones taxonomy licensing Other taxonomy licensing (Taxonomy Warehouse) Taxonomy customization Taxonomy development Expertise Taxonomy Assessment Taxonomy Consulting Analysis Recommendations Implementation Workshops Tools Synaptica: Taxonomy / Metadata -- Management Tool

Words

Dow Jones taxonomy licensing

Other taxonomy licensing (Taxonomy Warehouse)

Taxonomy customization

Taxonomy development

Expertise

Taxonomy Assessment

Taxonomy Consulting

Analysis

Recommendations

Implementation

Workshops

Tools

Synaptica:

Taxonomy / Metadata -- Management Tool

Some Definitions A taxonomy is a hierarchical topic structure to which information can be assigned through the dual processes of classification (filing to a location) and categorisation (tagging with relevant metadata ). A taxonomy provides browsable navigation and supports filtered search ing A thesaurus is a controlled vocabulary linking an organisation’s common language to its taxonomy structure. It accommodates synonyms, acronyms, language variants and other near equivalences. It also signposts non-hierarchical linkages within and across the taxonomy facets. A thesaurus is usually employed to interpret and guide user search queries An ontology is the working model of entities and interactions in a particular domain of knowledge or content set. It is a set of concepts - such as things, events, and relations - that are specified in some way in order to create an agreed-upon vocabulary for exchanging information. An ontology is increasingly used to visualise (or map) a set of search results and discover new or hidden connections

Classic taxonomy … groups things or concepts into families SIDEWAYS Traditional thesaurus … captures the different names of the family members and explores some more distant associations (cousins & close friends) Multi- Directional Emerging ontology … shows a network of multi-dimensional relationships and properties both within and outside the family groups UP DOWN

Telephones Is a broader term than Mobile Phones SIDEWAYS Mobile Phones AKA as Cell Phones & Hand Phones And Similar to Hand Held Devices & PDAs Multi- Directional Mobile Phones Are made by Phone Manufacturers And use the networks of Telecoms Service Providers UP DOWN

Metadata’s Evolutionary Path Dictionaries & Flat Lists Hierarchical Taxonomies Controlled Vocabulary Thesauri Ontologies Structured Authority Files Metadata is evolving organically – the less complex metadata elements form the building blocks for creating the more complex structures

Metadata’s Evolutionary Path

Portal navigation and browsable website menus Conceptual access to large databases  Records management and cataloging e-Commerce online product catalogues Inventory control and de-duplication Auto-classification of internal documents and email Multilingual search and browse Metasearch of enterprise-wide resources Practical Applications

Portal navigation and browsable website menus

Conceptual access to large databases 

Records management and cataloging

e-Commerce online product catalogues

Inventory control and de-duplication

Auto-classification of internal documents and email

Multilingual search and browse

Metasearch of enterprise-wide resources

Centralized Taxonomy and Metadata Management As a centralized repository for multi-lingual semantic management that is: - Independent from systems like web-portal search and categorization systems - Scalable ; capable of evolving with emerging corporate semantic standards HTML CSV XML ZThes SKOS OWL Web Services Centralized Taxonomy Management System Synaptica ® Portals Portals Categorizers Portals Portals Search Engines Portals Portals Content Portals Multiple users working in collaborative and compartmentalized space P e r m i s s i o n s

Metadata can transcend information islands and data silos but only if the enterprise is committed to common standards A centralized system that supports both collaboration and compartmentalization allows common metadata to be shared while also allowing user communities the independence to manage specialized metadata files Why Centralized?

Metadata can transcend information islands and data silos but only if the enterprise is committed to common standards

A centralized system that supports both collaboration and compartmentalization allows common metadata to be shared while also allowing user communities the independence to manage specialized metadata files

Enterprises are increasingly making use of multiple proprietary and open source software tools for categorization, search and portal tasks While many of these tools support some level of metadata management the diversity of standards, data formats and business rules they support can actually result in exacerbating the data silo problem by creating metadata silos Why Independent?

Enterprises are increasingly making use of multiple proprietary and open source software tools for categorization, search and portal tasks

While many of these tools support some level of metadata management the diversity of standards, data formats and business rules they support can actually result in exacerbating the data silo problem by creating metadata silos

Where taxonomy fits with Search DMS CMS Shared Docs News & Research Data Search Engine Taxonomy & Metadata Platform Information Processing, Management and Storage

4 Good Reasons for Taxonomy Search Relevancy Search Completeness Search Federation Search Visualisation Effective Research/Risk Mitigation Knowledge Worker Productivity Discovery & Innovation Better & Faster Decisions

Improved Search Relevancy Ambiguity of Language Is a Blackberry a fruit or a handheld device? By including this brand name in a taxonomy we can give context to the user search query In a telecoms domain we can assume that the user means the latter and only return content tagged as such Alternatively we can weight the results, promoting those documents about handheld devices above those that refer to the fruit Either way the result is increased search precision which translates into time savings

Improved Search Relevancy

Ambiguity of Language

Is a Blackberry a fruit or a handheld device?

By including this brand name in a taxonomy we can give context to the user search query

In a telecoms domain we can assume that the user means the latter and only return content tagged as such

Alternatively we can weight the results, promoting those documents about handheld devices above those that refer to the fruit

Either way the result is increased search precision which translates into time savings

2. Improved Search Completeness Synonymous and Related Term Relationships Mobile Phone (PT) = Cell Phone (NPT) = Hand Phone (NPT) Mobile Phone is related to Hand Held Device (RT) User Search Query = “Cell Phones” The taxonomy simultaneously broadens the search and prioritises the returned results giving increased recall without compromising relevancy Content tagged with Mobile Phone category are promoted over those not tagged using a weighting in the search algorithm Content tagged with Hand Held Device category may also receive a weighting

Synonymous and Related Term Relationships

Mobile Phone (PT) = Cell Phone (NPT) = Hand Phone (NPT)

Mobile Phone is related to Hand Held Device (RT)

User Search Query = “Cell Phones”

The taxonomy simultaneously broadens the search and prioritises the returned results giving increased recall without compromising relevancy

Content tagged with Mobile Phone category are promoted over those not tagged using a weighting in the search algorithm

Content tagged with Hand Held Device category may also receive a weighting

3. Search federation and data integration A snapshot or dashboard is often more desirable than a list of document titles or snippets, especially when looking for information on a customer, supplier or competitor Also, information will most likely reside in a number of internal repositories, each with their own levels of metadata structure Taxonomy allows the combination of news, internal CI reports, price plans, coverage data, market share data, share price etc. in one consolidated view by providing mappings or cross-walks This is essentially applying business intelligence discipline to the world of unstructured information

A snapshot or dashboard is often more desirable than a list of document titles or snippets, especially when looking for information on a customer, supplier or competitor

Also, information will most likely reside in a number of internal repositories, each with their own levels of metadata structure

Taxonomy allows the combination of news, internal CI reports, price plans, coverage data, market share data, share price etc. in one consolidated view by providing mappings or cross-walks

This is essentially applying business intelligence discipline to the world of unstructured information

4. Search Visualisation The previous three scenarios assume the user knows what they are looking for But what about serendipitous discovery? By being able see across an aggregation of content and extract facts and relationships from deep within the information stores, true (and sometimes fortunate) discovery can take place

The previous three scenarios assume the user knows what they are looking for

But what about serendipitous discovery?

By being able see across an aggregation of content and extract facts and relationships from deep within the information stores, true (and sometimes fortunate) discovery can take place

Document, Content & Records Management Synaptica ® Vocabulary & Metadata Management Thesauri Ontologies Filing & Storage Metadata Tagging (Categorisation) Process Search Engine Visualisation Navigation Intranet / Portal User Interface Back End Information Structure Front End Information Intelligence Librarians; Taxonomists; Indexers; Knowledge & Information Managers Information Creators; Records Managers; Content Managers; Librarians; Indexers Information Users (the business; the public) Taxonomies CIOs; CTOs; IT Architects

Paula R. McCoy Manager, Taxonomy Development ProQuest [email_address] Centralized Taxonomy Management for Enterprise Information Systems

Description of ProQuest Controlled Vocabulary & Authority Files Taxonomy Management -- Overview Managing Terms Manually Synaptica Thesaurus Management System Topics of Discussion

Description of ProQuest Controlled

Vocabulary & Authority Files

Taxonomy Management -- Overview

Managing Terms Manually

Synaptica Thesaurus Management System

Access to over 125 billion digital pages of content from magazine, trade, & scholarly publications, current & historical newspapers, original materials such as annual reports & civil war pamphlets, and daily wire feeds Subscription-based ProQuest® online information service available in academic and public libraries

ProQuest Controlled Vocabulary used to index subjects; Authority Files used to index company, geographic, personal, product names CV applied to non-periodical & third-party content via mapping, to allow cross-searching of multiple DBs with one vocabulary

ProQuest Controlled Vocabulary used to index

subjects; Authority Files used to index

company, geographic, personal, product names

CV applied to non-periodical & third-party

content via mapping, to allow cross-searching

of multiple DBs with one vocabulary

Created in 1970s for ABI/INFORM business database Based on Library of Congress Subject Headings Natural language, hierarchical vocabulary complying with ANSI/NISO Standard Z39.19 (Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies) ProQuest Controlled Vocabulary

Created in 1970s for ABI/INFORM business database

Based on Library of Congress Subject Headings

Natural language, hierarchical vocabulary complying

with ANSI/NISO Standard Z39.19 (Guidelines for

the Construction, Format, and Management of

Monolingual Controlled Vocabularies)

ProQuest Controlled Vocabulary Thesaurus subjects: Business, economics & trade – 4300 terms Science, math & technology – 1600 terms Medicine – 1150 terms Humanities – 960 terms Government & policy – 850 terms Education – 400 terms Merged with general reference vocabulary in 1980s Major development effort in past 4 years to boost science, education & medical terms

Thesaurus subjects:

Merged with general reference vocabulary in 1980s

Major development effort in past 4 years to boost

science, education & medical terms

ProQuest CV: Statistics Preferred terms: 11,046 Non-preferred terms: 5631 Scope Notes: 3194 (29%) Cross-references (Broader, Narrower, Related terms): 67,700 Terms added in 2007: 77 Terms added in 2008: 58+

Preferred terms: 11,046

Non-preferred terms: 5631

Scope Notes: 3194 (29%)

Cross-references (Broader,

Narrower, Related terms): 67,700

Terms added in 2007: 77

Terms added in 2008: 58+

Authority Files: Statistics Corporate/Organization Names: 438,098 Names added in 2008: 5489 Personal Names: 416,239 Names added in 2008: 1526 Geographic (Location) Names: 34,331 Names added in 2008: 144 Product Names: 38,210 Names added in 2008: 54

Corporate/Organization Names: 438,098

Names added in 2008: 5489

Personal Names: 416,239

Names added in 2008: 1526

Geographic (Location) Names: 34,331

Names added in 2008: 144

Product Names: 38,210

Names added in 2008: 54

The Taxonomy Manager’s Job Add subject terms as dictated by new concepts and new content to index Maintain hierarchies & Scope Notes Load updated Thesaurus to ProQuest interface Manage authority files to maintain standards & control file size

Add subject terms as dictated by new

concepts and new content to index

Maintain hierarchies & Scope Notes

Load updated Thesaurus to ProQuest interface

Manage authority files to maintain standards

& control file size

The Taxonomy Manager’s Job To ensure that indexers and searchers alike have access to a complete and accurate Thesaurus that they can use to maximize the discoverability of documents in ProQuest OBJECTIVE:

Sample Subject Term Chronic obstructive pulmonary disease SN: Any lung disease, such as chronic bronchitis or emphysema, causing obstruction of bronchial airflow    UF  COPD    BT  Disease    BT  Respiratory diseases    NT  Asthma    NT  Bronchitis    NT  Emphysema    RT  Airway management    RT  Lungs Preferred, or main term Scope note defining term and how it is used Non-preferred term: points to term used to index Terms broader in nature to main term: COPD is a disease, and specifically, a respiratory disease Terms narrower in nature to main term: these are chronic lung diseases Terms related to main term that might be used to narrow the search

New scientific content requiring a huge enhancement to vocabulary Seven MS Word vocabulary documents— English and foreign language (French, German, Spanish)—printed for internal use only Six authority files & 3 vocabulary files in Oracle databases, requiring duplicate entry of subject terms in Word and Oracle Legacy editorial system in process of being replaced Managing Terms Manually

New scientific content requiring a huge enhancement to vocabulary

Seven MS Word vocabulary documents—

English and foreign language (French, German,

Spanish)—printed for internal use only

Six authority files & 3 vocabulary files in Oracle

databases, requiring duplicate entry of subject

terms in Word and Oracle

Legacy editorial system in process of being

replaced

Thesaurus Management Systems Buying Criteria Thesaurus Management System: Requirements Eliminate double entry Improve editorial interface with vocabulary Automate entry of reciprocal relationships

Eliminate double entry

Improve editorial interface with vocabulary

Automate entry of reciprocal relationships

Life With Synaptica Word – Old, Bad  Synaptica – New, Good 

Adding Terms Today: 3 Easy Steps 2. Export report of new terms into Word 1. Enter term and relationships into Synaptica “ Item Details” window 3. Send Word document to editors

Improving Thesaurus Management Categories Feature

Subject Term Categories

CORP Names – Categories & Website

Foreign-Language Vocabularies Language Equivalents

Foreign-Language Vocabularies Life With Synaptica Spanish German French Spanish Alphabetical by language

Synaptica Updates Synaptica version 6.0 released in early 2006 Synaptica version 7.0 is being implemented now: Enhanced user interface Semantic Web standardization (RDF, OWL, SKOS) and Web Services integration Expanded Reporting functionality Enhanced adding and editing of term relationships including “rapid-fire” simple drag-and-drop editing Improved global term editing Online help and user guides

Synaptica version 6.0 released in early 2006

Synaptica version 7.0 is being implemented now:

Enhanced user interface

Semantic Web standardization (RDF, OWL, SKOS) and

Web Services integration

Expanded Reporting functionality

Enhanced adding and editing of term relationships

including “rapid-fire” simple drag-and-drop editing

Improved global term editing

Online help and user guides

Benefits of Synaptica Greater awareness of thesaurus standards and terminology, e.g.: “preferred” and “non-preferred” instead of Use and Used For Long-needed updating and improvement in term hierarchies; ability to provide thesaurus statistics Increase in Company name NPTs — from 1935 to 8952 today Immediate responsiveness to indexer needs — real-time term additions, esp. NPTs and SNs Easier loading of updated Thesaurus on PQ interface

Greater awareness of thesaurus standards and

terminology, e.g.: “preferred” and “non-preferred”

instead of Use and Used For

Long-needed updating and improvement in term

hierarchies; ability to provide thesaurus statistics

Increase in Company name NPTs — from 1935 to

8952 today

Immediate responsiveness to indexer needs —

real-time term additions, esp. NPTs and SNs

Easier loading of updated Thesaurus on PQ interface

thank you!

Add a comment

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages

Centralized Taxonomy Management and Synaptica Case Study ...

Centralized Taxonomy Management ... Taxonomy Management for Enterprise Information Systems. Centralized Taxonomy Management for Enterprise Information Systems.
Read more

Enterprise Taxonomy & Ontology Management | TopQuadrant, Inc

Enterprise Taxonomy & Ontology Management ... value to your business systems. For more information or to ... Enterprise Metadata Management; Information ...
Read more

Taxonomy and Classification Management for ECM

... ECM system for training Taxonomy ... Information Management software | Enterprise ... of taxonomy and classification management ...
Read more

ECM Taxonomy Management - MRC Information Technology - Our ...

ECM Taxonomy Management ... but enterprise information ... and content silos to an enterprise content management system can be just as high ...
Read more

A guide to developing taxonomies for effective data management

A guide to developing taxonomies for effective data management. ... A business taxonomy forces system ... an enterprise understand the information it ...
Read more

Video Library: Synaptica and Dow Jones Taxonomy Services ...

September 2008 Centralized Taxonomy Management for Enterprise Information Systems ... in centralized taxonomy management and ... Taxonomy Management ...
Read more

Taxonomy Management | LinkedIn

A good content management taxonomy will help ... Taxonomy Management System Support Change ... Management for Enterprise Information Systems.
Read more

AIIM - What is ECM? What is Enterprise Content Management?

Enterprise Content Management ... management system or other information ... in a content management system. Categorization/Taxonomy ...
Read more