advertisement

Metadata first, ontologies second

50 %
50 %
advertisement
Information about Metadata first, ontologies second

Published on June 6, 2008

Author: JosebaAbaitua

Source: slideshare.net

advertisement

Towards a solution to extract knowledge from the social web (“metadata first, ontologies second”) Project Collaborative Ontology Building System (CollOnBus) INTEK Nets 2005-2007 Aitor Almeida, Borja Sotomayor, Joseba Abaitua , Diego Lopez de Ipiña

Social web: source of knowledge Crowds share and tag resources of different types: pictures, music, posts, videoclips, slides, books, bookmarks, etc. Social tagging (or crowd- tagging ) is a very effective and economic way of generating knowledge Crowdsourcing “the trend of leveraging the mass collaboration enabled by Web2.0 technologies to achieve business goals. ” <http://en.wikipedia.org/wiki/Crowdsourcing>

Crowds share and tag resources of different types:

pictures, music, posts, videoclips, slides, books, bookmarks, etc.

Social tagging (or crowd- tagging ) is a very effective and economic way of generating knowledge

Crowdsourcing “the trend of leveraging the mass collaboration enabled by Web2.0 technologies to achieve business goals. ”

<http://en.wikipedia.org/wiki/Crowdsourcing>

Related work (since 2006) mapping tags to ontologies Schmitz 2006. Inducing Ontology from Flickr tags. WWW’2006: Collaborative Web Tagging workshop Abbasi et. al. 2007. Organizing Resources on Tagging Systems using T-ORG. ESWC2007 SemNet workshop identifying semantic relations Specia, Motta. 2007. Integrating Folksonomies with the Semantic Web. ESWC2007 transforming folksonomies into formal representations Marlow et al. 2006. Tagging, Taxonomy, Flickr, Article, ToRead. WWW’2006: Collaborative Web Tagging workshop Hotho et al. 2006. Trend Detection in Folksonomies . Semantics And Digital Media Technology SAMT2006 Maala et. Al. A Conversion Process From Flickr Tags to RDF Descriptions. BIS2007 workshop

mapping tags to ontologies

Schmitz 2006. Inducing Ontology from Flickr tags. WWW’2006: Collaborative Web Tagging workshop

Abbasi et. al. 2007. Organizing Resources on Tagging Systems using T-ORG. ESWC2007 SemNet workshop

identifying semantic relations

Specia, Motta. 2007. Integrating Folksonomies with the Semantic Web. ESWC2007

transforming folksonomies into formal representations

Marlow et al. 2006. Tagging, Taxonomy, Flickr, Article, ToRead. WWW’2006: Collaborative Web Tagging workshop

Hotho et al. 2006. Trend Detection in Folksonomies . Semantics And Digital Media Technology SAMT2006

Maala et. Al. A Conversion Process From Flickr Tags to RDF Descriptions. BIS2007 workshop

Which knowledge representation model? Extracting knowledge from data sharing Web 2.0 sites, but into which formal representation? Semantic Networks Lexical networks (WordNet) Taxonomines eg. categories from Wikipedia, Thesauri Metadata “ mapping to Dublin Core is a weak choice” Ontologies “ metadata first, ontologies second”

Extracting knowledge from data sharing Web 2.0 sites, but into which formal representation?

Semantic Networks

Lexical networks (WordNet)

Taxonomines

eg. categories from Wikipedia, Thesauri

Metadata

“ mapping to Dublin Core is a weak choice”

Ontologies

“ metadata first, ontologies second”

Crowds tagging pictures

Crowds tagging pictures Aitor Almeida Borja Sotomayor Diego López de Ipiña

Crowds tagging pictures

Crowds tagging posts

Crowds tagging slides

Crowds tagging books

Crowds tagging URL

Crowd-sharing of tags Flickr, del.icio.us... group tags by social sharing (or “co-usage”) but the semantic information that socially shared tags acquire is poorly exploited

Flickr, del.icio.us... group tags by social sharing (or “co-usage”)

but the semantic information that socially shared tags acquire is poorly exploited

Mapping folksonomies into tag clusters RawSugar <http://rawsugar.com/> allows users to assign hierarchies to their tags, improving the navigation and searching of folksonomies non-expert users will find it easier to tag resources without any restrictions

RawSugar <http://rawsugar.com/>

allows users to assign hierarchies to their tags, improving the navigation and searching of folksonomies

non-expert users will find it easier to tag resources without any restrictions

Tag clustering TAG clustering is the main technique used to improve the wealth of social tagging but semantic relations are not detected

TAG clustering is the main technique used to improve the wealth of social tagging

but semantic relations are not detected

Beyond tag clusters?

Should we map them into ontologies?

Better mapping 1st into metadata

Metadata vs ontologies Why are metadata structures better than ontologies (for resource classification and categorisation)? Let’s reflect on different knowledge representations and about who use them: Folksonomies (crowds) Taxonomies, ontologies (knowledge engineers, AI/SW practitioners) Metadata structures (librarians, archivists, documentalists)

Why are metadata structures better than ontologies (for resource classification and categorisation)?

Let’s reflect on different knowledge representations and about who use them:

Folksonomies (crowds)

Taxonomies, ontologies (knowledge engineers, AI/SW practitioners)

Metadata structures (librarians, archivists, documentalists)

What are metadata?

TAG vs metadata ?

Metadata vs ontologies Why are metadata structures better ? Because metadata provide wide and complete range of facets for representing knowledge about an entity or resource Each facet (or data type) could be part of one or several ontological structures Facet “any of the definable aspects that make up a subject (as of contemplation) or an object (as of consideration)” “ A faceted classification system allows the assignment of multiple classifications to an object, enabling the classifications to be ordered in multiple ways, rather than in a single, pre-determined, taxonomic order” (Wikipedia).

Why are metadata structures better ?

Because metadata provide wide and complete range of facets for representing knowledge about an entity or resource

Each facet (or data type) could be part of one or several ontological structures

Facet “any of the definable aspects that make up a subject (as of contemplation) or an object (as of consideration)”

“ A faceted classification system allows the assignment of multiple classifications to an object, enabling the classifications to be ordered in multiple ways, rather than in a single, pre-determined, taxonomic order” (Wikipedia).

Better mapping 1st folksonomies into metadata structures

Dublin Core Metadata Initiative http://jodi.tamu.edu/Articles/v02/i02/Greenberg/metadataform.gif

Dublin Core Metadata Initiative

Dublin Core Metadata Inicitive

Our mapping tool: folk2onto (? folk2meta) designed by Borja Sotomayor

folk2onto: Tag Distiller Tag Distiller : Downloads tags from Web 2.0 sites Matches each tag against WordNet (taking into account the tag’s context/cloud) Filters out synonyms Keeps the list of remaining tags Generates an XML file Implemented by Aitor Almeida

Tag Distiller :

Downloads tags from Web 2.0 sites

Matches each tag against WordNet (taking into account the tag’s context/cloud)

Filters out synonyms

Keeps the list of remaining tags

Generates an XML file

Implemented by Aitor Almeida

TAG clouds from del.icio.us http://del.icio.us/url/check?url=site Looks for <title> and gets its content: the hash Gets the RSS in http://del.icio.us/rss/url/ + hash Then tag-clouds are downloaded from < rdf:li resource=&quot;http://del.icio.us/tag/&quot; >

http://del.icio.us/url/check?url=site

Looks for <title> and gets its content: the hash

Gets the RSS in

http://del.icio.us/rss/url/ + hash

Then tag-clouds are downloaded from

< rdf:li resource=&quot;http://del.icio.us/tag/&quot; >

TAG clouds from Technorati Technorati: blog aggregator We can get tag clouds from Technoraty through: http://api.technorati.com/blogposttags?key= [apikey] &url= [blog URL]

Technorati: blog aggregator

We can get tag clouds from Technoraty through: http://api.technorati.com/blogposttags?key= [apikey] &url= [blog URL]

TAG clouds from Technorati <?xml version=&quot;1.0&quot; encoding=&quot;utf-8&quot;?> <!-- generator=&quot;Technorati API version 1.0 /blogposttags&quot; --> <!DOCTYPE tapi PUBLIC &quot;-//Technorati, Inc.//DTD TAPI 0.02//EN&quot; &quot;http://api.technorati.com/dtd/tapi-002.xml&quot;> <tapi version=&quot;1.0&quot;> <document> <result> <querycount>13</querycount> </result> <item> <tag>christmas cookie recipes</tag> <posts>274</posts> </item> … .

<?xml version=&quot;1.0&quot; encoding=&quot;utf-8&quot;?>

<!-- generator=&quot;Technorati API version 1.0 /blogposttags&quot; -->

<!DOCTYPE tapi PUBLIC &quot;-//Technorati, Inc.//DTD TAPI 0.02//EN&quot; &quot;http://api.technorati.com/dtd/tapi-002.xml&quot;>

<tapi version=&quot;1.0&quot;>

<document>

<result>

<querycount>13</querycount>

</result>

<item>

<tag>christmas cookie recipes</tag>

<posts>274</posts>

</item>

… .

Tagged URL at Technorati All <tag> elements are downloaded To get the “title” http://api.technorati.com/bloginfo?key= [apikey] &url= [blog url] And<name> is recovered

All <tag> elements are downloaded

To get the “title” http://api.technorati.com/bloginfo?key= [apikey] &url= [blog url]

And<name> is recovered

semantic relations in WordNet WordNet relations for tag ‘Spanish’:

WordNet relations for tag ‘Spanish’:

TAG filtering algorithm Tags are filtered out by means of WordNet If a TAG has only one meaning (synset) that meaning is assigned If it has more than one, then T: resources tag set Related(a,b): gives 1 if a and b have some type of relation (hypernym, hyponym, holonym, meronym) w: weights Several iterations are made until a meaning is found (10 iterations max.)

Tags are filtered out by means of WordNet

If a TAG has only one meaning (synset) that meaning is assigned

If it has more than one, then

T: resources tag set

Related(a,b): gives 1 if a and b have some type of relation (hypernym, hyponym, holonym, meronym)

w: weights

Several iterations are made until a meaning is found (10 iterations max.)

TAG filtering algorithm Once senses have been discarded, synonyms are also filtered out Words then are grouped in senses using WordNet’s relation network The output is exported to a: XML file with senses XML file with tags that were discarded RDF containing WordNet’s relation network

Once senses have been discarded, synonyms are also filtered out

Words then are grouped in senses using WordNet’s relation network

The output is exported to a:

XML file with senses

XML file with tags that were discarded

RDF containing WordNet’s relation network

TAG XML file <?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?> <resource> <tittle>PostgreSQL: Perguntas Frequentes (FAQ) sobre PostgreSQL</tittle> <type>Text</type> <format>text/html</format> <identifier>www.postgresql.org/docs/faqs.FAQ_brazilian.html</identifier> <tags> <tag> <lemma>tune</lemma> < idlex>236726</idlex> </tag> <tag> <lemma>bd</lemma> <idlex>5604473</idlex> </tag>

<?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?>

<resource>

<tittle>PostgreSQL: Perguntas Frequentes (FAQ) sobre PostgreSQL</tittle>

<type>Text</type>

<format>text/html</format>

<identifier>www.postgresql.org/docs/faqs.FAQ_brazilian.html</identifier>

<tags>

<tag>

<lemma>tune</lemma>

< idlex>236726</idlex>

</tag>

<tag>

<lemma>bd</lemma>

<idlex>5604473</idlex>

</tag>

TAG file without senses <resource> <tittle>Wired News: The Virus That Ate DHS</tittle> <type>Text</type> <format>text/html</format> <identifier>www.wired.com/news/technology/0,72051-0.html?tw=rss.index</identifier> <tags> <tag>bit200f06</tag> <tag>group141</tag> <tag>dhs</tag> <tag>group35</tag> <tag>malware</tag><tag>group91</tag><tag>group17</tag> <tag>group53</tag> <tag>computer_security</tag> </tags> </resource>

<resource>

<tittle>Wired News: The Virus That Ate DHS</tittle>

<type>Text</type>

<format>text/html</format>

<identifier>www.wired.com/news/technology/0,72051-0.html?tw=rss.index</identifier>

<tags>

<tag>bit200f06</tag>

<tag>group141</tag>

<tag>dhs</tag>

<tag>group35</tag>

<tag>malware</tag><tag>group91</tag><tag>group17</tag>

<tag>group53</tag>

<tag>computer_security</tag>

</tags>

</resource>

WordNet’s sense sets Words are grouped in sense sets If related(a,b) is = 1, then words are grouped in the same set The relations depth has to be equal or smaller than 3

Words are grouped in sense sets

If related(a,b) is = 1, then words are grouped in the same set

The relations depth has to be equal or smaller than 3

folk2onto: Tag Trainer

folk2onto: Map Trainer

folk2onto: Tag Mapper The Mapper makes tag-element associations These associations are made according to the senses asigned by the Distiller Mapping targets into Dublin Core metadata records

The Mapper makes tag-element associations

These associations are made according to the senses asigned by the Distiller

Mapping targets into Dublin Core metadata records

folk2onto: Dublin Core The Distiller gets 4 elements from the tag source (del.icio.us, Technorati, etc.): Title : URL’s title -> from the <title> XML tag Type : content type -> depending on the source (here both are “Text”) Format : MIME class -> depending on the source (here we have 2 text/html) Identifier : we take the resource’s URL

The Distiller gets 4 elements from the tag source (del.icio.us, Technorati, etc.):

Title : URL’s title -> from the <title> XML tag

Type : content type -> depending on the source (here both are “Text”)

Format : MIME class -> depending on the source (here we have 2 text/html)

Identifier : we take the resource’s URL

folk2onto: Dublin Core The Tag-Mapper deals with: Subject : the “topic”. Language : en, es, fr, de, ru... Coverage : when, where (about the topic) Rights : type of licence

The Tag-Mapper deals with:

Subject : the “topic”.

Language : en, es, fr, de, ru...

Coverage : when, where (about the topic)

Rights : type of licence

folk2onto: mapping formulae When a TAG has one mapping, that TAG is used If it has more than one: If it has no mapping, then:

When a TAG has one mapping, that TAG is used

If it has more than one:

If it has no mapping, then:

folk2onto: file mapping <rdf:RDF xmlns:j.0=&quot;http://purl.org/dc/elements/1.1&quot; xmlns:rdf=&quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&quot; > <rdf:Description rdf:nodeID=&quot;A0&quot;> <rdf:type rdf:resource=&quot;http://purl.org/dc/elements/1.1identifier&quot;/> <j.0:identifier>www.postgresql.org/docs/faqs.FAQ_brazilian.html</j.0:identifier> <j.0:type>Text</j.0:type> <j.0:format>text/html</j.0:format> <j.0:tittle>PostgreSQL: Perguntas Frequentes (FAQ) sobre PostgreSQL</j.0:tittle> <j.0:subject>database</j.0:subject> <j.0:subject>performance</j.0:subject> <j.0:subject>bd</j.0:subject> </rdf:Description> </rdf:RDF>

<rdf:RDF

xmlns:j.0=&quot;http://purl.org/dc/elements/1.1&quot;

xmlns:rdf=&quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&quot; >

<rdf:Description rdf:nodeID=&quot;A0&quot;>

<rdf:type rdf:resource=&quot;http://purl.org/dc/elements/1.1identifier&quot;/>

<j.0:identifier>www.postgresql.org/docs/faqs.FAQ_brazilian.html</j.0:identifier>

<j.0:type>Text</j.0:type>

<j.0:format>text/html</j.0:format>

<j.0:tittle>PostgreSQL: Perguntas Frequentes (FAQ) sobre PostgreSQL</j.0:tittle>

<j.0:subject>database</j.0:subject>

<j.0:subject>performance</j.0:subject>

<j.0:subject>bd</j.0:subject>

</rdf:Description>

</rdf:RDF>

Mapping trainer

folk2onto: 6 tests (A-F) Experiment A : Selecting random synsets for the tags. Experiment B : Without any limit in the semantic relation depth. Only taking into account the trained synsets (frec=0, wordnet=0, trained=1). Experiment C : Without any limit in the semantic relation depth. Only taking into account the context (frec=0, wordnet=1, trained=0). Experiment D : Without any limit in the semantic relation depth. Taking the context and the trained synsets into account (frec=0,=wordnet0.4, trained=0.6). Experiment E : Without any limit in the semantic relation depth. Taking al three components of the equation (familiarity, context and trained synsets) into account (frec=0.1, wordnet=0.3, trained=0.6). Experiment F : Limiting the semantic relation depth to 3 and taking the context and the trained synsets into account. (frec=0, wordnet=0.4, trained=0.6).

Experiment A : Selecting random synsets for the tags.

Experiment B : Without any limit in the semantic relation depth. Only taking into account the trained synsets (frec=0, wordnet=0, trained=1).

Experiment C : Without any limit in the semantic relation depth. Only taking into account the context (frec=0, wordnet=1, trained=0).

Experiment D : Without any limit in the semantic relation depth. Taking the context and the trained synsets into account (frec=0,=wordnet0.4, trained=0.6).

Experiment E : Without any limit in the semantic relation depth. Taking al three components of the equation (familiarity, context and trained synsets) into account (frec=0.1, wordnet=0.3, trained=0.6).

Experiment F : Limiting the semantic relation depth to 3 and taking the context and the trained synsets into account. (frec=0, wordnet=0.4, trained=0.6).

folk2onto: tests output 278 (%12.8) 1894 (%87.2) F 823 (%37.9) 1349 (%62.1) E 680 (%31.3) 1492 (%68.7) D 973 (%44.8) 1199 (%55.2) C 578 (%26.6) 1594 (%73.4) B 1466 (%67.5) 706 (%32.5) A Erroneous synsets Correct synsets Experiment

folk2onto: tests output

Open issues Tag filtering through WordNet blog, wiki xml, rdf, rss wordpress, tuenti, flickr social, open “ tags can be about so many things mapping to Dublin Core is a weak choice” Mappings Coverage: Japan Language: Spanish Learning the right synset of eg. &quot;jaguar&quot; &quot;vehicle&quot;, &quot;video game console&quot;, or &quot;cat of prey&quot; &quot;<dc:subject>Jaguar</dc:subject>&quot; Word-sense disambiguation tag-category disambiguation

Tag filtering through WordNet

blog, wiki

xml, rdf, rss

wordpress, tuenti, flickr

social, open

“ tags can be about so many things

mapping to Dublin Core is a weak choice”

Mappings

Coverage: Japan

Language: Spanish

Learning the right synset of eg. &quot;jaguar&quot;

&quot;vehicle&quot;, &quot;video game console&quot;, or &quot;cat of prey&quot;

&quot;<dc:subject>Jaguar</dc:subject>&quot;

Word-sense disambiguation

tag-category disambiguation

That was all about CollOnBus/folk2onto Thank you very much! Any question?

Thank you very much!

Any question?

Add a comment

Related pages

What is an Ontology? | Marine Metadata Interoperability

What is an Ontology? ... There are two views on what makes a controlled vocabulary qualify as an ontology. In the first view, ... In the second view, ...
Read more

RDF: Ontologies and Metadata - Bris

Ontologies and Metadata A Draft Discussion of issues raised by the Semantic Web Technologies Workshop, 22-23 November 2000. Author: Libby Miller
Read more

OMEGA: An Automatic Ontology Metadata Generation Algorithm ...

OMEGA: An Automatic Ontology Metadata Generation Algorithm. Look Inside. Seite 1. Erschienen in: Knowledge Engineering: Practice and Patterns
Read more

Ontology, Metadata, and Semiotics - John F. Sowa

Ontology, Metadata, and Semiotics. ... Ontologies contain categories, ... the first has a parent who is the second.
Read more

Ontology­based Approach for Interoperability of Digital ...

Second, publishing ontologies for the ... Finally, the use of the rich metadata and ontologies are ... ontologies were first compared and ...
Read more

From Metadata to Ontology Representation: A Case of ...

From Metadata to Ontology Representation: A Case of Converting Severe Weather Forecast Metadata to an Ontology Miao Chen1,2, Beth Plale1 Indiana University ...
Read more

Ontology-based semantic metadata extraction approach

Ontology-based semantic metadata extraction ... Second, it must be one of ... within the ontology allow users first to have a better understanding of the ...
Read more

Metadata Classifications | Marine Metadata Interoperability

First, so that you will know their meaning if you run across these terms in other metadata-related reading. Second, ... "Metadata Classifications."
Read more