Arcomem training – Enrichment Advanced (update)

50 %
50 %
Information about Arcomem training – Enrichment Advanced (update)

Published on February 18, 2014

Author: arcomem



This presentation on data enrichment is part of the ARCOMEM training curriculum. Feel free to roam around or contact us on Twitter via @arcomem to learn more about ARCOMEM training on archiving Social Media.

Entity Enrichment and Clustering in ARCOMEM Elena Demidova1, including slides by: Stefan Dietze1, Diana Maynard2, Thomas Risse1, Wim Peters2, Katerina Doka3, Yannis Stavrakas3 1 L3S Research Center, Hannover, Germany 2 University 3 Sheffield, UK IMIS, RC ATHENA, Athens, Greece

The ARCOMEM approach • Make use of the Social Web – Huge source of user generated content – Wide range of articulation methods From simple „I like it“-Buttons to complete articles – Represents the diversity of opinions of the public • User activities often triggered by – Events and related entities (e.g. Sport Events, Celebrations, Crises, News Articles, Persons, Locations) – Topics (e.g. Global Warming, Financial Crisis, Swine Flu) A semantic-aware and socially-driven preservation model is a natural way to go Slide 2

ARCOMEM architecture ARCOMEM system architecture foresees four processing levels: crawler level, online processing level, offline processing level and cross crawl analysis Slide 3

ETOE offline processing chain The processing chain depicted here describes all components involved in the offline processing of Web objects. 4

The extraction components for text Aim Extraction of Entities, Topics, Events and Opinions (ETOEs) from Web Pages Social Web (Twitter, YouTube, Facebook, …) Challenges Entity recognition from degraded input sources (tweets etc) Advancing state of the art NLP and text mining Dynamics detection: evolution of terms/entities Semantic representation of Web objects and entities Appropriate RDF schemas for ETOE and Web objects Exploiting (Linked Open) Web data to enrich extracted ETOE Entity classification (into events, locations, topics etc) & consolidation Slide 5

ETOE extraction with GATE: an example candidate multi-word term Slide 6

Data consolidation & integration problem Data extracted from different components or during different processing cycles not aligned => consolidation, disambiguation & correlation required. <Location>Greece</Location> <Person>Venizelos</Person> <Location>Griechenland</Location> <Organisation>Greek Parliament</Organisation> ? Slide 7

Data enrichment & clustering Enrichment of entities with related references to Linked Data, particularly reference datasets (DBpedia, Freebase, …) => use enrichments for clustering/correlation/consolidation Slide 8

Enrichment with DBpedia & Freebase • DBpedia and Freebase are particularly well-suited due to their vast size, the availability of disambiguation techniques which can utilise the variety of multilingual labels available in both datasets for individual data items and the level of inter-connectedness of both datasets, allowing the retrieval of a wealth of related information for particular items. • In the case of DBpedia, we make use of the DBpedia Spotlight service which enables an approximate string matching with adjustable confidence level in the interval [0,1]. Experimentally, we set confidence to 0.6. • For Freebase, we use structured queries, taking into account entity types extracted by GATE. 9

Enrichment for clustering & correlation: example <Person>Jean Claude Trichet</Person> <Organisation>ECB</Organisation> <Event>Trichet warns of systemic debt crisis</Event> Slide 10

Enrichment for clustering & correlation: example <Person>Jean Claude Trichet</Person> <Organisation>ECB</Organisation> <Event>Trichet warns of systemic debt crisis</Event> <Enrichment></Enrichment> <Enrichment></Enrichment> Slide 11

Enrichment for clustering & correlation: example <Person>Jean Claude Trichet</Person> <Organisation>ECB</Organisation> <Event>Trichet warns of systemic debt crisis</Event> <Enrichment></Enrichment> <Enrichment></Enrichment> => dbpprop:office => dcterms:subject dbpedia:President_of_the_European_Central_Bank dbpedia:Governor_of_the_Banque_de_France category:Living_people category:Karlspreis_recipients category:Alumni_of_the_École_Nationale_d'Administration category:People_from_Lyon Slide 12

ARCOMEM entities, enrichments & clusters Nodes: entities/events (blue), enrichments DBpedia (green), Freebase (orange) 1013 clusters of correlated entities/events Cluster built around enrichment db:Market Slide 13

Cluster expansion with related enrichments Clusters can be further expanded by considering related enrichments in the reference knowledge base. This is an experimental feature that is currently not included in the SARA application. Cluster expansion Cluster built around enrichment db:Market Slide 14

Clustering of entities via enrichment relatedness Discovery of “related” entities by discovering related enrichments (a) Retrieving possible paths between 2 enrichments (eg via RelFinder (b) Computation of relatedness measure (considering variables such as shortest path, number of paths, relationship types, number of directly connected edges of both enrichments…) (c) Clustering enrichments (entities) which are above certain threshold Slide 15

RDF schema for the Knowledge Base Relationships between ARCOMEM entities (ETOE etc) and enrichments RDF schema: 16

Enrichment evaluation results Manual evaluation of 240 enrichment-entity pairs Available scores: 1 (correct), 0 (incorrect), 0.5 (vague or ambiguous relationship) Entity Type Average score DBpedia Average score Freebase Average Score Total 0.71 arco:Event 0.71 arco:Location 0.81 arco:Money 0.67 arco:Organization 0.93 1 0.97 arco:Person 0.9 0.89 0.89 arco:Time 0.74 Total 0.79 0.94 0.88 0.67 0.74 0.94 0.87 Slide 17

Further reading • Entity Extraction and Consolidation for Social Web Content Preservation. S. Dietze, D. Maynard, E. Demidova, T. Risse, W. Peters, K. Doka und Y. Stavrakas, SDA, volume 912 of CEUR Workshop Proceedings, page 18-29., (2012) • Can entities be friends? B. P. Nunes , R. Kawase, S. Dietze, D. Taibi, M. A. Casanova, W. Nejdl Boston, US, 2012. Web of Linked Entities (WOLE2012), Workshop at The 11th International Semantic Web Conference (ISWC2012). • Combining a co-occurrence-based and a semantic measure for entity linking. B. P. Nunes, S. Dietze, M. A. Casanova, R. Kawase, B. Fetahu, W. Nejdl. 2013. ESWC 2013 - 10th Extended Semantic Web Conference. • Linked data - The Story So Far. Biser, C., Heath, T. and Berners-Lee, T. 2009, Special Issue on Linked data, International Journal on Semantic Web and Information Systems (IJSWIS). Slide 18

THANK YOU CONTACT DETAILS Dr. Elena Demidova L3S Research Center +49 511 762 17732

Add a comment

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages - Tacx Trainer software 4 -Software Updates

Die angezeigten Updates eignen sich sowohl für die Version Basic als auch Advanced von Tacx Trainer.
Read more - Aktualisierungen

Tacx Trainer software Version 4. ... Wenn Ihr Computer nicht direkt mit dem Internet verbunden ist, über das manuelle Update auf dieser Website.
Read more

Updates & Downloads - Garmin: MapSource Software Update ...

Home » Maps » MapSource » Updates & Downloads Updates & Downloads MapSource Additional Software. MapSource, Ver. 6.16.3 ...
Read more

KCCMR - Primate Training and Enrichment Workshop | MD ...

Primate Training and Enrichment Workshop Introduction. The University of Texas MD Anderson Cancer Center’s Department of Veterinary Sciences (DVS), ...
Read more

1. Introduction

... a more advanced ... If suitable training images ... in the offline phase, ARCOMEM allows the enrichment of the archive with additional ...
Read more

The Shape of Enrichment

The Shape of Enrichment is a quarterly publication dedicated to sharing ideas, inspirations, ... Update Profile; Logout
Read more

Animal Training

Because we believe that an animal training program is ... for new updates ... better understanding of animal enrichment, training and ...
Read more

Object Moved This document may be found here
Read more