Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributional-Compositional Semantics Approach

100 %
0 %
Information about Natural Language Queries over Heterogeneous Linked Data Graphs: A...
Technology

Published on February 27, 2014

Author: andrenfreitas

Source: slideshare.net

Description

The demand to access large amounts of heterogeneous structured
data is emerging as a trend for many users and applications.
However, the effort involved in querying heterogeneous
and distributed third-party databases can create major
barriers for data consumers. At the core of this problem is
the semantic gap between the way users express their information
needs and the representation of the data. This work
aims to provide a natural language interface and an associated
semantic index to support an increased level of vocabulary
independency for queries over Linked Data/Semantic
Web datasets, using a distributional-compositional semantics
approach. Distributional semantics focuses on the automatic
construction of a semantic model based on the statistical distribution
of co-occurring words in large-scale texts. The proposed
query model targets the following features: (i) a principled
semantic approximation approach with low adaptation
effort (independent from manually created resources such as
ontologies, thesauri or dictionaries), (ii) comprehensive semantic
matching supported by the inclusion of large volumes
of distributional (unstructured) commonsense knowledge into
the semantic approximation process and (iii) expressive natural language queries. The approach is evaluated using natural language queries on an open domain dataset and achieved avg. recall=0.81, mean avg. precision=0.62 and mean reciprocal rank=0.49.

Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributional-Compositional Semantics Approach André Freitas and Edward Curry Insight Centre for Data Analytics International Conference on Intelligent User Interfaces Haifa, 2014

Talking to your (Big) Data

Motivation

Shift in the Database Landscape  Heterogeneous, complex and large-scale databases.  Very-large and dynamic “schemas”. circa 2014 circa 2000 10s-100s attributes 1,000s-1,000,000s attributes

Databases for a Complex World How do you query data on this scenario?

Vocabulary Problem for Databases Query: Who is the daughter of Bill Clinton married to? Semantic Gap Possible representations Semantic approximation = Commonsense Knowledge

Semantics for a Complex World Formal World Real World Distributional Semantics Query Approach

Does it work?

Addressing the Vocabulary Problem for Databases (with Distributional Semantics) Gaelic: direction

Solution (Video)

More Complex Queries (Video)

Treo Answers Jeopardy Queries (Video) http://bit.ly/1hWcch9

Evaluation  102 natural language queries (Test Collection: QALD 2011).  Avg. query execution time: 1.52 s (simple queries) – 8.53 s (all queries). Dataset (DBpedia 3.7 + YAGO): 45,767 predicates, 5,556,492 classes and 9,434,677 instances

Comparative Evaluation

Query Approach

Distributional Semantics “Words occurring in similar (linguistic) contexts are semantically related.”  If we can equate meaning with context, we can simply record the contexts in which a word occurs in a collection of texts (a corpus).  This can then be used as a surrogate of its semantic representation.

Distributional Semantic Model function (number of times that the words occur in c1) c1 0.7 0.5 husband spouse cn c2 child Commonsense is here

Semantic Relatedness c1 husband spouse Works as a semantic ranking function θ cn c2 child

Approach Overview Query Query Analysis Query Features Query Planner Query Plan Core semantic approximation & composition operations Ƭ-Space Database Distributional semantics Large-scale unstructured data Commonsense knowledge

Approach Overview Query Query Analysis Query Features Query Planner Query Plan Core semantic approximation & composition operations Ƭ-Space RDF Data Explicit Semantic Analysis (ESA) Wikipedia Commonsense knowledge

Ƭ-Space r p e

Core Operations Query

Core Operations Query Search & Composition Operations

Search and Composition Operations  Instance search - Proper nouns - String similarity + node cardinality  Class (unary predicate) search - Nouns, adjectives and adverbs - String similarity + Distributional semantic relatedness  Property (binary predicate) search - Nouns, adjectives, verbs and adverbs - Distributional semantic relatedness  Navigation  Extensional expansion - Expands the instances associated with a class.  Operator application - Aggregations, conditionals, ordering, position   Disjunction & Conjunction Disambiguation dialog (instance, predicate)

Core Principles  Minimize the impact of Ambiguity, Vagueness, Synonymy.  Address the simplest matchings first (heuristics).  Semantic Relatedness as a primitive operation.  Distributional semantics as commonsense knowledge.

Question Analysis Transform natural language queries into triple patterns “Who is the daughter of Bill Clinton married to?” Bill Clinton daughter married to PODS (INSTANCE) (PREDICATE) (PREDICATE) Query Features

Query Plan Map query features into a query plan. A query plan contains a sequence of core operations. (INSTANCE) (PREDICATE) (PREDICATE)  (1) INSTANCE SEARCH (Bill Clinton)  (2) p1 <- SEARCH PREDICATE (Bill Clintion, daughter)  (3) e1 <- NAVIGATE (Bill Clintion, p1)  (4) p2 <- SEARCH PREDICATE (e1, married to)  (5) e2 <- NAVIGATE (e1, p2) Query Features Query Plan

Instance Search Query: Bill Clinton daughter Instance Search Linked Data: :Bill_Clinton married to

Predicate Search Query: Linked Data: Bill Clinton daughter married to :child :Bill_Clinton :Chelsea_Clinton :religion :Baptists :almaMater ... (PIVOT ENTITY) :Yale_Law_School (ASSOCIATED TRIPLES)

Predicate Search Query: Bill Clinton daughter married to Which properties are semantically related to „daughter‟? Linked Data: :child :Bill_Clinton :Chelsea_Clinton :religion ... :Baptists sem_rel(daughter,child)=0.054 sem_rel(daughter,child)=0.004 :almaMater :Yale_Law_School sem_rel(daughter,alma mater)=0.001

Navigate Query: Linked Data: Bill Clinton daughter :child :Bill_Clinton :Chelsea_Clinton married to

Navigate Query: Linked Data: Bill Clinton daughter :child :Bill_Clinton :Chelsea_Clinton (PIVOT ENTITY) married to

Predicate Search Query: Linked Data: Bill Clinton daughter :spouse :child :Bill_Clinton married to :Chelsea_Clinton (PIVOT ENTITY) :Mark_Mezvinsky

Results

Conclusions  The compositional-distributional model supports a schemaagnostic natural language query mechanism over a large schema (open domain) database  Comprehensive and accurate semantic matching - Avg. recall=0.81, map=0.62, mrr=0.49  Medium-high expressivity - 80% of queries answered  Interactive query execution time - Avg. 1.52 s (simple queries) – 8.53 s (all queries) / query  Better recall and query coverage compared to baselines with equivalent precision  Low adaptation effort for new datasets

Add a comment

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages

Natural Language Queries over Heterogeneous Linked Data ...

Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributional-Compositional Semantics Approach Andre Freitas Insight Centre for Data ...
Read more

Natural Language Queries over Heterogeneous Linked Data ...

Page 1. Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributional-Compositional Semantics Approach Andre Freitas Insight Centre for ...
Read more

Answering Natural Language Queries over Linked Data Graphs ...

Answering Natural Language Queries over Linked Data Graphs: A Distributional Semantics Approach ... QUERIES OVER LINKED DATA GRAPHS
Read more

1st Insight Workshop on Distributional Semantics

Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributional-Compositional Semantics Approach, Andre Freitas : 13:30 - 14:00
Read more

Querying Heterogeneous Datasets on the Linked Data Web ...

... for linked data and analyzes existing approaches ... queries over heterogeneous linked data graphs: a distributional-compositional semantics ...
Read more

André Freitas - Google Scholar Citations

... Natural Language Query Mechanisms over Linked. ... Querying heterogeneous datasets on the linked data web: ... a vocabulary independent approach.
Read more

Question Answering over Linked Data: Challenges ...

... approaches over Linked Data provide ... areas include Natural Language Query Mechanisms over Linked ... Space for Querying RDF Graph Data.
Read more

Querying Linked Data Graphs using Semantic Relatedness: A ...

... for Linked Data, using an approach based ... 50 natural language queries over ... Data Graphs: A Distributional Semantics ...
Read more