Published on February 27, 2014
Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributional-Compositional Semantics Approach André Freitas and Edward Curry Insight Centre for Data Analytics International Conference on Intelligent User Interfaces Haifa, 2014
Talking to your (Big) Data
Shift in the Database Landscape Heterogeneous, complex and large-scale databases. Very-large and dynamic “schemas”. circa 2014 circa 2000 10s-100s attributes 1,000s-1,000,000s attributes
Databases for a Complex World How do you query data on this scenario?
Vocabulary Problem for Databases Query: Who is the daughter of Bill Clinton married to? Semantic Gap Possible representations Semantic approximation = Commonsense Knowledge
Semantics for a Complex World Formal World Real World Distributional Semantics Query Approach
Does it work?
Addressing the Vocabulary Problem for Databases (with Distributional Semantics) Gaelic: direction
More Complex Queries (Video)
Treo Answers Jeopardy Queries (Video) http://bit.ly/1hWcch9
Evaluation 102 natural language queries (Test Collection: QALD 2011). Avg. query execution time: 1.52 s (simple queries) – 8.53 s (all queries). Dataset (DBpedia 3.7 + YAGO): 45,767 predicates, 5,556,492 classes and 9,434,677 instances
Distributional Semantics “Words occurring in similar (linguistic) contexts are semantically related.” If we can equate meaning with context, we can simply record the contexts in which a word occurs in a collection of texts (a corpus). This can then be used as a surrogate of its semantic representation.
Distributional Semantic Model function (number of times that the words occur in c1) c1 0.7 0.5 husband spouse cn c2 child Commonsense is here
Semantic Relatedness c1 husband spouse Works as a semantic ranking function θ cn c2 child
Approach Overview Query Query Analysis Query Features Query Planner Query Plan Core semantic approximation & composition operations Ƭ-Space Database Distributional semantics Large-scale unstructured data Commonsense knowledge
Approach Overview Query Query Analysis Query Features Query Planner Query Plan Core semantic approximation & composition operations Ƭ-Space RDF Data Explicit Semantic Analysis (ESA) Wikipedia Commonsense knowledge
Ƭ-Space r p e
Core Operations Query
Core Operations Query Search & Composition Operations
Search and Composition Operations Instance search - Proper nouns - String similarity + node cardinality Class (unary predicate) search - Nouns, adjectives and adverbs - String similarity + Distributional semantic relatedness Property (binary predicate) search - Nouns, adjectives, verbs and adverbs - Distributional semantic relatedness Navigation Extensional expansion - Expands the instances associated with a class. Operator application - Aggregations, conditionals, ordering, position Disjunction & Conjunction Disambiguation dialog (instance, predicate)
Core Principles Minimize the impact of Ambiguity, Vagueness, Synonymy. Address the simplest matchings first (heuristics). Semantic Relatedness as a primitive operation. Distributional semantics as commonsense knowledge.
Question Analysis Transform natural language queries into triple patterns “Who is the daughter of Bill Clinton married to?” Bill Clinton daughter married to PODS (INSTANCE) (PREDICATE) (PREDICATE) Query Features
Query Plan Map query features into a query plan. A query plan contains a sequence of core operations. (INSTANCE) (PREDICATE) (PREDICATE) (1) INSTANCE SEARCH (Bill Clinton) (2) p1 <- SEARCH PREDICATE (Bill Clintion, daughter) (3) e1 <- NAVIGATE (Bill Clintion, p1) (4) p2 <- SEARCH PREDICATE (e1, married to) (5) e2 <- NAVIGATE (e1, p2) Query Features Query Plan
Instance Search Query: Bill Clinton daughter Instance Search Linked Data: :Bill_Clinton married to
Predicate Search Query: Linked Data: Bill Clinton daughter married to :child :Bill_Clinton :Chelsea_Clinton :religion :Baptists :almaMater ... (PIVOT ENTITY) :Yale_Law_School (ASSOCIATED TRIPLES)
Predicate Search Query: Bill Clinton daughter married to Which properties are semantically related to „daughter‟? Linked Data: :child :Bill_Clinton :Chelsea_Clinton :religion ... :Baptists sem_rel(daughter,child)=0.054 sem_rel(daughter,child)=0.004 :almaMater :Yale_Law_School sem_rel(daughter,alma mater)=0.001
Navigate Query: Linked Data: Bill Clinton daughter :child :Bill_Clinton :Chelsea_Clinton married to
Navigate Query: Linked Data: Bill Clinton daughter :child :Bill_Clinton :Chelsea_Clinton (PIVOT ENTITY) married to
Predicate Search Query: Linked Data: Bill Clinton daughter :spouse :child :Bill_Clinton married to :Chelsea_Clinton (PIVOT ENTITY) :Mark_Mezvinsky
Conclusions The compositional-distributional model supports a schemaagnostic natural language query mechanism over a large schema (open domain) database Comprehensive and accurate semantic matching - Avg. recall=0.81, map=0.62, mrr=0.49 Medium-high expressivity - 80% of queries answered Interactive query execution time - Avg. 1.52 s (simple queries) – 8.53 s (all queries) / query Better recall and query coverage compared to baselines with equivalent precision Low adaptation effort for new datasets
Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...
In this presentation we will describe our experience developing with a highly dyna...
Presentation to the LITA Forum 7th November 2014 Albuquerque, NM
Un recorrido por los cambios que nos generará el wearabletech en el futuro
Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...
Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributional-Compositional Semantics Approach Andre Freitas Insight Centre for Data ...
Page 1. Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributional-Compositional Semantics Approach Andre Freitas Insight Centre for ...
Answering Natural Language Queries over Linked Data Graphs: A Distributional Semantics Approach ... QUERIES OVER LINKED DATA GRAPHS
Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributional-Compositional Semantics Approach, Andre Freitas : 13:30 - 14:00
... for linked data and analyzes existing approaches ... queries over heterogeneous linked data graphs: a distributional-compositional semantics ...
... Natural Language Query Mechanisms over Linked. ... Querying heterogeneous datasets on the linked data web: ... a vocabulary independent approach.
... approaches over Linked Data provide ... areas include Natural Language Query Mechanisms over Linked ... Space for Querying RDF Graph Data.
... for Linked Data, using an approach based ... 50 natural language queries over ... Data Graphs: A Distributional Semantics ...