Efficient source selection for sparql endpoint federation

50 %
50 %
Information about Efficient source selection for sparql endpoint federation

Published on June 15, 2016

Author: muhammad_saleem

Source: slideshare.net

1. SUPERVISORS PROF. DR.-ING. HABIL. KLAUS-PETER FÄHNRICH, UNIVERSITY OF LEIPZIG DR. AXEL-CYRILLE NGONGA NGOMO , UNIVERSITY OF LEIPZIG May 13th, 2016 EFFICIENT SOURCE SELECTION FOR SPARQL ENDPOINT QUERY FEDERATION Muhammad Saleem Faculty of Mathematics and Computer Science University of Leipzig PhD Defense 1

2. OUTLINE 1. Introduction 2. Problem Statement 3. State-of-the-art Analysis 4. HIBISCUS: Hyper graph-based source selection 5. DAW: Duplicate-aware source selection 6. SAFE: Policy-aware source selection 7. TopFed: Data distribution-aware source selection 8. FEASIBLE and LSQ 9. LargeRDFBench 10. Conclusion 11. Publication and Awards 2

3. INTRODUCTION  Linked, decentralized and distributed architecture  9,960 datasets  ~150B triples  Complex information needs  Need for federated queries 3

4. INTRODUCTION: EXAMPLE Return the party membership and news pages about all US presidents.  Party memberships  US presidents  US presidents  News pages  Computation of results require data from both sources 4

5. INTRODUCTION: EXECUTION OF FEDERATION S1 S2 S3 S4 RDF RDF RDF RDF Parsing/Rewriting Source Selection Federator Optimizer Integrator Rewrite query and get Individual Triple Patterns Identify capable/relevant sources Generate optimized query Execution Plan Integrate sub- queries results Execute sub- queries Federation Engine 5

6. MOTIVATION: SOURCE SELECTION FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } DBpedia RDF Source Selection Algorithm Triple pattern-wise source selection S1TP1 = KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9 //TP1 //TP3 //TP4 //TP5 //TP2 6

7. MOTIVATION: SOURCE SELECTION 7 Source Selection Algorithm Triple pattern-wise source selection S1TP1 = S1TP2 = FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } //TP1 //TP3 //TP4 //TP5 //TP2 DBpedia RDF KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9

8. MOTIVATION: SOURCE SELECTION 8 Source Selection Algorithm Triple pattern-wise source selection S1TP1 = S1TP2 = S1TP3 = FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } //TP1 //TP3 //TP4 //TP5 //TP2 DBpedia RDF KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9

9. MOTIVATION: SOURCE SELECTION 9 Source Selection Algorithm Triple pattern-wise source selection S1TP1 = S1TP2 = S1TP3 = S4TP4 = FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } //TP1 //TP3 //TP4 //TP5 //TP2 DBpedia RDF KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9

10. MOTIVATION: SOURCE SELECTION 10 Source Selection Algorithm Triple pattern-wise source selection S1TP1 = S1TP2 = S1TP3 = S4TP4 = S1TP5 = S2 S5-S9 Total triple pattern-wise sources selected = 1+1+1+1+8 => 12 S4 FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } //TP1 //TP3 //TP4 //TP5 //TP2 DBpedia RDF KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9

11. MOTIVATION: ANYTHING WRONG? 11 Source Selection Algorithm Triple pattern-wise source selection S1TP1 = S1TP2 = S1TP3 = S4TP4 = S1TP5 = S2 S5-S9 Total triple pattern-wise sources selected = 1+1+1+1+1=> 5 S4 FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } //TP1 //TP3 //TP4 //TP5 //TP2 317068 irrelevant intermediate results DBpedia RDF KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9

12. PROBLEM STATEMENT 12 Overestimation of sources is expensive  Extra intermediate results  Extra network traffic  Increase overall runtime 1. How to perform join-aware source selection with ensured result set completeness? 2. How to test the efficiency of the source selection? Comprehensive benchmarks  Which system is better and why?  What are the limitations of a given system?  How one can improve a given system? 3. How to design comprehensive federated SPARQL as well as triple stores benchmark?

13. STATE-OF-THE-ART 13Saleem et al. A Fine-Grained Evaluation of SPARQL Endpoint Federation Systems (Semantic

14. PROBLEM STATEMENT AND CONTRIBUTIONS 14 Research Questions 1. How to perform join-aware source selection with ensured result set completeness? 2. How to perform duplicate- aware source selection? 3. How to perform policy-aware source selection? 4. How to perform data distribution-aware source selection? 5. How to design comprehensive federated SPARQL as well as triple stores benchmark? S1 S2 S3 S4 RDF RDF RDF RDF Parsing/Rewriting Source Selection Federator Optimizer Integrator Federation Engine QUETSAL, LargeRDFBen ch, State-of- the-art EvaluationHIBISCuS, DAW, SAFE, TopFed

15. PROBLEM STATEMENT AND CONTRIBUTIONS 15 S1 S2 S3 S4 RDF RDF RDF RDF Parsing/Rewriting Source Selection Federator Optimizer Integrator Federation Engine QUETSAL, LargeRDFBen ch, State-of- the-art EvaluationHIBISCuS, DAW, SAFE, TopFed Research Questions 1. How to perform join-aware source selection with ensured result set completeness? 2. How to perform duplicate- aware source selection? 3. How to perform policy-aware source selection? 4. How to perform data distribution-aware source selection? 5. How to design comprehensive federated SPARQL as well as triple stores benchmark?

16. MOTIVATION: JOIN-AWARE SOURCE SELECTION 16 Source Selection Algorithm Triple pattern-wise source selection S1TP1 = S1TP2 = S1TP3 = S4TP4 = S1TP5 = S2 S5-S9 Total triple pattern-wise sources selected = 1+1+1+1+1=> 5 S4 FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } //TP1 //TP3 //TP4 //TP5 //TP2 DBpedia RDF KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9

17. HIBISCUS: HYPER GRAPH-BASED SOURCE SELECTION  Models SPARQL queries as hypergraphs  Makes use of URI‘s authorities in index  Performs join-aware triple pattern-wise source selection  Can be combined with any existing SPARQL endpoint federation system 17 Muhammad Saleem, Axel-Cyrille Ngonga Ngomo HiBISCuS: Hypergraph- Based Source Selection for SPARQL Endpoint Federation (ESWC, 2014)

18. HIBISCUS: HYPER GRAPH-BASED SOURCE SELECTION  Makes use of the URI’s authorities 18 http://dbpedia.org/ontology/party Scheme Authority Path

19. HIBISCUS: HYPER GRAPH-BASED SOURCE SELECTION SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President 19

20. HIBISCUS: HYPER GRAPH-BASED SOURCE SELECTION SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_States dbpedia: nationality 20

21. HIBISCUS: HYPER GRAPH-BASED SOURCE SELECTION SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_States dbpedia: nationality dbpedia: party ?party 21

22. HIBISCUS: HYPER GRAPH-BASED SOURCE SELECTION SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_States dbpedia: nationality ?x dbpedia: party ?party nyt:topic Page ?page 22

23. HIBISCUS: HYPER GRAPH-BASED SOURCE SELECTION SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_States dbpedia: nationality ?x owl: SameAS dbpedia: party ?party nyt:topic Page ?page Star simple hybrid Tail of hyperedge 23

24. HIBISCUS: HYPER GRAPH-BASED SOURCE SELECTION SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_States dbpedia: nationality ?x owl: SameAS dbpedia: party ?party nyt:topic Page ?page 24 dbpedi a KEG G NY T SWDF LMD B Geo Jamend o Obj. auth. dbpedi a Sbj. auth. KEG G Sbj. auth. NY T Sbj. auth. SWD F Sbj. auth. LMD B Sbj. auth. Geo Sbj. auth. DrgB nk Sbj. auth. Jamend o Sbj. auth. DrgBnk

25. HIBISCUS: HYPER GRAPH-BASED SOURCE SELECTION SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_States dbpedia: nationality ?x owl: SameAS dbpedia: party ?party nyt:topic Page ?page 25 dbpedi a KEG G NY T SWDF LMD B Geo Jamend o Obj. auth. dbpedi a Sbj. auth. KEG G Sbj. auth. NY T Sbj. auth. SWD F Sbj. auth. LMD B Sbj. auth. Geo Sbj. auth. DrgB nk Sbj. auth. Jamend o Sbj. auth. DrgBnk

26. HIBISCUS: HYPER GRAPH-BASED SOURCE SELECTION SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_States dbpedia: nationality ?x owl: SameAS dbpedia: party ?party nyt:topic Page ?page 26 Total triple pattern-wise sources selected = 5 instead of 12

27. EFFICIENT SOURCE SELECTION FedX(warm) SPLENDID DARQ ANAPSID HiBISCus (warm) Query #TP #AR SST #TP #AR SST #TP #AR SST #TP #AR SST #TP #AR SST CD 78 0 7.33 78 99 320.9 84 0 7.286 36 43 186 35 0 30.43 LS 56 0 7.99 56 90 307.3 77 0 7.571 44 63 477.4 41 0 23.14 LD 97 0 8.09 97 126 279 113 0 7.727 54 37 803.5 47 0 16 Net 231 0 8 231 315 299 274 0 7.56 134 143 554 123 0 22 27

28. FEDX EXTENSION WITH HIBISCUS 0 50 100 150 200 250 300 350 400 450 500 CD1 CD2 CD3 CD4 CD5 CD6 CD7 LS1 LS2 LS3 LS4 LS5 LS6 LS7 LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10 LD11 Avg. Queryexecutiontime(msec) FedX (warm) FedX+HiBISCus Improvement in 20/25 queries with net performance improvement 24.61% 28

29. SPLENDID EXTENSION WITH HIBISCUS 29 0 200 400 600 800 1000 1200 CD1 CD2 CD3 CD4 CD5 CD6 CD7 LS1 LS2 LS3 LS4 LS5 LS6 LS7 LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10LD11 Avg. Queryexecutiontime(msec) SPLENDID SPLENDID+HiBISCus Improvement in 24/25 queries with net performance improvement 82.72%

30. DARQ EXTENSION WITH HIBISCUS 30 0.01 0.1 1 10 100 1000 10000 100000 CD1 CD2 CD3 CD4 CD5 CD6 CD7 LS1 LS2 LS3 LS4 LS5 LS6 LS7 LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10LD11 Avg Queryexecutiontime(msec)logscale Hundreds ANAPSID SPLENDID+HiBISCusNotsupported Notsupported Runtimeerror Runtimeerror Runtimeerror Timeout Timeout Notsupported Notsupported Timeout Timeout Improvement in 20/20 queries with net performance improvement 92.22%

31. SPLENDID+HIBISCUS VS. ANAPSID 31 0.01 0.1 1 10 100 1000 CD1 CD2 CD3 CD4 CD5 CD6 CD7 LS1 LS2 LS3 LS4 LS5 LS6 LS7 LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10LD11 Avg. Queryexecutiontime(msec)logscale Hundreds ANAPSID SPLENDID+HiBISCus ZeroResults Improvement in 25/25 queries with net performance improvement 98%

32. PROBLEM STATEMENT AND CONTRIBUTIONS 32 S1 S2 S3 S4 RDF RDF RDF RDF Parsing/Rewriting Source Selection Federator Optimizer Integrator Federation Engine QUETSAL, LargeRDFBen ch, State-of- the-art EvaluationHIBISCuS, DAW, SAFE, TopFed Research Questions 1. How to perform join-aware source selection with ensured result set completeness? 2. How to perform duplicate- aware source selection? 3. How to perform policy- aware source selection? 4. How to perform data distribution-aware source selection? 5. How to design comprehensive federated SPARQL as well as triple stores benchmark?

33. DAW: DUPLICATE-AWARE SOURCE SELECTION 33 Retrieved results for TP1 (?uri <p1> ?v1) Triple pattern-wise source selection and skipping S1 S2 S3TP1 = Total triple pattern-wise selected sources = 4 S1 S2TP2 = S4 Min. number of new triples (threshold) = 20 Total triple pattern-wise skipped sources = 2 Retrieved results for TP2 (?uri <p2> ?v2)

34. DAW: DUPLICATE-AWARE SOURCE SELECTION  A combination of MIPs with compact data summaries  Use average selectivities values for bound subject and objects  Can be combined with any existing SPARQL endpoint federation system  Can be used for partial result retrieval 34 Saleem et al. DAW: Duplicate-AWare Federated Query Processing over the Web of Data (ISWC, 2013)

35. DAW: MIN-WISE INDEPENDENT PERMUTATIONS 35 48 24 36 18 820 21 3 12 24 877 9 21 15 24 4640 21 18 45 30 339 h1 = (7x + 3) mod 51 h2 = (5x + 6) mod 51 hN = (3x + 9) mod 51 8 9 9 Apply Permutations to all ID’s ID set Create MIP Vector from Minima of Permutations 8 9 30 24 36 9 8 24 20 48 36 13 MIPs estimated operations h(concat(s,o)) T4(s,p,o) T5(s,p,o) T6(s,p,o) T1(s,p,o) T2(s,p,o) T3(s,p,o) Triples VA VB 8 9 20 24 36 9 Union (VA , VB) Resemblance (VA , VB ) = 2/6 => 0.33 Overlap (VA , VB ) = 0.33*(6+6) / (1+0.33) => 3 hi = ai∗x + bimod U

Add a comment

Related pages

Efficient Source Selection for SPARQL Endpoint Query ...

Official Full-Text Publication: Efficient Source Selection for SPARQL Endpoint Query Federation on ResearchGate, the professional network for scientists.
Read more

HiBISCuS: Hypergraph-Based Source Selection for SPARQL ...

We have zero results for ANAPSID CD7 efficient source selection is one of key factors in the ... efficient source selection for SPARQL endpoint federation.
Read more

HiBISCuS: Hypergraph-Based Source Selection for SPARQL ...

... Hypergraph-Based Source Selection for SPARQL Endpoint Federation. ... Efficient federated ... source selection approach to federated SPARQL ...
Read more

HiBISCuS: Hypergraph-Based Source Selection for SPARQL ...

Efficient federated query processing is of significant importance to tame the large amount of data available on the Web of Data. Previous works have ...
Read more

HiBISCuS: Hypergraph-Based Source Selection for SPARQL ...

Efficient federated query ... based source selection approach to federated SPARQL ... Source Selection for SPARQL Endpoint Federation} ...
Read more

On the Selection of SPARQL Endpoints to Efficiently ...

We consider the problem of source selection and query ... source selection for SPARQL endpoint federation ... L.: Efficient evaluation ...
Read more

A fine-grained evaluation of SPARQL endpoint federation ...

... ASK queries for source selection leads to an efficient source selection in terms of TP ... source selection for SPARQL endpoint federation, in: ...
Read more

QUETSAL — Agile Knowledge Engineering and Semantic Web (AKSW)

QUETSAL is a SPARQL endpoint federation engine for federated ... SPARQL query federation engines in terms of efficient source selection and overall query ...
Read more

Muhammad Saleem | Leipzig University (Universität Leipzig ...

Muhammad Saleem, Leipzig University (Universität ... Unique Predicate-based Source Selection for SPARQL Endpoint ... Efficient source selection is ...
Read more