advertisement

MAHA Talk

50 %
50 %
advertisement
Information about MAHA Talk
Entertainment

Published on October 29, 2007

Author: Peppar

Source: authorstream.com

advertisement

Data Indexing and Sharing in Large-scale Distributed Systems:  Data Indexing and Sharing in Large-scale Distributed Systems Maha Abdallah LIP6, Paris VI DBMS : A Brief History:  DBMS : A Brief History Application Files DB Server DBMS Connections (ODBC, JDBC) Application Application A Client-Server Architecture Why a DBMS ?:  Why a DBMS ? Simplified Data Management & Access Data Definition Language (DDL) Data Manipulation Language (DML) Specific DBMS Functionalities Query Engine Query Optimizer Transaction Management CREATE TABLE Accounts ( AccountID varchar(30) PRIMARY KEY, CustomerID varchar(30) NOT NULL, Branch varchar(10) NOT NULL, Balance Decimal(10, 2), …) INSERT INTO Accounts VALUES (‘200’, ‘JONES’, ‘New York’, 3500.84) Relational Data Representation:  Relational Data Representation 333.00 New York KHAN 456 500.00 San Francisco ONO 400 340.14 San Francisco GREEN 350 23.17 New York SMITH 345 200.00 San Francisco GRAY 324 1000.00 New York JONES 200 Balance Branch CustomerID AccountID . . . New York San Francisco Primary Index Index Distributed DBMS Design:  Distributed DBMS Design Top-down Approach Global schema definition Data fragmentation Local sites for fragments storage Transparent Distributed DBMS Design:  Distributed DBMS Design Horizontal Fragmentation (Account) branch = ‘New York’ Vertical Fragmentation (Account) accountID, balance Distributed DBMS Design:  Distributed DBMS Design Bottom-up Approaches Integration of preexisting legacy systems Mediated schema approach Others: Ontology-based approach, … Global As View (GAV) Global schema as view over Source ones query(G) = query(f(S1,…, Sn)) Local As View (LAV) Sources as views over global schema query(G) = query(f1-1 (S1),…, fn-1 (Sn)) Integrated Schema User view Integration Mediator OODBMS ORDBMS Heterogeneous Systems RDBMS Adapter Adapter Adapter Distributed DBMSs…:  Distributed DBMSs… Assessment in large-scale environments Central Mediation Schema Centralized / Replicated Global Index Issue : Current DB technologies DO NOT SCALE Can we Bring P2P systems’ scalability to the DB world ? Why P2P ?:  Why P2P ? A distributed system architecture Typically many nodes, unreliable and heterogeneous No centralized control Nodes are symmetric in function Harness distributed, shared resources (bandwidth, CPU, storage) on peer nodes Fault-tolerant, self-organizing Dynamic environment with frequent join and leave P2P Challenge : Locating Content:  P2P Challenge : Locating Content Unstructured Systems Napster : Centralized global index Gnutella : No global index Flooding who has file F ? 1 Node 1 Napster central server Napster Gnutella P2P Challenge : Locating Content:  P2P Challenge : Locating Content Structured Systems Distributed Hash Tables Efficient : hash-based indexing Scalable : distributed global index ... x y z hash table hash bucket insert (key, data) lookup (key) → data 0 1 2 ... node insert (key, data) lookup (key) → data N-1 Hash Tables Associate data with keys Key is hashed to find bucket in hash table DHTs Associate data with keys Key is hashed to find peer node in the system Internet Scale DHTs : A Case Study :  Internet Scale DHTs : A Case Study Content Addressable Networks (CAN) Interface - insert(key,value) - value = retrieve(key) DHTs : Dealing with System Dynamicity:  DHTs : Dealing with System Dynamicity Consistent Hashing Fixed hash space shared by all peer nodes 2-dimensional space 1 By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks CAN : Sharing Hash Space:  CAN : Sharing Hash Space 1 2 By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks CAN : Sharing Hash Space:  CAN : Sharing Hash Space 1 2 3 By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks CAN : Sharing Hash Space:  1 2 3 4 CAN : Sharing Hash Space By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks CAN : Sharing Hash Space:  CAN : Sharing Hash Space By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks CAN: Example:  CAN: Example I By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks CAN: Example:  CAN: Example node I::insert(K,V) I By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks CAN: Example:  (1) a = hx(K) CAN: Example x = a node I::insert(K,V) I By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks CAN: Example:  (1) a = hx(K) b = hy(K) CAN: Example x = a y = b node I::insert(K,V) I By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks CAN: Example:  (1) a = hx(K) b = hy(K) CAN: Example (2) route(K,V) -> (a,b) node I::insert(K,V) I By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks CAN: Example:  CAN: Example (2) route(K,V) -> (a,b) (3) (a,b) stores (K,V) (K,V) node I::insert(K,V) I (1) a = hx(K) b = hy(K) By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks CAN: Example:  CAN: Example (2) route “retrieve(K)” to (a,b) (K,V) (1) a = hx(K) b = hy(K) node J::retrieve(K) J By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks CAN: routing table:  CAN: routing table A node only maintains state for its immediate neighboring nodes By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks CAN: routing:  CAN: routing (a,b) (x,y) By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks CAN: node insertion:  CAN: node insertion Bootstrap node Discover some node “I” already in CAN new node By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks CAN: node insertion:  CAN: node insertion I new node By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks CAN: node insertion:  CAN: node insertion 2) pick random point in space I (p,q) new node By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks CAN: node insertion:  CAN: node insertion (p,q) I routes to (p,q), discovers node J I J new node By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks CAN: node insertion:  CAN: node insertion new J Split J’s zone in half… new owns one half By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks DHTs…:  DHTs… Assessment in large-scale environments Coarse-grained indexing by document ID File level Content and semantic unawareness Random graph construction No “practical” relation between neighbors, as… - Semantic proximity - Geographical proximity, … Read-only data sharing Problem: do not adapt to complex data sharing fine-grained data semantically rich data Read/Write data sharing Bridging the Gap…:  Bridging the Gap… Can we bring DBMSs’ technologies to the P2P world ? DBMS Technologies P2P Technologies scalable Fine-grained Sharing of Semantically Rich Data Targeted Applications : Scientific data sharing among research communities Fine-Grained Indexing : Basic Idea:  Fine-Grained Indexing : Basic Idea DHT-based approach Indexing granularity Relations, attributes Key-words Content-based look-up & access Strong query language expressiveness Relational algebra operators Relational Data Indexing:  Relational Data Indexing Peers support relational data No central mediated schema Global index distributed over peers Meta-Data for semantic/structural content description Relations set of keywords Attributes set of keywords SELECT treatment FROM Sickness WHERE Name = “Cancer” Typical query Q Slide36:  Relational Data Indexing Slide37:  Relational Data Indexing Relation & Attribute indexing (key, value) hash index, ∀ key ∈ {relation name, relation keywords} value = node IP@ Sample hash index : Relations entries h(sickness) Peer3 h(Illness) Peer2 Relational Data Indexing:  Relational Data Indexing Sample hash index : Attributes entries h(sickness||name) Peer3 h(sickness||treatment) Peer1 Peer1 issues query Q SELECT treatment SELECT Attribute FROM Sickness FROM Relation WHERE name = “Cancer” WHERE Predicate h(sickness||name) Peer3 & h(Illness||name) Peer2 h(sickness||treatment) Peer1 & h(Illness||treatment) Peer4 Send request Q to Peers 1, 2 & 3 only Slide39:  Relational Data Indexing Assessment Scalability No global mediation schema Attribute-level distributed indexing, look-up and access Highly expressive query languages Current Extensions Store value intervals in index Support for range queries SELECT Attribute1 FROM Relation WHERE Attribute2 BETWEEN 350 AND 500 Still a lot to do… Semantic Clustering & Indexing:  Semantic Clustering & Indexing Meta-Data for Content Description General core meta-data schema Theme-specialized schema Hierarchical semantic representation Semantic/content characterization Documents : keyword vectors Document indexing Index placement Peers : semantic vectors/matrices Overlay construction Semantic node clustering Semantic Groups/Subgroups Slide41:  Semantic Clustering & Indexing Vector construction from meta-data Vector values represent relevance degree for content description Semantic Proximity Node vectors <theme1, theme2, theme3,…> <sub-theme1, sub-theme2, sub-theme3,…>, etc. Semantic Distance d, where d = cosine of angle between vectors Neighbors chosen wrt d Slide42:  K V K V K V K V K V K V K V K V K V K V K V Semantic Clustering & Indexing Document vectors : <keyword1, keyword2, …, keywordN> Document indexing : keywords hashing Slide43:  Semantic Clustering & Indexing Direct Issues Group specification - Completeness - Validity - Evolution Group management Semantic distance between groups Index placement wrt group semantics Other Issues Beyond semantic grouping … Confidence work spheres Multi criteria grouping And still… Data consistency and updates Schema mappings

Add a comment

Related presentations

Related pages

maha´s blog - MOTOR-TALK

Ein verregneter Dienstag Vormittag im Februar. Ich befinde mich auf einem örtlichen Supermarktparkplatz und bringe noch schnell die Post weg.
Read more

Talk:Maha Nawrahta - Wikipedia, the free encyclopedia

This is the talk page for discussing improvements to the Maha Nawrahta article. This is not a forum for general discussion of the article's subject.
Read more

Caddy hat ´nen Hängerchen : maha´s blog - motor-talk.de

Nachdem wir zumindest theoretisch Sommer- und somit Hochsaison haben, heute ein kleines Update nebst neuem Projekt -> Dem Himmel <-. Original-Radio mit EU ...
Read more

Talk:Maha Sona - Wikipedia, the free encyclopedia

A fact from Maha Sona appeared on Wikipedia's Main Page in the Did you know? column on 31 October 2009 (check views). The text of the entry was as follows ...
Read more

Maha Cattarisaka Sutta Talk - Na Uyane ... - YouTube

Sadaham Deshana - Tribute to Gauthama Samma Sam Budhu Rajanan Wahanse.
Read more

Talk:Maha Sarakham/Listings - Wikitravel

From Bangkok, motorist can head north via highway 1 and get into highway 2 at Saraburi, via Nakhon Racthasima. Then use highway 226 to Buri Ram, highway ...
Read more

Talk:Maha Sarakham – Travel guide at Wikivoyage

This page was last edited at 12:39, on 6 February 2010 by Wikivoyage user Globe-trotter. Text is available under the Creative Commons Attribution ...
Read more

Audio Dhamma Talks in English - Ajahn Maha Bua

Ajahn (Luangta) Maha Bua and Baan Taad Forest Monastery, Thai Forest tradition. All of talks of Ajahn Maha Bua translated into English in MP3 format
Read more

A Talk by Gurumayi Chidvilasananda

A Talk by Gurumayi Chidvilasananda given in Shree Muktananda Ashram during the Maha Abhishek Puja Satsang.
Read more