Introduction P2p

33 %
67 %
Information about Introduction P2p

Published on September 7, 2007

Author: dadaista

Source: slideshare.net

Description

A short introduction to p2p computing

Introduction to P2P systems Davide Carboni © 2005-2006

License Attribution-ShareAlike 2.5 You are free: to copy, distribute, display, and perform the work to make derivative works to make commercial use of the work Under the following conditions: Attribution . You must give the original author credit. Share Alike . If you alter, transform, or build upon this work, you may distribute the resulting work only under a licence identical to this one. For any reuse or distribution, you must make clear to others the licence terms of this work. Any of these conditions can be waived if you get permission from the copyright holder. Your fair use and other rights are in no way affected by the above. This is a human-readable summary of the Legal Code (the full licence ) . Disclaimer

P2P is about sharing resources Your CPU time Your bandwidth Your disk space

Your CPU time

Your bandwidth

Your disk space

What is P2P From Wikipedia A peer-to-peer computer network is a network that relies on the computing power and bandwidth of the participants in the network rather than concentrating it in a relatively low number of servers

From Wikipedia

A peer-to-peer computer network is a network that relies on the computing power and bandwidth of the participants in the network rather than concentrating it in a relatively low number of servers

P2P and GRID From Wikipedia Grid computing […] performs higher throughput computing by taking advantage of many networked computers to model a virtual computer architecture.

From Wikipedia

Grid computing […] performs higher throughput computing by taking advantage of many networked computers to model a virtual computer architecture.

Topology Comparison Client/server GRID P2P server client client=server

Overlay Crs4.it Australian ISP Mobile phones in cell xyz

Overlay Crs4.it Australian ISP Mobile phones in cell xyz

Three main issues in P2P systems Bootstrapping Index/Lookup (query) Delivery of large objects (in case of file sharing)

Bootstrapping

Index/Lookup (query)

Delivery of large objects (in case of file sharing)

A la Napster Query / Query Hits GET <file>

Copyright issues with Napster Napster claimed that the law allows people to share music with friends. The court considered this position illegal and Napster was closed.

Napster claimed that the law allows people to share music with friends.

The court considered this position illegal and Napster was closed.

Gnutella Overlay Requestor Responder

Gnutella Messages ping, pong, push, query, queryhit 16 23 – 23+payload length Payload length 19-22 hops 18 TTL 17 GUID 0 - 15 Description Byte

Gnutella messages ping: discover hosts on network pong: reply to ping query: search for a file query hit: reply to query push: download request for firewalled servents Ref. http://rfc-gnutella.sourceforge.net/developer/stable/index.html

ping: discover hosts on network

pong: reply to ping

query: search for a file

query hit: reply to query

push: download request for firewalled servents

Ref. http://rfc-gnutella.sourceforge.net/developer/stable/index.html

Gnutella: PING Requestor PING

Gnutella: PONG Requestor PONG

Gnutella: QUERY Requestor QUERY

Gnutella: QUERY-HITS A C B D Requestor QUERY-HITS Responder 1 Responder 2

Gnutella: GET the file Requestor Responder 1 GET file HTTP/1.1 file

Gnutella, behind firewalls Requestor Responder GET file

Gnutella, behind firewalls (2) C B D Requestor Responder PUSH A

Gnutella, behind firewalls (3) Requestor Responder FILE

Bootstrapping in Gnutella X-Try Ping/Pong Storing from QueryHit messages GWebCache

X-Try

Ping/Pong

Storing from QueryHit messages

GWebCache

Open issues in Gnutella Latency Scalability Vulnerability Privacy Security

Latency

Scalability

Vulnerability

Privacy

Security

Is Gnutella obsolete? Alive and Kicking The version 0.6 of the protocol prevents pure flooding and uses smart routing based on Ultrapeers More than 2 millions users with 500,000 nodes always up

Alive and Kicking

The version 0.6 of the protocol prevents pure flooding and uses smart routing based on Ultrapeers

More than 2 millions users with 500,000 nodes always up

Popularity of P2P Networks (measured by Slick.com) Latest Statistics taken 2006-02-26 22:14:12: eDonkey2K Users: 3,474,261 FastTrack Users: 2,609,688 Gnutella Users: 2,219,539 Overnet Users: 578,521 MP2P Users: 252,893 Filetopia Users: 4,806

Latest Statistics taken 2006-02-26 22:14:12: eDonkey2K Users: 3,474,261 FastTrack Users: 2,609,688 Gnutella Users: 2,219,539 Overnet Users: 578,521 MP2P Users: 252,893 Filetopia Users: 4,806

Hub (Gnutella2 et al.) Hub Web

Hub Requirements > 100 sockets CPU and RAM for servicing the network Uptime (>2 hours) Broadband (also for upload) Able to receive inbound TCP and/or UDP (IP in the global address space, no NAT)

> 100 sockets

CPU and RAM for servicing the network

Uptime (>2 hours)

Broadband (also for upload)

Able to receive inbound TCP and/or UDP (IP in the global address space, no NAT)

Hub Tasks Keep up-to-date information about other hubs Manage routing tables to route messages efficiently Manage filters for query messages Monitor they own resources.

Keep up-to-date information about other hubs

Manage routing tables to route messages efficiently

Manage filters for query messages

Monitor they own resources.

Query Hash Table QHTs provide information to know that a particular node (and possibly its descendants) will not be able to provide any matching objects for a given query. queries can be discarded confidently. Neighbours know what their neighbours do not have, but cannot say for sure what they do have. QHT

QHTs provide information to know that a particular node (and possibly its descendants) will not be able to provide any matching objects for a given query.

queries can be discarded confidently.

Neighbours know what their neighbours do not have, but cannot say for sure what they do have.

What is Hashing From Wikipedia, the free encyclopedia A hash function or hash algorithm is a function for examining the input data and producing an output hash value. The process of computing such a value is known as hashing . The process of hashing has the property that two different inputs are unlikely to hash to the same hash value.

From Wikipedia, the free encyclopedia

A hash function or hash algorithm is a function for examining the input data and producing an output hash value. The process of computing such a value is known as hashing . The process of hashing has the property that two different inputs are unlikely to hash to the same hash value.

What is Hashing (2) Collisions occur with 2^(-N)

Query Hash Table 1 1 1 1 1 1 1 1 1 1 0 1 2 2^N 0<= Hash(word) <= 2^N

Query Filtering If any of the lookups based on URNs found a hit, send the query packet If at least two thirds of lookups based on words found a hit, send Otherwise, drop the packet Consider all text content in the query, including generic search text and metadata search text if it is present. Tokenize quoted phrases into words, ignoring the phrase at this level

If any of the lookups based on URNs found a hit, send the query packet

If at least two thirds of lookups based on words found a hit, send

Otherwise, drop the packet

Consider all text content in the query, including generic search text and metadata search text if it is present.

Tokenize quoted phrases into words, ignoring the phrase at this level

Distributed hashtables

Distributed Hashtables Main features: a key is mapped onto a node of the network. Several proposals: Chord, Pastry and Kademlia. Lookup(key) reaches the right node with O(log(N) ) hops.

Main features: a key is mapped onto a node of the network.

Several proposals: Chord, Pastry and Kademlia.

Lookup(key) reaches the right node with O(log(N) ) hops.

Possible applications of DHT DHT DNS Content lookup Web search engine

DHT DNS

Content lookup

Web search engine

DNS over DHT (1) Problem: how to register a name onto a IP address Assign a name to your machine, example ‘mymachine’ Check if this name is available or not using the DHT operation get(‘mymachine’). If the result is null then register the name and the IP with the DHT operation put(‘mymachine’, 212.22..)

Problem: how to register a name onto a IP address

Assign a name to your machine, example ‘mymachine’

Check if this name is available or not using the DHT operation get(‘mymachine’).

If the result is null then register the name and the IP with the DHT operation put(‘mymachine’, 212.22..)

DNS over DHT (2) Problem: how to resolve a name onto a IP address Use the DHT operation get(hostname). The result if not null is the IP address you’re searching

Problem: how to resolve a name onto a IP address

Use the DHT operation get(hostname).

The result if not null is the IP address you’re searching

Content indexing/lookup on DHT A content has a set of metadata (i.e. author, editor, genre, …) Build a different index based on DHT for each metadata i.e. the index for author put(‘john’, http://host/dir/content.avi)

A content has a set of metadata (i.e. author, editor, genre, …)

Build a different index based on DHT for each metadata

i.e. the index for author

put(‘john’, http://host/dir/content.avi)

How DHT works In DHT each node has a node ID which belogs to a set S (for instance the set of bitstrings with length 160) Also keys must hashed in the same set S (hash(key) belongs to S)

In DHT each node has a node ID which belogs to a set S (for instance the set of bitstrings with length 160)

Also keys must hashed in the same set S (hash(key) belongs to S)

Web crawlers and DHT Assume a network of nodes in a DHT Assume each node runs also a crawler. For each word in a Web page it performs Put(word,URL) So a distributed index of the Web is built[1]

Assume a network of nodes in a DHT

Assume each node runs also a crawler.

For each word in a Web page it performs

Put(word,URL)

So a distributed index of the Web is built[1]

Web search and DHT When the user type a keyword ‘foo’ lookup the DHT Get(‘foo’) The DHT will give the list of URL indexed with ‘foo’

When the user type a keyword ‘foo’ lookup the DHT

Get(‘foo’)

The DHT will give the list of URL indexed with ‘foo’

Kademlia S = [00 ....0 - 11 ...1] the set of 160bit strings Each node has a node ID in S For each 'key' hash(key) is in S

S = [00 ....0 - 11 ...1] the set of 160bit strings

Each node has a node ID in S

For each 'key' hash(key) is in S

Kademlia distance Given x,y in S Define the distance d(x,y) = xor(x,y) d has the following properties: d(x,y) = d(y,x) d(x,x) = 0 d(x,y) + d(y,z) >= d(x,z)

Given x,y in S

Define the distance d(x,y) = xor(x,y)

d has the following properties:

d(x,y) = d(y,x)

d(x,x) = 0

d(x,y) + d(y,z) >= d(x,z)

k-Buckets in kademlia Each node stores an array of lists: list[i] i = 0,1, ... , 159 list[i] stores up to k tuples: (IP,port,ID) list[i] stores tuples whose ID is: 2^i <= D(this,ID)< 2^(i+1) list[i] is ordered as LRS (last recent seen)

Each node stores an array of lists: list[i]

i = 0,1, ... , 159

list[i] stores up to k tuples: (IP,port,ID)

list[i] stores tuples whose ID is:

2^i <= D(this,ID)< 2^(i+1)

list[i] is ordered as LRS (last recent seen)

Tree for nodes in kademlia 1 1 1 1 0 0 0 0 0101

k-Buckets in kademlia For small values of i, list[i] has few elements For larger values of i, list[i] is likely to contain more elements.

For small values of i, list[i] has few elements

For larger values of i, list[i] is likely to contain more elements.

Operations in kademlia PING (IP, port) STORE (key, value) FIND_VALUE (key) FIND_NODE (ID)

PING (IP, port)

STORE (key, value)

FIND_VALUE (key)

FIND_NODE (ID)

Lookup in Kademlia FIND_NODE(hash(k)) Compute D=xor(this,hash(key)) Find a tuples in list[i] (i.e. a=3) Send FIND_NODE(hash(key)) to the 3 nodes I receive other node addresses. Reiterate FIND_NODE(hash(key)) on them. Stop when no new addresses are received

FIND_NODE(hash(k))

Compute D=xor(this,hash(key))

Find a tuples in list[i] (i.e. a=3)

Send FIND_NODE(hash(key)) to the 3 nodes

I receive other node addresses. Reiterate FIND_NODE(hash(key)) on them.

Stop when no new addresses are received

Nodes Joining and Leaving Whenever one node asks another for its contacts, the called node stores the contact information of the caller. When a node joins the network it takes some of the contacts of an arbitrary node and uses them as its own. It then does a search for itself. This results in other nodes being called, which makes them aware of the new node's existence

Whenever one node asks another for its contacts, the called node stores the contact information of the caller.

When a node joins the network it takes some of the contacts of an arbitrary node and uses them as its own.

It then does a search for itself. This results in other nodes being called, which makes them aware of the new node's existence

Node Joining and Leaving (2) A new node may have become the closest node to certain keys The previous closest nodes will replicate the appropriate key/value pairs to the new node Ignoring replication the cost of a node joining is only O(log n) messages.

A new node may have become the closest node to certain keys

The previous closest nodes will replicate the appropriate key/value pairs to the new node

Ignoring replication the cost of a node joining is only O(log n) messages.

Range Query in DHT (1) DHT maps a key onto a node It is easy to lookup a value given a key It is uneasy lookup values in a range of keys Example 1: Lookup all tuples in ‘aaaa’ < key < ‘bbbb’ Example 2: Lookup all tuples in ’39,88’ < lat < ’39,94’

DHT maps a key onto a node

It is easy to lookup a value given a key

It is uneasy lookup values in a range of keys

Example 1:

Lookup all tuples in ‘aaaa’ < key < ‘bbbb’

Example 2:

Lookup all tuples in ’39,88’ < lat < ’39,94’

References (1) Napster Timeline http://www.cnn.tv/SPECIALS/2001/napster/timeline.html The Gnutella Developer Forum http://www.the-gdf.org/wiki/index.php?title=Main_Page History of Gnutella in ‘Gnutella’ http://ntrg.cs.tcd.ie/undergrad/4ba2.02-03/p5.html Slyck.com DHT Links http://www.etse.urv.es/~cpairot/dhts.html

Napster Timeline

http://www.cnn.tv/SPECIALS/2001/napster/timeline.html

The Gnutella Developer Forum

http://www.the-gdf.org/wiki/index.php?title=Main_Page

History of Gnutella in ‘Gnutella’

http://ntrg.cs.tcd.ie/undergrad/4ba2.02-03/p5.html

Slyck.com

DHT Links

http://www.etse.urv.es/~cpairot/dhts.html

References (2) YACY (DHT Web search/index) http://www.yacy.net/yacy/ Kademlia : A Peer-to-peer Information System Based on the XOR Metric . (paper) Khashmir – Kademlia in Python http://khashmir.sourceforge.net/ A Case Study in Building Layered DHT Applications (paper on range query/DHT) http://www.placelab.org/publications/pubs/IRS-TR-05-001.pdf

YACY (DHT Web search/index)

http://www.yacy.net/yacy/

Kademlia : A Peer-to-peer Information System Based on the XOR Metric . (paper)

Khashmir – Kademlia in Python

http://khashmir.sourceforge.net/

A Case Study in Building Layered DHT Applications (paper on range query/DHT)

http://www.placelab.org/publications/pubs/IRS-TR-05-001.pdf

License Attribution-ShareAlike 2.5 You are free: to copy, distribute, display, and perform the work to make derivative works to make commercial use of the work Under the following conditions: Attribution . You must give the original author credit. Share Alike . If you alter, transform, or build upon this work, you may distribute the resulting work only under a licence identical to this one. For any reuse or distribution, you must make clear to others the licence terms of this work. Any of these conditions can be waived if you get permission from the copyright holder. Your fair use and other rights are in no way affected by the above. This is a human-readable summary of the Legal Code (the full licence ) . Disclaimer

Add a comment

Related presentations

Related pages

P2P: Introduction and Real World Applications - ReadWrite

Written by Can Erten and edited by Richard MacManus. This is the first in a 2-part series on Read/WriteWeb, exploring the world of P2P on the Web. Part 1 ...
Read more

Introduction - P2P Foundation

Gwendal Simon: why develop P2P-based virtual worlds? Strolling into magnificent virtual world can now be casually experienced in massively multi-player games.
Read more

An Introduction to Peer-to-Peer Networks

An Introduction to Peer-to-Peer Networks Presentation for MIE456 - Information Systems Infrastructure II Vinod Muthusamy October 30, 2003
Read more

1. Introduction - P2P Foundation

1. Introduction 1.A. What this essay is about. 1.B. Some acknowledgments. 1. Introduction. 1.A. What this essay is about. The following essay describes the ...
Read more

Beyond Music File Sharing: A Technical Introduction to P2P ...

Christian Cikryt, Beyond Music Filesharing: A Technical Introduction to P2P Networks, 29. Januar 2010 3 Motivation Probleme klassischer Client-Server-
Read more

Introduction to Windows Peer-To-Peer Networking

Introduction to Windows Peer-to ... The SSP used for groups is known as the P2P Group Connect ... Windows Peer-to-Peer Networking uses ...
Read more

Peer-to-peer - Wikipedia, the free encyclopedia

Peer-to-peer (P2P) computing or networking is a distributed application architecture that partitions tasks or work loads between peers. Peers are equally ...
Read more

Introduction to the P2P Modeling System - NDSU

Introduction to the P2P Modeling System This Windows-based P2P software is a comprehensive hydrologic modeling package. It consists of four major ...
Read more

An Introduction to Peer-to-Peer Streaming - National ...

1. Introduction to Peer-to-Peer Streaming NI peer-to-peer (P2P) streaming technology uses PCI Express to enable direct, point-to-point transfers between ...
Read more

Introduction to Peer-To-Peer Software and Networks

P2P is peer-to-peer computer networking. Here's how P2P networking and P2P software applications have evolved in recent years
Read more