My favorites | Sign in
Project Home Wiki Issues Source
Search
for
MessageRouting  
Explanation of how data is found in Dijjer
Updated Feb 4, 2010 by ian.clarke

Message Routing

Dijjer peers are designed to self organize into a Small World Network. Such networks have been shown to allow data to be found in an entirely distributed, efficient, and scalable manner.

Each Peer has an associated "hash", this is a 128 bit number that is derived from the IP address and port number of the Peer, so each is unique. Each piece of data also has an associated 128 bit hash, and the goal is that peers will tend to cache data with hashes close to their own hash. This means that a peer which seeks a particular piece of data should forward the request for that data to whichever peer has a hash closest to the hash of that data.

The Small World principle requires that peers with similar hashes will have a higher probability of being connected to each other than peers with distant hashes. Specifically, the probability that a connection exists between any two peers should be proportional to the inverse of the distance between them. This is achieved by the way peers create and drop connections to each-other both during the "joining" process, and as data is requested, an approach based on that of Freenet.

Joining the network

This algorithm is implemented in dijjer.Dispatcher.handleJoinRequest().

The joining process consists of two stages. The peer that wishes to join creates a joinRequest, and sends it to the seed node (by default seed1.dijjer.org), setting the "joiner" field to itself. When a peer, including the seed node, receives a joinRequest, it first checks the "time to live" on that request. If it has expired, or if there is no node in its routing table that is closer to the joiner than itself, it returns a joinResponse immediately containing its own address in the "peers" filed. Otherwise it decrements the time-to-live and forwards the request to the peer with the closest hash in its routing table to that of the joiner.

On receiving a joinResponse from a peer to which the joinRequest was forwarded, a peer adds itself to the "peers" field of the joinResponse message and forwards it back to the peer from which it received the joinRequest. The peer then instructs its routing table to attempt to contact the joiner.

When the initiator of the joinRequest receives the joinResponse, it adds the list of peers in the "peers" field to its routing table and instructs the routing table to try to contact them. The effect of this is that the joiner gets some connections to a few distant peers, and hopefully a few closer ones. This helps to achieve the desired network topology as required for a small world network.

Requesting a file

When instructed to request a file at a given URL, a Dijjer peer will first contact that URL and do a HTTP HEAD request to determine:

  • The size of the file
  • The last-modified time
  • Its content-type
  • Whether it is permitted to cache it

If anything goes wrong (can't find the server, can't find the file, not allowed to cache etc) an appropriate error page is returned to the user via the HttpServer.

If not, the client starts to request the file. Each file is stored in the P2P network as 256k blocks. For each block, a hash is calculated, which consists of the URL, size, and last modified time, together with the number of the block. This is first requested from the network (see next section), and if this fails, it is downloaded directly from the server (internally this is handled by requesting it with a time-to-live of 0, this will make sense in the next section).

Requesting a block

When a peer receives a request for a block, whether generated as a result of the local user requesting a file, or whether it comes from another peer as a "requestData" message, we have three basic options, Either we are caching the data that was requested, we forward the request to another peer that is more likely to be caching the data, or we go to the HTTP server hosting the URL and retrieve the data itself using a HTTP Range request. In every case, we return the data to the requester when we receive it (in fact, we return it as we receive it if we forwarded the request to another peer).

The criteria under which we will just download it directly from the HTTP server, which is essentially "giving up", are as follows:

  • The time-to-live on the request is 0
  • We cannot find a peer in our routing table to forward the request to
  • We cannot find a peer in our RT that is closer to the hash of the block than we are

If none of these are the case, then we forward the request to the closest peer in our routing table to the hash of the block, adding our own address to the "forwarders" field of the requestData message.

If we end up being the "source", meaning that we had the data in our dataStore, or if we had to get it from the web server, then we must connect to each of the peers listed in the "forwarders" field of the requestData message after returning the data. They will also try to connect to us because our address will be in the dataSource field of the requestSuccessful message. We should also store the data in our DataStore to cache it for future requesters, as should each of the forwarders.

It is this last part of the process that is inspired by Freenet, and for reasons that are less than entirely obvious, will ensure that we maintain a "small world" connection distribution.


Sign in to add a comment
Powered by Google Project Hosting