evs4j


A Java implementation of the Totem single ring protocol.

```

CREDITS

Paul Kendall (testing, code review, bug fixes) Samant Maharaj (testing, code review, bug fixes)

EVS4J is a pure-Java(tm) implementation of the Totem single ring protocol:

"The Totem Single-Ring Ordering and Membership Protocol", Y. Amir, L. E. Moser, P. M. Melliar-Smith, D. A. Agarwal, and P. Ciarfella, ACM Transactions on Computer Systems 13, 4 (November 1995), 311-342.

NOTE: The flow control algorithm in this article uses a fixed window size. I found that this would make it impossible to use the same network for anything else, so I implemented congestion control. Now the window backs off nicely when needed. A maximum window size is still required.

Features:

  • Group membership (configuration) service
  • Reliable multicast
  • Total ordering
  • Flow control
  • Congestion control
  • Recovery of messages when a processor fails or joins

WARNING: Using multicasting on your LAN can take away precious bandwidth from others on the network and create huge delays. Do not try this code on your LAN unless you know that the packets won''t end up on some production network (and sometimes this happens by mistake.) Talk to your sys admin.

Usage

To use this protocol you need to do the following:

  1. Pick integer ids for the nodes in your cluster, e.g. 1,2,3.
  2. Pick a port number to send multicast packets on.
  3. Pick a multicast address.
  4. If you have more than one adapter on each node pick one subnet you want to use.
  5. Get an instance of a Connection object.
  6. (Probably) create a Listener object.
  7. Call open() on the connection.

Some time after open() returns, the method Listener.onConfiguration() is called. This is to notify the application that a configuration (a ring, or group) has been created. After this you can expect to receive messages through the onMessage() method.

See the API documentation and src/Example.java for an example.

Configuration parameters

The class which implements evs4j.Connection is:

evs4j.impl.SRPConnection

The constructor has the signature:

SRPConnection(long, Processor, String)

The first parameter can be zero 0 when running a test, but in a real application it should be the id of the last configuration used before the system was shut down. This id should be forced to disk in Listener.onConfiguration() and it should be read from disk when creating a new connection.

The second parameter is a Processor object with an arbitrary integer id which must be unique across the cluster. This id could be generated from the ip address of each, but that will depend on each cluster.

The third parameter is a string containing name value pairs using ''='' between name and value and ''&'' between pairs, for example:

"port=9100&ip=224.0.0.1&nic=192.168.254.0/255.255.255.0"

This is the complete list of properties:

port The port number used on multicast packets. Required.

ip The multicast ip address used, e.g. 224.0.0.1. Required.

nic The network interface to be used. This can be either of the form ###.###.###.###/###.###.###.###, e.g. 192.168.254.0/255.255.255.0 or the ''name'' of the interface, e.g eth0. This property is required only when the machine has more than one network adapter installed. This is the proper behavior because the EVS4J benchmark creates a multicast storm and we don''t want to do that to the wrong network.

windowSize The maximum window size (number of messages) to be used for flow control. The window size at any given moment varies between 0 and this value. The window size will decrease automatically (following the Van Jacobson et al. protocol) if you start ftp file transfers etc. on the same network. In a production environment you might consider using a dedicated network. You can use this parameter and the size of the ring to tune latency and throughput. Optional. The default is 30.

tokenDroppedTimeout A timeout in milliseconds used to determine if the token was dropped by the network or by the receiver''s buffer and needs to be re-sent. See totem article for details. Optional. The default is 3.

tokenLossTimeout A timeout in milliseconds used to determine if the token was dropped because one of the processor failed. When this timeout expires the remaining processors attempt to form a new configuration. See totem article for details. Optional. The default is 1000.

joinTimeout Analogous to tokenDroppedTimeout but applies to the membership protocol. See totem article for details. Optional. The default is 3.

consensusTimeout Analogous to tokenLossTimeout but applies to the membership protocol. See totem article for details. Optional. The default is 1000.

Known issues

Under certain conditions on Linux the network card (or cards) have to be configured specifically to support certain multicast addresses. If you are not receiving any packets this may be the reason.

Support

There is no commercial support for this code, however you can try me at akiva dot lichtner at gmail dot com if you like.

Benchmark

The following command runs 2 processors both sending and receiving 1450-byte packets in one JVM:

java -classpath evs4j.jar \ -Xincgc -client \ evs4j.tool.benchmark.Main \ -props "port=9100&ip=224.0.0.1&nic=192.168.254.0/255.255.255.0" \ -procs 1 2

In a recent test on 4 dual-core desktop machines evs4j achieved 25,000 messages per second (270 Mpbs,) with a token rotation time of 2ms. We had to change the maximum message size in the code from 1450 to 1423, as it turns out that this fits exactly into a udp packet (thanks for Paul Kendall for discovering that the maximum message size was too large.)

When the switch supports jumbo frames you can change the maximum packet size and the maximum message size to 9000 and 8923, respectively (presently these have to be changed in the code,) and achieve about 6000 messages per second (400 Mpbs.)

EVS4J 1.0b3

Incorrectly handled case of 2 network adapters. Should have forced user to specify one, but didn''t. Fixed.

Paul Kendall and Samant Maharaj sent in a patch for two bugs, one being a bug in the code that collected free nodes in the message buffer and put them back in the free list (a 6-line change) and the other a bug in updating the last safe message id. Legal note: on 3/2/2006 Paul assigned the copyright to me (Guglielmo Lichtner) on behalf of himself and Samant Maharaj.

EVS4J 1.0b2

On Linux 2.6.x with ipv6 support the nic-finding code threw an exception. Fixed.

On Windows XP when you bring an adapter down it seem to disappear entirely. Added code to report to handle this case more informatively.

In SRPRecovery.java there was a FIXME about delivering messages from the transitional processors only. Fixed.

EVS4J 1.0b1

First release since around January 2004 (0.9.1).

Deprecated EVS4J project in SourceForge because it uses CVS which is hard to administer. I am using my own Subversion repository now. I have removed a lot of code and done some refactorings to simplify the system as much as possible.

Changed the license. This code is now licensed under the Apache License, version 2.0. ```

Project Information