My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
Tutorial  
Scavenger tutorial
Featured
Updated Jul 13, 2009 by mad...@gmail.com

Introduction

What is Scavenger

Scavenger is a cyber foraging system; but what exactly is cyber foraging, you might ask. Cyber foraging is the opportunistic use of available computing reources in the vicinity (usually by small, mobile devices). That is, cyber foraging is a technique that enables small, resource constrained, mobile devices to 1) scan their current network environment for available surrogates (in cyber foraging a surrogate is a stronger machine that makes its resources, such as CPU, bandwidth, storage, etc., available to others), and 2) offload some of their resource intensive work to these surrogates. The keyword to notice here is "opportunistic"; the mobile devices use surrogates only if they are currently available, but if no surrogates are available they must be able to perform their tasks themselves.

License

Scavenger is released under the Gnu Public License (GPL) version 3. Take a look at the COPYING document in the source package for more information about this license. This software has been developed as a research prototype at Aarhus University, Denmark.

Installing Scavenger

The Scavenger daemon

Dependencies

The Scavenger daemon depends on another piece of software that we developed while working which has also been released as open source under the GPL license. This software is the Presence service discovery daemon. Presence can be found at http://presence-discovery.googlecode.com. In order for the Scavenger daemon to be able to start the Presence daemon must already be running on the system. Apart from this dependency, the Scavenger daemon also depends on Stackless Python, which can be found at http://stackless.com.

Using a pre-built bundle

On Scavengers homepage you can download pre-built Scavenger daemon bundles for a couple of platforms. These bundles are self-contained and thus include all needed dependencies. If you choose to download a Scavenger daemon bundle all you need to do to run a Scavenger daemon is:

  1. Unpack the archive.
  2. Configure the daemon by editing scavenger.ini.
  3. Run the start_daemon script.

Manual installation

The install procedure is as follows:

  1. Install the Presence daemon.
  2. Install the Presence client library.
  3. Install Stackless Python (you do not have to install Stackless as your main Python on the system).
  4. Install the Scavenger daemon.
  5. Configure the daemon by editing scavenger.ini.
  6. Start Scavenger by running main.py in the Stackless Python interpreter.

Configuring the daemon

The Scavenger daemon needs a tiny bit of configuring to be running perfectly. This fine tuning is optional, a standard configuration file is created for you the first time you run the daemon, but by configuring it yourself you make sure that the scheduling gets the best possible data when it has to decide between different surrogates and between local and remote execution.

The configuration file for the daemon is called scavenger.ini and is found in the directory that you launch the Scavenger daemon from (in the pre-built bundles that is the scavenger subdir). The configuration file may look like this:

[network]
speed = 2500000

[cpu]
strength = 62
cores = 1

The network speed is the expected amount of bytes per second that can be transferred over the network link, but it can also be any of the following strings: BT-1, BT-2, WLAN-b, WLAN-g, LAN10, LAN100, or LAN1K. The meanings of these strings are described in this section. The CPU strength is the nbench rating of the CPU; or more precisely, it is the average of the nbench integer and floating point rating - i.e., (int_perf + float_perf) / 2. If you are installing Scavenger on a new piece of hardware you really should download and run nbench to obtain the correct rating here. We have of course run nbench on our test systems, and if you are interested you may find their ratings here - maybe you'll get lucky and your system is listed here saving you the trouble of running nbench. If you do not Scavenger will try to estimate your nbench rating at startup. The CPU cores setting tells Scavenger how many CPUs/cores it may use for performing tasks.

The Scavenger client API

Installing the API

To install the Scavenger API (in order to create applications that utilise Scavenger surrogates) all you need to do is:

  1. Unpack the archive.
  2. Run setup.py using the Python interpreter you will be using for your application.

Configuring the client

Configuring the Scavenger client is similar to configuring the daemon only now the configuration file is stored in ~/.scavenger/config.ini. See Configuring the daemon for more information.

The manual approach

If you are interested in knowing about all the little details of Scavenger you should keep on reading this section as it provides a good overview of how one may work with the entire API. For most use-cases though, the automated approach is a much better choice.

In the following it will be covered how one may discover available surrogates, install functionality onto surrogates, ask surrogates to perform named tasks, and resolve remote data handles.

Finding surrogates

Getting information about currently available surrogates is done using the class method called Scavenger.get_peers. This method returns a list of surrogates that have been heard from within the last five seconds. The return type is a list of ScavengerPeer objects, and a ScavengerPeers looks as such:

+----------------------+
| ScavengerPeer        |
+----------------------+
| name (str)           |
| address (str, int)   |
| cpu_strength (float) |
| cpu_cores (int)      |
| active_tasks (int)   |
| timestamp (float)    |
| net (str)            |
+----------------------+

The name is the user-defined name of the surrogate. Right now two surrogates are not allowed to have the same name - things will behave weirdly if that happens. The address is the (ip-address, port)-tuple of the remote Scavenger daemon. cpu_strength is the nbench rating of the surrogate, cpu_cores the number of dedicated cores, and active_tasks the number of tasks being performed currently. The timestamp is updated every time the surrogate is heard from, and net says what kind of network media the surrogate is connected to the network with. These network media, ordered by expected throughput rate which is listed in the brackets, are:

BT-1   : Bluetooth v. 1 (34000)
BT-2   : Bluetooth v. 2 (100000)
WLAN-b : IEEE 802.11b (Wi-Fi) (500000) 
LAN10  : IEEE 802.3 (Ethernet) 10 Mbit (937500)
WLAN-g : IEEE 802.11g (Wi-Fi) (2500000)
LAN100 : IEEE 802.3 (Ethernet) 100 Mbit (9375000)
LAN1K  : IEEE 802.3 (Ethernet) 1 Gbit (93750000)

A small example of using the Scavenger.get_peers method is shown below:

from time import sleep
from scavenger import Scavenger
sleep(1.1)
surrogates = Scavenger.get_peers()
for surrogate in surrogates:
    print 'Found', surrogate.name, 'at', surrogate.address

Checking for task availability

Once you have some handles to surrogates (ScavengerPeer objects) you can start checking for the functionality you need, or possibly installing it if you want to. It is entirely possible to work with pre-installed functionality in Scavenger if that is what you want. But of course, seeing as Scavenger is a mobile code based system, it is also possible to do ad-hoc installation of tasks.

Before diving into how to check for installed task lets take a brief moment to discuss the task naming scheme in Scavenger. Any Scavenger task must have a name on the form org.app.task, where org is the organisation providing the application, app is the name of the application, and task is the name of the given task. An example could be daimi.augim.sharpen which would be an image sharpening task used in the AugIM demonstrator application developed by me at DAIMI (the department of computer science at Aarhus University).

To check for the availability of a given task the Scavenger.has_service method is used (there is currently a bit of a terminology mix-up in Scavenger; we used to call tasks services - this will probably be fixed in the near future). Given a surrogate and a task name this method checks to see if that service is available at the given surrogate. A small code example is shown below:

from time import sleep
from scavenger import Scavenger
sleep(1.1)
surrogates = Scavenger.get_peers()
if len(surrogates) == 0:
    print 'No surrogates are available a.t.m.'
else:
    for surrogate in surrogates:
        if Scavenger.has_service(surrogate, 'foo.bar.baz'):
            print surrogate.name, 'has the foo.bar.baz task.'
        else:
            print surrogate.name, 'does not have the foo.bar.baz task.'

Note that the Scavenger.has_service method may throw either a ScavengerException, if the given surrogate is no longer available, or an Exception if an error occurs on the remote host.

Installing tasks

If you want to install new functionality, new tasks, onto a surrogate you can use the Scavenger.install_service method. This method takes three arguments: 1) the surrogate that you want to install the task onto, 2) the name of the task, and 3) the actual task code as a string (perhaps read from a local .py file).

Installing mobile code onto a surrogate is subject to a validation process so as to prevent abuse. Firstly the task name is validated to be on the form a.b.c, that it will not overwrite an existing task, etc. Secondly the actual task code is validated in two ways: 1) it is checked that none of the built-in Python methods that are considered harmful are used, and 2) it is checked that only allowed modules are imported. I.e., a black-listing and white-listing approach is used; black-listing harmful keywords and white-listing allowed modules. The built-in keywords that are considered harmful are:

__subclasses__
__class__
__import__
__builtins__
__getattr__
__getattribute__
exec

And, as of writing this, the allowed modules so far are:

math
PIL
StringIO
gdata.photos.service
smtplib
MimeWriter
base64

I readily admit that these modules have not been white-listed because I have found them to be safe, but rather because I needed the functionality they offer. If one were really interested in security an audit of all modules allowed here should be undertaken... By the way, if you want to allow more modules to be imported simply add them to the LEGAL_IMPORTS list in validator.py.

The task (mobile code) that you install onto a surrogate must adhere to a very simple interface:

  1. It must contain a function named perform on the top-level scope,
  2. the arguments to this function must be picklable Python types, and
  3. the perform function must return one or more picklable Python types.

An example of installing a task onto a surrogate is shown in the code below:

from time import sleep
from scavenger import Scavenger
sleep(1.1)
surrogates = Scavenger.get_peers()
if len(surrogates) == 0:
    print 'No surrogates are available a.t.m.'
else:
    surrogate = surrogates[0]
    Scavenger.install_service(surrogate, 'daimi.test.subtract', """
def perform(x, y):
    return x - y
""")

The Scavenger.install_service method may also throw either a ScavengerException, if the given surrogate is no longer available, or an Exception if an error occurs on the remote host (such as validation errors, naming errors, etc.).

Performing remote tasks

When invoking tasks that are installed on surrogates you use the Scavenger.perform_service method. This method takes three arguments: the surrogate to invoke the task at, the name of the task, and the input to the task. The return value of Scavenger.perform_service is the return value of the named task. Of course this may go wrong in a number of ways so, as the other manual methods, this method may raise both ScavengerException and Exception.

Task input can be given either as positional arguments (by passing a list as argument) or as keyword arguments (by passing in a dict instead). Assuming that the daimi.test.subtract task from the previous code example is installed on the surrogate two ways of invoking this task is shown in the code example below:

# Using positional arguments.
Scavenger.perform_service(surrogates[0], 'daimi.test.subtract', [3, 2])

# Using keyword arguments.
Scavenger.perform_service(surrogates[0], 'daimi.test.subtract', {'x':3, 'y':2})

Both of these call the daimi.test.subtract task with the same arguments. And the result returned in both cases is 1.

Working with remote data handles

In the preceding presentation of how to perform tasks it was assumed that task input and output was always passed back and forth between client and surrogate. This does not have to be the case; both in- an output may be remote data handles instead of the actual data. Using remote data handles is quite simple; when performing a task use the keyword argument store=True in the Scavenger.perform_service method. Now a remote data handle will be returned rather that the result of performing the task. If you want to use that remote data handle as input to another task you do not need to do anything special; you can simply pass in that data handle as one of the arguments. If, on the other hand, you want to resolve the data handle, i.e., to get the actual data in hand, you can use the Scavenger.fetch_data method passing in the remote data handle as an argument. A small example of using remote data handles is shown in the code below:

rdh = Scavenger.perform_service(surrogate, 'daimi.test.subtract', [5, 2], store=True)
rdh = Scavenger.perform_service(surrogate, 'daimi.test.subtract', [rdh, 2], store=True)
print Scavenger.fetch_data(rdh)

Notice that, in line two, a remote data handle is passed as the first argument to the daimi.test.subtract method. Finally, in the third line, the actual data behind the remote data handle is fetched and printed.

Important note: Remote data handles do not exist forever on surrogates. In the current implementation a remote data handle is kept "alive" for five minutes, and after that the data is deleted from the surrogate. Also, a remote data handle may only be fetched once whereafter it is removed from the surrogate. If you want to retain the remote data handle, i.e., you want to make sure that it is not deleted even though you fetch it, you must set its retain property to True. En example of this is shown below:

rdh = Scavenger.perform_service(surrogate, 'daimi.test.subtract', [5, 2], store=True)
rdh.retain = True
print Scavenger.fetch_data(rdh)
print Scavenger.fetch_data(rdh)

In this example it is possible to fetch the data twice because the remote data handle is retained. It is also possible to ask the surrogate to hold on to the stored data for a little longer (five more minutes) by using the Scavenger.retain_data method which accepts a remote data handle as its only argument. If, on the other hand, you want to remove a remote data handle from a surrogate you may use the Scavenger.expire_data method which immediately expires the given remote data handle.

Shutting down

When an application quits it should call the scavenger.shutdown method. This is not strictly required, but failure to do so will raise an exception when the interpreter shuts down.

Automated cyber foraging

The automatic approach to cyber foraging is by far the easiest way to use Scavenger. Using the decorators defined in the scavenger module cyber foraging is automatically enabled for the decorated functions. Using these decorators the whole check for availability of task, install task, perform task routine, that was described in the previous section about the manual approach, is completely automated.

Let's jump right in with a code example:

from time import sleep
import scavenger
@scavenger.scavenge
def subtract(x, y):
    return x - y
sleep(1.1)
print subtract(3, 1)

What really happens in this piece of code is the following:

  1. The function called subtract is made into a task in the auto.*.* namespace (the naming is auto.modulename.md5sum-of-source).
  2. The Scavenger client lib takes a look at the surrogates currently available and decides where to perform the task.
  3. The task is performed; either locally or at a remote surrogate, which one is chosen is invisible to the developer.

Step two in that small description is definitely the most important - this is where the scheduling takes place. Currently there are five different schedulers in the Scavenger API and more may be added later on. The Scavenger.scavenge decorator used here is the most naive scheduler and it is not the one you should use. Most developers should probably use one of the profiling schedulers, for example the one behind the scavenger.cprofilescavenge decorator. This decorator takes a single argument, a string, which is the expected output size (in bytes) of the task. This output size may either a constant of an expression relating the output size to the input size. When an expression is given it may refer to the input arguments by positional arguments #0 to #n-1 where n is the number or arguments given to the task. An example task is shown in the code below:

@scavenger.cprofilescavenge('len(#0)')
def sharpen_image(image, factor):
    from PIL import Image, ImageEnhance
    from StringIO import StringIO
    sio = StringIO(image)
    pil_image = Image.open(sio)
    factor = 1.0 + float(factor)
    sharpened_image = ImageEnhance.Sharpness(pil_image).enhance(factor)
    sio = StringIO()
    sharpened_image.save(sio, pil_image.format, quality=95)
    return sio.getvalue()

This function/task sharpens the given input image after the given factor. Notice the expression 'len(#0)' given to the scavenger.cprofilescavenge decorator. This expression tells the scheduler that the output size is expected to be the same as the size of the first input; the image. Any Python expression is legal here, so examples such as 'len(#0)/#1*3.14159' are just as valid.

The decorators (well, all but the naive one) also accept the store keyword argument. If this is set to True a remote data handle is returned in place of the result. See this section for more information about that.

One very important thing to note when working with the automated cyber foraging handles in Scavenger is, that the code within the decorated functions must be self contained. I.e., they must import all the needed modules within the function, they must not try to call methods defined outside of the scope of the decorated function, etc.


Sign in to add a comment
Powered by Google Project Hosting