My favorites | Sign in
Logo
                
Search
for
Updated Aug 16, 2009 by lukas.lueg
Labels: Featured
ExtendPyrit  
Extending Pyrit to new hardware-platforms

Introduction

Adding support for more hardware-platforms to Pyrit is very easy. There are only a few steps besides the obvious requirement to write a library that implements the PBKDF2-HMAC-SHA-1-algorithm used for computing Pairwise Master Keys. This boils down to having the fastest (or most parallel) possible implementation of SHA-1.

Pyrit abstracts access to the hardware in roughly three steps:

  • Direct access to the hardware is provided through Python-extensions usually written in C. These extension-modules encapsulate the hardware-platform in very minimal Python-classes which in turn provide a single function named solve. This function takes strings (passwords) and computes the corresponding Pairwise Master Keys.
  • The extension-modules should hide their implementation-details by getting sub-classed from the class Core provided in the cpyrit-module. The purpose of Core is to attach to Pyrit's scheduling routine, run self-tests, provide statistics and such. We should never need to know what exact kind of hardware we are actually talking to when using an instance of Core.
  • Right now we only have a bunch of classes that can compute Pairwise Master Keys for us. The glue that holds everything together is implemented as the almighty CPyrit-class which is the veil between hardware and client. All you need to worry about as a hardware-provider is how to tell CPyrit about your new module. All you got to do as a client is how to put work on the queue and get the results back. The magic in between is done by CPyrit.

Talking to hardware

All extension-modules that provide hardware-access usually reside as part of the package cpyrit. The extension-modules should be very convenient about errors and take great care not to disrupt Pyrit in an unexpected way or method that is not common to all other modules. It must be possible to have it installed on a platform that does not support the hardware the module was written for. For example it must be possible to have static bindings to other libraries which may not be present on the platform Pyrit is executed on. As a general rule of thumb the modules should cause an ImportError in it's init-function if it fails (or does not want) to load for reasonable causes. In such case the CPyrit-class described further below swallows the exception and continues to initialize the other modules. If the module fails in an unexpected way, it may throw a SystemError-Exception which walks all the way up to Python`s exception-handler (and usually causes Pyrit to crash and burn as it should in such cases).

Some points to consider when writing a module for new hardware:

This document will not go any deeper into how to write extension modules for Python. There is some really great documentation about CPython's API on python.org. Pyrit's subversion-repository also includes an minimal 'hardware'-module named cpyrit_null that can be used as a guideline for those who are unfamiliar with writing extension-modules for Python.

The Core-class

Every hardware-module may introduce it's own kind of limits and constraints due to details of the implementation or restrictions in the underlying hardware-platform. The Core-class hides all this in order to make the hardware-modules available to Pyrit's scheduling-routine more easily.

First of all, the Core-class is a sub-class of Python's threading.Thread so every instance of every sub-class of Core lives in it's own thread. The instances usually spend most of their time in Thread's run()-function, trying to gather work (passwords) from the global work-queue, computing the corresponding results (Pairwise Master Keys) and pushing those back to the queue. The Core-class already provides this functionality. It also tries to calibrate itself so every call to solve takes exactly three seconds of wall-clock-time. This usually leads to good efficiency on the hardware-side (small overhead per call to hardware) and reasonable interactivity.

All that sub-classes of Core must do is to set the .name-attribute to a human-readable description of the underlying hardware-platform. They may also need to set the .minBufferSize- and .maxBufferSize-attributes to values arbitrary to the underlying hardware-platform. For example the StreamCore-class sets .maxBufferSize to 8192 because the current implementation for ATI-Stream can only take exactly that amount of passwords per call to hardware.

The following examples shows how the Core-class for Nvidia-CUDA is defined:

class CUDACore(Core, _cpyrit_cuda.CUDADevice):
    """Computes results on Nvidia-CUDA capable devices."""
    def __init__(self, queue, dev_idx):
        Core.__init__(self, queue)
        _cpyrit_cuda.CUDADevice.__init__(self, dev_idx)
        self.name = "CUDA-Device #%i '%s'" % (dev_idx+1, self.deviceName)
        self.minBufferSize = 1024
        self.buffersize = 4096
        self.maxBufferSize = 40960
        self.start()

Things to note here:

Everthing put together: The almighty CPyrit

Instances of CPyrit enumerate the available hardware-modules, instantiate them if possible and provide scheduling between the hardware and the caller. Although neither side should ever need to care about the inner workings of CPyrit, you should take note of some design goals of it's current implementation:

  1. We assume that there is an endless amount of work waiting to be put on the queue.
  2. We assume that there is no further (bandwidth-) latency inside CPyrit.
  3. We assume that instances of Core have different speeds, must be able to return results in random order and must be able to get more work any time.
  4. Callers of CPyrit can enqueue passwords by calling the .enqueue()-function. The function usually does not block and can be called many times before ever calling .dequeue().
  5. Results are returned to the caller through the .dequeue()-function once they are available. The CPyrit-class guarantees that calls to .enqueue() and .dequeue() correspond in FIFO-order, no matter in which order the hardware actually returned the results. The call to .dequeue() can block until the current results are available.
  6. Instances of Core call _gather() with a desired number of passwords to get work from the queue. The function blocks until unsolved passwords are available on the queue and may return less but not more than the desired number. The calling instance of Core is now responsible to call either _scatter() to return results or _revoke() in case of failure.
  7. Calls to _gather() can combine passwords from consecutive calls to .enqueue() with matching ESSIDs. The order in which ESSIDs are put on the queue however is preserved towards the hardware to prevent a pipeline-stall towards the caller.

As a hardware-provider you usually don't have anything to do with all this. All you got to do is to add some functionality to CPyrit's __init__-function that adds new instances of your Core-class to CPyrit's self.cores. The following example shows how cores for Nvidia-CUDA are loaded in CPyrit's __init__-function:

if 'cpyrit._cpyrit_cuda' in sys.modules:
    for dev_idx, device in enumerate(_cpyrit_cuda.listDevices()):
        self.cores.append(CUDACore(queue=self, dev_idx=dev_idx))
        ncpus -= 1

Things to note here:

This is all.


Comment by mikael.lavoie, Oct 01, 2009

How much do you want to extend pyrit to my ATI EAX1950PRO to be used in ubuntu 8.04?? I will do the transfert via paypal if you're interested. I want to use it in Backtrack 4 to complement the aircrack-ng suite. If you're interested e-mail me to filthyest@hotmail.com. Thx Mike

Comment by lukas.lueg, Oct 02, 2009

ATI's 1xxx-line of GPUs does not support Stream. It's pointless.


Sign in to add a comment
Hosted by Google Code