My favorites | Sign in
Project Home Wiki Issues Source
READ-ONLY: This project has been archived. For more information see this post.
Search
for
DistributedPerceptronAlgorithm  
An overview of the entire map/reduce job chain
Updated Feb 4, 2010 by paulrei...@gmail.com

Introduction

In the online (testing) mode of a perceptron network, an input vector is multiplied by the layer weights for an initial hidden layer. The output from each neuron is the dot product of the input and the layer weights for all connections to that neuron. Typically, the output is normalized by some smooth function (e.g. sigmoid) to aid in training the network with gradient descent methods.

The matrix multiplication and subsequent normalization is repeated for each layer in the neural net, where the output need not be normalized using the same function as the network can be trained to produce any desired output from the final layer. This constitutes the basic perceptron algorithm.

Optimizations

A few optimizations to the basic perceptron algorithm are used in the distributed perceptron algorithm. First, instead of considering a single input vector, a buffer of input vectors is provided for each invocation of the neural net. Each input vector instance is a row in the initial input matrix. Since matrix multiplication implicitly applies the dot product for each column and row in the two input matrices, respectively, the output matrix will have distinct output values for each input instance. The number of rows (input instances) is chosen by allocating half of the maximum memory available to the neural net training process.

A second optimization is similar to increasing the row count of the input matrix, but instead increasing the column count of the layer weights. Here, a variation of the Voted Perceptron Algorithm is used to further consume the maximum possible memory before invoking the algorithm. A number of previously trained layer weights are extracted from previous invocations of an offline training algorithm, or previous iterations in the current DisCo map/reduce job. The resultant matrix has distinct values for each set of layer weights, and so partitioning across columns in the result matrix gives the neuron response for each neuron in each that net represented by that partition. This effectively evaluates several neural networks in a single matrix operation. Different nets could be used for each layer, but this would not be a stable means of further training the nets, and so by default the same nets are used in the same order for each iteration of the algorithm.

A combining algorithm in the reduce task can choose how to interpret the results from each neural net, and from each input vector. The Voted Perceptron Algorithm suggests averaging or selecting consensus among the results from each neural net, and updating each provided net by the prediction error if further training should be performed.

The improved nets can be reuploaded after presenting the results; an ideal method to do this would be overwriting a memcached cache entry for each distinct neural net.

Powered by Google Project Hosting