cuda-convnet

High-performance C++/CUDA implementation of convolutional neural networks

Note July 18, 2014: * I've released an update to cuda-convnet, called cuda-convnet2. The two main new features are faster training on Kepler-generation GPUs and support for multi-GPU training.

This is a fast C++/CUDA implementation of convolutional (or more generally, feed-forward) neural networks. It can model arbitrary layer connectivity and network depth. Any directed acyclic graph of layers will do. Training is done using the back-propagation algorithm.

Fermi-generation GPU (GTX 4xx, GTX 5xx, or Tesla equivalent) required.

Documentation

Compiling -- how to check out and compile this code.
Data -- what kind of data this net can train on.
LayerParams -- how to specify an architecture for the net.
NeuronTypes -- types of hidden unit nonlinearities.
TrainingNet -- how to train the net.
Options -- the command-line arguments that the net takes.
ViewingNet -- how to look inside the checkpoints saved by the net.
CheckingGradients -- how to numerically test the gradients for correctness.

Fast results

11% error on CIFAR-10 in 75 minutes, with image translations and horizontal reflections (def, params).
13% error on CIFAR-10 in 25 minutes, with image translations and horizontal reflections (def, params).
- See Methodology for details of training.
  
  Filters learned by this net:
18% error on CIFAR-10 in 20 minutes, without any image translations/transformations/preprocessing (def, params).
26% error on CIFAR-10 in 80 seconds, without any image translations/transformations/preprocessing (def, params).

Recent changes

Jul 17, 2012
- Fixed bug in contrast normalization backpropagation code which caused wrong gradients to be computed near image borders. (Thanks Hannes Schulz).
Mar 13, 2012
- Added response-normalization across maps layer.
- Started modifying the code to support rectangular (i.e. non-square) images. The convolution code now supports rectangular images, but the remaining code does not yet. So the whole package still requires square images.
Feb 8, 2012
- Most layer types now should work well with minibatch size 64 or 32.
- Fixed --conserve-mem option so it can be combined with -f (i.e. its value can be changed after a net has been saved).
See ChangeLog for older changes.

Features

Supported layer types:

Layers with weights:
- Fully-connected
- Convolutional, including sparsely-connected convolutional
- Locally-connected, unshared
Others:
- Local pooling (avg, max), including overlapping pooling regions
- Local response normalization
- Local contrast normalization
- Softmax
- Elementwise sum, elementwise max
- Gaussian blur + "bed of nails" subsampling
- Resize with bilinear filtering

Supported neuron activation functions:

Logistic
Hyperbolic tangent
Rectified linear
Others

Supported objectives:

Logistic regression
Sum-of-squares

Other features:

Efficient implementation of convolution in CUDA.
- Supports arbitrary stride size at zero loss of efficiency (except that which comes from reducing the problem size).
- Implicitly pads your images with an arbitrary-sized border of zeros without using any extra memory.
- Supports block-random sparse connectivity at no performance cost (see LayerParams).
Modular design makes it easy to add new layer types, neuron activation functions, or objectives if you should need them.
Mostly avoids use of temporary memory where it isn't strictly needed.
Optimizes multiple objectives simultaneously.
Saves checkpoints to disk as python pickled objects, so you can write python scripts to probe the mind of your neural net.
Capable of training on one batch of data while simultaneously loading the next from disk (or pre-processing it in some way, if necessary).
Numerically tests gradient computations for correctness.

Contact

My university web page
My email

Acknowledgements

I am grateful to Ilya Sutskever for suggestions that led to many of the features in this package.

Project Information

The project was created on Jun 29, 2011.

License: New BSD License
339 stars
svn-based source control

Labels:
Machinelearning Academic Cuda CPlusPlus Python neuralnet convnet convolution GPU

Code

Archive