cuda-waste


Why Another Simple Trivial Emulator for CUDA?

CUDA Waste is a wrapper for emulation of CUDA programs. Why this emulator? In 2010, Nvidia decided to drop the emulator mode from its CUDA SDK (version 3.1). Ocelot was the only other working emulator, but only available on Linux (http://code.google.com/p/gpuocelot/). CUDA Waste is built and run only for Windows, using Microsoft Visual Studio/C++ 2010 and 2012. It is the only alternative for emulating NVIDIA CUDA programs on Windows.

At this time, CUDA instructions for the GPU (PTX) are emulated by an interpreter. In comparison, Ocelot executes a LLVM just-in-time translation of PTX. In the future, CUDA Waste will support JIT translation and execution.

You can download the installation program, or you can "svn co" the sources, and build the latest version if you like.

How to Use the Emulator

System Requirements

1) User program must be a 32-bit application. 64-bit applications will not run. (A fix is being developed.)

2) You are running Windows 7 or 8.

3) The CUDA Toolkit is installed. If you do not have an NVIDIA card, then choose a custom install, and deselect the installation of the GPU driver.

Install CUDA Waste

4) Download the MSI installation file for Waste and install it. The program consists of "waste.exe" and "wrapper.dll", both of which are available in Release and Debug builds. If you have problems with the Release version, you can try to use the Debug version to get more information.

5) Modify the PATH environmental variable to include c:\program files\waste\waste, the directory containing Waste's Exe and Dll.

6) Start a cmd or bash shell. No special privileges are needed.

Run CUDA Waste

7) Execute your program in a cmd or bash shell using:

waste [options] your-program.exe [program-options]

Your program will execute with the CUDA emulator by default. It will output error messages to std::cerr (stderr) detailing memory errors.

Options

There are several options that you can set to change the behavior of the debugging wrapper. These are available through the following command line options which precede the file name of your program.

-s=NUMBER, --padding-size=NUMBER

This option sets the additional size of allocated memory that contains a padding of bytes. The padding is used to check for buffer over- and under-runs. The option default is 32.

-b=NUMBER, --padding-byte=NUMBER

This method sets the value of the byte used to set each byte in the padding. The option default is 0xde.

-n, --non-standard-ptr

This option sets the option of whether to assume device pointers passed to CUDA routines cudaMemset and cudaMemcpy but be exactly the pointers returned from cudaMalloc or cudaHostAlloc. If you perform calculations of pointers addresses (e.g., "cudaMemcpy(dev_pointer+20, &hostvar, sizeof(int), cudaMemcpyHostToDevice);"), then you should set this to false. The option default is true.

-t, --trace

This option sets the option to output a message for all calls to the wrapper. Use this if you want to see every call to CUDA as it is being executed. The option default is false.

-q, --quit-on-error

This option sets the option to quit your program immediately if there is an error detected. The option default is false.

-k, --skip-on-error

This option sets the option to skip any immediately following CUDA API memory call. The option default is false.

-d device_name

This option allows you to set the device to use in emulation.

Problems Running?

If you have problems running the program, please send me email (@gmail.com). Please note: this program is for debugging Windows programs, and was built with Windows 7. Also, this program implements a minimum in API hooking. If you use a lot of the CUDA API, then it will fail.

How it Works

If you are interested in learning more about WASTE, please read the document http://code.google.com/p/cuda-waste/source/browse/trunk/doc/WASTE.pdf. It will contain information on how to build WASTE, the requirements and design, and some general information on how it works.

Building CUDA Waste

Prerequisites for building:

Windows 7 or 8, 32-bit or 64-bit; Visual Studio, 2010 or 2012, with .NET installed for MSBuild.exe; zlib (http://zlib.net/); cygwin (http://www.cygwin.com/) with subversion, make installed; Java JDK SE (http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html)

To build:

1) You will need to build zlib. Open the file .../contrib/vstudio/v10 (or v11, etc) depending on which Visual Studio version you use (v10 = 2010; v11 = 2012).

2) Start up a Cygwin command line shell, and download sources for CUDA Waste.

$ cd to an empty directory for CUDA Waste

$ svn checkout http://cuda-waste.googlecode.com/svn/trunk/ cuda-waste-read-only

3) Set up environment:

$ cd cuda-waste-read-only

$ ./vcvars.sh

$ make build

Alternatively, you can run "devenv.exe", open waste.sln, then Rebuild.

Help?

If you are interested in helping, please let me know (ken.domino at gmail.com).

Latest changes

14 May 2013: Waste updated to work with version 4 and 5 of the CUDA Toolkit.

11 Dec 2010: I have corrected several bugs, including an embarrassing bug in cudaMemset in emulation mode. Also, I am implementating a Visual Studio debug engine for the emulator. At some point, one should be able to run CUDA programs within the debugger, set breakpoints at all points in the code, including the kernel, step through the program, display variables, etc. I will probably be done with this sometime in January 2011.

8 Nov 2010: WASTE now has a design document. See http://code.google.com/p/cuda-waste/source/browse/trunk/doc/WASTE.pdf.

Project Information

Labels:
Cuda Emulator