The idea is to use modern graphics cards to accelerate ADDA.
Marcus Huntemann and Georg Heygster have succeeded in porting ADDA to OpenCL, so incorporating their code should address this issue.
Comment #1
Posted on Feb 20, 2011 by Helpful RabbitThe current revision r1023 is capable of doing the matrix vector multiplication part on the GPU as long as no dimension is bigger than 512/256 on Nvidia/AMD GPUs respectively. This restriction is because of the used FFT routine.
This might be a new issue but belongs to GPU acceleration as well: AMD GPUs seem to suffer a lot more from unaligned read and write operations on the global GPU memory in a kernel. So the next task is to align the global memory access using local memory as cache. Both, AMD and Nvidia GPUs will benefit from that.
Comment #2
Posted on Jun 10, 2011 by Happy Dog(No comment was entered for this change.)
Comment #3
Posted on Apr 16, 2012 by Happy DogI have added a few specific issues for further development of adda_ocl. It is already quite mature and almost ready for ADDA 1.1 release. Thus I am closing this issue.
Status: Fixed
Labels:
Type-Enhancement
Priority-Medium
OpSys-All
Milestone-1.1
Component-Logic
Performance
OpenCL