Quote:
micro-parallel = algorithm cut up into its smallest independent units, so that the performance will be directly proportional to the number of processor cores up to the maximum number of independent units. Inter-thread communication needs to be close to instantaneous for this relation to hold.
CUDA would seem to be a good fit. If you go this route, I recommand that you use a card with a GPU which supports compute capability 1.3 such as the GeForce GTX 260.