I am looking to have a C++ and Python application developed using the processing power of at least one GPU.
We are already using Boost, so
Boost.MPI comes to mind.
We already have some experience in developing C++ applications for multi-core CPUs by using threading.
However, since we have not developed sufficient expertise in MPI, we were wondering if a Python solution would be more appropriate, because it will give the developers a better understanding of what the code does. In such a case something like
mpi4py would come to mind.
Of course having a GPU specifically for simulation purposes could mean that something like
PyOpenCL may be even better.
I mention "simulation", because the machine will be calculating variations on what is largely the same problem, ie not be a general purpose machine.
So in summary:
1- For raw processing speed with a multi-core CPU and a GPU, would a openCL solution or Boost.MPI solution be recommended?
2- Should the Python solution be used only for high level functionality, such as handling data upon return, or does Python compromise speed too much?