Well, as processes get dispatched to CPUs, some registers must be reloaded every time, like VM translation cache, even if it is the same CPU as last dispatch, as something else has been running in...
It takes time for processes to move around from CPU to CPU to CPU to CPU. Cache must be copied, RAM perhaps re-fetched. Prevent it from moving and these losses are minimized.
If a thread goes from one multicore to the other, the cache is empty. Often, everyting one writes, the other discards from cache. The may be similar problems with VM translation cache.