hit reply too soon. editing
---------- Post updated at 02:53 PM ---------- Previous update was at 02:35 PM ----------
Quote:
Originally Posted by
gorga
Are you suggesting then, that if I used such an instruction relatively frequently (say once in a loop of maybe a 100 execution statements, per core), I shouldn't notice a significant drop in throughput of the application?
Don't panic. 'lock' is not a mutex. A single memory bus always works like its "single-threaded" just because it's physically impossible for it to do anything else.
Imagine two threads running XCHG in an infinite loop. This is what the order they're let access memory might end up as.
... | 1 | 1 | 2 | 2 | 3 | 3 |
Thread A | Read | ... | Write | ... | Read | ... |
Thread B | ... | Read | ... | Write | ... | Read |
Since this isn't LOCK XCHG, this operation isn't guaranteed to be atomic. Writes might happen interleaved like this, or in any order.
Now, with LOCK XCHG:
... | 1 | 1 | 2 | 2 | 3 | 3 |
Thread A | Read | Write | ... | ... | Read | Write |
Thread B | ... | ... | Read | Write | ... | ... |
The same amount of waiting is happening, LOCK just forces an atomic order.
Quote:
You'd expect that each core accessing the XCHG variable though would have to get the value from memory
memory, or cache. In recent x86 systems, cores can update each others' cache; that's what Hypertransport's for, a shortcut between caches.
Quote:
(but I read that these atomic operations do create a memory barrier so a core cannot execute instructions either side of said barrier out of order).
Excellent.
Quote:
The pthreads occasionally check the "value" of a task "state", when they reach that task in the queue, therefore if the "state" isn't "ready" they simply move on to the next task (hence the pthread has more work to do and isn't polling continuously).
And when they run out of jobs completely?
Quote:
You see what this means, as long as a pthread "eventually" discovers a task is "ready" that's okay, even if it's not asap. It seems like a lock would be unnecessary here then, but a pthread shouldn't detect that the task state is "ready" before its other data members have been updated (hence the need for a memory barrier).
Your readers would spend a lot of time scanning a mostly-empty list, and your writer would spend a lot of time scanning a mostly-full one. That's a lot of time wasted. You'd be much better off just using a normal one-writer-many-reader queue. It won't block when there's lots of work, and will actually put threads to sleep when there's none, instead of everything spinlocking forever.
Quote:
If using these atomic operations isn't going to impact throughput, then great they solve the problem, but even that seems like overkill when I only need to ensure that a handful of statements are executed in a certain order.
Using spinlocks to handle a work queue, now that's overkill.