Quote:
Originally Posted by
Corona688
...and when you don't assume best-case, you could be going through the next few thousand not-ready tasks.
Bear in mind though that the "best" case is also the typical one here.
Quote:
Meanwhile every idle worker's doing the same thing. I begin to understand why you're concerned about contention for memory.
Not a problem here, the idle workers are searching through separate queues.
Quote:
But because it doesn't block, you'll be wasting time scanning the list anyway, time that could have been spent doing actual work. And since your system's as busy idle as it is when actually busy you'll have a difficult time guessing how much.
Fair enough, in rare cases, it could behave like that. Perhaps the time gained by not blocking would be spent searching.
Quote:
If I read you correctly, the jobs are all tiny. How tiny? How much more work is it to do a job than to scan the list?
Varies, some tasks are tiny, short-lived tasks, others persist and involve a greater amount of work...at a rough guess 50-50.
Quote:
Put jobs in the queue when they become ready, don't just stick them there in advance, that way threads won't block when picking up jobs unless you're actually out of jobs -- in which case you want them to block.
Okay, let's explore that for a minute. This queue would have to be able to dynamically grow and shrink right, so not only would there be a lock for accessing the queue, there'd also be a lock for allocating the memory on the heap. In addition, the repeated following of a pointer into allocated memory would likely cause repeated cache misses, which would also degrade performance.
So let's say, instead of allocating a task at a time, I allocate a chunk of tasks at a time, say 100 using calloc for example. Now my tasks are contiguous in memory and I've reduced the heap contention to a 100th while improving the cache hit rate cos I can bring multiple tasks into the cache in one go. Great, but many tasks don't persist for long, so why not recycle the memory in the list when a task has completed and then I don't have to allocate more chunks quite so often.
Now how do I achieve that, with a flag to say the task is ready perhaps, has to be atomic though because it's shared between threads and I have to ensure instructions are not reordered by using memory barriers.
But I'm not sure about memory barriers, I know, I'll ask the folks on the Unix forum...
You see where this is leading.