Quote:
Originally Posted by
Corona688
Yes, and the overhead you were worried about was the same atomic operations you're hellbent on using now.
But with those atomic operations I can avoid sending threads to sleep and having to wake them up.
Quote:
I remain stolidly unconvinced that spinlocking is more efficient than blocking.
I would agree if no other progress is being made, but to reiterate, the thread-pool I'm building is for dedicated use with a custom application which features high numbers of tasks. 95% of the time I would expect my threads to have tasks to execute, a plenty of them. The remaining 5% would be a situation where there's no work to do, so it's not a problem.
Quote:
Are you sure of that? You only discovered thread-specific data last week.
Pretty sure, I've already built prototypes of my system in Erlang and I found a need for such control.
---------- Post updated at 12:41 AM ---------- Previous update was at 12:30 AM ----------
Quote:
Originally Posted by
fpmurphy
NPTL has been proven in numerous benchmarks to satisfactorily scale to tens of thousands of threads of execution.
Are you saying it's possible to create tens of thousands of pthreads on linux? I tried this some time ago and I couldn't generate anywhere near that amount, doesn't the kernel impose strict limits on the number of threads that can be generated anyway?
Quote:
Frankly if you wish to scale to tens of thousands of threads you have lots more problems to worry about than what you have discussed so far. See
The C10K problem for example.
Thanks, I have encountered similar problems. But to explain, I'm not working at the abstraction of pthreads in the "thread-pool" but rather very lightweight tasks of execution that involve little more than a function call, some memory and no context switching. Something along the lines of protothreads. The pthreads underneath simply run these tasks in a continous loop.
Quote:
BTW, an alternative approach to scaling your application might be use something like CUDA and offload the application to a GPU.
Thanks, I have examined CUDA and OpenMP, Threading Building Blocks etc, but the nature of what I'm doing involves an expansion of state, whereas these libraries typically feature a mapping of many parallel elements onto parallel resources, assigning a "thread" of execution to each iteration of a loop for example.
As mentioned in the previous post I began my work with Erlang, which offers 10s of 1000s of lightweight processes, communicating via message passing. But I found I needed more control than was on offer.