Interesting issue with pthread_mutex_lock and siglongjmp in AIX 5.3 (and no other OS)
Executive summary:
Code (posted below) cores in AIX 5.3, despite being compiled and run successfully on several other operating systems. Code is attempting to verify that pthread_mutex_lock can be successfully aborted by siglongjmp. I do not believe this is an unreasonable requirement.
If you could please compile the code below in any operating system supporting pthreads and report whether it runs to completion, I'd really appreciate it. Of course, I would appreciate it more if someone could tell me I'm definately doing something wrong.
Ok, on with the long winded post....
I have a simple application using siglongjmp and mutexes that is coring in AIX 5.3 and, thus far, no other operating systems. I have compiled and ran it successfully on Redhat Enterprise Linux (kernel 2.6.18, 32 bit), HP-UX 11, Compaq Tru64 V5.1B, and SunOS 5.7.
What seems to be happening is that when the code prematurely exits the pthread_mutex_lock function, via the long jump, a subsequent call to pthread_mutex_lock causes the application to seg fault (in AIX 5.3). Interestingly, this only seems to occur if the subsequent call is made before the thread holding the lock releases it; a condition that could not be guaranteed in a real application. Further, replacing the subsequent pthread_mutex_lock with pthread_mutex_trylock (in a spin loop) will succeed without coring as well. However, the spin lock is wasteful and, unlike most spin locks which spin for a bit and then block, this spin has to continue until the lock is acquired. This is because any attempt to call the blocking function (pthread_mutex_lock) causes the application to core.
When the core occurs, dbx shows an AIX library function at the top of the stack. Here is the stack according to dbx when it cores:
Segfault at _usched_dispatch_front stack is:
_usched_dispatch_front
_usched_swtch
_waitlock
_local_lock_common
_mutex_lock
main
What I am really looking for here is ammunition to point to whether my code (which works on 4 of 5 Operating Systems successfully so far) or IBM's libraries are at fault here. To that end, if people can compile and run this successfully (or not) and report their results that would be awesome! Of course, if anyone has insight regarding something I am doing wrong, I'd love to hear it!
Here is the code, overly commented to explain the problem. The two options that make it run successfully on AIX 5.3 can be tested by compiling with either -DNO_CORE (which changes the order of operations to a successful one) or -DNO_CORE -DTRYSPIN (which replaces the failing pthread_mutex_lock with a spinning pthread_mutex_trylock). As I said, though, while neither of these two options is, in my opinion, a viable "solution" to the problem, I do find it interesting that either averts the core.
Hello All,
Finally I am posting an issue and it's solution which I faced last week. Let me explain it by headings.
Issue's background: It was a nice Tuesday for me, went to office as usual started checking emails and work assigned to me. Suddenly a gentleman reached out to me on my desk(in a... (2 Replies)
Currently server have load while there is no heavy things running, just oracle database/ application server oracle. I don't understand why server have heavy load, 22GB is under buffer, how to clean buffer/memory in AIX
load averages: 9.42, 9.43, 9.68; 05:25:08
141 processes: 125 idle, 16... (12 Replies)
Hello,
I'm trying to set up an internet connection on an IBM RS/6000 7043-140 machine with AIX v 5.1. The problem is that no matter if it is setup to receive an IP address from another DHCP server or has a static IP set, it seems to act as a DHCP server that assigns a random IP address with a... (3 Replies)
I have a program which has 7-8 threads, and lots of shared variables; these variables (and also they may not the primitive type, they may be enum or struct ), then they may read/write by different threads at the same time.
Now, my design is like this,
typedef unsigned short int UINT16;... (14 Replies)
Hi All,
The below code works perfectly on AIX machine but doesnt give the desired o/p on HP UX.
Can someone please generalise the code so that it becomes platform independent.
awk 'NR == FNR {
/^*\47name/ && c++ # get the field number
if (/^*\47size/) {
split($0, t, ":") ... (2 Replies)
Hi,
Could any one help me to extract data from a report.
I would like to get the two lines which are just below the separations
I have a report like this
--------------------------------------------------------------------------
Pid Command Inuse Pin Pgsp Virtual... (2 Replies)
Hello all,
One of the application we port to Aix from linux Segmentation faults when it exits. Here is part of backtrace of SEGV:
(dbx) where
splay(??, ??, ??) at
free_y(??, ??) at
free_common(??) at
....
exit(??) at
...
Application seem to perform everything expected well and... (1 Reply)
Please read my issue!
My old server using:
- AIX system operating (5300-05-CSP-0000)
- WebSphere 6.1.0.21 (Fix Pack 21)
After I've upgraded version AIX
- AIX system operating (5300-09-02-0849)
- WebSphere 6.1.0.21 (Fix Pack 21)
I have 1 issue when I access home page:
"Error... (0 Replies)
Hello,
I have AIX5.2. I am trying to set tcp_ephemeral_high port value to 5000 and tcp_ephemeral_low value to 1024. tcp_ephemeral_high is not possible to set below 32769.
pls advise how to set tcp_ephemeral_high value to 5000. (7 Replies)
I have been wondering what the difference between pthread_rwlock_lock and pthread_mutex_lock is. Both these routines acquire an exclusive rw lock on an enclosed region.
So I performed a simple experiment in which I execute both these routines multiple times in a loop. Here are the results:... (1 Reply)