Interesting issue with pthread_mutex_lock and siglongjmp in AIX 5.3 (and no other OS)
Executive summary:
Code (posted below) cores in AIX 5.3, despite being compiled and run successfully on several other operating systems. Code is attempting to verify that pthread_mutex_lock can be successfully aborted by siglongjmp. I do not believe this is an unreasonable requirement.
If you could please compile the code below in any operating system supporting pthreads and report whether it runs to completion, I'd really appreciate it. Of course, I would appreciate it more if someone could tell me I'm definately doing something wrong.
Ok, on with the long winded post....
I have a simple application using siglongjmp and mutexes that is coring in AIX 5.3 and, thus far, no other operating systems. I have compiled and ran it successfully on Redhat Enterprise Linux (kernel 2.6.18, 32 bit), HP-UX 11, Compaq Tru64 V5.1B, and SunOS 5.7.
What seems to be happening is that when the code prematurely exits the pthread_mutex_lock function, via the long jump, a subsequent call to pthread_mutex_lock causes the application to seg fault (in AIX 5.3). Interestingly, this only seems to occur if the subsequent call is made before the thread holding the lock releases it; a condition that could not be guaranteed in a real application. Further, replacing the subsequent pthread_mutex_lock with pthread_mutex_trylock (in a spin loop) will succeed without coring as well. However, the spin lock is wasteful and, unlike most spin locks which spin for a bit and then block, this spin has to continue until the lock is acquired. This is because any attempt to call the blocking function (pthread_mutex_lock) causes the application to core.
When the core occurs, dbx shows an AIX library function at the top of the stack. Here is the stack according to dbx when it cores:
Segfault at _usched_dispatch_front stack is:
_usched_dispatch_front
_usched_swtch
_waitlock
_local_lock_common
_mutex_lock
main
What I am really looking for here is ammunition to point to whether my code (which works on 4 of 5 Operating Systems successfully so far) or IBM's libraries are at fault here. To that end, if people can compile and run this successfully (or not) and report their results that would be awesome! Of course, if anyone has insight regarding something I am doing wrong, I'd love to hear it!
Here is the code, overly commented to explain the problem. The two options that make it run successfully on AIX 5.3 can be tested by compiling with either -DNO_CORE (which changes the order of operations to a successful one) or -DNO_CORE -DTRYSPIN (which replaces the failing pthread_mutex_lock with a spinning pthread_mutex_trylock). As I said, though, while neither of these two options is, in my opinion, a viable "solution" to the problem, I do find it interesting that either averts the core.
Code:
#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdarg.h>
#include <string.h>
#include <signal.h>
#include <setjmp.h>
static pthread_mutex_t mx = PTHREAD_MUTEX_INITIALIZER;
static pthread_t tidMain;
static sigjmp_buf env;
static int t_printf(const char *fmt, ...)
{
va_list va;
char buf[256];
sprintf(buf, "[%c] ",
pthread_equal(tidMain, pthread_self()) ? 'M' : 'I');
va_start(va, fmt);
vsnprintf(buf+4, sizeof(buf)-4, fmt, va);
va_end(va);
printf(buf); fflush(stdout);
}
static void *thread_fn(void *data)
{
int rv;
t_printf("thread started\n");
/* lock mutex so main thread must block waiting on us to release it */
rv = pthread_mutex_lock(&mx);
t_printf("mutex was %slocked\n",
rv == 0 ? "" : "NOT ");
/* sleep to "assure" main thread will be blocked in pthread_mutex_lock
call when we signal it. */
sleep(5);
t_printf("signaling main\n");
/* send signal to main thread causing it to long jump out of the
pthread_mutex_lock function */
pthread_kill(tidMain, SIGALRM);
/* sleep to "assure" main thread will be attempting to reaquire the
mutex when we unlock it. */
sleep(2);
rv = pthread_mutex_unlock(&mx);
t_printf("mutex lock %sreleased\n",
rv == 0 ? "" : "NOT ");
}
static void alarm_fn(int sig)
{
/* Since we directed this signal to the main thread via pthread_kill,
verify that is where we get it! If we do not get it there, we
may expect issues trying to long jump across a thread stack.
Fortunately this is working as expected. */
t_printf("signal received %s\n",
pthread_equal(tidMain, pthread_self()) ? "OK" : "IN WRONG THREAD");
siglongjmp(env, 1);
puts("!!! LONG JUMP FAILED !!!");
exit(-2);
}
int main(int argc, char **argv)
{
pthread_t tid;
int rv;
int cancelled = 0;
tidMain = pthread_self();
signal(SIGALRM, alarm_fn);
/* create "interference" thread */
pthread_create(&tid, NULL, thread_fn, NULL);
if (sigsetjmp(env, 1) == 0)
{
/* sleep a bit to "assure" the thread we created executes and
grabs the lock before we can so we have to block. */
sleep(3);
/* this is where we want to be when we get signalled to test
that we can be broken out of pthread_mutex_lock successfully
by a long jump */
t_printf("blocked locking mutex\n");
rv = pthread_mutex_lock(&mx);
}
else
{
/* we expect the signal to be delivered forcing us into this
block of code. */
rv = -1;
cancelled = 1;
}
/* print rv and cancelled to show us what path we took above -- we
expect to be cancelled with rv == -1 */
t_printf("rv: %d; %scancelled\n", rv, cancelled ? "" : "!NOT! ");
/* Verify that we can re-acquire the lock after pthread_mutex_lock was
jumped out of by the long jump -- this is where we die in AIX 5.3
but nowhere else! Oddly, we only core if the interference thread
still has the mutex locked when we get here; if it unlocks first
then this call succeeds. We can test this by sleeping for a bit
before making this call (to allow the interference thread to
unlock) define NO_CORE to demonstrate this behavior.
It also happens that a pthread_mutex_trylock is successful too if
written in a spin loop; which is also odd. Define both NO_CORE
and TRYSPIN to demonstrate this behavior. */
#ifdef NO_CORE
#ifdef TRYSPIN
t_printf("attempting relock via trylock\n");
while((rv = pthread_mutex_trylock(&mx)) != 0)
{
sched_yield();
}
#else
sleep(5);
t_printf("attempting relock after sleep\n");
rv = pthread_mutex_lock(&mx);
#endif
#else
t_printf("attempting relock\n");
rv = pthread_mutex_lock(&mx);
#endif
t_printf("lock was %sacquired\n",
rv == 0 ? "": "NOT ");
sleep(1);
t_printf("goodbye...\n");
return EXIT_SUCCESS;
}
Hello All,
Finally I am posting an issue and it's solution which I faced last week. Let me explain it by headings.
Issue's background: It was a nice Tuesday for me, went to office as usual started checking emails and work assigned to me. Suddenly a gentleman reached out to me on my desk(in a... (2 Replies)
Currently server have load while there is no heavy things running, just oracle database/ application server oracle. I don't understand why server have heavy load, 22GB is under buffer, how to clean buffer/memory in AIX
load averages: 9.42, 9.43, 9.68; 05:25:08
141 processes: 125 idle, 16... (12 Replies)
Hello,
I'm trying to set up an internet connection on an IBM RS/6000 7043-140 machine with AIX v 5.1. The problem is that no matter if it is setup to receive an IP address from another DHCP server or has a static IP set, it seems to act as a DHCP server that assigns a random IP address with a... (3 Replies)
I have a program which has 7-8 threads, and lots of shared variables; these variables (and also they may not the primitive type, they may be enum or struct ), then they may read/write by different threads at the same time.
Now, my design is like this,
typedef unsigned short int UINT16;... (14 Replies)
Hi All,
The below code works perfectly on AIX machine but doesnt give the desired o/p on HP UX.
Can someone please generalise the code so that it becomes platform independent.
awk 'NR == FNR {
/^*\47name/ && c++ # get the field number
if (/^*\47size/) {
split($0, t, ":") ... (2 Replies)
Hi,
Could any one help me to extract data from a report.
I would like to get the two lines which are just below the separations
I have a report like this
--------------------------------------------------------------------------
Pid Command Inuse Pin Pgsp Virtual... (2 Replies)
Hello all,
One of the application we port to Aix from linux Segmentation faults when it exits. Here is part of backtrace of SEGV:
(dbx) where
splay(??, ??, ??) at
free_y(??, ??) at
free_common(??) at
....
exit(??) at
...
Application seem to perform everything expected well and... (1 Reply)
Please read my issue!
My old server using:
- AIX system operating (5300-05-CSP-0000)
- WebSphere 6.1.0.21 (Fix Pack 21)
After I've upgraded version AIX
- AIX system operating (5300-09-02-0849)
- WebSphere 6.1.0.21 (Fix Pack 21)
I have 1 issue when I access home page:
"Error... (0 Replies)
Hello,
I have AIX5.2. I am trying to set tcp_ephemeral_high port value to 5000 and tcp_ephemeral_low value to 1024. tcp_ephemeral_high is not possible to set below 32769.
pls advise how to set tcp_ephemeral_high value to 5000. (7 Replies)
I have been wondering what the difference between pthread_rwlock_lock and pthread_mutex_lock is. Both these routines acquire an exclusive rw lock on an enclosed region.
So I performed a simple experiment in which I execute both these routines multiple times in a loop. Here are the results:... (1 Reply)