Interesting issue with pthread_mutex_lock and siglongjmp in AIX 5.3 (and no other OS)


 
Thread Tools Search this Thread
Top Forums Programming Interesting issue with pthread_mutex_lock and siglongjmp in AIX 5.3 (and no other OS)
# 1  
Old 06-12-2009
Interesting issue with pthread_mutex_lock and siglongjmp in AIX 5.3 (and no other OS)

Executive summary:

Code (posted below) cores in AIX 5.3, despite being compiled and run successfully on several other operating systems. Code is attempting to verify that pthread_mutex_lock can be successfully aborted by siglongjmp. I do not believe this is an unreasonable requirement.

If you could please compile the code below in any operating system supporting pthreads and report whether it runs to completion, I'd really appreciate it. Of course, I would appreciate it more if someone could tell me I'm definately doing something wrong.

Ok, on with the long winded post....

I have a simple application using siglongjmp and mutexes that is coring in AIX 5.3 and, thus far, no other operating systems. I have compiled and ran it successfully on Redhat Enterprise Linux (kernel 2.6.18, 32 bit), HP-UX 11, Compaq Tru64 V5.1B, and SunOS 5.7.

What seems to be happening is that when the code prematurely exits the pthread_mutex_lock function, via the long jump, a subsequent call to pthread_mutex_lock causes the application to seg fault (in AIX 5.3). Interestingly, this only seems to occur if the subsequent call is made before the thread holding the lock releases it; a condition that could not be guaranteed in a real application. Further, replacing the subsequent pthread_mutex_lock with pthread_mutex_trylock (in a spin loop) will succeed without coring as well. However, the spin lock is wasteful and, unlike most spin locks which spin for a bit and then block, this spin has to continue until the lock is acquired. This is because any attempt to call the blocking function (pthread_mutex_lock) causes the application to core.

When the core occurs, dbx shows an AIX library function at the top of the stack. Here is the stack according to dbx when it cores:

Segfault at _usched_dispatch_front stack is:
_usched_dispatch_front
_usched_swtch
_waitlock
_local_lock_common
_mutex_lock
main

What I am really looking for here is ammunition to point to whether my code (which works on 4 of 5 Operating Systems successfully so far) or IBM's libraries are at fault here. To that end, if people can compile and run this successfully (or not) and report their results that would be awesome! Of course, if anyone has insight regarding something I am doing wrong, I'd love to hear it!

Here is the code, overly commented to explain the problem. The two options that make it run successfully on AIX 5.3 can be tested by compiling with either -DNO_CORE (which changes the order of operations to a successful one) or -DNO_CORE -DTRYSPIN (which replaces the failing pthread_mutex_lock with a spinning pthread_mutex_trylock). As I said, though, while neither of these two options is, in my opinion, a viable "solution" to the problem, I do find it interesting that either averts the core.

Code:
#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdarg.h>
#include <string.h>
#include <signal.h>
#include <setjmp.h>

static pthread_mutex_t	mx = PTHREAD_MUTEX_INITIALIZER;

static pthread_t	tidMain;

static sigjmp_buf 	env;

static int t_printf(const char *fmt, ...)
{
	va_list	va;
	char	buf[256];
	
	sprintf(buf, "[%c] ",
		pthread_equal(tidMain, pthread_self()) ? 'M' : 'I');
	
	va_start(va, fmt);
	vsnprintf(buf+4, sizeof(buf)-4, fmt, va);		
	va_end(va);
	
	printf(buf); fflush(stdout);
}

static void *thread_fn(void *data)
{
	int		rv;
	
	t_printf("thread started\n");
	
	/* lock mutex so main thread must block waiting on us to release it */
	
	rv = pthread_mutex_lock(&mx);
	
	t_printf("mutex was %slocked\n",
		rv == 0 ? "" : "NOT ");
	
	/* sleep to "assure" main thread will be blocked in pthread_mutex_lock
	     call when we signal it. */
	
	sleep(5);
	
	t_printf("signaling main\n");
	
	/* send signal to main thread causing it to long jump out of the
	     pthread_mutex_lock function */
	
	pthread_kill(tidMain, SIGALRM);
	
	/* sleep to "assure" main thread will be attempting to reaquire the
	     mutex when we unlock it. */
	
	sleep(2);
	
	rv = pthread_mutex_unlock(&mx);
	
	t_printf("mutex lock %sreleased\n",
		rv == 0 ? "" : "NOT ");
}

static void alarm_fn(int sig)
{
	/* Since we directed this signal to the main thread via pthread_kill,
	     verify that is where we get it!  If we do not get it there, we
	     may expect issues trying to long jump across a thread stack.
	     Fortunately this is working as expected. */
	     
	t_printf("signal received %s\n",
		pthread_equal(tidMain, pthread_self()) ? "OK" : "IN WRONG THREAD");
	
	siglongjmp(env, 1);
	
	puts("!!! LONG JUMP FAILED !!!");
	exit(-2);
}

int main(int argc, char **argv)
{
	pthread_t	tid;
	int		rv;
	int		cancelled = 0;
	
	tidMain = pthread_self();
	
	signal(SIGALRM, alarm_fn);
	
	/* create "interference" thread */
	
	pthread_create(&tid, NULL, thread_fn, NULL);

	if (sigsetjmp(env, 1) == 0)
	{
		/* sleep a bit to "assure" the thread we created executes and
		     grabs the lock before we can so we have to block. */
		
		sleep(3);
		
		/* this is where we want to be when we get signalled to test
		     that we can be broken out of pthread_mutex_lock successfully
		     by a long jump */
			 
		t_printf("blocked locking mutex\n");
		
		rv = pthread_mutex_lock(&mx);
	}
	else
	{
		/* we expect the signal to be delivered forcing us into this
		     block of code. */
		
		rv = -1;
		cancelled = 1;
	}
	
	/* print rv and cancelled to show us what path we took above -- we
	     expect to be cancelled with rv == -1 */

	t_printf("rv: %d; %scancelled\n", rv, cancelled ? "" : "!NOT! ");
	
	/* Verify that we can re-acquire the lock after pthread_mutex_lock was
	     jumped out of by the long jump -- this is where we die in AIX 5.3
	     but nowhere else!  Oddly, we only core if the interference thread
	     still has the mutex locked when we get here; if it unlocks first
	     then this call succeeds.  We can test this by sleeping for a bit
	     before making this call (to allow the interference thread to
	     unlock) define NO_CORE to demonstrate this behavior.
	     
	   It also happens that a pthread_mutex_trylock is successful too if
	     written in a spin loop; which is also odd.  Define both NO_CORE
		 and TRYSPIN to demonstrate this behavior. */
	
	#ifdef NO_CORE
		#ifdef TRYSPIN
			t_printf("attempting relock via trylock\n");
		
			while((rv = pthread_mutex_trylock(&mx)) != 0)
			{
				sched_yield();
			}
		#else
			sleep(5);
			
			t_printf("attempting relock after sleep\n");
			
			rv = pthread_mutex_lock(&mx);
		#endif
	#else
		t_printf("attempting relock\n");
		
		rv = pthread_mutex_lock(&mx);
	#endif
	
	t_printf("lock was %sacquired\n",
		 rv == 0 ? "": "NOT ");
	
	sleep(1);
	
	t_printf("goodbye...\n");
	
	return EXIT_SUCCESS;
}

# 2  
Old 06-14-2009
I would try increasing the thread stack size and see what happens. You may have a yellow or red (guard) zone stack overflow.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. War Stories

Interesting script issue clubbed with crontab.

Hello All, Finally I am posting an issue and it's solution which I faced last week. Let me explain it by headings. Issue's background: It was a nice Tuesday for me, went to office as usual started checking emails and work assigned to me. Suddenly a gentleman reached out to me on my desk(in a... (2 Replies)
Discussion started by: RavinderSingh13
2 Replies

2. AIX

AIX memory issue

Currently server have load while there is no heavy things running, just oracle database/ application server oracle. I don't understand why server have heavy load, 22GB is under buffer, how to clean buffer/memory in AIX load averages: 9.42, 9.43, 9.68; 05:25:08 141 processes: 125 idle, 16... (12 Replies)
Discussion started by: learnbash
12 Replies

3. AIX

AIX Networking Issue

Hello, I'm trying to set up an internet connection on an IBM RS/6000 7043-140 machine with AIX v 5.1. The problem is that no matter if it is setup to receive an IP address from another DHCP server or has a static IP set, it seems to act as a DHCP server that assigns a random IP address with a... (3 Replies)
Discussion started by: Xsystem
3 Replies

4. Programming

pthread_mutex_lock in ANSI C vs using Atomic builtins of GCC

I have a program which has 7-8 threads, and lots of shared variables; these variables (and also they may not the primitive type, they may be enum or struct ), then they may read/write by different threads at the same time. Now, my design is like this, typedef unsigned short int UINT16;... (14 Replies)
Discussion started by: sehang
14 Replies

5. Shell Programming and Scripting

HP UX and AIX compatibility issue

Hi All, The below code works perfectly on AIX machine but doesnt give the desired o/p on HP UX. Can someone please generalise the code so that it becomes platform independent. awk 'NR == FNR { /^*\47name/ && c++ # get the field number if (/^*\47size/) { split($0, t, ":") ... (2 Replies)
Discussion started by: subhrap.das
2 Replies

6. Shell Programming and Scripting

Report filtering - Weird issue and interesting - UrgentPlease

Hi, Could any one help me to extract data from a report. I would like to get the two lines which are just below the separations I have a report like this -------------------------------------------------------------------------- Pid Command Inuse Pin Pgsp Virtual... (2 Replies)
Discussion started by: ajilesh
2 Replies

7. AIX

Aix xlc interesting SEGV on exit

Hello all, One of the application we port to Aix from linux Segmentation faults when it exits. Here is part of backtrace of SEGV: (dbx) where splay(??, ??, ??) at free_y(??, ??) at free_common(??) at .... exit(??) at ... Application seem to perform everything expected well and... (1 Reply)
Discussion started by: qrio.qrio
1 Replies

8. AIX

Issue "Error 404" when upgrade AIX 5300-05-CSP-0000 to AIX (5300-09-02-0849)

Please read my issue! My old server using: - AIX system operating (5300-05-CSP-0000) - WebSphere 6.1.0.21 (Fix Pack 21) After I've upgraded version AIX - AIX system operating (5300-09-02-0849) - WebSphere 6.1.0.21 (Fix Pack 21) I have 1 issue when I access home page: "Error... (0 Replies)
Discussion started by: gamonhon
0 Replies

9. AIX

tcp_ephemeral_high issue with AIX 5.2

Hello, I have AIX5.2. I am trying to set tcp_ephemeral_high port value to 5000 and tcp_ephemeral_low value to 1024. tcp_ephemeral_high is not possible to set below 32769. pls advise how to set tcp_ephemeral_high value to 5000. (7 Replies)
Discussion started by: balareddy
7 Replies

10. Programming

pthread_rwlock_lock vs pthread_mutex_lock

I have been wondering what the difference between pthread_rwlock_lock and pthread_mutex_lock is. Both these routines acquire an exclusive rw lock on an enclosed region. So I performed a simple experiment in which I execute both these routines multiple times in a loop. Here are the results:... (1 Reply)
Discussion started by: kmehta
1 Replies
Login or Register to Ask a Question