semaphore access speed

09-24-2008

Registered User

316, 33

Join Date: Sep 2008

Last Activity: 13 September 2020, 12:21 AM EDT

Location: US

Posts: 316

Thanks Given: 66

Thanked 33 Times in 31 Posts

My version of truss does not have -c flag, or any other flag similar to the one I used with trace under Linux.

On Linux :
$ /sbin/sysctl -a |grep shm
... <snip error msgs>
vm.hugetlb_shm_group = 0
kernel.shmmni = 4096
kernel.shmall = 2097152
kernel.shmmax = 33554432

Q: Why would SHM parameters affect semaphore performance?

Also, just in case I ran this
$ /sbin/sysctl -a |grep sem
<snip error msgs>
kernel.sem = 250 32000 32 128

Last edited by migurus; 09-24-2008 at 08:55 PM.. Reason: added info

migurus

View Public Profile for migurus

Find all posts by migurus

09-25-2008

Registered User

2,157, 51

Join Date: Feb 2007

Last Activity: 6 September 2017, 5:43 AM EDT

Location: Innsbruck, Austria

Posts: 2,157

Thanks Given: 12

Thanked 51 Times in 48 Posts

Quote:

Originally Posted by migurus

Q: Why would SHM parameters affect semaphore performance?

*Possibly* because of the number of page tables and structures required to make use of all that memory. It might result in every op requiring two to three cache misses per call. If there are no cache misses, because shmem is less, then maybe it takes 3 times as fast. Jim??

otheus

View Public Profile for otheus

Find all posts by otheus

09-29-2008

Registered User

316, 33

Join Date: Sep 2008

Last Activity: 13 September 2020, 12:21 AM EDT

Location: US

Posts: 316

Thanks Given: 66

Thanked 33 Times in 31 Posts

I'd like to ask gurus where else can I post my question. Would you recommend me any other group or forum?
Your suggestions would be appreciated.

migurus

View Public Profile for migurus

Find all posts by migurus

09-30-2008

Registered User

2,157, 51

Join Date: Feb 2007

Last Activity: 6 September 2017, 5:43 AM EDT

Location: Innsbruck, Austria

Posts: 2,157

Thanks Given: 12

Thanked 51 Times in 48 Posts

As I previously suggested: LinuxQuestions.org.

Since the bottleneck appears to be within the system call, I suggest you also look at Kernel-related BB's (KernelTrap.org is a good one).

otheus

View Public Profile for otheus

Find all posts by otheus

09-30-2008

Registered User

316, 33

Join Date: Sep 2008

Last Activity: 13 September 2020, 12:21 AM EDT

Location: US

Posts: 316

Thanks Given: 66

Thanked 33 Times in 31 Posts

Thanks Otheus and Jim, I got quite detailed answer here:
semaphore access speed | KernelTrap

So, the 2.6.9 kernel is not the best for modern h/w.

I appreciate everebodys time!

migurus

View Public Profile for migurus

Find all posts by migurus

10-01-2008

Registered User

2,157, 51

Join Date: Feb 2007

Last Activity: 6 September 2017, 5:43 AM EDT

Location: Innsbruck, Austria

Posts: 2,157

Thanks Given: 12

Thanked 51 Times in 48 Posts

Kudos to you for your tenacity! However, I don't think this is the end of it.

I did a little research on strcmp's answer. 2.6.9 was released in 2004 and is standard with RHEL 4, which shipped with glibc 2.3.4. Pentium 3's were old in 2004. RHEL 5 ships with kernel 2.6.18 and glibc 2.5.12.

So I did some benchmarks.

I followed strcmp's suggestion and used a "falling timer" method, where the loop starts and ends after the time() call notes a change in seconds. There's a 10 to 100 ms variance on either side of the fall, so I took an average of several runs. Then I divide the ops/s number by the CPU speed (cycles/s) to get "tics per op".

2.6.18 / P3 / 800 MHz: 548300/s (average, 19 runs) = 1459 tics/op
2.6.18 / AMD Opteron 285 / 2.6 GHz: 1689138 (avg 6 runs) = 1539 tics/op
2.6.18 / AMD Opteron 270 / 1.0 GHz: 974228 (avg 7 runs) = 1026 tics/op
2.6.9 / Xeon / 3.6 GHz: 917196 (avg, 4 runs) = 3925 tics/op
2.6.9 / P3 / 1.25 GHz : 733927 (avg, 5 runs) = 1703 tics/op
2.6.9 / Xeon / 2.3 GHz: 1127894 (avg, 10 runs) = 2608 tics/op

For tics/op, smaller is better. So the 2.6.18 kernel is indeed faster than the 2.6.9 kernels. The Xeon is MUCH slower. Presumably the kernels were compiled by a lowest common denominator. No Optimization flags were enabled, but there was a difference in compilers: the 2.6.9 hosts used gcc 3.4.6, while the newer ones were with gcc 4.1.1. Also, it should be noted that we don't have an AMD running 2.6.9 nor a Xeon running 2.6.18.

It very may well be that the problem is that these kernels were not compiled optimally for the various architectures. Why the Xeons are so much slower is quite surprising, given their characteristic use as HPC components.

Regardless, none of these results seem to explain the fundamental question: Why is SCO so much faster??

Last edited by otheus; 10-08-2008 at 06:57 AM.. Reason: I said the Xeon is much faster when it was much slower.

otheus

View Public Profile for otheus

Find all posts by otheus

10-01-2008

Registered User

2,157, 51

Join Date: Feb 2007

Last Activity: 6 September 2017, 5:43 AM EDT

Location: Innsbruck, Austria

Posts: 2,157

Thanks Given: 12

Thanked 51 Times in 48 Posts

Here is my code:

Code:

#include <stdio.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/sem.h>
#include <time.h>
#define NSEMS   2

/* change this per CPU to run between 8 and 12 s*/
const static int maxloop = 10000000; 

main(int argc, char *argv[])
{
    time_t start,last,stop;
    long int i;
    int estimate = 100;
    int sid;
    key_t key;
    ushort vals[NSEMS] = { 0, 0 };

    key = ftok("/tmp",99);
    last=start=time(NULL);
    for (i = 0; i < 1000; ++i) {
        usleep(10);
        last=time(NULL);
        if (last > start) break;
    }
    start=last;
    last = 0;

    for (i = maxloop/8; i < maxloop; i++) {
      if ((sid = semget(key, NSEMS, IPC_CREAT | 0777)) == -1) {
          perror("Can Not Get Semaphore ID");
      }
      if (semctl(sid, NSEMS, GETALL, vals) == -1) {
          perror("Can Not Get Semaphore Values");
      }
    }

/* do the last 1/8th until the second changes.
    If your processor reaches the maxloop before that,
    change the maxloop or the divisor or the "estimate" */

    stop=time(NULL);
    for (i = maxloop - maxloop/8; i < maxloop; ++i) {
      if ( !(i % estimate) ) {
        last=time(NULL);
        if (last > stop) break;
        stop=last;
      }

      /* repeat semaphore opts */
      if ((sid = semget(key, NSEMS, IPC_CREAT | 0777)) == -1) {
          perror("Can Not Get Semaphore ID");
      }
      if (semctl(sid, NSEMS, GETALL, vals) == -1) {
          perror("Can Not Get Semaphore Values");
      }
    }
    stop=last;

    printf("%.2f semop/s (%i/%i) [%d]\n", (double)i/(stop-start), i, stop-start, estimate);
}

otheus

View Public Profile for otheus

Find all posts by otheus

Programming

semaphore access speed

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Semaphore

Discussion started by: uniran

2. Programming

Semaphore

Discussion started by: rupeshkp728

3. Shell Programming and Scripting

semaphore

Discussion started by: gokult

4. Filesystems, Disks and Memory

data from blktrace: read speed V.S. write speed

Discussion started by: W.C.C

5. UNIX for Dummies Questions & Answers

semaphore

Discussion started by: raguramtgr

6. Shell Programming and Scripting

Semaphore

Discussion started by: Jaken

7. Filesystems, Disks and Memory

dmidecode, RAM speed = "Current Speed: Unknown"

Discussion started by: Santi

8. UNIX for Dummies Questions & Answers

Semaphore

Discussion started by: vjsony

9. UNIX for Dummies Questions & Answers

semaphore

Discussion started by: yls177