Test program running taking much more time on high end server T5440 than low end server T5220

07-02-2013

Registered User

4,673, 588

Join Date: Oct 2010

Last Activity: 1 February 2016, 3:35 PM EST

Location: Southern NJ, USA (Nord)

Posts: 4,673

Thanks Given: 8

Thanked 588 Times in 561 Posts

Well, as processes get dispatched to CPUs, some registers must be reloaded every time, like VM translation cache, even if it is the same CPU as last dispatch, as something else has been running in there, even if it is some 'Idle Process'. But RAM is cached inside the CPU possibly at two or more levels, by physical not virtual address so it is process-insensitive, and the farther away, CPUwise, the next dispatch of that process is, them more cache misses until cache is reloaded from RAM. Some CPUs have a variation on this scheme, where a VM translation miss is a first level cache miss.

Furthermore, many cache snoopers remove things from cache that are written by other parallel CPUs, so even if no other process has used a CPU since your process was last there, the cache hit rate is reduced for modified cache lines, which are often 16 or more bytes wide. Any modified byte on a line and the whole line is deleted from every other cache as that modified word makes its way to RAM.

RAM is a lot slower than the first level cache, and caches get faster as you get closer to th CPU, so the cost of cache misses is huge in CPU cycles. That is why programs that run like lightning once started still take time to get loaded and produce the first loop's data.

Fetching from disk to RAM adds to that delay, since disk is also much slower than RAM. If it wasn't, disk I/O could stop the CPUs dead.

This User Gave Thanks to DGPickett For This Post:

DGPickett

View Public Profile for DGPickett

Find all posts by DGPickett

07-02-2013

Registered User

4,940, 703

Join Date: Dec 2007

Last Activity: 4 October 2020, 5:57 PM EDT

Location: Outside Paris

Posts: 4,940

Thanks Given: 20

Thanked 703 Times in 595 Posts

Quote:

Originally Posted by sanjay_singh85

I put the code you mentioned in the link into my test program.

Can you post your updated code ?

This User Gave Thanks to jlliagre For This Post:

jlliagre

View Public Profile for jlliagre

Find all posts by jlliagre

07-03-2013

Registered User

5, 0

Join Date: Jun 2013

Last Activity: 8 July 2013, 7:10 AM EDT

Posts: 5

Thanks Given: 6

Thanked 0 Times in 0 Posts

Pleae find below the updated code which bind the process to CPU. It takes around 107 second t complete the execution.

Code:

#include <pthread.h>
#include <sys/processor.h>
#include <time.h>
#include<unistd.h>
using namespace std;
#define NUM_OF_THREADS 20
struct ABCDEF {
char A[1024];
char B[1024];
};
void bindnow()
{
  processorid_t proc = getcpuid();
  if (processor_bind(P_LWPID, P_MYID, proc, 0))
    { printf("Warning: Binding failed\n"); }
  else
    { printf("Bound to CPU %i\n", proc); }
}
 
void *start_func(void *)
{
    long long i = 6000000;
    //bindnow();
    while(i--)
    {
                ABCDEF*             sdf = new ABCDEF;
                delete sdf;
                sdf = NULL;
    }
    return NULL;
}
int main(int argc, char* argv[])
{
    pthread_t tid[50];
    struct timespec tps, tpe;
 if ((clock_gettime(CLOCK_REALTIME, &tps) != 0)  || (clock_gettime(CLOCK_REALTIME, &tpe) != 0)) {
  perror("clock_gettime");
    return -1;
  }
    bindnow();
    for(int i=0; i<NUM_OF_THREADS; i++)
    {
                pthread_create(&tid[i], NULL, start_func, NULL);
                cout<<"Creating thread " << i <<endl;
    }
     
    for(int i=0; i<NUM_OF_THREADS; i++)
    {
                pthread_join(tid[i], NULL);
                cout<<"Waiting for thread " << i <<endl;
    }
 clock_gettime(CLOCK_REALTIME, &tpe);
  printf("%lu s, %lu ns\n", tpe.tv_sec-tps.tv_sec,
    tpe.tv_nsec-tps.tv_nsec);
}

Code:

[root]kansparc54144:/ /usr/sfw/bin/g++ -g -Wno-deprecated ss2.cpp -lpthread -lrt -o ss2
[root]kansparc54144:/ ./ss2
Bound to CPU 64
Creating thread 0
Creating thread 1
Creating thread 2
Creating thread 3
Creating thread 4
Creating thread 5
Creating thread 6
Creating thread 7
Creating thread 8
Creating thread 9
Creating thread 10
Creating thread 11
Creating thread 12
Creating thread 13
Creating thread 14
Creating thread 15
Creating thread 16
Creating thread 17
Creating thread 18
Creating thread 19
start_funcWaiting for thread 0
Waiting for thread 1
Waiting for thread 2
Waiting for thread 3
Waiting for thread 4
Waiting for thread 5
Waiting for thread 6
Waiting for thread 7
Waiting for thread 8
Waiting for thread 9
Waiting for thread 10
Waiting for thread 11
Waiting for thread 12
Waiting for thread 13
Waiting for thread 14
Waiting for thread 15
Waiting for thread 16
Waiting for thread 17
Waiting for thread 18
Waiting for thread 19
107 s, 416364341 ns

Also, I commented the "bindnow" function in main and added in the "bindnow" function in "start_func" as shown below. It takes around 486 second to complete the execution.

Code:

void *start_func(void *)
{
    long long i = 6000000;
    bindnow();
    while(i--)
    {
                ABCDEF*             sdf = new ABCDEF;
                delete sdf;
                sdf = NULL;
    }
    return NULL;
}
int main(int argc, char* argv[])
{
    pthread_t tid[50];
    struct timespec tps, tpe;
 if ((clock_gettime(CLOCK_REALTIME, &tps) != 0)  || (clock_gettime(CLOCK_REALTIME, &tpe) != 0)) {
  perror("clock_gettime");
    return -1;
  }
    //bindnow();
    for(int i=0; i<NUM_OF_THREADS; i++)
    {
                pthread_create(&tid[i], NULL, start_func, NULL);
                cout<<"Creating thread " << i <<endl;
    }
    ...

Code:

root]kansparc54144:/ /usr/sfw/bin/g++ -g -Wno-deprecated ss2.cpp -lpthread -lrt -o ss2
[root]kansparc54144:/ ./ss2
Creating thread Bound to CPU 64
0
Creating thread 1
Bound to CPU 192
Creating thread 2
Bound to CPU 0
Creating thread Bound to CPU 129
3
Creating thread 4
Bound to CPU 211
Creating thread 5
Bound to CPU 101
Creating thread 6
Bound to CPU 19
Creating thread 7
Bound to CPU 142
Creating thread 8
Bound to CPU 192
Creating thread 9
Bound to CPU 110
Creating thread 10
Bound to CPU 0
Creating thread 11
Bound to CPU 147
Creating thread 12
Bound to CPU 229
Creating thread 13
Bound to CPU 119
Creating thread 14
Bound to CPU 9
Creating thread 15
Bound to CPU 147
Creating thread 16
Bound to CPU 101
Creating thread 17
Bound to CPU 247
Creating thread 18
Bound to CPU 19
Creating thread 19
Bound to CPU 147
Waiting for thread 0
Waiting for thread 1
Waiting for thread 2
Waiting for thread 3
Waiting for thread 4
Waiting for thread 5
Waiting for thread 6
Waiting for thread 7
Waiting for thread 8
Waiting for thread 9
Waiting for thread 10
Waiting for thread 11
Waiting for thread 12
Waiting for thread 13
Waiting for thread 14
Waiting for thread 15
Waiting for thread 16
Waiting for thread 17
Waiting for thread 18
Waiting for thread 19
486 s, 3873742799 ns

Moderator's Comments:

Please use [code] tags, not [icode] ones

Last edited by Corona688; 07-03-2013 at 12:26 PM..

sanjay_singh85

View Public Profile for sanjay_singh85

Find all posts by sanjay_singh85

07-03-2013

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

So.... When you don't call bindnow() it takes many times longer?

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

07-08-2013

Registered User

5, 0

Join Date: Jun 2013

Last Activity: 8 July 2013, 7:10 AM EDT

Posts: 5

Thanks Given: 6

Thanked 0 Times in 0 Posts

Hi All,

Thanks a lot for replies . It help me a lot to find out the issue which I was facing with my appplication. It was due to the multi-processors.
I bound my application to a processor with following code:

Code:

void ProcessorSetAdd()
{
    if (pset_create(&psid) != 0)
    {
        cout<<"pset_create() Failed" <<endl;
    }
    /* Assign CPU 0 to the processor-set */
    //for(ci=0; ci < 63; ci++)
    for(ci=8; ci < 16; ci++)
    {
        if (pset_assign(psid, ci, NULL) != 0)
        {
            cout<<"pset_assign() Failed for " << ci <<endl;
        }
    }
    /* Bind the current process to the processor-set */
    if (pset_bind(psid, P_PID, P_MYID, NULL) != 0)
    {
        cout<<"pset_bind() Failed" <<endl;
    }
    int pType;
    unsigned int noOfCPU = 0;
    processorid_t cpuList;
    pset_info(psid, &pType, &noOfCPU, &cpuList);
    cout<< "No of CPU in List is" << noOfCPU <<endl;
    cout<< "TYPE OF CPU" << pType <<endl;
}

It gave the same performance as server T5220.

Thanks a lot once again eveybody.

regards,
sanjay

sanjay_singh85

View Public Profile for sanjay_singh85

Find all posts by sanjay_singh85

07-10-2013

Registered User

4,673, 588

Join Date: Oct 2010

Last Activity: 1 February 2016, 3:35 PM EST

Location: Southern NJ, USA (Nord)

Posts: 4,673

Thanks Given: 8

Thanked 588 Times in 561 Posts

Try using a multiple (2x isusually good) of the CPU core count for the child thread count, like 64. That way, the work is divided equally to all CPU cores with 8 or 32 and if any thread blocks, there is another to use the CPU core.

DGPickett

View Public Profile for DGPickett

Find all posts by DGPickett

07-10-2013

Registered User

1,015, 157

Join Date: Jun 2009

Last Activity: 25 June 2018, 8:15 AM EDT

Posts: 1,015

Thanks Given: 3

Thanked 157 Times in 149 Posts

I have a couple of observations:

1. The only thing the test program is testing is the ability of the standard malloc()/free() implementation to repeatedly allocate then free then allocate again the same blocks of memory to multiple threads. I question the usefulness of such a test.

2. The calculation of time spent ignores nanosecond rollover.

achenle

View Public Profile for achenle

Find all posts by achenle

Solaris

Test program running taking much more time on high end server T5440 than low end server T5220

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to keep staying on remote server after executing a shell script with if then exit end statement?

Discussion started by: moonmonk

2. UNIX for Dummies Questions & Answers

Running a C/C++ program and/or bash script from a server

Discussion started by: frad

3. UNIX for Advanced & Expert Users

Empty lines at the end of the payload generated in FTP server

Discussion started by: mayank2211

4. Solaris

SPARC Enterprise T5440 Server, can not power off

Discussion started by: paulk93

5. AIX

High Runqueue (R) LOW CPU LOW I/O Low Network Low memory usage

Discussion started by: IL-Malti

6. Solaris

High I/O on Sun server running Oracle.

Discussion started by: gwhelan

7. Shell Programming and Scripting

taking the end off a path

Discussion started by: Nat

8. UNIX for Dummies Questions & Answers

running dos program from unix server

Discussion started by: rkap

9. UNIX for Dummies Questions & Answers

Script to Test Application Server is running

Discussion started by: duglover