Test program running taking much more time on high end server T5440 than low end server T5220


 
Thread Tools Search this Thread
Operating Systems Solaris Test program running taking much more time on high end server T5440 than low end server T5220
# 8  
Old 07-02-2013
Well, as processes get dispatched to CPUs, some registers must be reloaded every time, like VM translation cache, even if it is the same CPU as last dispatch, as something else has been running in there, even if it is some 'Idle Process'. But RAM is cached inside the CPU possibly at two or more levels, by physical not virtual address so it is process-insensitive, and the farther away, CPUwise, the next dispatch of that process is, them more cache misses until cache is reloaded from RAM. Some CPUs have a variation on this scheme, where a VM translation miss is a first level cache miss.

Furthermore, many cache snoopers remove things from cache that are written by other parallel CPUs, so even if no other process has used a CPU since your process was last there, the cache hit rate is reduced for modified cache lines, which are often 16 or more bytes wide. Any modified byte on a line and the whole line is deleted from every other cache as that modified word makes its way to RAM.

RAM is a lot slower than the first level cache, and caches get faster as you get closer to th CPU, so the cost of cache misses is huge in CPU cycles. That is why programs that run like lightning once started still take time to get loaded and produce the first loop's data.

Fetching from disk to RAM adds to that delay, since disk is also much slower than RAM. If it wasn't, disk I/O could stop the CPUs dead.
This User Gave Thanks to DGPickett For This Post:
# 9  
Old 07-02-2013
Quote:
Originally Posted by sanjay_singh85
I put the code you mentioned in the link into my test program.
Can you post your updated code ?
This User Gave Thanks to jlliagre For This Post:
# 10  
Old 07-03-2013
Pleae find below the updated code which bind the process to CPU. It takes around 107 second t complete the execution.

Code:
#include <pthread.h>
#include <sys/processor.h>
#include <time.h>
#include<unistd.h>
using namespace std;
#define NUM_OF_THREADS 20
struct ABCDEF {
char A[1024];
char B[1024];
};
void bindnow()
{
  processorid_t proc = getcpuid();
  if (processor_bind(P_LWPID, P_MYID, proc, 0))
    { printf("Warning: Binding failed\n"); }
  else
    { printf("Bound to CPU %i\n", proc); }
}
 
void *start_func(void *)
{
    long long i = 6000000;
    //bindnow();
    while(i--)
    {
                ABCDEF*             sdf = new ABCDEF;
                delete sdf;
                sdf = NULL;
    }
    return NULL;
}
int main(int argc, char* argv[])
{
    pthread_t tid[50];
    struct timespec tps, tpe;
 if ((clock_gettime(CLOCK_REALTIME, &tps) != 0)  || (clock_gettime(CLOCK_REALTIME, &tpe) != 0)) {
  perror("clock_gettime");
    return -1;
  }
    bindnow();
    for(int i=0; i<NUM_OF_THREADS; i++)
    {
                pthread_create(&tid[i], NULL, start_func, NULL);
                cout<<"Creating thread " << i <<endl;
    }
     
    for(int i=0; i<NUM_OF_THREADS; i++)
    {
                pthread_join(tid[i], NULL);
                cout<<"Waiting for thread " << i <<endl;
    }
 clock_gettime(CLOCK_REALTIME, &tpe);
  printf("%lu s, %lu ns\n", tpe.tv_sec-tps.tv_sec,
    tpe.tv_nsec-tps.tv_nsec);
}

Code:
[root]kansparc54144:/ /usr/sfw/bin/g++ -g -Wno-deprecated ss2.cpp -lpthread -lrt -o ss2
[root]kansparc54144:/ ./ss2
Bound to CPU 64
Creating thread 0
Creating thread 1
Creating thread 2
Creating thread 3
Creating thread 4
Creating thread 5
Creating thread 6
Creating thread 7
Creating thread 8
Creating thread 9
Creating thread 10
Creating thread 11
Creating thread 12
Creating thread 13
Creating thread 14
Creating thread 15
Creating thread 16
Creating thread 17
Creating thread 18
Creating thread 19
start_funcWaiting for thread 0
Waiting for thread 1
Waiting for thread 2
Waiting for thread 3
Waiting for thread 4
Waiting for thread 5
Waiting for thread 6
Waiting for thread 7
Waiting for thread 8
Waiting for thread 9
Waiting for thread 10
Waiting for thread 11
Waiting for thread 12
Waiting for thread 13
Waiting for thread 14
Waiting for thread 15
Waiting for thread 16
Waiting for thread 17
Waiting for thread 18
Waiting for thread 19
107 s, 416364341 ns

Also, I commented the "bindnow" function in main and added in the "bindnow" function in "start_func" as shown below. It takes around 486 second to complete the execution.
Code:
void *start_func(void *)
{
    long long i = 6000000;
    bindnow();
    while(i--)
    {
                ABCDEF*             sdf = new ABCDEF;
                delete sdf;
                sdf = NULL;
    }
    return NULL;
}
int main(int argc, char* argv[])
{
    pthread_t tid[50];
    struct timespec tps, tpe;
 if ((clock_gettime(CLOCK_REALTIME, &tps) != 0)  || (clock_gettime(CLOCK_REALTIME, &tpe) != 0)) {
  perror("clock_gettime");
    return -1;
  }
    //bindnow();
    for(int i=0; i<NUM_OF_THREADS; i++)
    {
                pthread_create(&tid[i], NULL, start_func, NULL);
                cout<<"Creating thread " << i <<endl;
    }
    ...


Code:
root]kansparc54144:/ /usr/sfw/bin/g++ -g -Wno-deprecated ss2.cpp -lpthread -lrt -o ss2
[root]kansparc54144:/ ./ss2
Creating thread Bound to CPU 64
0
Creating thread 1
Bound to CPU 192
Creating thread 2
Bound to CPU 0
Creating thread Bound to CPU 129
3
Creating thread 4
Bound to CPU 211
Creating thread 5
Bound to CPU 101
Creating thread 6
Bound to CPU 19
Creating thread 7
Bound to CPU 142
Creating thread 8
Bound to CPU 192
Creating thread 9
Bound to CPU 110
Creating thread 10
Bound to CPU 0
Creating thread 11
Bound to CPU 147
Creating thread 12
Bound to CPU 229
Creating thread 13
Bound to CPU 119
Creating thread 14
Bound to CPU 9
Creating thread 15
Bound to CPU 147
Creating thread 16
Bound to CPU 101
Creating thread 17
Bound to CPU 247
Creating thread 18
Bound to CPU 19
Creating thread 19
Bound to CPU 147
Waiting for thread 0
Waiting for thread 1
Waiting for thread 2
Waiting for thread 3
Waiting for thread 4
Waiting for thread 5
Waiting for thread 6
Waiting for thread 7
Waiting for thread 8
Waiting for thread 9
Waiting for thread 10
Waiting for thread 11
Waiting for thread 12
Waiting for thread 13
Waiting for thread 14
Waiting for thread 15
Waiting for thread 16
Waiting for thread 17
Waiting for thread 18
Waiting for thread 19
486 s, 3873742799 ns

Moderator's Comments:
Mod Comment Please use [code] tags, not [icode] ones

Last edited by Corona688; 07-03-2013 at 12:26 PM..
# 11  
Old 07-03-2013
So.... When you don't call bindnow() it takes many times longer?
# 12  
Old 07-08-2013
Hi All,

Thanks a lot for replies . It help me a lot to find out the issue which I was facing with my appplication. It was due to the multi-processors.
I bound my application to a processor with following code:
Code:
void ProcessorSetAdd()
{
    if (pset_create(&psid) != 0)
    {
        cout<<"pset_create() Failed" <<endl;
    }
    /* Assign CPU 0 to the processor-set */
    //for(ci=0; ci < 63; ci++)
    for(ci=8; ci < 16; ci++)
    {
        if (pset_assign(psid, ci, NULL) != 0)
        {
            cout<<"pset_assign() Failed for " << ci <<endl;
        }
    }
    /* Bind the current process to the processor-set */
    if (pset_bind(psid, P_PID, P_MYID, NULL) != 0)
    {
        cout<<"pset_bind() Failed" <<endl;
    }
    int pType;
    unsigned int noOfCPU = 0;
    processorid_t cpuList;
    pset_info(psid, &pType, &noOfCPU, &cpuList);
    cout<< "No of CPU in List is" << noOfCPU <<endl;
    cout<< "TYPE OF CPU" << pType <<endl;
}

It gave the same performance as server T5220.

Thanks a lot once again eveybody.

regards,
sanjay
# 13  
Old 07-10-2013
Try using a multiple (2x isusually good) of the CPU core count for the child thread count, like 64. That way, the work is divided equally to all CPU cores with 8 or 32 and if any thread blocks, there is another to use the CPU core.
# 14  
Old 07-10-2013
I have a couple of observations:

1. The only thing the test program is testing is the ability of the standard malloc()/free() implementation to repeatedly allocate then free then allocate again the same blocks of memory to multiple threads. I question the usefulness of such a test.

2. The calculation of time spent ignores nanosecond rollover.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to keep staying on remote server after executing a shell script with if then exit end statement?

i have a "if .. then exit end " in s shell script on remote servers. now the connection to the remote server got killed after i run this script on the remote servers. How do i run this script on remote hosts and still keep remote connections alive after executing the script. Thank you. (10 Replies)
Discussion started by: moonmonk
10 Replies

2. UNIX for Dummies Questions & Answers

Running a C/C++ program and/or bash script from a server

I wish to be able to give to a client the opportunity to : 0) Turn one of my ubuntu computers into a webserver 1) See a webpage after visiting a url where an external user/client can set a couple of variables (e.g. Number1= ?, Number2=?) 2) By pressing "run" the program runs on my machine 3)... (1 Reply)
Discussion started by: frad
1 Replies

3. UNIX for Advanced & Expert Users

Empty lines at the end of the payload generated in FTP server

Hi All, I am facing an issue in one of the use cases that I am trying to implement. I am getting a purchase order from one of the trading partners through Oracle B2B. B2B forwards this B2BM (B2B message ) to AIAB2BInterface. From AIAB2BInterface my BPEL process gets invoked, which in turn... (1 Reply)
Discussion started by: mayank2211
1 Replies

4. Solaris

SPARC Enterprise T5440 Server, can not power off

SPARC Enterprise T5440 Server, can not power off I connect to Service Processor via serial port and submit the following to boot the system, it boots fine but can not power off the system. Can someone help? The following takes me to unix login -> start /SYS -> start /SP/console... (6 Replies)
Discussion started by: paulk93
6 Replies

5. AIX

High Runqueue (R) LOW CPU LOW I/O Low Network Low memory usage

Hello All I have a system running AIX 61 shared uncapped partition (with 11 physical processors, 24 Virtual 72GB of Memory) . The output from NMON, vmstat show a high run queue (60+) for continous periods of time intervals, but NO paging, relatively low I/o (6000) , CPU % is 40, Low network.... (9 Replies)
Discussion started by: IL-Malti
9 Replies

6. Solaris

High I/O on Sun server running Oracle.

Hi, Currently we have a Sun Fire 480R running Solaris 9 and Oracle 9.2.0.8. The server is fibre attached to a NetApp FAS3070. Two separate 100GB LUNs are presented to the server. The two LUNs are mounted as the file systems data and logs for the Oracle database. We are seeing high I/O... (1 Reply)
Discussion started by: gwhelan
1 Replies

7. Shell Programming and Scripting

taking the end off a path

I need a script to be able to take a path such as "/foo/bar/thing" a put the "/foo/bar/" bit in one variable and the "thing" bit in another. I figured awk would probably be the best tool for the job but looking at the man page didn't seem to help. The only way i know how to use awk is with... (12 Replies)
Discussion started by: Nat
12 Replies

8. UNIX for Dummies Questions & Answers

running dos program from unix server

Hello, My apologies if this sounds like a stupid question...... but is it possible to call a script that is located on a w2k machine from a csh script bing run on a unix server. Many thanks rkap (1 Reply)
Discussion started by: rkap
1 Replies

9. UNIX for Dummies Questions & Answers

Script to Test Application Server is running

Hi, I'm a complete novice at Unix and need to create a script that does the following... checks to see if an application server is running. If the app is running then print 'Available' Else print 'Unavaliable' exit from scriopt I have no idea where to start. I'd be very grateful... (0 Replies)
Discussion started by: duglover
0 Replies
Login or Register to Ask a Question