Test program running taking much more time on high end server T5440 than low end server T5220


 
Thread Tools Search this Thread
Operating Systems Solaris Test program running taking much more time on high end server T5440 than low end server T5220
# 1  
Old 06-26-2013
Test program running taking much more time on high end server T5440 than low end server T5220

Hi all,
I have written the following program and run on both T5440 [1.4 GHz, 95 GB RAM, 32 cores(s), 256 logical (virtual) processor(s),] and T5220 [(UltraSPARC-T2 (chipid 0, clock 1165 MH) , 8GB RAM, 1 core, 8 virtual processors )] on same OS version. I found that T5540 server takes more time than T5220. Please find below the details.

test1.cpp

Code:
#include <iostream>
#include <pthread.h>
 
using namespace std;
#define NUM_OF_THREADS 20
 
struct ABCDEF {
char A[1024];
char B[1024];
};
 
void *start_func(void *)
{
    long long i = 6000;
    while(i--)
    {
                ABCDEF*             sdf = new ABCDEF;
                delete sdf;
                sdf = NULL;
    }
    return NULL;
}
int main(int argc, char* argv[])
{
    pthread_t tid[50];
    for(int i=0; i<NUM_OF_THREADS; i++)
    {
                pthread_create(&tid[i], NULL, start_func, NULL);
                cout<<"Creating thread " << i <<endl;
    }
 
    for(int i=0; i<NUM_OF_THREADS; i++)
    {
                pthread_join(tid[i], NULL);
                cout<<"Waiting for thread " << i <<endl;
    }
}

After executing the above program on T5440 takes :
real 0.78
user 3.94s
sys 0.05

After executing the above program on T5220 takes :
real 0.23
user 1.43s
sys 0.03


It seems that T5440 which is high end server takes almost 3 times more time than T5220 which is low end server.

However, I have one more observation. I tried the following program :

test2.cpp
Code:
#include <iostream>
#include <pthread.h>
 
using namespace std;
#define NUM_OF_THREADS 20
 
struct ABCDEF {
char A[1024];
char B[1024];
};
 
int main(int argc, char* argv[])
{
    long long i = 6000000;
    while(i--)
    {
        ABCDEF*  sdf = new ABCDEF;
        delete sdf;
        sdf = NULL;
    }
    return 0;
}

It seems that T5440 server is fast in this case as compaired to T5220 server.

Could anyone please help me out the exact reason for this behaviour as my application is slow as well on this T5440 server.

Thanks in advance !!!

regards,
Sanjay

Last edited by Scrutinizer; 06-27-2013 at 03:12 PM.. Reason: code tags
# 2  
Old 06-27-2013
Did you compile for fastest on this platform on the slower machine? The SunWSPro compiler has a lot of optimizations, some very architecture specific.

The multicore CPUs can be slower on a single thread than some CPUs that are not trimmed to allow so many cores. Gamers still like 1-2 core machines, as the parallelization of a game is pretty low. If you run 32 or 64 copies at once, you might see the difference.

Then, there is the question of what is running concurrently on each server. Another app may be making CPU, RAM, DIsk or net speed competition. Some of this can be tricky to observe.
This User Gave Thanks to DGPickett For This Post:
# 3  
Old 06-28-2013
Thanks a lot for reply.

I complied the test program on both the servers and executed the corresponding binaries on both machines. Also, no other application is running on any of these servers.

Additionally, I tried one more experitment and found the following results.

Attached Program (ABC.cpp) is compiled by - /usr/sfw/bin/g++ -g -Wno-deprecated ABC.cpp -lpthread and Run by - time -p ./a.out

High Performance Architecture (kansparc54144) - root/labbws54144

4 socket(s)
32 core(s)
256 logical (virtual) processor(s)
The physical processor has 64 virtual processors (0-63)
UltraSPARC-T2+ (chipid 0, clock 1414 MHz)
The physical processor has 64 virtual processors (64-127)
UltraSPARC-T2+ (chipid 1, clock 1414 MHz)
The physical processor has 64 virtual processors (128-191)
UltraSPARC-T2+ (chipid 2, clock 1414 MHz)
The physical processor has 64 virtual processors (192-255)
UltraSPARC-T2+ (chipid 3, clock 1414 MHz)
Memory size: 98016 Megabytes

SunOS Generic_144488-17 sun4v sparc SUNW,T5440

Case1: Memory Operation (allocation, set, de-allocation)- commented line number 129 of test prog
real 0.78
user 3.94
sys 0.05

Case2: Memory Operation (allocation, set, de-allocation) and Computation (Matrix Mul)
real 14.54
user 280.18
sys 0.07



Low Performance Architecture (kansparc6744) - root/6744@labbws

1 socket(s)
1 core(s)
8 logical (virtual) processor(s)
The physical processor has 8 virtual processors (0-7)
UltraSPARC-T2 (chipid 0, clock 1165 MHz)
Memory size: 8192 Megabytes

SunOS 5.10 Generic_144488-17 sun4v sparc SUNW,SPARC-Enterprise-T5220

Case1: Memory Operation (allocation, set, de-allocation)- commented line number 129 of test prog
real 0.23
user 1.43
sys 0.03

Case2: Memory Operation (allocation, set, de-allocation) and Computation (Matrix Mul)
real 66.50
user 525.30
sys 0.44

MY CONCLUSION:
High Perf Arch perform good in case2 but bad in case1 (???).

I don't understand this behavior . could you please provide some information on the above behviour.


regards,
Sanjay
# 4  
Old 06-28-2013
Your program is CPU bound and doesn't make any I/O so doesn't take that much advantage of the CMT architecture.

The difference in results might be due to the migration of threads from one core to another.

Have a look to this blog for a piece of code you could add for your threads to be bound to the same CPU during their execution:

https://blogs.oracle.com/d/entry/bin...rent_processor
This User Gave Thanks to jlliagre For This Post:
# 5  
Old 07-01-2013
Thanks a lot jlliagre for reply !!!

Your suggestion was very helpful for my analysis. I put the code you mentioned in the link into my test program. Results were actaully changed after that as it took " 107 second " to execute my program on high end server(i.e. 32 core, 4 sockets, 256 virtual cpus, 95 GB RAM, 14,1414 Mhz )and took 130 second to execute on another server (i.e 8 cores, 1 socket, 64 virtual cpu, 32 GB RAM,1165 Mhz).

However, I am still not able to understand , infact got more confused why after binding my test program with single CPU performance improves for multi-core multi-processor high end server.

I am also not able to understand how the migration of threads in case of multi-core multi-processor machine degrads the performance.
Could you please help to understand the reason for the same.

Thanks a lot for your time.

regards,
Sanjay Singh
# 6  
Old 07-01-2013
If a thread goes from one multicore to the other, the cache is empty. Often, everyting one writes, the other discards from cache. The may be similar problems with VM translation cache.

Last edited by DGPickett; 07-01-2013 at 02:12 PM..
This User Gave Thanks to DGPickett For This Post:
# 7  
Old 07-02-2013
Quote:
Originally Posted by sanjay_singh85
However, I am still not able to understand , infact got more confused why after binding my test program with single CPU performance improves for multi-core multi-processor high end server.
It takes time for processes to move around from CPU to CPU to CPU to CPU. Cache must be copied, RAM perhaps re-fetched. Prevent it from moving and these losses are minimized.
This User Gave Thanks to Corona688 For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to keep staying on remote server after executing a shell script with if then exit end statement?

i have a "if .. then exit end " in s shell script on remote servers. now the connection to the remote server got killed after i run this script on the remote servers. How do i run this script on remote hosts and still keep remote connections alive after executing the script. Thank you. (10 Replies)
Discussion started by: moonmonk
10 Replies

2. UNIX for Dummies Questions & Answers

Running a C/C++ program and/or bash script from a server

I wish to be able to give to a client the opportunity to : 0) Turn one of my ubuntu computers into a webserver 1) See a webpage after visiting a url where an external user/client can set a couple of variables (e.g. Number1= ?, Number2=?) 2) By pressing "run" the program runs on my machine 3)... (1 Reply)
Discussion started by: frad
1 Replies

3. UNIX for Advanced & Expert Users

Empty lines at the end of the payload generated in FTP server

Hi All, I am facing an issue in one of the use cases that I am trying to implement. I am getting a purchase order from one of the trading partners through Oracle B2B. B2B forwards this B2BM (B2B message ) to AIAB2BInterface. From AIAB2BInterface my BPEL process gets invoked, which in turn... (1 Reply)
Discussion started by: mayank2211
1 Replies

4. Solaris

SPARC Enterprise T5440 Server, can not power off

SPARC Enterprise T5440 Server, can not power off I connect to Service Processor via serial port and submit the following to boot the system, it boots fine but can not power off the system. Can someone help? The following takes me to unix login -> start /SYS -> start /SP/console... (6 Replies)
Discussion started by: paulk93
6 Replies

5. AIX

High Runqueue (R) LOW CPU LOW I/O Low Network Low memory usage

Hello All I have a system running AIX 61 shared uncapped partition (with 11 physical processors, 24 Virtual 72GB of Memory) . The output from NMON, vmstat show a high run queue (60+) for continous periods of time intervals, but NO paging, relatively low I/o (6000) , CPU % is 40, Low network.... (9 Replies)
Discussion started by: IL-Malti
9 Replies

6. Solaris

High I/O on Sun server running Oracle.

Hi, Currently we have a Sun Fire 480R running Solaris 9 and Oracle 9.2.0.8. The server is fibre attached to a NetApp FAS3070. Two separate 100GB LUNs are presented to the server. The two LUNs are mounted as the file systems data and logs for the Oracle database. We are seeing high I/O... (1 Reply)
Discussion started by: gwhelan
1 Replies

7. Shell Programming and Scripting

taking the end off a path

I need a script to be able to take a path such as "/foo/bar/thing" a put the "/foo/bar/" bit in one variable and the "thing" bit in another. I figured awk would probably be the best tool for the job but looking at the man page didn't seem to help. The only way i know how to use awk is with... (12 Replies)
Discussion started by: Nat
12 Replies

8. UNIX for Dummies Questions & Answers

running dos program from unix server

Hello, My apologies if this sounds like a stupid question...... but is it possible to call a script that is located on a w2k machine from a csh script bing run on a unix server. Many thanks rkap (1 Reply)
Discussion started by: rkap
1 Replies

9. UNIX for Dummies Questions & Answers

Script to Test Application Server is running

Hi, I'm a complete novice at Unix and need to create a script that does the following... checks to see if an application server is running. If the app is running then print 'Available' Else print 'Unavaliable' exit from scriopt I have no idea where to start. I'd be very grateful... (0 Replies)
Discussion started by: duglover
0 Replies
Login or Register to Ask a Question