![]() |
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.
|
|
google unix.com
|
|||||||
| Forums | Register | Forum Rules | Links | Albums | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| High Performance Computing Message Passing Interface (MPI) programming and tuning, MPI library installation and management, parallel administration tools, cluster monitoring, cluster optimization, and more HPC topics. |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| MySQL Cluster - Designing, Evaluating and Benchmarking (reg. req'd) | iBot | High Performance Computing | 0 | 09-30-2008 01:50 PM |
| Building a Solaris Cluster Express cluster in a VirtualBox on OpenSolaris | iBot | High Performance Computing | 0 | 09-09-2008 10:55 AM |
| IOzone for filesystem performance benchmarking | iBot | UNIX and Linux RSS News | 0 | 07-03-2008 04:20 AM |
| HP-Unix Hardware benchmarking | dgatkal | HP-UX | 0 | 12-11-2006 01:50 AM |
| Server and Workstation benchmarking | Sergiu-IT | UNIX Benchmarks | 0 | 03-29-2005 01:40 PM |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread |
Rating:
|
Display Modes |
|
|
|
||||
|
Benchmarking a Beowulf Cluster
Hi guys. I am trying to test my universities cluster using the Intel Linpack Benchmarking software. Let me say from the get-go that i am a Linux novice, and only recently learnt some Linux commands to have a play around, so if you can please keep the language simple
![]() I have run a Linpack test as can be seen from the below terminal copy and paste. However, the test that i have currently run, i believe is just testing the avg GFLOPS for the current node i am on. What if i want to test the strength of a cluster of 3 nodes all working together? Would anyone happen to know what i would need to type in the command line in order to run a test on 3 nodes that i have logged into (all clustered by the university already). The nodes are all Xeon 64's, with 4 Gigs of ram each. Would really appreciate any help. Code:
[******@beowulf linpack]$ ./xlinpack_xeon64 Input data or print help ? Type [data]/help : Number of equations to solve (problem size): 20000 Leading dimension of array: 20000 Number of trials to run: 4 Data alignment value (in Kbytes): 4 Current date/time: Wed Apr 29 13:50:57 2009 CPU frequency: 3.400 GHz Number of CPUs: 4 Number of threads: 4 Parameters are set to: Number of tests : 1 Number of equations to solve (problem size) : 20000 Leading dimension of array : 20000 Number of trials to run : 4 Data alignment value (in Kbytes) : 4 Maximum memory requested that can be used = 3200404096, at the size = 20000 ============= Timing linear equation system solver ================= Size LDA Align. Time(s) GFlops Residual Residual(norm) Error: info returned = 1 20000 20000 4 237.211 22.4869 4.455000e-10 3.943651e-02 Error: info returned = 1 20000 20000 4 236.686 22.5368 4.455000e-10 3.943651e-02 Error: info returned = 1 20000 20000 4 235.285 22.6710 4.455000e-10 3.943651e-02 Error: info returned = 1 20000 20000 4 237.404 22.4686 4.455000e-10 3.943651e-02 Performance Summary (GFlops) Size LDA Align. Average Maximal 20000 20000 4 22.5408 22.6710 End of tests Last edited by Neo; 05-11-2009 at 04:15 PM.. Reason: code tags |
|
|||||
|
First, you should (a) install the ATLAS and scalapak libraries, and make sure these are on each node. Second, you need to install one of the MPI packages (OpenMPI, LAMMPI, MPICH, etc); the run-times need to be on each node, and the compiler libraries and tools need to be on one node. Third, you need to recompile for MPI and ATLAS. I believe linpack uses a configure script in which you tell it to use MPI or something like that. Fourth, for these benchmarks, you should disable Linux's swap; this ensures the linpack doesn't start swapping and killing performance. (Do this with sysctl vm.swappiness=0" and after "=1") (If it runs out of memory, the problem size is too large, and the process fails.)
Next, start out with a simple test to make sure your hpl + mpi setup is working. You'll need a dummy config file like this: Code:
Our cluster benchmark My university lab HPL.out 6 1 400 1 50 1 1 4 3 -1 1 # of panel fact 0 1 # of recursive stopping criterium 2 1 # of panels in recursion 2 1 # of recursive panel fact. 0 1 # of Bcasts 1 1 # of Lookahead depths 0 2 60 0 0 0 8 # alignment of double Once you have that working, you're ready for tuning the HPL suite: run a series of tests, each with a different configuration. One configuration file does this. The linpack program permutes all possible combinations of parameters within the file, and runs one test on each permutation. A quide to this format can be found here, but here's what I suggest you start with: Code:
Our cluster benchmark My university lab HPL.out 6 7 100 200 400 800 1600 3200 6400 5 50 100 150 200 250 1 4 12 4 3 1 1 3 4 12 -1 3 # of panel fact 0 1 2 PFACTs (0=left, 1=Crout, 2=Right) 4 # of recursive stopping criterium 1 2 4 8 NBMINs (>= 1) 3 # of panels in recursion 2 3 4 NDIVs 3 # of recursive panel fact. 0 1 2 RFACTs (0=left, 1=Crout, 2=Right) 5 # of Bcasts 0 1 2 3 4 5 3 # Lookaheads 0 1 2 2 # SWAP type 60 # SWAP=2 threshold 0 0 1 8 After this, look for the top 8 or 16 results, and refine the config file to use only the parameters that produced these results. NOW you can start performance tuning the cluster. Most critically, you will want to (a) tune the TCP/IP kernel parameters, (b) disable all non-essential Linux processes on all nodes, and (c) tune the switch parameters for the cluster ports -- ie, disable auto-negotiate and maybe tune the messaging queues (some switches use different types of service and have small queues for each one; you want one large queue for all TOS). |
|
||||
|
Thanks for the response otheus.
Everything seems to be working except the tuning of the HPL.dat. I keep getting processor errors such as: HPL ERROR from process # 0, on line 419 of function HPL_pdinfo: >>> Need at least 8 processes for these tests <<< HPL ERROR from process # 0, on line 621 of function HPL_pdinfo: >>> Illegal input in file HPL.dat. Exiting ... < That is trying to run it on 8 cores across 2 nodes. I have also tried your HPL.dat you provided, and i get a similar error except it says Need at least 12 processes. Do you know what causes these errors. I have a hosts file in the same directory with the names of the two nodes which i wish to run the tests on. At the command line i am typing: mpirun -np 8 -machinefile hosts xhpl_em64t where hosts file has the names: machine1 machine2 With each machine being a 3ghz QX6850 Core 2 Extreme (Quad Core), 4GB RAM. The dat file being uses for two nodes is: Code:
HPLinpack benchmark input file Innovative Computing Laboratory, University of Tennessee HPL.out output file name (if any) 8 device out (6=stdout,7=stderr,file) 1 # of problems sizes (N) 29184 Ns 1 # of NBs 128 NBs 0 PMAP process mapping (0=Row-,1=Column-major) 1 # of process grids (P x Q) 2 Ps 4 Qs 16.0 threshold 1 # of panel fact 2 PFACTs (0=left, 1=Crout, 2=Right) 1 # of recursive stopping criterium 4 NBMINs (>= 1) 1 # of panels in recursion 2 NDIVs 1 # of recursive panel fact. 1 RFACTs (0=left, 1=Crout, 2=Right) 1 # of broadcast 1 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) 1 # of lookahead depth 1 DEPTHs (>=0) 2 SWAP (0=bin-exch,1=long,2=mix) 64 swapping threshold 0 L1 in (0=transposed,1=no-transposed) form 0 U in (0=transposed,1=no-transposed) form 1 Equilibration (0=no,1=yes) 8 memory alignment in double (> 0) ##### This line (no. 32) is ignored (it serves as a separator). ###### 0 Number of additional problem sizes for PTRANS 1200 10000 30000 values of N 0 number of additional blocking sizes for PTRANS 40 9 8 13 13 20 16 32 64 values of NB Hoping someone could please help. Thanks. Last edited by Neo; 05-11-2009 at 04:16 PM.. Reason: code tags |
|
||||
|
Quote:
HPL ERROR from process # 0, on line 621 of function HPL_pdinfo: >>> Illegal input in file HPL.dat. Exiting ... <<< I am using 2 nodes including the head node, so 2 in total. Each of these nodes is a quad core system. So my machine file has this in it: machine1 machine1 machine1 machine1 machine2 machine2 machine2 machine2 The command execution line i am typing in is: mpirun -np 8 -machinefile hosts xhpl_em64t p*q = 8 from my HPL.dat file, where p = 2 , and q = 4. Yet still i am getting that error. Would you happen to know what else could be wrong? Thanks. |
|
||||
|
Otheus, thank you so much for your responses. I can't wait to test that out when i get to university on Monday. I think my mistake is that i have not put down a name in the machines file for each core.
Hopefully this should work. Sorry for the double post, and will post back to let you know how it goes. |
|
|||||
|
You are getting a different error.
Code:
HPL ERROR from process # 0, on line 621 of function HPL_pdinfo: >>> Illegal input in file HPL.dat. Exiting ... <<< |
![]() |
| Bookmarks |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|