Unix/Linux Go Back    


Linux RedHat, Ubuntu, SUSE, Fedora, Debian, Mandriva, Slackware, Gentoo linux, PCLinuxOS. All Linux questions here!

Inconsistency with parallel run

Linux


Reply    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 10-30-2017   -   Original Discussion by arunkumar_mca
arunkumar_mca's Unix or Linux Image
arunkumar_mca arunkumar_mca is offline
Registered User
 
Join Date: Oct 2004
Last Activity: 6 April 2018, 2:36 PM EDT
Posts: 416
Thanks: 81
Thanked 2 Times in 2 Posts
Inconsistency with parallel run

Hi All,

I am running a parallel processing on aggregating a file. I am splitting the process into 7 separate parallel process and processing the same input file and the process will do the same for each 7 run. The issue I am having is for some reason the 1st parallel processes complete first with minimum time and the second complete as second and so on. Each process completion having significant difference in time.

I tried to look CPU usage when process with top command all the process is occupying 97% of CPU not sure why there is a difference between each parallel run.

Is there a way I can trace the process and find whether it is problem with IO/ Memory or CPU.

Note: Each process will read the same file from NAS mount and do the aggregation. I am using RedHAT

Thanks
Arun
Sponsored Links
    #2  
Old Unix and Linux 10-30-2017   -   Original Discussion by arunkumar_mca
RudiC's Unix or Linux Image
RudiC RudiC is offline Forum Staff  
Moderator
 
Join Date: Jul 2012
Last Activity: 20 April 2018, 12:03 PM EDT
Location: Aachen, Germany
Posts: 12,497
Thanks: 401
Thanked 3,872 Times in 3,560 Posts
More details, please.
That file almost certainly will be buffered locally when being accessed from NAS, so the first process should take longest. Will the file be updated / written back? Per process? Are the processes doing identical operations on the file? Do these influence each other? How do user and system times compare between processes? Do you have lock information available?
Sponsored Links
    #3  
Old Unix and Linux 10-30-2017   -   Original Discussion by arunkumar_mca
Corona688's Unix or Linux Image
Corona688 Corona688 is offline Forum Staff  
Mead Rotor
 
Join Date: Aug 2005
Last Activity: 20 April 2018, 11:12 AM EDT
Location: Saskatchewan
Posts: 22,635
Thanks: 1,172
Thanked 4,306 Times in 3,972 Posts
Why would they be identical? Especially if they're I/O bound. More details needed.
    #4  
Old Unix and Linux 10-30-2017   -   Original Discussion by arunkumar_mca
arunkumar_mca's Unix or Linux Image
arunkumar_mca arunkumar_mca is offline
Registered User
 
Join Date: Oct 2004
Last Activity: 6 April 2018, 2:36 PM EDT
Posts: 416
Thanks: 81
Thanked 2 Times in 2 Posts
No we are not wring the file . Basically we are reading the file from NAS and then comparing the same file with qualified records and doing the aggregation.

The file in NAS is the full set. Where it have details about the customer and the file we will be compared will be SAN. Which have the transaction record of customer. The file from NAS will be compared with the transaction record and then the aggrigation happens. We are splitting that into 7 parallel so that we can achieve performance.
Sponsored Links
    #5  
Old Unix and Linux 10-30-2017   -   Original Discussion by arunkumar_mca
joeyg's Unix or Linux Image
joeyg joeyg is offline Forum Staff  
modérateur
 
Join Date: Dec 2007
Last Activity: 19 April 2018, 10:18 AM EDT
Location: Within two miles of a Dunkin donuts.
Posts: 2,480
Thanks: 142
Thanked 209 Times in 184 Posts
Are the seven splits identical? Sounds like you are breaking a transaction file into seven parts to lookup against a master file.
Because I cannot fathom any reason to do the same thing seven times - my only guess is that you are doing it seven times BUT with different data elements.
Please provide more details.
Sponsored Links
    #6  
Old Unix and Linux 10-30-2017   -   Original Discussion by arunkumar_mca
arunkumar_mca's Unix or Linux Image
arunkumar_mca arunkumar_mca is offline
Registered User
 
Join Date: Oct 2004
Last Activity: 6 April 2018, 2:36 PM EDT
Posts: 416
Thanks: 81
Thanked 2 Times in 2 Posts
Below is what I traced back. Basically there will be huge file we are processing that in parallel . The file transaction_data.dat will be compared with the spend.dat. The file spend is a small file. We will match the transaction between these file and do the aggregation

I made the transaction_data.dat in SAN . Even with that I am seeing the first parallel process is taking less time and the process time increase with the split going on

Below is the log on the process. I see the process split the file almost into equal split but not sure why the process different between each parallel run
Quote:
Process 1:
(21899) Total process time = 102.550
(21899) Final Elapsed time = 103.000
(21899) Position Start 0Position End 4700904

Process2:
(21900) Total process time = 193.660
(21900) Final Elapsed time = 195.000
(21900) Position Start 4700904Position End 9401808

Process 3:
(21901) Total process time = 300.220
(21901) Final Elapsed time = 303.000
(21901) Position Start 9401808Position End 14102218

Process 4:
(21902) Total process time = 333.180
(21902) Final Elapsed time = 337.000
(21902) Position Start 14102218Position End 18802628

Process 5:
(21903) Total process time = 379.340
(21903) Final Elapsed time = 383.000
(21903) Position Start 18802628Position End 23504026

Process 6:
(21904) Total process time = 423.610
(21904) Final Elapsed time = 428.000
(21904) Position Start 23504026Position End 28204436

Process 7:
(21905) Total process time = 411.130
(21905) Final Elapsed time = 415.000
(21905) Position Start 28204436Position End 32905093

Process 8:
(21906) Total process time = 532.900
(21906) Final Elapsed time = 538.000

Last edited by arunkumar_mca; 10-31-2017 at 11:03 AM..
Sponsored Links
    #7  
Old Unix and Linux 11-02-2017   -   Original Discussion by arunkumar_mca
Corona688's Unix or Linux Image
Corona688 Corona688 is offline Forum Staff  
Mead Rotor
 
Join Date: Aug 2005
Last Activity: 20 April 2018, 11:12 AM EDT
Location: Saskatchewan
Posts: 22,635
Thanks: 1,172
Thanked 4,306 Times in 3,972 Posts
Quote:
Originally Posted by arunkumar_mca View Post
Even with that I am seeing the first parallel process is taking less time and the process time increase with the split going on
Once you've maxed out your I/O bandwidth, adding more processes will just make a task slower. How many processes it takes to max out your I/O bandwidth could well be "one". Spinning disks especially lose a lot of bandwidth when split between competing tasks.

Beyond that, it's difficult to say what's happening. We still don't know what you're doing. "Processing" is a fine word but tells us little.
The Following User Says Thank You to Corona688 For This Useful Post:
arunkumar_mca (11-06-2017)
Sponsored Links
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
file system inconsistency venikathir Red Hat 3 11-24-2011 07:53 AM
Inconsistency between passwd and group Norgaard UNIX for Dummies Questions & Answers 1 07-01-2011 10:09 AM
'find' command inconsistency soleil4716 Shell Programming and Scripting 3 06-12-2010 02:02 AM
Variable value inconsistency on BASH and CSH pavanlimo Shell Programming and Scripting 2 04-08-2009 11:48 AM
Disk inconsistency Carmen123 UNIX for Dummies Questions & Answers 1 11-20-2006 10:10 AM



All times are GMT -4. The time now is 12:24 PM.