Making things run faster


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users Making things run faster
# 1  
Old 10-17-2008
Making things run faster

I am processing some terabytes of information on a computer having 8 processors (each with 4 cores) with a 16GB RAM and 5TB hard drive implemented as a RAID. The processing doesn't seem to be blazingly fast perhaps because of the IO limitation.

I am basically running a perl script to read some data and then either modify it a little or grep something out of it and writing it back to disk. Could someone please tell me if there is a superior method I could use to improve performance?
# 2  
Old 10-17-2008
There is no "warp 9" button Smilie At least I haven't seen one yet. Smilie

That depends on your code efficiency and your settings for the OS. Different OS'es have different tuning options. No offense, but I guess it's primarily the code you use since the hardware sounds somewhat powerful. There is a lot of people here in the forum that are good at Perl - maybe you post your code if it is not tons of pages of code here with the fancy [ code ] and [ /code ] tags so they can give a small hint.

Also you can give a snippet of the input file and the desired output. Maybe people can just give an alternative.

For your current setting, have the time written down (use the "time" command in front of the line when you start the script) to compare it after tuning/using alternatives.
# 3  
Old 10-17-2008
Sure. Thank You so much. I am open to any advice as I am more interested in learning Smilie Please let me know if there is some obvious mistake I am doing.

My perl code is:

Code:
open (FILE, $ARGV[0]);
my %hTmp;

while (my $fileLine = <FILE>) {

        if($fileLine =~ /PREFIX/) {
                if(!($fileLine =~ /[:]{2}/)) {
                        $flag = 1;
                }
        }

        if($flag == 1) {
                if($fileLine =~ /ASPATH/) {
                        $fileLine =~ s/\n//;
                        @myarray = ($fileLine =~ m/([0-9]{3,5}\s)/g);

                        #Following removes prepending. Remove if you do not want it

                        undef %saw;
                        @out = grep(!$saw{$_}++, @myarray);

                        $temp = join("", @out);
                        $temp =~ s/^\s//;
                        $temp =~ s/^\s+//;
                        print $temp."\n" unless ($hTmp{$temp}++);
                        $flag = 0;
                }
        }

}

And a sample from the input file is:

Code:
TIME: 12/01/07 00:40:57
TYPE: TABLE_DUMP/INET
VIEW: 0
SEQUENCE: 1
PREFIX: 0.0.0.0/0
FROM:213.140.32.148 AS12956
ORIGINATED: 11/28/07 09:12:40
ORIGIN: IGP
ASPATH: 12956
NEXT_HOP: 213.140.32.148
STATUS: 0x1

TIME: 12/01/07 00:40:57
TYPE: TABLE_DUMP/INET
VIEW: 0
SEQUENCE: 2
PREFIX: 3.0.0.0/8
FROM:208.51.134.246 AS3549
ORIGINATED: 11/30/07 17:06:53
ORIGIN: IGP
ASPATH: 3549 701 703 80
NEXT_HOP: 208.51.134.246
MULTI_EXIT_DISC: 12653
COMMUNITY: 3549:2355 3549:30840
STATUS: 0x1

TIME: 12/01/07 00:40:57
TYPE: TABLE_DUMP/INET
VIEW: 0
SEQUENCE: 3
PREFIX: 3.0.0.0/8
FROM:209.161.175.4 AS14608
ORIGINATED: 11/30/07 13:43:49
ORIGIN: IGP
ASPATH: 14608 19029 3356 701 703 80
NEXT_HOP: 209.161.175.4
COMMUNITY: no-export
STATUS: 0x1

I want the ASPATHS corresponding to the IPv4 addresses in the input data. Please let me know of any obvious improvements if possible.
# 4  
Old 10-17-2008
Code:
 if($fileLine =~ /PREFIX/) {
                if($fileLine =~ /ASPATH/) {


This is something that I noticed when going through the code

Code:
 if($fileLine =~ /^PREFIX/) {
                if($fileLine =~ /^ASPATH/) {

Help the regex to help us Smilie

From the input file, both the literals PREFIX and ASPATH are at the start of the line ( at least in the examples provided ), so hint the perl regex by saying its at the start always.

Though its trivial, this will definitely improve the performance.

If it can appear anywhere in the line, please ignore the tip.
# 5  
Old 10-17-2008
And am interested to see by what percentage computational time came down, if that happens ? Smilie

>>>>>>>>>>>>>>>>

I have got one more suggestion, since there is more processing power best thing is to exploit them.

What could be done is ? - A master script whose only job is to read through the file and splits into chunks and assign it to the script you have written

With this multiple process would be doing the job instead of waiting for 1 job to complete the task.

I assume that there are no dependencies that file has to be processed only in sequential order and the only
aim is to process the file quickly.

If possible, I will post a sample code tonight Smilie

Last edited by matrixmadhan; 10-17-2008 at 09:16 AM.. Reason: Exploit processing power
# 6  
Old 10-17-2008
Hmm... that's an interesting idea! Smilie I'd love to try that out... Actually by the time I woke up, it was able to process some 1 TB (so that makes it 7 hours). I don't know how to formally write down the whole thing but I'll try:

Total Data Size: 2.2 TB (currently handling around 1TB though)
Special Info: The formats of the data were slightly different. There were a total of four data sets (lets call them DS):
DS1 & DS2: Format 1
  • Size of DS1: 556G RAW
  • Size of DS2: 105G Gzip Compressed

DS3 & DS4: Format 2
  • Size of DS3: 157G RAW
  • Size of DS4: 109G RAW

Further, there were two other tasks (extracting a 36G archive and copying some 10G worth data) running handled by a different processor (another computer in fact) on the disk of this main computer.

Tasks running Simultaneously:
  • Parsing DS1, DS2, DS3, DS4 and writing the result onto disk again
  • Extracting an archive on the same disk using a different computer on which the disk is mounted as a remote drive
  • Copying the gzipped files back into the main computer(if anyone has seen my other threads, yes, in fact these were the huge archives I was talking about converting into individual smaller archives Smilie )

As of now, I have finished parsing DS1, DS2, DS3 and DS4 but I am left with extracting the huge archives and then parsing the last DS5 which will be around 1.7TB uncompressed. I will perhaps run the optimization then.

Thanks for the advice and looking forward for a post from you.
# 7  
Old 10-17-2008
Added to that, I have a small question (not so sure if its silly though but can't seem to understand it completely)...

If I have four datasets like in the problem above and all I have to do is grep some text out of it, does it really make a difference doing the jobs parallely on all the datasets or doing them in a sequential order? In fact, to be more precise, the argument goes something like this:

Four datasets are stored on the disk. The CPU has to fetch some data everytime for the four processes to process them and write back to the disk. Now, if it has to provide data to all the four processes, then shouldn't the head keep moving around to provide the data as opposed to just one process where it just keeps reading the data (provided there is no fragmentation). As I said, I'm sorry if my question seems silly but just want to clear some basic concepts.
Login or Register to Ask a Question

Previous Thread | Next Thread

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Optimize shell script to run faster

data.file: contact { contact_name=royce-rolls modified_attributes=0 modified_host_attributes=0 modified_service_attributes=0 host_notification_period=24x7 service_notification_period=24x7 last_host_notification=0 last_service_notification=0 host_notifications_enabled=1... (8 Replies)
Discussion started by: SkySmart
8 Replies

2. Shell Programming and Scripting

Making a faster alternative to a slow awk command

Hi, I have a large number of input files with two columns of numbers. For example: 83 1453 99 3255 99 8482 99 7372 83 175 I only wish to retain lines where the numbers fullfil two requirements. E.g: =83 1000<=<=2000 To do this I use the following... (10 Replies)
Discussion started by: s052866
10 Replies

3. Shell Programming and Scripting

Making script run faster

Can someone help me edit the below script to make it run faster? Shell: bash OS: Linux Red Hat The point of the script is to grab entire chunks of information that concerns the service "MEMORY_CHECK". For each chunk, the beginning starts with "service {", and ends with "}". I should... (15 Replies)
Discussion started by: SkySmart
15 Replies

4. UNIX for Dummies Questions & Answers

things root can't do

Hey all my co-workers and I are trying to put together a list of things root "Can't" do on any *NIX OS, so I wanted to come here and see what all we could come up with. Here are two to start this off: write to a read only mount FS kill a tape rewind Please add what you know. Thanks,... (5 Replies)
Discussion started by: sunadmn
5 Replies

5. Shell Programming and Scripting

Can anyone make this script run faster?

One of our servers runs Solaris 8 and does not have "ls -lh" as a valid command. I wrote the following script to make the ls output easier to read and emulate "ls -lh" functionality. The script works, but it is slow when executed on a directory that contains a large number of files. Can anyone make... (10 Replies)
Discussion started by: shew01
10 Replies

6. Shell Programming and Scripting

When things doesn't run into crontab???

Could someone explain my problem? I've the following script... #! /bin/ksh ... vmquery -m $MediaID | awk ' BEGIN {FS=": " getline expdate <"ExpDate.txt" } $1 ~ /media ID/ {MediaNumber = $NF} ... $1 ~ /number of mounts/ { "date +%Y"|getline YearToday Year4 = YearToday - 4 if... (4 Replies)
Discussion started by: nymus7
4 Replies

7. Programming

Complicating things?

So basically what im trying to do is ... Open file, read that file, than try to find .. We or we and replace them with I, but not replace the cases where words contain We or we, such as Went, went, etc a and replace them with the, but not replace the cases where words contain a, such as... (1 Reply)
Discussion started by: bconnor
1 Replies

8. UNIX for Dummies Questions & Answers

making ssh run without password

Hello Everybody, Could anyone please tell me how to get ssh to work without asking for passwords? (i want to do a ssh <hostname> without getting a request for a password but getting connected straight away) I have attempted the following but to no avail :( ... I tried to generate a SSH... (5 Replies)
Discussion started by: rkap
5 Replies
Login or Register to Ask a Question