Visit The New, Modern Unix Linux Community


Efficiently altering and merging files in perl


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Efficiently altering and merging files in perl
# 1  
Lightbulb Efficiently altering and merging files in perl

I have two files

Code:
fileA
HEADER LINE A
CommentLine A
Content A
....
....
....
TAILER A

Code:
fileB
HEADER LINE B
CommentLine B
Content B
....
....
....
TAILER B

I want to merge these two files as
Code:
HEADER LINE A
CommentLine A
Content A
....
....
....
Content B
....
....
....
TAILER B

i.e. skip the TAILER line of file A and skip the HEADER and Comment Line of fileB

I am able to do it using the below perl code
Code:
        open ( FA, "$fileA" ) || die("can't open fileA $!");
        open ( FB, "$fileB" ) || die("can't open fileB $!");
        open ( TMP, ">> tmp_file" ) || die("can't open tmp_file $!");

        #reading both files in array
        my @fileA = <FA>;
        my @fileB = <FB>;

        #getting rid of HEADER, Comment line, in fileB
        shift @fileB;
        shift @fileB;

        #getting rid of TAILER in fileA
        pop @fileA;

        my @tmp_file=(@fileA,@fileB);

        foreach ( @tmp_file ){
            print TMP $_;
        }

        close(FA);
        close(FB);
        close(TMP);
        
        rename tmp_file, fileA || die("can't rename tmp_file to fileA);

This code works fine, however I doubt it's efficiency if fileA and fileB are going to be millions of lines (which is the case)
i.e. why read whole file in arrays just to get rid of three lines (will end up using lots of memory)

Can someone suggest a more efficient way of doing this
(answers in perl only)

Last edited by sam05121988; 06-12-2014 at 04:15 AM..
# 2  
Do not load it in memory, try:
Code:
open ( FA, "$fileA" ) || die("can't open fileA $!");
open ( FB, "$fileB" ) || die("can't open fileB $!");
open ( TMP, ">> tmp_file" ) || die("can't open tmp_file $!");


my $line;
foreach  (<FA>)  {
    #getting rid of TAILER in fileA
    print TMP $line if $line;
    $line = $_;
    }

$count = 0; 
foreach  (<FB>)  {
    #getting rid of HEADER, Comment line, in fileB
    next if ($++counter < 2) ;
    print TMP $_;
    }
}

close(FA);
close(FB);
close(TMP);
rename tmp_file, fileA || die("can't rename tmp_file to fileA);

# 3  
Thanks KlashXX

Asking a little more help, can you please explain the below snippet

Code:
my $line; 
foreach  (<FA>)  
{     #getting rid of TAILER in fileA     
         print TMP $line if $line;
         $line = $_;
}

# 4  
Of course , the line variable only gets value after the first iteneration of the loop, in other words the first line is printed in the second iteration and so on.

Finally the penultimate line will be write in the last loop. The last line value will be never used.
# 5  
Thanks again

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #768
Difficulty: Medium
Smart cache, developed by AMD, is a level 2 or level 3 caching method for multiple execution cores
True or False?

10 More Discussions You Might Find Interesting

1. Programming

Altering a jar file

I have a script I am trying to test and run but it runs against a jar file. I wrote an external property file so it would redirect with my script, but it keeps going in search of the previous property file. Is there any way to externally over write the jar file and if not how do you go about... (7 Replies)
Discussion started by: risarose87
7 Replies

2. Shell Programming and Scripting

Altering a variable

Can I take an argument input, lets say it's, hg0000_xy1_v2, in the script it becomes f ... then hack off the end of the filename to change the variable to hg0000 only. I tried using sed but can't figure it out. f="$f" | sed 's/_fg_v//' I could change the variable label if necessary to... (4 Replies)
Discussion started by: scribling
4 Replies

3. Shell Programming and Scripting

Perl - multiple keys and merging two files

Hi, I'm not a regular coder but some times I write some basic perl script, hence Perl is bit difficult for me :). I'm merging two files a.txt and b.txt into c.txt: a.txt ------ x001;frtb70;xyz;109 x001;frvt65;sec;239 x003;wqax34;jul;659 x004;yhud43;yhn;760 b.txt ------... (8 Replies)
Discussion started by: Lokesha
8 Replies

4. Shell Programming and Scripting

merging two files

file1.txt 1 2 10 11 56 57 7 8 43 44 and let's suppose that there is a file called file2.txt with 100 columns I want to produce a file3.txt with columns specified in file1.txt in that order (1,2,10,11,56,57,7,8,43,44) Thanks! (2 Replies)
Discussion started by: johnkim0806
2 Replies

5. Shell Programming and Scripting

Algorithm to load files efficiently without missing or accidently archiving....

We have a requirement where we get the Delta Files in every one hour and we need to load them into Oracle database every one hour using Powercenter. To efficiently do this we need to build an File management system. Here is our process: we get 6 files for 6 tables with a timestamp appended... (2 Replies)
Discussion started by: okkadu
2 Replies

6. Shell Programming and Scripting

perl : merging two arrays on basis of common parameter

I have 2 arrays, @array1 contains records in the format 1|_|X|_|ssd|_| 4|_|H|_|hbd|_| 9|_|Y|_|u8gjdfg|_| @array2 contains records in the format X|_|asdf|_| Y|_|qwer|_| A|_|9kdkf|_| @array3 should contain records in the PLz X|_|ssd|_|asdf|_| Y|_|hdb|_|qwer|_| PLZ dont use... (2 Replies)
Discussion started by: centurion_13
2 Replies

7. Shell Programming and Scripting

Scripting question: Altering 2 field.

Hi Experts, I want to alter two filed of my data file: The _new should come to 2nd column, and _new to be removed from 4rth column, please advise, datafile.txt aa /dev/vgAA/lvol1 bb /dev/vgAA_new/lvol1 aa /dev/vgAA1/lvol2 bb /dev/vgAA1_new/lvol2 aa /dev/vgAC/lvol1 bb... (5 Replies)
Discussion started by: rveri
5 Replies

8. Shell Programming and Scripting

altering numbers in files

I want to change a number in a file into number -1.. for instance file_input is fdisdlf_s35 fdjsk_s27 fsdf_s42 jkljllljkkl_s57 ... etc now i want the output to be fdisdlf_s34 fdjsk_s26 fdsf_s41 jkljllljkkl_s56 ... etc I was think of using "sed -e 's/2/1/g' -e 's/3/2/g' -e... (4 Replies)
Discussion started by: bigboizvince
4 Replies

9. UNIX Desktop Questions & Answers

how to search files efficiently using patterns

hi friens, :) if i need to find files with extension .c++,.C++,.cpp,.Cpp,.CPp,.cPP,.CpP,.cpP,.c,.C wat is the pattern for finding them :confused: (2 Replies)
Discussion started by: arunsubbhian
2 Replies

10. Shell Programming and Scripting

Issue altering end data

I have an inventory program that I would like to have the ability to go and change or alter the field data based on the item number as a key. I have the menu option set but at the end of the script process it just appends the changed data to the database rather than what I would like; which is to... (5 Replies)
Discussion started by: stlitguru
5 Replies

Featured Tech Videos