Sponsored Content
Top Forums Shell Programming and Scripting Efficiently altering and merging files in perl Post 302905511 by sam05121988 on Thursday 12th of June 2014 03:06:51 AM
Old 06-12-2014
Lightbulb Efficiently altering and merging files in perl

I have two files

Code:
fileA
HEADER LINE A
CommentLine A
Content A
....
....
....
TAILER A

Code:
fileB
HEADER LINE B
CommentLine B
Content B
....
....
....
TAILER B

I want to merge these two files as
Code:
HEADER LINE A
CommentLine A
Content A
....
....
....
Content B
....
....
....
TAILER B

i.e. skip the TAILER line of file A and skip the HEADER and Comment Line of fileB

I am able to do it using the below perl code
Code:
        open ( FA, "$fileA" ) || die("can't open fileA $!");
        open ( FB, "$fileB" ) || die("can't open fileB $!");
        open ( TMP, ">> tmp_file" ) || die("can't open tmp_file $!");

        #reading both files in array
        my @fileA = <FA>;
        my @fileB = <FB>;

        #getting rid of HEADER, Comment line, in fileB
        shift @fileB;
        shift @fileB;

        #getting rid of TAILER in fileA
        pop @fileA;

        my @tmp_file=(@fileA,@fileB);

        foreach ( @tmp_file ){
            print TMP $_;
        }

        close(FA);
        close(FB);
        close(TMP);
        
        rename tmp_file, fileA || die("can't rename tmp_file to fileA);

This code works fine, however I doubt it's efficiency if fileA and fileB are going to be millions of lines (which is the case)
i.e. why read whole file in arrays just to get rid of three lines (will end up using lots of memory)

Can someone suggest a more efficient way of doing this
(answers in perl only)

Last edited by sam05121988; 06-12-2014 at 04:15 AM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Issue altering end data

I have an inventory program that I would like to have the ability to go and change or alter the field data based on the item number as a key. I have the menu option set but at the end of the script process it just appends the changed data to the database rather than what I would like; which is to... (5 Replies)
Discussion started by: stlitguru
5 Replies

2. UNIX Desktop Questions & Answers

how to search files efficiently using patterns

hi friens, :) if i need to find files with extension .c++,.C++,.cpp,.Cpp,.CPp,.cPP,.CpP,.cpP,.c,.C wat is the pattern for finding them :confused: (2 Replies)
Discussion started by: arunsubbhian
2 Replies

3. Shell Programming and Scripting

altering numbers in files

I want to change a number in a file into number -1.. for instance file_input is fdisdlf_s35 fdjsk_s27 fsdf_s42 jkljllljkkl_s57 ... etc now i want the output to be fdisdlf_s34 fdjsk_s26 fdsf_s41 jkljllljkkl_s56 ... etc I was think of using "sed -e 's/2/1/g' -e 's/3/2/g' -e... (4 Replies)
Discussion started by: bigboizvince
4 Replies

4. Shell Programming and Scripting

Scripting question: Altering 2 field.

Hi Experts, I want to alter two filed of my data file: The _new should come to 2nd column, and _new to be removed from 4rth column, please advise, datafile.txt aa /dev/vgAA/lvol1 bb /dev/vgAA_new/lvol1 aa /dev/vgAA1/lvol2 bb /dev/vgAA1_new/lvol2 aa /dev/vgAC/lvol1 bb... (5 Replies)
Discussion started by: rveri
5 Replies

5. Shell Programming and Scripting

perl : merging two arrays on basis of common parameter

I have 2 arrays, @array1 contains records in the format 1|_|X|_|ssd|_| 4|_|H|_|hbd|_| 9|_|Y|_|u8gjdfg|_| @array2 contains records in the format X|_|asdf|_| Y|_|qwer|_| A|_|9kdkf|_| @array3 should contain records in the PLz X|_|ssd|_|asdf|_| Y|_|hdb|_|qwer|_| PLZ dont use... (2 Replies)
Discussion started by: centurion_13
2 Replies

6. Shell Programming and Scripting

Algorithm to load files efficiently without missing or accidently archiving....

We have a requirement where we get the Delta Files in every one hour and we need to load them into Oracle database every one hour using Powercenter. To efficiently do this we need to build an File management system. Here is our process: we get 6 files for 6 tables with a timestamp appended... (2 Replies)
Discussion started by: okkadu
2 Replies

7. Shell Programming and Scripting

merging two files

file1.txt 1 2 10 11 56 57 7 8 43 44 and let's suppose that there is a file called file2.txt with 100 columns I want to produce a file3.txt with columns specified in file1.txt in that order (1,2,10,11,56,57,7,8,43,44) Thanks! (2 Replies)
Discussion started by: johnkim0806
2 Replies

8. Shell Programming and Scripting

Perl - multiple keys and merging two files

Hi, I'm not a regular coder but some times I write some basic perl script, hence Perl is bit difficult for me :). I'm merging two files a.txt and b.txt into c.txt: a.txt ------ x001;frtb70;xyz;109 x001;frvt65;sec;239 x003;wqax34;jul;659 x004;yhud43;yhn;760 b.txt ------... (8 Replies)
Discussion started by: Lokesha
8 Replies

9. Shell Programming and Scripting

Altering a variable

Can I take an argument input, lets say it's, hg0000_xy1_v2, in the script it becomes f ... then hack off the end of the filename to change the variable to hg0000 only. I tried using sed but can't figure it out. f="$f" | sed 's/_fg_v//' I could change the variable label if necessary to... (4 Replies)
Discussion started by: scribling
4 Replies

10. Programming

Altering a jar file

I have a script I am trying to test and run but it runs against a jar file. I wrote an external property file so it would redirect with my script, but it keeps going in search of the previous property file. Is there any way to externally over write the jar file and if not how do you go about... (7 Replies)
Discussion started by: risarose87
7 Replies
bup-margin(1)						      General Commands Manual						     bup-margin(1)

NAME
bup-margin - figure out your deduplication safety margin SYNOPSIS
bup margin [options...] DESCRIPTION
bup margin iterates through all objects in your bup repository, calculating the largest number of prefix bits shared between any two entries. This number, n, identifies the longest subset of SHA-1 you could use and still encounter a collision between your object ids. For example, one system that was tested had a collection of 11 million objects (70 GB), and bup margin returned 45. That means a 46-bit hash would be sufficient to avoid all collisions among that set of objects; each object in that repository could be uniquely identified by its first 46 bits. The number of bits needed seems to increase by about 1 or 2 for every doubling of the number of objects. Since SHA-1 hashes have 160 bits, that leaves 115 bits of margin. Of course, because SHA-1 hashes are essentially random, it's theoretically possible to use many more bits with far fewer objects. If you're paranoid about the possibility of SHA-1 collisions, you can monitor your repository by running bup margin occasionally to see if you're getting dangerously close to 160 bits. OPTIONS
--predict Guess the offset into each index file where a particular object will appear, and report the maximum deviation of the correct answer from the guess. This is potentially useful for tuning an interpolation search algorithm. --ignore-midx don't use .midx files, use only .idx files. This is only really useful when used with --predict. EXAMPLE
$ bup margin Reading indexes: 100.00% (1612581/1612581), done. 40 40 matching prefix bits 1.94 bits per doubling 120 bits (61.86 doublings) remaining 4.19338e+18 times larger is possible Everyone on earth could have 625878182 data sets like yours, all in one repository, and we would expect 1 object collision. $ bup margin --predict PackIdxList: using 1 index. Reading indexes: 100.00% (1612581/1612581), done. 915 of 1612581 (0.057%) SEE ALSO
bup-midx(1), bup-save(1) BUP
Part of the bup(1) suite. AUTHORS
Avery Pennarun <apenwarr@gmail.com>. Bup unknown- bup-margin(1)
All times are GMT -4. The time now is 05:54 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy