Delete block of text in one file based on list in another file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Delete block of text in one file based on list in another file
# 1  
Old 09-01-2009
Delete block of text in one file based on list in another file

Hi all

I currently use the following in shell.
Code:
#!/bin/sh

while read LINE
do
  perl -i -ne "$/ = ''; print if !m'Using archive: ${LINE}'ms;" "datafile"
done < "listfile"

NOTE the single quote delimiters in the expression. It's highly likely the 'LINE' may very well have characters in it that perl will try to interpolate. For example the '@'... See sample data.

I would like to reduce the overhead of the multi perl calls and do both loops in one perl call from within a shell.

So Inspired by this thread: Removing Lines if value exist in first file

And this bit of code from that thread:
Code:
my @a, %exclude;
my $file = shift;
open(EXCLUDE_LIST, "< $file") or die;
chomp( @a=<EXCLUDE_LIST> );
close(EXCLUDE_LIST);
@exclude{@a}=@a;

while (<>) {
    print unless exists $exclude{ (split(/,/))[3] };
}

I have been attempting to hack into submission without success!

HELP!

I like the idea of the hash, however that is way above my head and after many hours of pawing over this site and the perl man pages I have yet to even come close to figuring out how to use it!

If I understand the above with the use of the hash, that would limit the loop to one iteration for multiple matches!
Correct or Incorrect??

And if needed, The following is sample datafile, listfile and results.

Sample datafile: (first line is blank, last line is not)
Code:
Backup started: Sat Aug 22 05:15:00 EDT 2009, MyBackup v3.0.8
 Using archive: /mnt/Raid/test/Backup_20090822@051500.tbz
 Removed archive: /mnt/Raid/test/Backup_20090820@051500.tbz
Backup completed: 293,437,440 bytes in 131 seconds at 05:17:11 EDT

Backup started: Sun Aug 23 05:15:00 EDT 2009, MyBackup v3.0.8
 Using archive: /mnt/Raid/test/Backup_20090823@051500.tbz
 Removed archive: /mnt/Raid/test/Backup_20090821@051500.tbz
Backup completed: 224,477,184 bytes in 100 seconds at 05:16:40 EDT

Backup started: Mon Aug 24 05:15:00 EDT 2009, MyBackup v3.1.0
 Using archive: /mnt/Raid/test/Backup_20090824@051500.tbz
 Removed archive: /mnt/Raid/test/Backup_20090822@051500.tbz
Backup completed: 224,307,734 bytes in 99 seconds at 05:16:39 EDT

Backup started: Tue Aug 25 05:15:00 EDT 2009, MyBackup v3.1.0
 Using archive: /mnt/Raid/test/Backup_20090825@051500.tbz
 Removed archive: /mnt/Raid/test/Backup_20090823@051500.tbz
Backup completed: 237,993,204 bytes in 104 seconds at 05:16:44 EDT

Sample listfile: No blanks
Code:
/mnt/Raid/test/Backup_20090823@051500.tbz
/mnt/Raid/test/Backup_20090825@051500.tbz

Target Results: (first line is blank, last line is not)
Code:
Backup started: Sat Aug 22 05:15:00 EDT 2009, MyBackup v3.0.8
 Using archive: /mnt/Raid/test/Backup_20090822@051500.tbz
 Removed archive: /mnt/Raid/test/Backup_20090820@051500.tbz
Backup completed: 293,437,440 bytes in 131 seconds at 05:17:11 EDT

Backup started: Mon Aug 24 05:15:00 EDT 2009, MyBackup v3.1.0
 Using archive: /mnt/Raid/test/Backup_20090824@051500.tbz
 Removed archive: /mnt/Raid/test/Backup_20090822@051500.tbz
Backup completed: 224,307,734 bytes in 99 seconds at 05:16:39 EDT

Thanks

-Enjoy
fh : )_~

Last edited by Festus Hagen; 09-04-2009 at 12:13 AM..
# 2  
Old 09-02-2009
Following implementation can help you out...

Code:
open IN, "<t1";
while(<IN>)  {
	chomp;
	$hash{$_} = 1;
}
$i = 0;
while(<>)  {
	if ( $_ !~ /^$/ )  {
		#print ">$_";
		$array[$i++] = $_;
		next;
	}
	if ( $i != 0 )  {
		$key = (split / /, $array[1])[3];
		chomp($key);
		print join '', @array if defined $hash{$key};
		$i  = 0;
	}
}

# 3  
Old 09-03-2009
Or in awk, this seems to produce the desired output with the given sample files:

Code:
awk 'FNR>=NR{a[" Using archive: "$0]=1;next}{RS=ORS="\n\n";FS="\n"}!a[$2]{print}' listfile datafile

I don't like too much the {RS=ORS="\n\n";FS="\n"} bloc that executes on every lines of the second file but I don't know how do avoid this. Can't do that variable assignment in the BEGIN bloc as the first file will then not be parsed correctly. Any idea?


Edit
The only thing I could think of is to flag the variable assignment so that when it is done once, skip it on the next line:
Code:
awk 'FNR>=NR{a[" Using archive: "$0]=1;next}!f{RS=ORS="\n\n";FS="\n";f=1}!a[$2]{print}' flle1 flie2


Last edited by ripat; 09-03-2009 at 03:59 AM..
# 4  
Old 09-04-2009
Hi all,

Thanks for the responses ...

I have accomplished this with the following methods, however I have gone a step further with a third method...

Hopefully they help the next one in need!
The 3rd one is pretty specific to my needs.

Method 1 based on Removing Lines if value exist in first file post #4 by Azhrei, Thanks Azhrei
Code:
perl -i~ -e '  # -i~ for in-place editing with tilde backup file
  use strict;
  use warnings;
  my @a;
  my %excludehash;
  my $file = shift;
  open(excludelist, "< $file") or die;
  chomp( @a=<excludelist> );
  close(excludelist);
  @excludehash{@a}=@a;
  {
    local($/) = "";
    while (<>) {
      m/YOURKEY:\s+(.*)$/m;
      print unless exists $excludehash{ $1 }
    }
  }' "excludefile" "datafile"

Method 2 my own brew.
Code:
perl -i~ -e '  # -i~ for in-place editing with tilde backup file
  use strict;
  use warnings;
  my %excludehash;
  my $file = shift;
  open(my $excludelist, "<", $file) or die;
  while(<$excludelist>) {
    chomp;
    next if /^$/;
    $excludehash{ $_ } = $_;
  }
  close($excludelist);
  {
    local($/) = "";
    while (<>) {
      next if ( m/^YOURKEY:\s+(.*)$/m && $excludehash{ $1 } );
      print
    }
  }' "excludefile" "datafile"

Just for giggles I created a dummy datafile that was ~26M with 80,743 records ... each record consisted of at least 7 and up to 30 lines of text. After generating an excludefile of 20,271 records to be removed... I ran them both across it.

The speed is freaking incredible!
I didn't accurately time them, however it is done in less than 15 seconds! I was/am blown away by that!
Especially on my FBSD7.1R PIII-866!

Now from the results of that education I did the following!

What is in production for my needs...
Take a look at the sample data above and you will notice that Archive Maintenance removes old archives and logs them as "Removed archive: ... ..." ... There is the EXCLUDELIST!!

The following code reads the log file in the first while loop adding all the "Removed archive:" elements to a hash (removehash).
Then moves the file pointer back to the beginning of the log, and in the second while loop scrolls down though the records matching the removehash elements to the "Using archive: of each record... If there is a match skip it!

It's even wise to multiple 'Removed archive:' elements per record...
And is incredibly fast.
Code:
perl -i~ -e '  # -i~ for in-place editing with tilde backup file
    my %removehash;
    {
      local($/) = "";
      {
        while(<>) {
          while (m/^ Removed archive:\s+(.*)$/mg) {
            $removehash{ $1 } = $1;
          }
          last if (eof)
        }
      }
      seek(ARGV, 0, 0);
      {
        while (<>) {
          m/^ Using archive:\s+(.*)$/m;
          print unless exists $removehash{ $1 }
        }
      }
    }' "logfile"

-Enjoy
fh : )_~

Last edited by Festus Hagen; 09-04-2009 at 09:14 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Deletion of list of user based on a text file In LDAP UNIX server

Dear All, It would be really nice, if you could help me to write a script for deletion of list of user( more than 15000 users) stored in a file and sorted by email address( i need deletion of only a particular type of mail address). Is the any script to write and take the file as input and... (3 Replies)
Discussion started by: Chand
3 Replies

2. UNIX for Dummies Questions & Answers

Delete records based on a text file from a text file

Hi Folks, I am a novice and need to build a script in bash. I have 2 text files data.txt file is big file, column 2 is the we need to search and delete in the output. The filter file contains the rows to be deleted. Data.txt state city zone Alabama Huntsville 4 California SanDiego 3... (3 Replies)
Discussion started by: tech_frk
3 Replies

3. Shell Programming and Scripting

How to delete lines of a text file based on another text file?

I have 2 TXT files with with 8 columns in them(tab separated). First file has 2000 entries whereas 2nd file has 300 entries. The first file has ALL the lines of second file. Now I need to remove those 300 lines (which are in both files) from first file so that first file's line count become... (2 Replies)
Discussion started by: prvnrk
2 Replies

4. UNIX for Dummies Questions & Answers

Extracting lines from a text file based on another text file with line numbers

Hi, I am trying to extract lines from a text file given a text file containing line numbers to be extracted from the first file. How do I go about doing this? Thanks! (1 Reply)
Discussion started by: evelibertine
1 Replies

5. UNIX for Dummies Questions & Answers

print multiple lines from text file based on pattern list

I have a text file with a list of items/patterns: ConsensusfromCGX_alldays_trimmedcollapsedfilteredreadscontiglist(229095contigs)contig12238 ConsensusfromCGX_alldays_trimmedcollapsedfilteredreadscontiglist(229095contigs)contig34624... (1 Reply)
Discussion started by: Oyster
1 Replies

6. UNIX for Dummies Questions & Answers

Script for replacing text in a file based on list

Hi All, I am fairly new to the world of Unix, and I am looking for a way to replace a line of text in a file with a delimited array of values. I have an aliases file that is currently in use on our mail server that we are migrating off of. Until the migration is complete, the server must stay... (8 Replies)
Discussion started by: phoenixjc
8 Replies

7. Shell Programming and Scripting

Parallel delete based flag from text file

Hi, I need a unix shell script for this requirement and is URGENT My input text file contains A-1 B-1 C-1 D-2 E-2 F-3 G-3 H-3 I-3 J-4 K-4 L-5 My expected result should be: if flag is 1, it has to delete A, B, C if flag is 2, it has to delete D,E if flag is 3, it has to delete... (1 Reply)
Discussion started by: moses_a
1 Replies

8. Shell Programming and Scripting

Bash script to delete folder based on text file information

I have been working on a script to list all the name's of a subfolder in a text file then edit that text file and then delete the subfolder base on the edited text file so far I have been able to do every thing I just talked about but can't figure out how to delete the subfolers base on a text file... (8 Replies)
Discussion started by: bone11409
8 Replies

9. Shell Programming and Scripting

Delete line in file based on data in another file

Hi there I would like to create a shell script to do the following: - delete a line in file1 if it contains the data string in file2 eg: file1 1 100109942004051510601703694 0.00 0.00 2 100109942004051510601702326 0.00 0.00 3 ... (1 Reply)
Discussion started by: earth_goddess
1 Replies

10. Shell Programming and Scripting

i want to delete a file based on existing file in a directory

hi i am having four files in a directory.like 1)sampleRej 2)exampleRej 3)samplemain 4)examplemain my requirement is i have to search for the rejected files (sampleRej,exampleRej) in a directory.if these files in that directory then i have to delete the main files... (3 Replies)
Discussion started by: srivsn
3 Replies
Login or Register to Ask a Question