Delete block of text in one file based on list in another file

09-01-2009

Registered User

32, 0

Join Date: Jan 2009

Last Activity: 31 December 2013, 1:03 AM EST

Posts: 32

Thanks Given: 1

Thanked 0 Times in 0 Posts

Delete block of text in one file based on list in another file

Hi all

I currently use the following in shell.

Code:

#!/bin/sh

while read LINE
do
  perl -i -ne "$/ = ''; print if !m'Using archive: ${LINE}'ms;" "datafile"
done < "listfile"

NOTE the single quote delimiters in the expression. It's highly likely the 'LINE' may very well have characters in it that perl will try to interpolate. For example the '@'... See sample data.

I would like to reduce the overhead of the multi perl calls and do both loops in one perl call from within a shell.

So Inspired by this thread: Removing Lines if value exist in first file

And this bit of code from that thread:

Code:

my @a, %exclude;
my $file = shift;
open(EXCLUDE_LIST, "< $file") or die;
chomp( @a=<EXCLUDE_LIST> );
close(EXCLUDE_LIST);
@exclude{@a}=@a;

while (<>) {
    print unless exists $exclude{ (split(/,/))[3] };
}

I have been attempting to hack into submission without success!

HELP!

I like the idea of the hash, however that is way above my head and after many hours of pawing over this site and the perl man pages I have yet to even come close to figuring out how to use it!

If I understand the above with the use of the hash, that would limit the loop to one iteration for multiple matches!
Correct or Incorrect??

And if needed, The following is sample datafile, listfile and results.

Sample datafile: (first line is blank, last line is not)

Code:

Backup started: Sat Aug 22 05:15:00 EDT 2009, MyBackup v3.0.8
 Using archive: /mnt/Raid/test/Backup_20090822@051500.tbz
 Removed archive: /mnt/Raid/test/Backup_20090820@051500.tbz
Backup completed: 293,437,440 bytes in 131 seconds at 05:17:11 EDT

Backup started: Sun Aug 23 05:15:00 EDT 2009, MyBackup v3.0.8
 Using archive: /mnt/Raid/test/Backup_20090823@051500.tbz
 Removed archive: /mnt/Raid/test/Backup_20090821@051500.tbz
Backup completed: 224,477,184 bytes in 100 seconds at 05:16:40 EDT

Backup started: Mon Aug 24 05:15:00 EDT 2009, MyBackup v3.1.0
 Using archive: /mnt/Raid/test/Backup_20090824@051500.tbz
 Removed archive: /mnt/Raid/test/Backup_20090822@051500.tbz
Backup completed: 224,307,734 bytes in 99 seconds at 05:16:39 EDT

Backup started: Tue Aug 25 05:15:00 EDT 2009, MyBackup v3.1.0
 Using archive: /mnt/Raid/test/Backup_20090825@051500.tbz
 Removed archive: /mnt/Raid/test/Backup_20090823@051500.tbz
Backup completed: 237,993,204 bytes in 104 seconds at 05:16:44 EDT

Sample listfile: No blanks

Code:

/mnt/Raid/test/Backup_20090823@051500.tbz
/mnt/Raid/test/Backup_20090825@051500.tbz

Target Results: (first line is blank, last line is not)

Code:

Backup started: Sat Aug 22 05:15:00 EDT 2009, MyBackup v3.0.8
 Using archive: /mnt/Raid/test/Backup_20090822@051500.tbz
 Removed archive: /mnt/Raid/test/Backup_20090820@051500.tbz
Backup completed: 293,437,440 bytes in 131 seconds at 05:17:11 EDT

Backup started: Mon Aug 24 05:15:00 EDT 2009, MyBackup v3.1.0
 Using archive: /mnt/Raid/test/Backup_20090824@051500.tbz
 Removed archive: /mnt/Raid/test/Backup_20090822@051500.tbz
Backup completed: 224,307,734 bytes in 99 seconds at 05:16:39 EDT

Thanks

-Enjoy
fh : )_~

Last edited by Festus Hagen; 09-04-2009 at 12:13 AM..

Festus Hagen

View Public Profile for Festus Hagen

Find all posts by Festus Hagen

09-02-2009

Banned

947, 38

Join Date: Apr 2009

Last Activity: 30 July 2012, 5:38 AM EDT

Location: /usr/bin/vim

Posts: 947

Thanks Given: 13

Thanked 38 Times in 36 Posts

Following implementation can help you out...

Code:

open IN, "<t1";
while(<IN>)  {
	chomp;
	$hash{$_} = 1;
}
$i = 0;
while(<>)  {
	if ( $_ !~ /^$/ )  {
		#print ">$_";
		$array[$i++] = $_;
		next;
	}
	if ( $i != 0 )  {
		$key = (split / /, $array[1])[3];
		chomp($key);
		print join '', @array if defined $hash{$key};
		$i  = 0;
	}
}

thegeek

View Public Profile for thegeek

Find all posts by thegeek

09-03-2009

Registered User

544, 43

Join Date: Oct 2006

Last Activity: 27 March 2017, 3:00 AM EDT

Location: Belgium

Posts: 544

Thanks Given: 5

Thanked 43 Times in 29 Posts

Or in awk, this seems to produce the desired output with the given sample files:

Code:

awk 'FNR>=NR{a[" Using archive: "$0]=1;next}{RS=ORS="\n\n";FS="\n"}!a[$2]{print}' listfile datafile

I don't like too much the {RS=ORS="\n\n";FS="\n"} bloc that executes on every lines of the second file but I don't know how do avoid this. Can't do that variable assignment in the BEGIN bloc as the first file will then not be parsed correctly. Any idea?

Edit
The only thing I could think of is to flag the variable assignment so that when it is done once, skip it on the next line:

Code:

awk 'FNR>=NR{a[" Using archive: "$0]=1;next}!f{RS=ORS="\n\n";FS="\n";f=1}!a[$2]{print}' flle1 flie2

Last edited by ripat; 09-03-2009 at 03:59 AM..

ripat

View Public Profile for ripat

Find all posts by ripat

09-04-2009

Registered User

32, 0

Join Date: Jan 2009

Last Activity: 31 December 2013, 1:03 AM EST

Posts: 32

Thanks Given: 1

Thanked 0 Times in 0 Posts

Hi all,

Thanks for the responses ...

I have accomplished this with the following methods, however I have gone a step further with a third method...

Hopefully they help the next one in need!
The 3rd one is pretty specific to my needs.

Method 1 based on Removing Lines if value exist in first file post #4 by Azhrei, Thanks Azhrei

Code:

perl -i~ -e '  # -i~ for in-place editing with tilde backup file
  use strict;
  use warnings;
  my @a;
  my %excludehash;
  my $file = shift;
  open(excludelist, "< $file") or die;
  chomp( @a=<excludelist> );
  close(excludelist);
  @excludehash{@a}=@a;
  {
    local($/) = "";
    while (<>) {
      m/YOURKEY:\s+(.*)$/m;
      print unless exists $excludehash{ $1 }
    }
  }' "excludefile" "datafile"

Method 2 my own brew.

Code:

perl -i~ -e '  # -i~ for in-place editing with tilde backup file
  use strict;
  use warnings;
  my %excludehash;
  my $file = shift;
  open(my $excludelist, "<", $file) or die;
  while(<$excludelist>) {
    chomp;
    next if /^$/;
    $excludehash{ $_ } = $_;
  }
  close($excludelist);
  {
    local($/) = "";
    while (<>) {
      next if ( m/^YOURKEY:\s+(.*)$/m && $excludehash{ $1 } );
      print
    }
  }' "excludefile" "datafile"

Just for giggles I created a dummy datafile that was ~26M with 80,743 records ... each record consisted of at least 7 and up to 30 lines of text. After generating an excludefile of 20,271 records to be removed... I ran them both across it.

The speed is freaking incredible!
I didn't accurately time them, however it is done in less than 15 seconds! I was/am blown away by that!
Especially on my FBSD7.1R PIII-866!

Now from the results of that education I did the following!

What is in production for my needs...
Take a look at the sample data above and you will notice that Archive Maintenance removes old archives and logs them as "Removed archive: ... ..." ... There is the EXCLUDELIST!!

The following code reads the log file in the first while loop adding all the "Removed archive:" elements to a hash (removehash).
Then moves the file pointer back to the beginning of the log, and in the second while loop scrolls down though the records matching the removehash elements to the "Using archive: of each record... If there is a match skip it!

It's even wise to multiple 'Removed archive:' elements per record...
And is incredibly fast.

Code:

perl -i~ -e '  # -i~ for in-place editing with tilde backup file
    my %removehash;
    {
      local($/) = "";
      {
        while(<>) {
          while (m/^ Removed archive:\s+(.*)$/mg) {
            $removehash{ $1 } = $1;
          }
          last if (eof)
        }
      }
      seek(ARGV, 0, 0);
      {
        while (<>) {
          m/^ Using archive:\s+(.*)$/m;
          print unless exists $removehash{ $1 }
        }
      }
    }' "logfile"

-Enjoy
fh : )_~

Last edited by Festus Hagen; 09-04-2009 at 09:14 AM..

Festus Hagen

View Public Profile for Festus Hagen

Find all posts by Festus Hagen

Shell Programming and Scripting

Delete block of text in one file based on list in another file

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Deletion of list of user based on a text file In LDAP UNIX server

Discussion started by: Chand

2. UNIX for Dummies Questions & Answers

Delete records based on a text file from a text file

Discussion started by: tech_frk

3. Shell Programming and Scripting

How to delete lines of a text file based on another text file?

Discussion started by: prvnrk

4. UNIX for Dummies Questions & Answers

Extracting lines from a text file based on another text file with line numbers

Discussion started by: evelibertine

5. UNIX for Dummies Questions & Answers

print multiple lines from text file based on pattern list

Discussion started by: Oyster

6. UNIX for Dummies Questions & Answers

Script for replacing text in a file based on list

Discussion started by: phoenixjc

7. Shell Programming and Scripting

Parallel delete based flag from text file

Discussion started by: moses_a

8. Shell Programming and Scripting

Bash script to delete folder based on text file information

Discussion started by: bone11409

9. Shell Programming and Scripting

Delete line in file based on data in another file

Discussion started by: earth_goddess

10. Shell Programming and Scripting

i want to delete a file based on existing file in a directory

Discussion started by: srivsn