Removing multiple lines from input file, if multiple lines match a pattern.

09-28-2015

Registered User

3, 0

Join Date: Sep 2015

Last Activity: 5 April 2016, 11:20 AM EDT

Posts: 3

Thanks Given: 2

Thanked 0 Times in 0 Posts

Removing multiple lines from input file, if multiple lines match a pattern.

GM,

I have an issue at work, which requires a simple solution. But, after multiple attempts, I have not been able to hit on the code needed.

I am assuming that sed, awk or even perl could do what I need.

I have an application that adds extra blank page feeds, for multiple reports, when sending these print jobs to CUPS.

Unfortunately, it would be very time consuming to have each of the developers go back and remove these extra page feeds, through hundreds of different reports, not to mention that would require QA, Release Management, and official patches for all of our customers.

Here's what I've done already.

I am intercepting the raw print jobs when they come into CUPS, as plain text PostScript files.

I have identified the actual lines that need to be removed from each job.

If I edit the file manually, and remove the lines manually, the subsequent printout works perfectly, with the blank pages suppressed.

But, I've been trying to use normal KSH or BASH to programatically remove the offending lines.

Unfortunately, in doing so, the script removes ALL of the necessary hidden formatting / special characters.

If I edit the file by hand, and just remove these offending lines, the document prints perfectly.

If I run the PostScript file through my "filter" script, it removes all of the special characters, and the postscript data itself is sent to the printer.

Let's say hypothetically that the following 4 lines are found. These same 4 lines can occur at different places inside of the postscript job, but normally, they will have data inserted between rows.

If these same 4 lines occur, sequentially, right after each other, without any other data, then I want to remove them:

abc
def
ghi
jkl

It is important that I must not remove any special formatting or characters from any previous or future lines within the file.

Thanks in advance,

JCF

jxfish2

View Public Profile for jxfish2

Find all posts by jxfish2

09-28-2015

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

A real although abbreviated sample would be utmost helpful.

RudiC

View Public Profile for RudiC

Find all posts by RudiC

09-28-2015

Registered User

6,575, 572

Join Date: Sep 2007

Last Activity: 5 November 2019, 9:08 AM EST

Location: St. Gallen, Switzerland

Posts: 6,575

Thanks Given: 179

Thanked 572 Times in 484 Posts

Not sure if I understood it exactly - did you try something like:

Code:

grep -vE "abc|def|ghi|jkl" infile
# or more precise
grep -vE "^(abc|def|ghi|jkl)$" infile

? Should the file be edited in place or is a temp file ok?

zaxxon

View Public Profile for zaxxon

Find all posts by zaxxon

09-28-2015

Registered User

3, 0

Join Date: Sep 2015

Last Activity: 5 April 2016, 11:20 AM EDT

Posts: 3

Thanks Given: 2

Thanked 0 Times in 0 Posts

Zaxxon reply

Hi Zaxxon,

Unfortunately, you did not understand the issue.

I am not looking for 4 different strings, each on it's own line. Grep -e, or egrep, would work fine for this.

I am searching for these 4 lines, together, when they appear back to back.

In order for the condition to be true, all 4 lines must exist, exactly as seen below.

In the pattern match, I need to search for something like this:

Code:

  sed -e s/"abc\ndef\nghi\njkl\n"/""/g

I also tried:

Code:

sed -e s/"abc\rdef\rghi\rjkl\r"/""/g

I also thought about using "tr" to delete the matching strings, but I'm still having an issue matching the 4 lines, to include their special characters. i.e. Line Feeds or Carriage Returns.

Unfortunately, either I'm using the wrong carriage return characters, or something is wrong with my systax.

Basically, each time the above 4 lines occur, back to back, on separate lines, I need to remove all 4 lines.

There will be times when the 4 lines will appear, where they have some other entries in the middle, such as:

Code:

     abc
          e2c422 a12652 
     def
     ghi
     jkl

Note that if there are ANY characters or data of any kind between, or in the middle of the 4 line pattern, those are valid data lines, and must not be removed.

Only when the 4 lines appear, back to back, with NOTHING else between them, or appended to them, do they need to be removed.

I hope this helps to clarify the issue.

JCF

Moderator's Comments:

Use code tags, thanks.

Last edited by zaxxon; 09-28-2015 at 11:04 AM.. Reason: code tags

jxfish2

View Public Profile for jxfish2

Find all posts by jxfish2

09-28-2015

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Try

Code:

awk '
$1 == "abc"     {getline L1
                 getline L2
                 getline L3
                 if (L1!="def" ||
                     L2!="ghi" ||
                     L3!="jkl") {print
                                 print L1
                                 print L2
                                 print L3
                                }
                 next
                }
1
' file

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

09-28-2015

Registered User

1,416, 266

Join Date: Sep 2013

Last Activity: 13 January 2021, 9:37 AM EST

Location: Swissh

Posts: 1,416

Thanks Given: 328

Thanked 266 Times in 239 Posts

Try:

Code:

SEARCH="abc
def
ghi
jkl"
sed s,"$SEARCH","",g

hth

This User Gave Thanks to sea For This Post:

sea

View Public Profile for sea

Find all posts by sea

09-28-2015

Registered User

3, 0

Join Date: Sep 2015

Last Activity: 5 April 2016, 11:20 AM EDT

Posts: 3

Thanks Given: 2

Thanked 0 Times in 0 Posts

Thank you for the potential solutions.

Unfortunately, this has been a very hectic morning so far (Mondays), and I have not had a chance to test either of the proposed solutions yet.

I promise, I will try to test with both solutions very, very soon.

Thanks again for your help and input.

JCF

jxfish2

View Public Profile for jxfish2

Find all posts by jxfish2

Shell Programming and Scripting

Removing multiple lines from input file, if multiple lines match a pattern.

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Removing carriage returns from multiple lines in multiple files of different number of columns

Discussion started by: dJHa

2. Shell Programming and Scripting

Remove multiple lines that match pattern

Discussion started by: mrlayance

3. Shell Programming and Scripting

Match Pattern and print pattern and multiple lines into one line

Discussion started by: tigerhills

4. Shell Programming and Scripting

Extract a pattern from multiple lines in a file

Discussion started by: Viernes

5. Shell Programming and Scripting

Awk match multiple columns in multiple lines in single file

Discussion started by: jacobs.smith

6. Shell Programming and Scripting

shell script: grep multiple lines after pattern match

Discussion started by: mirfan

7. Shell Programming and Scripting

Perl: Printing Multiple Lines after pattern match

Discussion started by: Deep9000

8. UNIX for Dummies Questions & Answers

removing multiple lines of text in a file

Discussion started by: spartan22

9. Shell Programming and Scripting

removing pattern which is spread in multiple lines

Discussion started by: sabyasm

10. Shell Programming and Scripting

Concatenating multiple lines to one line if match pattern

Discussion started by: phixsius