Remove sections based on duplicate first line

01-16-2015

Registered User

306, 2

Join Date: Aug 2005

Last Activity: 16 July 2017, 12:05 PM EDT

Location: Bangalore

Posts: 306

Thanks Given: 10

Thanked 2 Times in 2 Posts

Remove sections based on duplicate first line

Hi,

I have a file with many sections in it. Each section is separated by a blank line.
The first line of each section would determine if the section is duplicate or not.
if the section is duplicate then remove the entire section from the file.

below is the example of input and output. Wherein, the lines starting with *& is the first line and there are 2 sections with the same first line. I need to delete one of them.

Code:

Input:
*& abc def
1
2
3
4
5

*& cde efg
1
2
3

*& abc def
1
2
3
4
5

Code:

Output:
*& cde efg
1
2
3

*& abc def
1
2
3
4
5

Thanks for your help!!

ahmedwaseem2000

View Public Profile for ahmedwaseem2000

Find all posts by ahmedwaseem2000

01-16-2015

Registered User

503, 195

Join Date: Sep 2013

Last Activity: 22 January 2021, 1:52 PM EST

Location: France

Posts: 503

Thanks Given: 43

Thanked 195 Times in 176 Posts

Hello,
If order out of sections is not important, with (gnu) awk:

Code:

awk 'BEGIN{RS='\n\n'};{A[$0]=1};END{for (h in A) print h,"\n"}' file

Regards.

This User Gave Thanks to disedorgue For This Post:

disedorgue

View Public Profile for disedorgue

Find all posts by disedorgue

01-16-2015

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

That works if DOS <CR> line terminators are removed from the input file. Try also

Code:

awk '/^\*\&/ {STOP=($0 in T); T[$0]} /^ *$/ {STOP=0} !STOP' file

Last edited by RudiC; 01-16-2015 at 04:17 PM.. Reason: removed the "4" from file name

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

01-16-2015

Registered User

306, 2

Join Date: Aug 2005

Last Activity: 16 July 2017, 12:05 PM EDT

Location: Bangalore

Posts: 306

Thanks Given: 10

Thanked 2 Times in 2 Posts

Quote:

Originally Posted by disedorgue

Hello,
If order out of sections is not important, with (gnu) awk:

Code:

awk 'BEGIN{RS='\n\n'};{A[$0]=1};END{for (h in A) print h,"\n"}' file

Regards.

Thanks for your help. your code worked fine. I had already tried similar code but the difference was I didn't set RS, and instead of A[$0] =1 I assigned A[$0]=$0 and the array was getting jumbled up. Do you know the reason?

Rudic - I dont quite understand this code. can you please help me understand?

Code:

awk '/^\*\&/ {STOP=($0 in T); T[$0]} /^ *$/ {STOP=0} !STOP' file4

Thank you both for your help!!

ahmedwaseem2000

View Public Profile for ahmedwaseem2000

Find all posts by ahmedwaseem2000

01-16-2015

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Code:

awk '/^\*\&/ {STOP=($0 in T)            # if header (identified by *&) is known, stop the printing
              T[$0]                     # remember the header line next time
             } 
     /^ *$/  {STOP=0}                   # empty line: reenable printing
     !STOP                              # use default action: print, if NOT STOPped
    ' file

RudiC

View Public Profile for RudiC

Find all posts by RudiC

01-16-2015

Registered User

503, 195

Join Date: Sep 2013

Last Activity: 22 January 2021, 1:52 PM EST

Location: France

Posts: 503

Thanks Given: 43

Thanked 195 Times in 176 Posts

By default, Record Separator is one '\n' that represent end of line, if RS is set to '\n\n', for awk, one record (line) is terminate by '\n\n'.
With this way, one line is one section.

Regards.

disedorgue

View Public Profile for disedorgue

Find all posts by disedorgue

Shell Programming and Scripting

Remove sections based on duplicate first line

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicate lines from file based on fields

Discussion started by: Lord Spectre

2. Shell Programming and Scripting

Remove duplicate rows based on one column

Discussion started by: clarissab

3. Shell Programming and Scripting

Remove duplicate entries based on the range

Discussion started by: raj_k

4. Shell Programming and Scripting

How To Remove Duplicate Based on the Value?

Discussion started by: OTNA

5. Shell Programming and Scripting

Remove duplicate value based on two field $4 and $5

Discussion started by: mohan sharma

6. Shell Programming and Scripting

Remove duplicate based on Group

Discussion started by: yale_work

7. Shell Programming and Scripting

Remove duplicate lines based on field and sort

Discussion started by: cokedude

8. UNIX for Dummies Questions & Answers

How to get remove duplicate of a file based on many conditions

Discussion started by: reva

9. Shell Programming and Scripting

Remove duplicate line detail based on column one data

Discussion started by: patrick87

10. UNIX for Dummies Questions & Answers

Remove duplicate rows of a file based on a value of a column

Discussion started by: risk_sly