Search for duplicates and delete but remain the first one based on a specific pattern

07-27-2013

Registered User

149, 1

Join Date: Dec 2010

Last Activity: 9 June 2015, 10:16 AM EDT

Posts: 149

Thanks Given: 100

Thanked 1 Time in 1 Post

Search for duplicates and delete but remain the first one based on a specific pattern

Hi all,

I have been trying to delete duplicates based on a certain pattern but failed to make it works. There are more than 1 pattern which are duplicated but i just want to remove 1 pattern only and remain the rest. I cannot use awk '!x[$0]++' inputfile.txt or sed '/pattern/d' or use uniq and sort command as it will deleted all the duplicated patterns in the file. A sample as follows:

inputfile.txt

Code:

;;  
;;
ID    701
NAME    701
FUNC    Null
FUNC    Null
FUNC    Null
CC    27749
PRO    A
NO    NO:3676
NO    NO:3677
NO    NO:3723
NO    NO:3964
COMMENT    Nothing is impossible
@@
ID    702
NAME    702
FUNC    Null
FUNC    Null
FUNC    Null
FUNC    Null
PRO    A
NO    NO:3676
NO    NO:3677
COMMENT    Need to change
@@
ID    706
NAME    706
FUNC    Null
PRO    A
NO    NO:6301
NO    NO:6310
NO    NO:6450
NO    NO:6647
NO    NO:6812
@@

I want to remove the duplicates for pattern "FUNC" only, where the output should look like this:

output.txt

Code:

;;  
;;
ID    701
NAME    701
FUNC    Null
CC    27749
PRO    A
NO    NO:3676
NO    NO:3677
NO    NO:3723
NO    NO:3964
COMMENT    Nothing is impossible
@@
ID    702
NAME    702
FUNC    Null
PRO    A
NO    NO:3676
NO    NO:3677
COMMENT    Need to change
@@
ID    706
NAME    706
FUNC    Null
PRO    A
NO    NO:6301
NO    NO:6310
NO    NO:6450
NO    NO:6647
NO    NO:6812
@@

I have thousands of data like this and i need to delete a different pattern at one time. I tried to do it by specifying the column no too but it affects other duplicated values which i dont want it to be affected. Appreciate your help on this. Thanks

redse171

View Public Profile for redse171

Find all posts by redse171

07-27-2013

Registered User

5,091, 1,931

Join Date: May 2012

Last Activity: 15 July 2020, 4:46 AM EDT

Location: Simplicity

Posts: 5,091

Thanks Given: 565

Thanked 1,931 Times in 1,668 Posts

Code:

awk '$1!="FUNC" || $2!="Null" || $0!=prev {print} {prev=$0}' inputfile.txt

Last edited by MadeInGermany; 07-27-2013 at 10:16 AM.. Reason: bug fix

This User Gave Thanks to MadeInGermany For This Post:

MadeInGermany

View Public Profile for MadeInGermany

Find all posts by MadeInGermany

07-27-2013

Registered User

149, 1

Join Date: Dec 2010

Last Activity: 9 June 2015, 10:16 AM EDT

Posts: 149

Thanks Given: 100

Thanked 1 Time in 1 Post

Hi MadeInGermany,

Thanks so much!! It works perfectly...

. btw, can you pls explain to me the code? especially

Code:

$0!=prev {print} {prev=$0}'

Last edited by Scott; 07-27-2013 at 10:52 AM.. Reason: Code tags

redse171

View Public Profile for redse171

Find all posts by redse171

07-27-2013

Registered User

544, 43

Join Date: Oct 2006

Last Activity: 27 March 2017, 3:00 AM EDT

Location: Belgium

Posts: 544

Thanks Given: 5

Thanked 43 Times in 29 Posts

Another way:

Code:

awk 'l==$0&&/FUNC/{next}{l=$0}1' file

This User Gave Thanks to ripat For This Post:

ripat

View Public Profile for ripat

Find all posts by ripat

07-27-2013

Registered User

149, 1

Join Date: Dec 2010

Last Activity: 9 June 2015, 10:16 AM EDT

Posts: 149

Thanks Given: 100

Thanked 1 Time in 1 Post

Hi ripat,

Yeah, i tried yours and it worked great too! But, if u dont mind, can u pls help me explain the code? Thanks

redse171

View Public Profile for redse171

Find all posts by redse171

07-27-2013

Registered User

544, 43

Join Date: Oct 2006

Last Activity: 27 March 2017, 3:00 AM EDT

Location: Belgium

Posts: 544

Thanks Given: 5

Thanked 43 Times in 29 Posts

The idea is to store every line in a buffer variable {l=$0}

For every line seen == to the previous line stored in the buffer l==$0 and containing FUNC &&/FUNC/ skip that line {next} and start all over again with the next line.

If the line is not skipped it will be catched by the 1 at the end which is shorthand for print. Same as: l==$0&&/FUNC/{next}{l=$0;print}

This User Gave Thanks to ripat For This Post:

ripat

View Public Profile for ripat

Find all posts by ripat

07-27-2013

Registered User

149, 1

Join Date: Dec 2010

Last Activity: 9 June 2015, 10:16 AM EDT

Posts: 149

Thanks Given: 100

Thanked 1 Time in 1 Post

got that.. thanks!

redse171

View Public Profile for redse171

Find all posts by redse171

Shell Programming and Scripting

Search for duplicates and delete but remain the first one based on a specific pattern

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Finding duplicates in a file excluding specific pattern

Discussion started by: shiva2985

2. Shell Programming and Scripting

Delete files based on specific MMDDYYYY pattern in filename

Discussion started by: shankar1dada

3. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Discussion started by: script_op2a

4. Shell Programming and Scripting

Print a pattern between the xml tags based on a search pattern

Discussion started by: oky

5. Shell Programming and Scripting

Search based on 1,2,4,5 columns and remove duplicates in the same file.

Discussion started by: onesuri

6. Shell Programming and Scripting

Trimming sequences based on specific pattern

Discussion started by: Xterra

7. HP-UX

How to delete specific pattern in a file with SED?

Discussion started by: ganesh.mandlik

8. Shell Programming and Scripting

How can i delete the duplicates based on one column of a line

Discussion started by: rdhanek

9. UNIX for Dummies Questions & Answers

Search for very specific pattern with less

Discussion started by: anjas

10. Shell Programming and Scripting

delete and remain 2 value

Discussion started by: happyv