Search for duplicates and delete but remain the first one based on a specific pattern


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Search for duplicates and delete but remain the first one based on a specific pattern
# 1  
Old 07-27-2013
Search for duplicates and delete but remain the first one based on a specific pattern

Hi all,

I have been trying to delete duplicates based on a certain pattern but failed to make it works. There are more than 1 pattern which are duplicated but i just want to remove 1 pattern only and remain the rest. I cannot use awk '!x[$0]++' inputfile.txt or sed '/pattern/d' or use uniq and sort command as it will deleted all the duplicated patterns in the file. A sample as follows:

inputfile.txt
Code:
;;  
;;
ID    701
NAME    701
FUNC    Null
FUNC    Null
FUNC    Null
CC    27749
PRO    A
NO    NO:3676
NO    NO:3677
NO    NO:3723
NO    NO:3964
COMMENT    Nothing is impossible
@@
ID    702
NAME    702
FUNC    Null
FUNC    Null
FUNC    Null
FUNC    Null
PRO    A
NO    NO:3676
NO    NO:3677
COMMENT    Need to change
@@
ID    706
NAME    706
FUNC    Null
PRO    A
NO    NO:6301
NO    NO:6310
NO    NO:6450
NO    NO:6647
NO    NO:6812
@@

I want to remove the duplicates for pattern "FUNC" only, where the output should look like this:

output.txt
Code:
;;  
;;
ID    701
NAME    701
FUNC    Null
CC    27749
PRO    A
NO    NO:3676
NO    NO:3677
NO    NO:3723
NO    NO:3964
COMMENT    Nothing is impossible
@@
ID    702
NAME    702
FUNC    Null
PRO    A
NO    NO:3676
NO    NO:3677
COMMENT    Need to change
@@
ID    706
NAME    706
FUNC    Null
PRO    A
NO    NO:6301
NO    NO:6310
NO    NO:6450
NO    NO:6647
NO    NO:6812
@@

I have thousands of data like this and i need to delete a different pattern at one time. I tried to do it by specifying the column no too but it affects other duplicated values which i dont want it to be affected. Appreciate your help on this. Thanks
# 2  
Old 07-27-2013
Code:
awk '$1!="FUNC" || $2!="Null" || $0!=prev {print} {prev=$0}' inputfile.txt


Last edited by MadeInGermany; 07-27-2013 at 10:16 AM.. Reason: bug fix
This User Gave Thanks to MadeInGermany For This Post:
# 3  
Old 07-27-2013
Hi MadeInGermany,

Thanks so much!! It works perfectly... Smilie. btw, can you pls explain to me the code? especially
Code:
$0!=prev {print} {prev=$0}'


Last edited by Scott; 07-27-2013 at 10:52 AM.. Reason: Code tags
# 4  
Old 07-27-2013
Another way:

Code:
awk 'l==$0&&/FUNC/{next}{l=$0}1' file

This User Gave Thanks to ripat For This Post:
# 5  
Old 07-27-2013
Hi ripat,

Yeah, i tried yours and it worked great too! But, if u dont mind, can u pls help me explain the code? Thanks
# 6  
Old 07-27-2013
The idea is to store every line in a buffer variable {l=$0}

For every line seen == to the previous line stored in the buffer l==$0 and containing FUNC &&/FUNC/ skip that line {next} and start all over again with the next line.

If the line is not skipped it will be catched by the 1 at the end which is shorthand for print. Same as: l==$0&&/FUNC/{next}{l=$0;print}
This User Gave Thanks to ripat For This Post:
# 7  
Old 07-27-2013
got that.. thanks!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Finding duplicates in a file excluding specific pattern

I have unix file like below >newuser newuser <hello hello newone I want to find the unique values in the file(excluding <,>),so that the out put should be >newuser <hello newone can any body tell me what is command to get this new file. (7 Replies)
Discussion started by: shiva2985
7 Replies

2. Shell Programming and Scripting

Delete files based on specific MMDDYYYY pattern in filename

Hi Unix gurus, I am trying to remove the filenames based on MMDDYYYY in the physical name as such so that the directory always has the recent 3 files based on MMDDYYYY. "HHMM" is just dummy in this case. You wont have two files with different HHMM on the same day. For example in a... (4 Replies)
Discussion started by: shankar1dada
4 Replies

3. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Given a file such as this I need to remove the duplicates. 00060011 PAUL BOWSTEIN ad_waq3_921_20100826_010517.txt 00060011 PAUL BOWSTEIN ad_waq3_921_20100827_010528.txt 0624-01 RUT CORPORATION ad_sade3_10_20100827_010528.txt 0624-01 RUT CORPORATION ... (13 Replies)
Discussion started by: script_op2a
13 Replies

4. Shell Programming and Scripting

Print a pattern between the xml tags based on a search pattern

Hi all, I am trying to extract the values ( text between the xml tags) based on the Order Number. here is the sample input <?xml version="1.0" encoding="UTF-8"?> <NJCustomer> <Header> <MessageIdentifier>Y504173382</MessageIdentifier> ... (13 Replies)
Discussion started by: oky
13 Replies

5. Shell Programming and Scripting

Search based on 1,2,4,5 columns and remove duplicates in the same file.

Hi, I am unable to search the duplicates in a file based on the 1st,2nd,4th,5th columns in a file and also remove the duplicates in the same file. Source filename: Filename.csv "1","ccc","information","5000","temp","concept","new" "1","ddd","information","6000","temp","concept","new"... (2 Replies)
Discussion started by: onesuri
2 Replies

6. Shell Programming and Scripting

Trimming sequences based on specific pattern

My files look like this And I need to cut the sequences at the last "A" found in the following 'pattern' -highlighted for easier identification, the pattern is the actual file is not highlighted. The expected result should look like this Thus, all the sequences would end with AGCCCTA... (2 Replies)
Discussion started by: Xterra
2 Replies

7. HP-UX

How to delete specific pattern in a file with SED?

I have one file which is having content as following... 0513468211,,,,20091208,084005,5,,2,3699310, 0206554475,,,,20090327,123634,85,,2,15615533 0206554475,,,,20090327,134431,554,,2,7246177 0103000300,,,,20090523,115501,89,,2,3869929 0736454328,,,,20091208,084005,75,,2,3699546... (7 Replies)
Discussion started by: ganesh.mandlik
7 Replies

8. Shell Programming and Scripting

How can i delete the duplicates based on one column of a line

I have my data something like this (08/03/2009 22:57:42.414)(:) king aaaaaaaaaaaaaaaa bbbbbbbbbbbbbbbbbbbbbb (08/03/2009 22:57:42.416)(:) John cccccccccccc cccccvssssssssss baaaaa (08/03/2009 22:57:42.417)(:) Michael ddddddd tststststtststts (08/03/2009 22:57:42.425)(:) Ravi... (11 Replies)
Discussion started by: rdhanek
11 Replies

9. UNIX for Dummies Questions & Answers

Search for very specific pattern with less

Hi, I want to search a certain pattern with less command in a files. For examples, I have a files with this entry: POLAR xx POLARX xc POLARXI x1 POLARZZZY vb POLARLLLLLLL ee... (1 Reply)
Discussion started by: anjas
1 Replies

10. Shell Programming and Scripting

delete and remain 2 value

Hello Friend, I have the followint command to delete 4th field and move forward. Can I delete all filed and just remain the first 2? sed -e "/^*<Number/s/\(\) \(\)/\1\2/g" -e "/^*<Number/s/\(\)./\1/" -e "/^*<Number/s/\(\)/\1 /g" -e "/^*<Number/s/0</</" file input <Number>00000000<Number>... (5 Replies)
Discussion started by: happyv
5 Replies
Login or Register to Ask a Question