Remove very first pair of duplicate words


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove very first pair of duplicate words
# 1  
Old 03-21-2012
Remove very first pair of duplicate words

I have file which is almost look like below
Code:
                                                                           MMIT
                                                                                      MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                      MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 15-03-2012                                                                                                                                                               														 MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 15-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 14-03-2012
                                                                                                                                                                         														MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 15-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 14-03-2012
                                                                                     
                                                                                      ISS
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                 19-03-2012
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                 16-03-2012

Now it will check for very first two occurance of the same words, like in this example the same occurance of the words are MMIT and ISS , so it will remove one of each words so the new file will look like below
Code:
                                                                            MMIT -removed
                                                                                      MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                      MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 15-03-2012                                                                                                                                                               														 MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 15-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 14-03-2012
                                                                                                                                                                         														MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 15-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 14-03-2012
                                                                                     
                                                                                      ISS --removed
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                 19-03-2012
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                 16-03-2012

now it will check very similar occurance of first and second words like in our example
Code:
                                                                             MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                      MMIT
                                                                                 VAR_1D_DATA_TYPE

and 
	                                                                          MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE 
and 
                                                                                     ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE

so it will remove one pair and keep other . So finally after removing these words output should look like as below
Code:
                                                                           MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 15-03-2012                                                                                                                                                               														 MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 15-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 14-03-2012
                                                                                                                                                                         														
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 15-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 14-03-2012
                                                                                     
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                 19-03-2012
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                 16-03-2012


Last edited by manas_ranjan; 03-21-2012 at 08:11 AM..
# 2  
Old 03-25-2012
Hi, try:
Code:
awk '{R[NR]=$0;F[NR]=$1} END{for(i=1;i<=NR;i++)if(F[i]!=F[i+1]){if(F[i]==F[i+2] && F[i+1]==F[i+3])i++; else print R[i]}}' infile

# 3  
Old 03-25-2012
Also try

Code:
awk '{y=x;x=w;w=$1}(y==x||y==w){next}{print y}END{print x!=w?x RS w:w }'  infile

Just be aware that this code will not display the line in red because the same line appear in line n+2
Code:
                                                                                 [...]                                                                                      MMIT                                                                                  VAR_1D_DATA_TYPE                                                                                  15-03-2012                                                                                                                                                               														 MMIT                                                                                  VAR_10D_DATA_TYPE                                                                                      MMIT                                                                                  VAR_10D_DATA_TYPE
                                                                                 [...]


or
Code:
awk '{z=y;y=x;x=w;w=$1}(z==y||t){t=0;next}(z==x&&y==w){t=1;next}{print z}END{print (y!=x?y RS:z) (x!=w?x RS w:w)}' infile


Last edited by ctsgnb; 03-25-2012 at 11:34 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Remove duplicate words from column 1

Tried using sed and uniq but it's removing the entire line. Can't seem to figure a way to just remove the word. Any help is appreciated. I have a file: dog, text1, text2, text3 dog, text1, text2, text3 dog, text1, text2, text3 cat, text1, text2, text3 Trying to remove all duplicate instances... (6 Replies)
Discussion started by: jimmyf
6 Replies

2. Shell Programming and Scripting

[All variants] remove first pair of parentheses

How to remove first pair of parentheses and content in them from the beginning of the line? Here's the list: (ok)-test (ok)-test-(ing) (some)-test-(ing)-test test-(ing) Desired result: test test-(ing) test-(ing)-test test-(ing) Here's what I already tried with GNU sed: sed -e... (6 Replies)
Discussion started by: useretail
6 Replies

3. UNIX for Advanced & Expert Users

Find duplicate words using sed

I have following statement and I want to find duplicate word using sed command. How is it possible? "detect string and remove the duplicate string" There could be many statements in a file and each line may have duplicate word. Thanks! (1 Reply)
Discussion started by: jnrohit2k
1 Replies

4. Shell Programming and Scripting

Remove duplicate

Hi , I have a pipe seperated file repo.psv where i need to remove duplicates based on the 1st column only. Can anyone help with a Unix script ? Input: 15277105||Common Stick|ESHR||Common Stock|CYRO AB 15277105||Common Stick|ESHR||Common Stock|CYRO AB 16111278||Common Stick|ESHR||Common... (12 Replies)
Discussion started by: samrat dutta
12 Replies

5. Shell Programming and Scripting

remove words

All, I have a file with below entries. /java/usr/abc/123 /java/usr/xyz/123_21 /java/usr/ab12/345/234 ......... ......... And I need entry as /java/usr/abc/config /java/usr/xyz/config /java/usr/ab12/config ......... ......... Actually, I need to remove all other entries... (2 Replies)
Discussion started by: anshu ranjan
2 Replies

6. Shell Programming and Scripting

Need to remove the words

Hi folks, I have file with the below 1245633505 +manual mroennfeldt@news.com.au 1245633506 +manual sal@bynews.com.au 1245633506 +manual whson@btimes.com 1245633507 +manual karla.marsden@tnews.com.au 1245633508 +manual king@netn.com.au Now, I need the output of the files only with... (4 Replies)
Discussion started by: gsiva
4 Replies

7. Shell Programming and Scripting

remove duplicate words in a line

Hi, Please help! I have a file having duplicate words in some line and I want to remove the duplicate words. The order of the words in the output file doesn't matter. INPUT_FILE pink_kite red_pen ball pink_kite ball yellow_flower white no white no cloud nine_pen pink cloud pink nine_pen... (6 Replies)
Discussion started by: sam_2921
6 Replies

8. UNIX for Dummies Questions & Answers

Identify duplicate words in a line using command

Hi, Let me explain the problem clearly: Let the entries in my file be: lion,tiger,bear apple,mango,orange,apple,grape unix,windows,solaris,windows,linux red,blue,green,yellow orange,maroon,pink,violet,orange,pink Can we detect the lines in which one of the words(separated by field... (8 Replies)
Discussion started by: srinivasan_85
8 Replies

9. Shell Programming and Scripting

remove duplicate

i have a text its contain many record, but its written in one line, i want to remove from that line the duplicate record, not record have fixed width ex: width = 4 inputfile test.txt =abc cdf abc abc cdf fgh fgh abc abc i want the outputfile =abc cdf fgh only those records can any one help... (4 Replies)
Discussion started by: kazanoova2
4 Replies

10. UNIX for Dummies Questions & Answers

Duplicate words

Hi, can anyone help me with this small problem. Say i've got duplicate words in a list e.g blue orange green green pink blue red How can I delete all reoccurences of the same word? Thanks (1 Reply)
Discussion started by: zulander
1 Replies
Login or Register to Ask a Question