Remove very first pair of duplicate words


Login or Register to Reply

 
Thread Tools Search this Thread
# 1  
Old 03-21-2012
Remove very first pair of duplicate words

I have file which is almost look like below
Code:
                                                                           MMIT
                                                                                      MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                      MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 15-03-2012                                                                                                                                                               														 MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 15-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 14-03-2012
                                                                                                                                                                         														MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 15-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 14-03-2012
                                                                                     
                                                                                      ISS
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                 19-03-2012
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                 16-03-2012

Now it will check for very first two occurance of the same words, like in this example the same occurance of the words are MMIT and ISS , so it will remove one of each words so the new file will look like below
Code:
                                                                            MMIT -removed
                                                                                      MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                      MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 15-03-2012                                                                                                                                                               														 MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 15-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 14-03-2012
                                                                                                                                                                         														MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 15-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 14-03-2012
                                                                                     
                                                                                      ISS --removed
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                 19-03-2012
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                 16-03-2012

now it will check very similar occurance of first and second words like in our example
Code:
                                                                             MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                      MMIT
                                                                                 VAR_1D_DATA_TYPE

and 
	                                                                          MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE 
and 
                                                                                     ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE

so it will remove one pair and keep other . So finally after removing these words output should look like as below
Code:
                                                                           MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 15-03-2012                                                                                                                                                               														 MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 15-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 14-03-2012
                                                                                                                                                                         														
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 15-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 14-03-2012
                                                                                     
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                 19-03-2012
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                 16-03-2012


Last edited by manas_ranjan; 03-21-2012 at 07:11 AM..
# 2  
Old 03-25-2012
Hi, try:
Code:
awk '{R[NR]=$0;F[NR]=$1} END{for(i=1;i<=NR;i++)if(F[i]!=F[i+1]){if(F[i]==F[i+2] && F[i+1]==F[i+3])i++; else print R[i]}}' infile

# 3  
Old 03-25-2012
Also try

Code:
awk '{y=x;x=w;w=$1}(y==x||y==w){next}{print y}END{print x!=w?x RS w:w }'  infile

Just be aware that this code will not display the line in red because the same line appear in line n+2
Code:
                                                                                 [...]                                                                                      MMIT                                                                                  VAR_1D_DATA_TYPE                                                                                  15-03-2012                                                                                                                                                               														 MMIT                                                                                  VAR_10D_DATA_TYPE                                                                                      MMIT                                                                                  VAR_10D_DATA_TYPE
                                                                                 [...]


or
Code:
awk '{z=y;y=x;x=w;w=$1}(z==y||t){t=0;next}(z==x&&y==w){t=1;next}{print z}END{print (y!=x?y RS:z) (x!=w?x RS w:w)}' infile


Last edited by ctsgnb; 03-25-2012 at 10:34 AM..
Login or Register to Reply

|
Thread Tools Search this Thread
Search this Thread:
Advanced Search

More UNIX and Linux Forum Topics You Might Find Helpful
Remove duplicate words from column 1 jimmyf UNIX for Dummies Questions & Answers 6 06-30-2015 12:46 PM
[All variants] remove first pair of parentheses useretail Shell Programming and Scripting 6 02-09-2015 12:11 AM
Find duplicate words using sed jnrohit2k UNIX for Advanced & Expert Users 1 03-07-2014 10:41 AM
Remove duplicate tinku981 UNIX for Dummies Questions & Answers 10 08-09-2013 12:43 PM
Remove duplicate samrat dutta Shell Programming and Scripting 12 06-07-2013 03:36 PM
How to remove duplicate ID's? buzzme Shell Programming and Scripting 9 01-28-2013 12:09 PM
remove words anshu ranjan Shell Programming and Scripting 2 10-12-2011 02:57 AM
remove duplicate ccp Shell Programming and Scripting 6 11-07-2009 10:50 PM
Need to remove the words gsiva Shell Programming and Scripting 4 07-30-2009 06:36 AM
remove duplicate words in a line sam_2921 Shell Programming and Scripting 6 03-19-2009 05:52 PM
Identify duplicate words in a line using command srinivasan_85 UNIX for Dummies Questions & Answers 8 05-01-2007 01:29 AM
Remove duplicate sabercats Shell Programming and Scripting 2 03-31-2006 11:35 AM
Remove duplicate ??? sabercats Shell Programming and Scripting 3 03-10-2006 06:06 PM
remove duplicate kazanoova2 Shell Programming and Scripting 4 04-12-2004 12:35 AM
Duplicate words zulander UNIX for Dummies Questions & Answers 1 04-01-2001 03:11 AM