Unix/Linux Go Back    


Shell Programming and Scripting BSD, Linux, and UNIX shell scripting — Post awk, bash, csh, ksh, perl, php, python, sed, sh, shell scripts, and other shell scripting languages questions here.

Remove very first pair of duplicate words

Shell Programming and Scripting


Closed    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 03-21-2012   -   Original Discussion by manas_ranjan
manas_ranjan's Unix or Linux Image
manas_ranjan manas_ranjan is offline
Registered User
 
Join Date: Jul 2007
Last Activity: 2 February 2018, 6:29 AM EST
Location: Amsterdam
Posts: 289
Thanks: 16
Thanked 0 Times in 0 Posts
Remove very first pair of duplicate words

I have file which is almost look like below


Code:
                                                                           MMIT
                                                                                      MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                      MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 15-03-2012                                                                                                                                                               														 MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 15-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 14-03-2012
                                                                                                                                                                         														MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 15-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 14-03-2012
                                                                                     
                                                                                      ISS
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                 19-03-2012
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                 16-03-2012

Now it will check for very first two occurance of the same words, like in this example the same occurance of the words are MMIT and ISS , so it will remove one of each words so the new file will look like below


Code:
                                                                            MMIT -removed
                                                                                      MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                      MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 15-03-2012                                                                                                                                                               														 MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 15-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 14-03-2012
                                                                                                                                                                         														MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 15-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 14-03-2012
                                                                                     
                                                                                      ISS --removed
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                 19-03-2012
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                 16-03-2012

now it will check very similar occurance of first and second words like in our example


Code:
                                                                             MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                      MMIT
                                                                                 VAR_1D_DATA_TYPE

and 
	                                                                          MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE 
and 
                                                                                     ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE

so it will remove one pair and keep other . So finally after removing these words output should look like as below


Code:
                                                                           MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 15-03-2012                                                                                                                                                               														 MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 15-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 14-03-2012
                                                                                                                                                                         														
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 15-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 14-03-2012
                                                                                     
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                 19-03-2012
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                 16-03-2012


Last edited by manas_ranjan; 03-21-2012 at 07:11 AM..
Sponsored Links
    #2  
Old Unix and Linux 03-25-2012   -   Original Discussion by manas_ranjan
Scrutinizer's Unix or Linux Image
Scrutinizer Scrutinizer is offline Forum Staff  
Moderator
 
Join Date: Nov 2008
Last Activity: 19 April 2018, 5:13 PM EDT
Location: Amsterdam
Posts: 11,781
Thanks: 542
Thanked 3,430 Times in 3,024 Posts
Hi, try:


Code:
awk '{R[NR]=$0;F[NR]=$1} END{for(i=1;i<=NR;i++)if(F[i]!=F[i+1]){if(F[i]==F[i+2] && F[i+1]==F[i+3])i++; else print R[i]}}' infile

Sponsored Links
    #3  
Old Unix and Linux 03-25-2012   -   Original Discussion by manas_ranjan
ctsgnb's Unix or Linux Image
ctsgnb ctsgnb is offline Forum Advisor  
Registered User
 
Join Date: Oct 2010
Last Activity: 3 April 2018, 6:42 AM EDT
Location: France
Posts: 2,967
Thanks: 88
Thanked 640 Times in 610 Posts
Also try



Code:
awk '{y=x;x=w;w=$1}(y==x||y==w){next}{print y}END{print x!=w?x RS w:w }'  infile

Just be aware that this code will not display the line in red because the same line appear in line n+2


Code:
                                                                                 [...]                                                                                      MMIT                                                                                  VAR_1D_DATA_TYPE                                                                                  15-03-2012                                                                                                                                                               														 MMIT                                                                                  VAR_10D_DATA_TYPE                                                                                      MMIT                                                                                  VAR_10D_DATA_TYPE
                                                                                 [...]


or


Code:
awk '{z=y;y=x;x=w;w=$1}(z==y||t){t=0;next}(z==x&&y==w){t=1;next}{print z}END{print (y!=x?y RS:z) (x!=w?x RS w:w)}' infile


Last edited by ctsgnb; 03-25-2012 at 10:34 AM..
Sponsored Links
Closed

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
remove duplicate words in a line sam_2921 Shell Programming and Scripting 6 03-19-2009 05:52 PM
Identify duplicate words in a line using command srinivasan_85 UNIX for Dummies Questions & Answers 8 05-01-2007 01:29 AM
remove duplicate kazanoova2 Shell Programming and Scripting 4 04-12-2004 12:35 AM
Duplicate words zulander UNIX for Dummies Questions & Answers 1 04-01-2001 03:11 AM



All times are GMT -4. The time now is 03:20 AM.