Unix/Linux Go Back    


Shell Programming and Scripting BSD, Linux, and UNIX shell scripting — Post awk, bash, csh, ksh, perl, php, python, sed, sh, shell scripts, and other shell scripting languages questions here.

Remove very first pair of duplicate words

Shell Programming and Scripting


Closed    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 03-21-2012   -   Original Discussion by manas_ranjan
manas_ranjan's Unix or Linux Image
manas_ranjan manas_ranjan is offline
Registered User
 
Join Date: Jul 2007
Last Activity: 1 April 2016, 8:46 AM EDT
Location: Amsterdam
Posts: 287
Thanks: 14
Thanked 0 Times in 0 Posts
Remove very first pair of duplicate words

I have file which is almost look like below

Code:
                                                                           MMIT
                                                                                      MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                      MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 15-03-2012                                                                                                                                                               														 MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 15-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 14-03-2012
                                                                                                                                                                         														MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 15-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 14-03-2012
                                                                                     
                                                                                      ISS
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                 19-03-2012
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                 16-03-2012

Now it will check for very first two occurance of the same words, like in this example the same occurance of the words are MMIT and ISS , so it will remove one of each words so the new file will look like below

Code:
                                                                            MMIT -removed
                                                                                      MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                      MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 15-03-2012                                                                                                                                                               														 MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 15-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 14-03-2012
                                                                                                                                                                         														MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 15-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 14-03-2012
                                                                                     
                                                                                      ISS --removed
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                 19-03-2012
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                 16-03-2012

now it will check very similar occurance of first and second words like in our example

Code:
                                                                             MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                      MMIT
                                                                                 VAR_1D_DATA_TYPE

and 
	                                                                          MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE 
and 
                                                                                     ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE

so it will remove one pair and keep other . So finally after removing these words output should look like as below

Code:
                                                                           MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 15-03-2012                                                                                                                                                               														 MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 15-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 14-03-2012
                                                                                                                                                                         														
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 15-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 14-03-2012
                                                                                     
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                 19-03-2012
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                 16-03-2012


Last edited by manas_ranjan; 03-21-2012 at 08:11 AM..
Sponsored Links
    #2  
Old Unix and Linux 03-25-2012   -   Original Discussion by manas_ranjan
Scrutinizer's Unix or Linux Image
Scrutinizer Scrutinizer is online now Forum Staff  
Moderator
 
Join Date: Nov 2008
Last Activity: 18 November 2017, 5:35 PM EST
Location: Amsterdam
Posts: 11,618
Thanks: 516
Thanked 3,380 Times in 2,979 Posts
Hi, try:

Code:
awk '{R[NR]=$0;F[NR]=$1} END{for(i=1;i<=NR;i++)if(F[i]!=F[i+1]){if(F[i]==F[i+2] && F[i+1]==F[i+3])i++; else print R[i]}}' infile

Sponsored Links
    #3  
Old Unix and Linux 03-25-2012   -   Original Discussion by manas_ranjan
ctsgnb ctsgnb is offline Forum Advisor  
Registered User
 
Join Date: Oct 2010
Last Activity: 11 October 2017, 11:47 AM EDT
Location: France
Posts: 2,962
Thanks: 85
Thanked 637 Times in 608 Posts
Also try


Code:
awk '{y=x;x=w;w=$1}(y==x||y==w){next}{print y}END{print x!=w?x RS w:w }'  infile

Just be aware that this code will not display the line in red because the same line appear in line n+2

Code:
                                                                                 [...]                                                                                      MMIT                                                                                  VAR_1D_DATA_TYPE                                                                                  15-03-2012                                                                                                                                                               														 MMIT                                                                                  VAR_10D_DATA_TYPE                                                                                      MMIT                                                                                  VAR_10D_DATA_TYPE
                                                                                 [...]


or

Code:
awk '{z=y;y=x;x=w;w=$1}(z==y||t){t=0;next}(z==x&&y==w){t=1;next}{print z}END{print (y!=x?y RS:z) (x!=w?x RS w:w)}' infile


Last edited by ctsgnb; 03-25-2012 at 11:34 AM..
Sponsored Links
Closed

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
remove duplicate words in a line sam_2921 Shell Programming and Scripting 6 03-19-2009 06:52 PM
Identify duplicate words in a line using command srinivasan_85 UNIX for Dummies Questions & Answers 8 05-01-2007 02:29 AM
remove duplicate kazanoova2 Shell Programming and Scripting 4 04-12-2004 01:35 AM
Duplicate words zulander UNIX for Dummies Questions & Answers 1 04-01-2001 04:11 AM



All times are GMT -4. The time now is 07:01 PM.