Home Man
Search
Today's Posts
Register

BSD, Linux, and UNIX shell scripting — Post awk, bash, csh, ksh, perl, php, python, sed, sh, shell scripts, and other shell scripting languages questions here.

Remove very first pair of duplicate words

Tags
shell scripts

👤 Login to reply

 
Thread Tools Search this Thread
# 1  
Old 03-21-2012
Remove very first pair of duplicate words

I have file which is almost look like below
Code:
                                                                           MMIT
                                                                                      MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                      MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 15-03-2012                                                                                                                                                               														 MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 15-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 14-03-2012
                                                                                                                                                                         														MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 15-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 14-03-2012
                                                                                     
                                                                                      ISS
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                 19-03-2012
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                 16-03-2012

Now it will check for very first two occurance of the same words, like in this example the same occurance of the words are MMIT and ISS , so it will remove one of each words so the new file will look like below
Code:
                                                                            MMIT -removed
                                                                                      MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                      MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 15-03-2012                                                                                                                                                               														 MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 15-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 14-03-2012
                                                                                                                                                                         														MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 15-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 14-03-2012
                                                                                     
                                                                                      ISS --removed
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                 19-03-2012
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                 16-03-2012

now it will check very similar occurance of first and second words like in our example
Code:
                                                                             MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                      MMIT
                                                                                 VAR_1D_DATA_TYPE

and 
	                                                                          MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE 
and 
                                                                                     ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE

so it will remove one pair and keep other . So finally after removing these words output should look like as below
Code:
                                                                           MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 VAR_1D_DATA_TYPE
                                                                                 15-03-2012                                                                                                                                                               														 MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 15-03-2012
                                                                                     MMIT
                                                                                 VAR_10D_DATA_TYPE
                                                                                 14-03-2012
                                                                                                                                                                         														
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 19-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 16-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 15-03-2012
                                                                                     MMIT
                                                                                 STRESSED_VAR_1D_DATA_TYPE
                                                                                 14-03-2012
                                                                                     
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                 19-03-2012
                                                                                      ISS
                                                                                 CB_VAR_DATA_TYPE
                                                                                 16-03-2012


Last edited by manas_ranjan; 03-21-2012 at 07:11 AM..
# 2  
Old 03-25-2012
Hi, try:
Code:
awk '{R[NR]=$0;F[NR]=$1} END{for(i=1;i<=NR;i++)if(F[i]!=F[i+1]){if(F[i]==F[i+2] && F[i+1]==F[i+3])i++; else print R[i]}}' infile

# 3  
Old 03-25-2012
Also try

Code:
awk '{y=x;x=w;w=$1}(y==x||y==w){next}{print y}END{print x!=w?x RS w:w }'  infile

Just be aware that this code will not display the line in red because the same line appear in line n+2
Code:
                                                                                 [...]                                                                                      MMIT                                                                                  VAR_1D_DATA_TYPE                                                                                  15-03-2012                                                                                                                                                               														 MMIT                                                                                  VAR_10D_DATA_TYPE                                                                                      MMIT                                                                                  VAR_10D_DATA_TYPE
                                                                                 [...]


or
Code:
awk '{z=y;y=x;x=w;w=$1}(z==y||t){t=0;next}(z==x&&y==w){t=1;next}{print z}END{print (y!=x?y RS:z) (x!=w?x RS w:w)}' infile


Last edited by ctsgnb; 03-25-2012 at 10:34 AM..
👤 Login to reply

« Previous Thread | Next Thread »
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Remove duplicate words from column 1 jimmyf UNIX for Dummies Questions & Answers 6 06-30-2015 12:46 PM
[All variants] remove first pair of parentheses useretail Shell Programming and Scripting 6 02-09-2015 12:11 AM
Find duplicate words using sed jnrohit2k UNIX for Advanced & Expert Users 1 03-07-2014 10:41 AM
Remove duplicate samrat dutta Shell Programming and Scripting 12 06-07-2013 03:36 PM
How to remove duplicate ID's? buzzme Shell Programming and Scripting 9 01-28-2013 12:09 PM
remove words anshu ranjan Shell Programming and Scripting 2 10-12-2011 02:57 AM
Need to remove the words gsiva Shell Programming and Scripting 4 07-30-2009 06:36 AM
remove duplicate words in a line sam_2921 Shell Programming and Scripting 6 03-19-2009 05:52 PM
Identify duplicate words in a line using command srinivasan_85 UNIX for Dummies Questions & Answers 8 05-01-2007 01:29 AM
Duplicate words zulander UNIX for Dummies Questions & Answers 1 04-01-2001 03:11 AM


All times are GMT -4. The time now is 02:11 PM.

Unix & Linux Forums Content Copyright©1993-2018. All Rights Reserved.
UNIX.COM Login
Username:
Password:  
Show Password