Code to exclude lines with similar values | Unix Linux Forums | Shell Programming and Scripting

  Go Back    


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

Code to exclude lines with similar values

Shell Programming and Scripting


Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 03-06-2013
Tzole Tzole is offline
Registered User
 
Join Date: Jan 2013
Last Activity: 26 July 2013, 8:30 AM EDT
Posts: 11
Thanks: 14
Thanked 0 Times in 0 Posts
Code to exclude lines with similar values

Hi!!!

I have a problem with txt file. For example:

File:


Code:
CATEGORY OF XXX
  AAA    1          XXX     BBB     CCC
  AAA    1          XXX     DDD     EEE
  AAA    1          XXX     FFF     GGG
  AAA    1          XXX     KKK     LLL
  AAA    1          XXX     MMM     NNN
  
CATEGORY OF YYY
  AAA    1          YYY     OOO    PPP
  AAA    1          YYY     DDD    EEE
  AAA    1          YYY     QQQ    RRR

When I am analyzing the category of XXX, I don’t want the lines that have same values with the category of YYY.
So the output will be:


Code:
CATEGORY OF XXX
  AAA     1          XXX     BBB     CCC
  AAA     1          XXX     FFF     GGG
  AAA     1          XXX     KKK     LLL
  AAA     1          XXX     MMM     NNN

(without the second line).

Any suggestions??? Thank you in advance
Sponsored Links
    #2  
Old 03-06-2013
rdrtx1 rdrtx1 is offline
Registered User
 
Join Date: Sep 2012
Last Activity: 15 July 2014, 11:14 AM EDT
Location: Houston, Texas, USA
Posts: 675
Thanks: 0
Thanked 203 Times in 195 Posts
try:

Code:
awk '
NR==FNR {if ($3!=cat) a[$1$2$4$5]=$0; next}
$NF==cat
$3==cat {if (!a[$1$2$4$5]) print }
' cat="XXX" infile infile

The Following User Says Thank You to rdrtx1 For This Useful Post:
Tzole (03-06-2013)
Sponsored Links
    #3  
Old 03-06-2013
Scrutinizer's Avatar
Scrutinizer Scrutinizer is offline Forum Staff  
Moderator
 
Join Date: Nov 2008
Last Activity: 20 September 2014, 11:28 AM EDT
Location: Amsterdam
Posts: 9,464
Thanks: 279
Thanked 2,388 Times in 2,140 Posts
@rdrtx1: It is better to use SUBSEP to separate the fields in the index of the array.

Code:
a[$1,$2,$4,$5]

In the sample they all happen to have the same length, but if they vary in length then one value may "blur" into another value and create unexpected results
The Following User Says Thank You to Scrutinizer For This Useful Post:
Tzole (03-06-2013)
    #4  
Old 03-06-2013
hanson44 hanson44 is offline
Registered User
 
Join Date: Mar 2013
Last Activity: 12 May 2013, 11:33 PM EDT
Posts: 858
Thanks: 18
Thanked 180 Times in 177 Posts
Here is a possibility. Instead of going through machinations with complex scripts, improve the file format first. The "Category of XXX", etc. information is redundant, already in field #3. "Category of XXX" is extraneous, and hard to deal with. I know you didn't ask for different file format! But I think this is better solution to making file easier to deal with. Suggested new data file format:

Code:
  AAA    1          XXX     BBB     CCC
  AAA    1          XXX     DDD     EEE
  AAA    1          XXX     FFF     GGG
  AAA    1          XXX     KKK     LLL
  AAA    1          XXX     MMM     NNN
  AAA    1          YYY     OOO    PPP
  AAA    1          YYY     DDD    EEE
  AAA    1          YYY     QQQ    RRR

- sort on field #4 (BBB).
- run uniq with option to limit comparison to fields #4 and #5.
uniq step will get rid of the "DDD EEE" duplication.
- sort on field #3, to put categories back in order.
The Following User Says Thank You to hanson44 For This Useful Post:
Tzole (03-06-2013)
Sponsored Links
    #5  
Old 03-06-2013
Tzole Tzole is offline
Registered User
 
Join Date: Jan 2013
Last Activity: 26 July 2013, 8:30 AM EDT
Posts: 11
Thanks: 14
Thanked 0 Times in 0 Posts
@rdrtx1 it works!!! Thank you so much!!

I will see also the other useful suggestions of @Scrutinizer and @hanson44
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
How to find similar values in different files linseyr Shell Programming and Scripting 2 09-26-2012 04:43 PM
removing lines with similar values from file krecik28 Shell Programming and Scripting 3 03-01-2012 09:11 AM
awk to search similar strings and add their values prashu_g Shell Programming and Scripting 3 01-12-2012 08:05 AM
Joining multiple files based on one column with different and similar values (shell or perl) seqbiologist Shell Programming and Scripting 4 04-28-2011 05:00 AM
exclude lines in a loop shantanuo Shell Programming and Scripting 2 03-19-2009 10:13 AM



All times are GMT -4. The time now is 02:01 PM.