Code to exclude lines with similar values


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Code to exclude lines with similar values
# 1  
Old 03-06-2013
Code to exclude lines with similar values

Hi!!!

I have a problem with txt file. For example:

File:

Code:
CATEGORY OF XXX
  AAA    1          XXX     BBB     CCC
  AAA    1          XXX     DDD     EEE
  AAA    1          XXX     FFF     GGG
  AAA    1          XXX     KKK     LLL
  AAA    1          XXX     MMM     NNN
  
CATEGORY OF YYY
  AAA    1          YYY     OOO    PPP
  AAA    1          YYY     DDD    EEE
  AAA    1          YYY     QQQ    RRR

When I am analyzing the category of XXX, I don’t want the lines that have same values with the category of YYY.
So the output will be:

Code:
CATEGORY OF XXX
  AAA     1          XXX     BBB     CCC
  AAA     1          XXX     FFF     GGG
  AAA     1          XXX     KKK     LLL
  AAA     1          XXX     MMM     NNN

(without the second line).

Any suggestions??? Thank you in advance Smilie
# 2  
Old 03-06-2013
try:
Code:
awk '
NR==FNR {if ($3!=cat) a[$1$2$4$5]=$0; next}
$NF==cat
$3==cat {if (!a[$1$2$4$5]) print }
' cat="XXX" infile infile

This User Gave Thanks to rdrtx1 For This Post:
# 3  
Old 03-06-2013
@rdrtx1: It is better to use SUBSEP to separate the fields in the index of the array.
Code:
a[$1,$2,$4,$5]

In the sample they all happen to have the same length, but if they vary in length then one value may "blur" into another value and create unexpected results
This User Gave Thanks to Scrutinizer For This Post:
# 4  
Old 03-06-2013
Here is a possibility. Instead of going through machinations with complex scripts, improve the file format first. The "Category of XXX", etc. information is redundant, already in field #3. "Category of XXX" is extraneous, and hard to deal with. I know you didn't ask for different file format! But I think this is better solution to making file easier to deal with. Suggested new data file format:
Code:
  AAA    1          XXX     BBB     CCC
  AAA    1          XXX     DDD     EEE
  AAA    1          XXX     FFF     GGG
  AAA    1          XXX     KKK     LLL
  AAA    1          XXX     MMM     NNN
  AAA    1          YYY     OOO    PPP
  AAA    1          YYY     DDD    EEE
  AAA    1          YYY     QQQ    RRR

- sort on field #4 (BBB).
- run uniq with option to limit comparison to fields #4 and #5.
uniq step will get rid of the "DDD EEE" duplication.
- sort on field #3, to put categories back in order.
This User Gave Thanks to hanson44 For This Post:
# 5  
Old 03-06-2013
@rdrtx1 it works!!! Thank you so much!!

I will see also the other useful suggestions of @Scrutinizer and @hanson44
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract and exclude rows based on duplicate values

Hello I have a file like this: > cat examplefile ghi|NN603762|eee mno|NN607265|ttt pqr|NN613879|yyy stu|NN615002|uuu jkl|NN607265|rrr vwx|NN615002|iii yzA|NN618555|ooo def|NN190486|www BCD|NN628717|ppp abc|NN190486|qqq EFG|NN628717|aaa HIJ|NN628717|sss > I can sort the file by... (5 Replies)
Discussion started by: CHoggarth
5 Replies

2. Shell Programming and Scripting

Add values of similar patterns with awk

so my output is this: session_closed=157 session_opened=151 session_closed=18 session_opened=17 there are two patterns here, but with different values. the two patterns are "session_opened" and "session_closed". i expect there will be many more other patterns. what i want to do is... (8 Replies)
Discussion started by: SkySmart
8 Replies

3. Shell Programming and Scripting

Perl next if @array (exclude a list of values)

I'm trying to exlude a list of values with perl to process while reading in a file. Is there a way to use the next if with a list? Example: @array = qw(val1 val2 val3 val6); while (<>) { next if $_ =~ @array; # values I don't want to process here print; # process the rest here }... (8 Replies)
Discussion started by: timj123
8 Replies

4. Shell Programming and Scripting

File lines starts with # not processed or exclude that lines from processing

I have a file like below #Fields section bald 1234 2345 456 222 abcs dddd dddd ssss mmmm mmm mmm i need do not process a files stating with # I was written code below while read -r line do if then echo ${line} >> elif then ... (3 Replies)
Discussion started by: Chenchireddy
3 Replies

5. Shell Programming and Scripting

File lines starts with # not processed or exclude that lines

I have requirement in my every files starting lines have # needs to be not processing or exclude the that lines. I have written a code like below, but now working as expected getting ERROR" line 60: 1 #!/bin/sh 2 echo ======= LogManageri start ========== 3 4 #This directory is... (1 Reply)
Discussion started by: Chenchireddy
1 Replies

6. Shell Programming and Scripting

How to find similar values in different files

Hello, I have 4 files like this: file1: cg24163616 15 297 cg09335911 123 297 cg13515808 565 776 cg12242345 499 705 cg22905282 225 427 cg16674860 286 779 cg14251734 303 724 cg19316579 211 717 cg00612625 422 643 file2:... (2 Replies)
Discussion started by: linseyr
2 Replies

7. Shell Programming and Scripting

removing lines with similar values from file

Hello, got a file with this structure: 33274 171030 02/29/2012 37897 P_GEH 2012-02-29 10:31:26 33275 171049 02/29/2012 38132 P_GEH 2012-02-29 10:35:27 33276 171058 02/29/2012 38515 P_GEH 2012-02-29 10:43:26 33277 170748 02/29/2012 40685 P_KOM ... (3 Replies)
Discussion started by: krecik28
3 Replies

8. Shell Programming and Scripting

awk to search similar strings and add their values

Hi, I have a text file with the following content: monday,20 tuesday,10 wednesday,29 monday,10 friday,12 wednesday,14 monday,15 thursday,34 i want the following output: monday,45 tuesday,10 wednesday,43 friday,12 (3 Replies)
Discussion started by: prashu_g
3 Replies

9. Shell Programming and Scripting

Joining multiple files based on one column with different and similar values (shell or perl)

Hi, I have nine files looking similar to file1 & file2 below. File1: 1 ABCA1 1 ABCC8 1 ABR:N 1 ACACB 1 ACAP2 1 ACOT1 1 ACSBG 1 ACTR1 1 ACTRT 1 ADAMT 1 AEN:N 1 AKAP1File2: 1 A4GAL 1 ACTBL 1 ACTL7 (4 Replies)
Discussion started by: seqbiologist
4 Replies

10. Shell Programming and Scripting

exclude lines in a loop

I use while do - done loop in my shell script. It is working as per my expectations. But I do not want to process all the lines. I am finding it difficult to exclude certain lines. 1) I do not want to process blank lines as well as lines those start with a space " " 2) I do not want to... (2 Replies)
Discussion started by: shantanuo
2 Replies
Login or Register to Ask a Question