Filter values by Awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Filter values by Awk
# 1  
Old 08-06-2009
Computer Filter values by Awk

INPUT

Code:
A    EEEE
A    EEEE
A    EEEE
B    FFFF
B    GGGG
B    FFFF
C    GGGG
C    FFFF
C    FFFF
D    HHHH
D    XXXX
D    YYYY
D    YYYY
E    EEEEE
E    EEEEE
E    EEEEE

OUTPUT

Code:
A    EEEE
B    GGGG
C    GGGG
D    HHHH
E    EEEEE

Based on the key in column1 take a look at the column 2 values.
As you can see A has 3 EEEE and B and C have 1 GGGG and and so on.
The problem is to find the common key value. But it changes when ever GGGG or HHHH appears. They will take either G or H's as first values.

I tried a code some thing like this but this is giving wrong input becaue it is considering the last field of the key when ever G or H appears.
Could you guyz please suggest any thing better do solve this

Code:
awk '{a[$1]=a[$1]?a[$1]:$1;b[$1]=$NF}END{for(i in a) print a[i],b[i]}' OFS="\t" file | sort

# 2  
Old 08-06-2009
Hi.

It's not clear to me, based on your output, how you got to that, or what exactly you are looking for.

What exactly do you mean by "common key"? There's nothing in your output to demonstrate that. i.e. The common key for B and C should be FFFF? (my understanding of common being that which appears more). Am I missing something?

Thanks
# 3  
Old 08-06-2009
hi

Filtering GGGG and HHHH as 1st priority.


For example 1st case in INPUT1, a commonkey (A) has 3 values are similar (EEEE). So output should be A EEEE.
2nd case commonkey (B) has 1 value as GGGG and others are FFFF. In this we need to ignore every thing except GGGG. So the output should be B GGGG.
3rd case commonkey (C) also like the above, it has GGGG. So the output should be C GGGG
4th case commonkey (D) also like the above but having HHHH and others so the output should be D HHHH
5th case (E) all the values are similar so the output E EEEE


Briefly if the values are same the output will be same. When evr GGGG or HHHH comes in second column the output values changes to either of those like I mentioned in the previous post.

I hope you understand now.

Thanx
# 4  
Old 08-06-2009
Hi.

Thank you for the clearer description (unless it was already clear, in which case I apologise!). I think I understand!

Code:
awk '!ARR[$1] || ($2 ~ /[GGGG|HHHH]/)  {ARR[$1] = $2} END {for( A in ARR ) print A, ARR[A] }' file
 
Output:
A EEEE
B GGGG
C GGGG
D HHHH
E EEEEE


Last edited by Scott; 08-06-2009 at 03:14 PM..
# 5  
Old 08-07-2009
Working like the previous script.
Its printing this one as wrong A8 DRANGE the correct one has to be A8 HRANGE
I changed the code according my requirements like this

Code:
awk '!ARR[$1] || ($4 ~ /[GGGG|HHHH]/)  {ARR[$1] = $4} END {for( A in ARR ) print A, ARR[A] }' file

Input
Code:
$ cat ou3.txt
A1      1 2                     ARANGE
A1      2 4             ARANGE
A2      120 130                 BRANGE
A2      136 140         BRANGE
A3      2 1                     CRANGE
A3      4 2             CRANGE
A4      130 120                 DRANGE
A4      140 136         DRANGE
A5      15 20                   ERANGE
A5      50 60           ERANGE
A5      94 98           ERANGE
A6      20 15                   FRANGE
A6      60 50           FRANGE
A6      98 94           FRANGE
A7      1 2                     ARANGE
A7      2 4             ARANGE
A7      33 36           GRANGE
A8      88 84                   HRANGE
A8      130 120         DRANGE
A8      140 136         DRANGE

output

Code:
$ awk -f script3.awk ou3.txt
A4      DRANGE
A5      ERANGE
A6      FRANGE
A7      GRANGE
A8      DRANGE
A1      ARANGE
A2      BRANGE
A3      CRANGE


Last edited by repinementer; 08-07-2009 at 04:16 AM..
# 6  
Old 08-07-2009
Not sure whether I get it right. Instead of one awk to do everything, I split it up
Code:
awk '{ print $1, $4 }' ou3.txt | sort -u | \
awk '
{
        a[$1]=$2
        if ( $2 ~ /^[GH]/ ) {
                A[$1]=$2
        }
}
END {
        for ( i in a ) {
                if ( A[i] != "" ) {
                        print i, A[i]
                } else {
                        print i,a[i]
                }
        }
}'

Output:
Code:
A1 ARANGE
A2 BRANGE
A3 CRANGE
A4 DRANGE
A5 ERANGE
A6 FRANGE
A7 GRANGE
A8 HRANGE

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Filter file to remove duplicate values in first column

Hello, I have a script that is generating a tab delimited output file. num Name PCA_A1 PCA_A2 PCA_A3 0 compound_00 -3.5054 -1.1207 -2.4372 1 compound_01 -2.2641 0.4287 -1.6120 3 compound_03 -1.3053 1.8495 ... (3 Replies)
Discussion started by: LMHmedchem
3 Replies

2. Shell Programming and Scripting

Filter lines based on values at specific positions

hi. I have a Fixed Length text file as input where the character positions 4-5(two character positions starting from 4th position) indicates the LOB indicator. The file structure is something like below: 10126Apple DrinkOmaha 10231Milkshake New Jersey 103 Billabong Illinois ... (6 Replies)
Discussion started by: kumarjt
6 Replies

3. Shell Programming and Scripting

UNIX command -Filter rows in fixed width file based on column values

Hi All, I am trying to select the rows in a fixed width file based on values in the columns. I want to select only the rows if column position 3-4 has the value AB I am using cut command to get the column values. Is it possible to check if cut -c3-4 = AB is true then select only that... (2 Replies)
Discussion started by: ashok.k
2 Replies

4. Shell Programming and Scripting

awk to filter out lines containing unique values in a specified column

Hi, I have multiple files that each contain four columns of strings: File1: Code: 123 abc gfh 273 456 ddff jfh 837 789 ghi u4u 395 File2: Code: 123 abc dd fu 456 def 457 nd 891 384 djh 783 I want to compare the strings in Column 1 of File 1 with each other file and Print in... (3 Replies)
Discussion started by: owwow14
3 Replies

5. Shell Programming and Scripting

Filter uniq field values (non-substring)

Hello, I want to filter column based on string value. All substring matches are filtered out and only unique master strings are picked up. infile: 1 abcd 2 abc 3 abcd 4 cdef 5 efgh 6 efgh 7 efx 8 fgh Outfile: 1 abcd 4 cdef 5 efgh 7 efxI have tried awk '!a++; match(a, $2)>0'... (32 Replies)
Discussion started by: yifangt
32 Replies

6. Linux

Filter a .CSV file based on the 5th column values

I have a .CSV file with the below format: "column 1","column 2","column 3","column 4","column 5","column 6","column 7","column 8","column 9","column 10 "12310","42324564756","a simple string with a , comma","string with or, without commas","string 1","USD","12","70%","08/01/2013",""... (2 Replies)
Discussion started by: dhruuv369
2 Replies

7. UNIX for Dummies Questions & Answers

Command line / script option to filter a data set by values of one column

Hi all! I have a data set in this tab separated format : Label, Value1, Value2 An instance is "data.txt" : 0 1 1 -1 2 3 0 2 2 I would like to parse this data set and generate two files, one that has only data with the label 0 and the other with label -1, so my outputs should be, for... (1 Reply)
Discussion started by: gnat01
1 Replies

8. Shell Programming and Scripting

awk: assign variable with -v didn't work in awk filter

I want to filter 2nd column = 2 using awk $ cat t 1 2 2 4 $ VAR=2 #variable worked in print $ cat t | awk -v ID=$VAR ' { print ID}' 2 2 # but variable didn't work in awk filter $ cat t | awk -v ID=$VAR '$2~/ID/ { print $0}' (2 Replies)
Discussion started by: honglus
2 Replies

9. Shell Programming and Scripting

How to pick values from column based on key values by usin AWK

Dear Guyz:) I have 2 different input files like this. I would like to pick the values or letters from the inputfile2 based on inputfile1 keys (A,F,N,X,Z). I have done similar task by using awk but in that case the inputfiles are similar like in inputfile2 (all keys in 1st column and values in... (16 Replies)
Discussion started by: repinementer
16 Replies

10. Shell Programming and Scripting

Filter out lines of a cvs from values from an other file

Hi there, I have a comma seperated file which looks like 16-Jun-08,KLM forwarder,,AMS,DXB,AMS,C,Y,G10,074-02580900,milestone failed - message not received,C,OK,,13/06/2008 00:00,KL427,13/06/2008 00:00,KL427,Rebooked,C,milestone failed - message not received,milestone failed - evented... (3 Replies)
Discussion started by: sickboy
3 Replies
Login or Register to Ask a Question