Extract columns where header matches a given string


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract columns where header matches a given string
# 1  
Old 03-17-2011
Extract columns where header matches a given string

Hi,

I'm having trouble pulling out columns where the headers match a file of key ID's I'm interested in and was looking for some help.

file1.txt
Code:
I
Name
34
56
84
350
790
1215
1919
7606
9420

file2.txt
Code:
I Name 1 1 2 2 3 3 ... 34 34... 56 56... 84 84... 350 350... 
M 1      A A A A B B    A   A      A  B     B  A       A    A
M 2      A A B A B B    B   A      A  B     B  A       A    A
M 3      A A A A B B    A   A      A  B     A  A       B    A
M 4      A B A A B B    A   B      A  A     B  A       A    A


I would like to create a new file where the columns from file2 match ids from file 1, so:

Code:
I Name 34 34 56 56 84 84 350 350...9420 9420
M 1      A   A   A  B  B  A    A    A  ...  B     A
M 2      B   A   A  B  B  A    A    A  ...  B     A
M 3      A   A   A  B  A  A    B    A  ...  B     B

I initially tried the following:

Code:
% awk '{print $1}'  file1.txt | while read i; do awk 'NR=1;{for (j=0;j<=NF;j++) {if ($j == $i) break; }} {printf "%s ",$i}' file2.txt;done > file3.txt

But I'm obviously doing something wrong as I get the entire matrix "X" printed first then on subsequent lines I get the whole matrix printed twice

Code:
% vi file3.txt

X
X X
X X
X X
...
X

Thanks in advance.
Moderator's Comments:
Mod Comment Please use code tags when posting data and code samples!

Last edited by vgersh99; 03-17-2011 at 05:09 PM.. Reason: code tags, please!
# 2  
Old 03-17-2011
nawk -f flot.awk file1.txt file2.txt

flot.awk:
Code:
FNR==NR {f1[$1];next}
FNR==1{
       for(i=1;i<=NF;i++) {
         f2[i]=$i
         if ($i in f1)
            printf("%c%s", c++?OFS:"", $i)
       }
       print ""
       next
}
{
  b=0
  for(i=1;i<=NF;i++)
    if (f2[i] in f1)
       printf("%c%s", b++?OFS:"",$i)
  print ""
}


Last edited by vgersh99; 03-17-2011 at 06:19 PM.. Reason: ooops - misunderstood the requirements...
# 3  
Old 03-17-2011
it may be done with this trick too, you could try:

save following statement into a file, let's say, myawk.sh
Code:
awk '{for (i=1;i<=NF;i++)a[i,NR]=$i; }END{for(i=1;i<=NF;i++) {for(j=1;j<=NR;j++)printf a[i,j]" ";print ""} }'

then execute this statement: assume file1 & file2 are your two files
Code:
sh myawk.sh<file2 |awk 'NR==FNR{a[$1]=1;} NR>FNR{if ($1 in a )print $0}' file1 - |sh myawk.sh

# 4  
Old 03-17-2011
Thanks for the quick reply unfortunately your example doesn't work.

I tried to run flot.awk and got the following

A single entry for the column header ~100 columns and then the entire matrix ~10000 columns * n rows below the header. What I was hoping to get was a matrix of 200 columns * n rows.


Any idea why your script may be only matching the header once instead of the two times I expect and why it's printing the entire matrix following that? The input file2 is separated by whitespace, does it need -F ' ' somewhere?

Thanks again
# 5  
Old 03-17-2011
Hmm..... strange - worked with my sample files.
Could you provide the sample files you were using?
Tailor it down to be small enough and be representative as well - say 5 records in file1 and 20 columns in file2.
# 6  
Old 03-17-2011
Thanks for the replies

@sk1418
I tried your example, but never received anything back from stdout after running the scripts. I've tried to run the first awk command inputing file2.txt and received some output but from that point on the computer just hangs.

@vgersh99
Here's an example of the files. And I appologise, I'm not sure if a file should be code tagged?

file1.txt:
Code:
I
Name
104069
109706
113889
115561
115979

file2.txt
Code:
I Name 021071 021071 021215 021215 104069 104069 124236 124236 212439 212439 109706 109706 215398 215398 113889 113889 272728 272728 360976 360976 
M 10080_CO B B B B B B B B B B B B B B B B A A B B
M 10068_CO A A A A A A B B A A A A A A A A A A A A
M 12187_ND B B B B B B B B B B B B B B B B B B B B
M GGA_0061 A A A A A B B B A B B B A B A A A B A B
M GGA_0013 A B A B A B A B A B A B A B B B A A A B
M GGA_0024 A A A B A A A A A A A B A A A A A B A A
M GGA_0025 B B B B B B B B B B B B B B B B B B B B

Moderator's Comments:
Mod Comment Please use code tags when posting data and code samples!
# 7  
Old 03-17-2011
hi, i tried your examples:
Code:
ArchT60 23:39:39 /tmp
kent$ sh tawk.sh<f2 |awk 'NR==FNR{a[$1]=1;} NR>FNR{if ($1 in a )print $0}' f1 - |sh tawk.sh 
I Name 104069 104069 109706 109706 113889 113889 
M 10080_CO B B B B B B 
M 10068_CO A A A A A A 
M 12187_ND B B B B B B 
M GGA_0061 A B B B A A 
M GGA_0013 A B A B B B 
M GGA_0024 A A A B A A 
M GGA_0025 B B B B B B

if the myawk.sh doesn't work there, try this:

Code:
awk '{for (i=1;i<=NF;i++)a[i,NR]=$i; }END{for(i=1;i<=NF;i++) {for(j=1;j<=NR;j++)printf a[i,j]" ";print ""} }' f2 |awk 'NR==FNR{a[$1]=1;} NR>FNR{if ($1 in a )print $0}' f1 - |awk '{for (i=1;i<=NF;i++)a[i,NR]=$i; }END{for(i=1;i<=NF;i++) {for(j=1;j<=NR;j++)printf a[i,j]" ";print ""} }'

actually the myawk.sh just for save some words. as you can see above statement, same part of codes duplicated at the beginning and end.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk add all columns if column 1 name matches

Hi - I want to add all columns if column1 name matches. TOPIC1 5 1 4 TOPIC2 3 2 1 TOPIC3 7 2 5 TOPIC1 6 3 3 TOPIC2 4 1 3 TOPIC3 9 5 4 . . . . . . . . . . . . Result should look like TOPIC1 11 4 7 TOPIC2 7 3 4 (1 Reply)
Discussion started by: oraclermanpt
1 Replies

2. UNIX for Beginners Questions & Answers

Matches columns from two different files in shell script

Hi friends, i want to compare first columns from two different files ,if equal print the file2's second column else print the zero.Please help me... file1: a b c d efile2: a 1 c 20 e 30 desired output: 1 0 20 0 30 Please use CODE tags as required by forum rules! Please post in... (1 Reply)
Discussion started by: bhaskar illa
1 Replies

3. UNIX for Beginners Questions & Answers

Extract the whole set if a pattern matches

Hi, I have to extract the whole set if a pattern matches.i have a file called input.txt input.txt ------------ CREATE TABLE ABC ( A, B, C ); CREATE TABLE XYZ ( X, Y, Z, P, Q ); (6 Replies)
Discussion started by: raju2016
6 Replies

4. UNIX for Dummies Questions & Answers

Print Matches to New Columns

Hi all, I have a problem that I'm struggling to resolve. I have two files that look like this: File 1 654654654 3 987987987 2 321321321 1 File 2 14NS0064 654654654 14NS0054 654654654 14NS0032 654654654 14NS0090 987987987 14NS0093 987987987 14NS0056 321321321 As you may notice,... (2 Replies)
Discussion started by: winkleman
2 Replies

5. Shell Programming and Scripting

Blocks of text in a file - extract when matches...

I sat down yesterday to write this script and have just realised that my methodology is broken........ In essense I have..... ----------------------------------------------------------------- (This line really is in the file) Service ID: 12345 ... (7 Replies)
Discussion started by: Bashingaway
7 Replies

6. Shell Programming and Scripting

Extract columns based on header

Hi to all, I have two files. File1 has no header, two columns: sample1 A sample2 B sample3 B sample4 C sample5 A sample6 D sample7 D File2 has a header, except for the first 3 columns (chr,start,end). "sample1" is the header for the 4th ,5th ,6th columns, "sample2" is the header... (4 Replies)
Discussion started by: aec
4 Replies

7. Shell Programming and Scripting

Merge two columns from two files into one if another column matches

I have two text files that look something like this: A:B:C 123 D:E:F 234 G:H:I 345 J:K:L 123 M:N:O 456 P:Q:R 567 A:B:C 456 D:E:F 567 G:H:I 678 J:K:L 456 M:N:O 789 P:Q:R 890 I want to find the line where the first column matches and then combine the second columns into a single... (8 Replies)
Discussion started by: pbluescript
8 Replies

8. Shell Programming and Scripting

Need awk help to print specific columns with as string in a header

awk experts, I have a big file of 4000 columns with header. Would like to print the columns with string value of "Commands" in header. File has "," separator. This file is on ESX host with Bash. Thanks, Arv (21 Replies)
Discussion started by: arv_cds
21 Replies

9. Shell Programming and Scripting

Joining columns from two files, if the key matches

I am trying to join/paste columns from two files for the rows with matching first field. Any help will be appreciated. Files can not be sorted and may not have all rows in both files. Thanks. File1 aaa 111 bbb 222 ccc 333 File2 aaa sss mmmm ccc kkkk llll ddd xxx yyy Want to... (1 Reply)
Discussion started by: sk_sd
1 Replies

10. Shell Programming and Scripting

Extract if pattern matches

Hi All, I have an input below. I tried to use the awk below but it seems that it ;s not working. Can anybody help ? My concept here is to find the 2nd field of the last occurrence of such pattern " ** XXX ccc ccc cc cc ccc 2007 " . In this case, the 2nd field is " XXX ". With this "XXX" term... (20 Replies)
Discussion started by: Raynon
20 Replies
Login or Register to Ask a Question