Help with awk program


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help with awk program
# 1  
Old 10-22-2013
Help with awk program

i have two files,
one looks like this (file1):
Code:
novelMiR_892    novelMiR_891,
novelMiR_852    
novelMiR_893    
novelMiR_1661    
novelMiR_854    
novelMiR_1210    
novelMiR_1251    
novelMiR_855    
novelMiR_1252    
novelMiR_897    novelMiR_2336,novelMiR_2335,

and the second like this (file2):
Code:
>novelMiR_891
AAAABBBCCCDDD
>novelMiR_892
BBBCCCDDDEEEF
>novelMiR_852
HHHHGGGFFFDD

now I want rename all ">headers" which are in file 1 in the same line with the first name in file1. this is what I want (file3):
Code:
>novelMiR_892 (renamed)
AAAABBBCCCDDD
>novelMiR_892
BBBCCCDDDEEEF
>novelMiR_852
HHHHGGGFFFDD

the first renamed, because it is the same as 891 (seen from file 1)

my solution is (BUT DOES NOT WORK):
Code:
awk 'NR==FNR{n[$1]=$1","$2;next} { $1 ~ ">" ;
name=substr($1,2,length($1)-1); getline seq; 
{for (i in n) if(n[i] ~ /'"$name"'/) names=i} print names "\n" seq > "file3" }' file1 file2

explanation:

first I create an array with all names concatenated by "," and indexed with the names I want to be used.
Code:
n[novelMiR_892] = novelMiR_892,novelMiR_891,

now I get line for line all names (without ">") and the corresponding sequences and compare if the name is one of the n-array. if yes the index should be kept and printed.

But I always get only the first name for all sequences:
Code:
>novelMiR_892
AAAABBBCCCDDD
>novelMiR_892
BBBCCCDDDEEEF
>novelMiR_892
HHHHGGGFFFDD

where is ma fallacy....
# 2  
Old 10-22-2013
Try


Code:
$ cat file1
novelMiR_892    novelMiR_891,
novelMiR_852    
novelMiR_893    
novelMiR_1661    
novelMiR_854    
novelMiR_1210    
novelMiR_1251    
novelMiR_855    
novelMiR_1252    
novelMiR_897    novelMiR_2336,novelMiR_2335,

Code:
$ cat file2
>novelMiR_891
AAAABBBCCCDDD
>novelMiR_892
BBBCCCDDDEEEF
>novelMiR_852
HHHHGGGFFFDD

Code:
$ awk -F'[ ,]' 'FNR==NR{a=$1;$1="";gsub(" ",">");Arr[">"$0]=a;next}{for(i in Arr)$1=(i~$1)?">"Arr[i]:$1}1'  file1 file2

Resulting
Code:
>novelMiR_892
AAAABBBCCCDDD
>novelMiR_892
BBBCCCDDDEEEF
>novelMiR_852
HHHHGGGFFFDD


Last edited by Akshay Hegde; 10-22-2013 at 10:43 AM..
This User Gave Thanks to Akshay Hegde For This Post:
# 3  
Old 10-22-2013
@OP: You are not using a next statement and it seems you have the two files reversed and this: /'"$name"'/ uses a shell variable "$name".

Alternatively try:
Code:
awk 'NR==FNR{for(i=2; i<=NF; i++) if($i)A[">" $i]=$1; next} $1 in A{$1=A[$1]}1' FS='[ \t]*|,' file1 file2


Last edited by Scrutinizer; 10-22-2013 at 09:58 AM..
This User Gave Thanks to Scrutinizer For This Post:
# 4  
Old 10-22-2013
nearly working!

Scutinizers work's nearly perfect. only the ">" is missing on all renamed headers.

as i don't understand the solution at all, i can't add it!

what means the 1 at the end - to print it?


Akshay works nearly.

this are only test-data, but in structure completely correct.

the renamed headers start with a unusual character and looks like this:
Code:
 >novelMiR_11    novelMiR_10

# 5  
Old 10-22-2013
Quote:
Originally Posted by Scrutinizer
Alternatively try:
Code:
awk 'NR==FNR{for(i=2; i<=NF; i++) if($i)A[">" $i]=$1; next} $1 in A{$1=A[$1]}1' FS='[ \t]*|,' file1 file2

I think you missed >suppose if we take one more file say file3

Code:
$ cat file3
>novelMiR_891
AAAABBBCCCDDD
>novelMiR_892
BBBCCCDDDEEEF
>novelMiR_852
HHHHGGGFFFDD
>novelMiR_2336
Test1 - Check
>novelMiR_2335
Test2 -Check

it's resulting

Code:
novelMiR_892
AAAABBBCCCDDD
>novelMiR_892
BBBCCCDDDEEEF
>novelMiR_852
HHHHGGGFFFDD
novelMiR_897
Test1 - Check
novelMiR_897
Test2 -Check

Modified version of Scrutinizer's code

Code:
$ awk 'NR==FNR{for(i=2; i<=NF; i++) if($i)A[">" $i]=$1; next} $1 in A{$1=">"A[$1]}1' FS='[ \t]*|,' file1 file3

This User Gave Thanks to Akshay Hegde For This Post:
# 6  
Old 10-22-2013
Thanks, yes I left out a ">", so Akshay posted a correction. This would be an alternative:
Code:
awk 'NR==FNR{for(i=2; i<=NF; i++) if($i)A[">" $i]=">" $1; next} $1 in A{$1=A[$1]}1' FS='[ \t]*|,' file1 file2

Perhaps this is a bit clearer:
Code:
awk 'NR==FNR{for(i=2; i<=NF; i++) if($i)A[$i]=$1; next} $2 in A{$2=A[$2]}1' FS='[ \t]*|,' file1 FS=\> OFS=\> file2



--
Quote:
Originally Posted by dietmar13
[..]
what means the 1 at the end - to print it?
[..]
Yes it means print the record (in this case print the entire line)...

Last edited by Scrutinizer; 10-22-2013 at 11:21 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

If else in awk program

Hi All , I have set of input files with one of the fields as below File 1 -Field 1=BUDGET_CURR_FX File 2 -Field 1=BUDGET_MTH_AVGFX File 3 -Field 1=BUDGET_PREV Now i need to include one extra field in the new file as below Output File 1 -Field 1= BUDGET Field2=CURRENT_FX Output File... (2 Replies)
Discussion started by: Hypesslearner
2 Replies

2. Shell Programming and Scripting

awk program in perl

Hi All, I have an AWK code snippet which I want to use in Perl. How can I achieve the same thing in perl? Here I am searching for a pattern in a file and from that matching line, I am extracting the 3rd column value and storing it in a variable which I later on use this value in a if condition. ... (2 Replies)
Discussion started by: sanzee007
2 Replies

3. Shell Programming and Scripting

awk program

I have a csv file as below : NAME,5a-6a,6a-7a,7a-8a,8a-9a,9a-10a,10a-11a,11a-12n,12n-1p,13p-14p,14p-15p,15p-16p,16p-17p,17p-18p,18p-19p,19p-20p,20p-21p,21p-22p,22p-23p,11p-12m, TOTALS... (6 Replies)
Discussion started by: deo_kaustubh
6 Replies

4. Shell Programming and Scripting

AWK program....

Dear All, facing problem to get data in different fields. SO i am using # to get the data.. please refer my code below... BEGIN { FS=" " } { if ( $1 == "START" ) { i = i+1; SFILENAME = FILENAME } if ( substr($1,2,8) == "filename" ) ... (1 Reply)
Discussion started by: arvindng
1 Replies

5. Shell Programming and Scripting

Problem with awk awk: program limit exceeded: sprintf buffer size=1020

Hi I have many problems with a script. I have a script that formats a text file but always prints the same error when i try to execute it The code is that: { if (NF==17){ print $0 }else{ fields=NF; all=$0; while... (2 Replies)
Discussion started by: fate
2 Replies

6. Shell Programming and Scripting

AWK program

Hi all, I have the following problem and hope someone could help me. I have 184 files, each with 5 columns (c1, c2, c3, c4, c5). I am only interrested in column 5, and would like to paste column 5 from all the 184 files into one file. I have tried the following with two files awk... (20 Replies)
Discussion started by: awkliker
20 Replies

7. UNIX for Dummies Questions & Answers

AWK Program

:(Hi all, I have a doubt in AWK program. I am writing an awk program which accepts a parameter as input and it should display the corresponding details. if the country as "US" it should be displayed as "United States of America" and the country as "IN" India should be displayed. How can I... (2 Replies)
Discussion started by: sivakumar.rj
2 Replies

8. UNIX for Dummies Questions & Answers

AWK program

Hi all, I need to grep the 3 characters from a file, and to fetch the corresponding words to that character. My file is in the following format.. The below text will be in the separate file....say file2.txt ABC This is the first text. DEF This is the second text. GH1 9.8.7890 AB1... (1 Reply)
Discussion started by: sivakumar.rj
1 Replies

9. Shell Programming and Scripting

Error in AWK Program

Hi Friends, I need your help. I am not able to execute one awk program .If you can solve the following small program then i can solve other one. $ vi prg #!/bin/awk -f BEGIN { # Print the squares from 1 to 10 the first way i=1; while (i <= 10) { ... (3 Replies)
Discussion started by: bikas_jena
3 Replies

10. UNIX for Dummies Questions & Answers

AWK Program File Help

I have some .dat files that I cannot open and read the data. It is an awk program file, and my question would be to you all is there a way to convert this awk file to ascii text? Thanks (10 Replies)
Discussion started by: ryangfm
10 Replies
Login or Register to Ask a Question