Challenging Awk array problem


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Challenging Awk array problem
# 15  
Old 05-22-2010
Quote:
Originally Posted by alister
What do you mean you should not type "paste"? Or that you should not use \\n? "paste" is the name of the command. You must type it. "\\n" is an option argument to paste that tells it to use a newline when merging the two files; it is crucial that it is used. You should enter the code exactly as posted: "paste -d\\n file1 file2 | awk....". If the data files aren't named file1 and file2, then change the filenames to point to the correct locations, but nothing else.
I tried pasting the code exactly, but it says the command paste is not recognized as external or internal command. Then I changed the single quotes to double quotes and that message wont show up, but i get the same s=$11^unexpected newline error.
# 16  
Old 05-22-2010
Here's a simpler approach that only uses AWK (and which I should've suggested from the beginning):
Code:
awk '{chr=$6; min=$7; max=$8; s=$11" "$12" "$13; getline < f2; if (chr==$4 && $5>=min && $5<=max) print $0, s;}' f2='file2' file1

You can change what's in red to suit your needs, but leave the rest as is.

Changing the quotes as you say you did completely changes the meaning of the code; using double quotes at the outer level causes every instance of a dollar sign folowed by a digit to be expanded by the shell instead of being passed literally to AWK for its use. AWK will never see them. Also, you are altering what is quoted and what is not quoted by creating unintended quoted strings with the double quotes that were embedded in the single quotes which were removed.

Alister

Last edited by alister; 05-22-2010 at 01:01 AM..
This User Gave Thanks to alister For This Post:
# 17  
Old 05-22-2010
Quote:
Originally Posted by alister
Here's a simpler approach that only uses AWK (and which I should've suggested from the beginning):
Code:
awk '{chr=$6; min=$7; max=$8; s=$11" "$12" "$13; getline < f2; if (chr==$4 && $5>=min && $5<=max) print $0, s;}' f2='file2' file1

You can change what's in red to suit your needs, but leave the rest as is.

Changing the quotes as you say you did completely changes the meaning of the code; using double quotes at the outer level causes every instance of a dollar sign folowed by a digit to be expanded by the shell instead of being passed literally to AWK for its use. AWK will never see them. Also, you are altering what is quoted and what is not quoted by creating unintended quoted strings with the double quotes that were embedded in the single quotes which were removed.

Alister
Alister, thanks for the code. But when I run it, it gives me an error message "the system can not find the file specified" even though the files are in the installation directory.

Also, while running other scripts I always use double quotes and it works fine. Single quotes doesn't work in my windows laptop. They work fine in my ubuntu office computer though.

It might be easier to run this in a script with -f option, but then the code might have to be changed a bit at the end so that the files will be given as in put in the command prompt. Here file 1 can be removed from the script and put in the command prompt but not sure how file2 should be accommodated.

Last edited by polsum; 05-22-2010 at 01:50 PM..
# 18  
Old 05-22-2010
If you prefer it that way, sure.

Code:
awk -f dna.awk f2='file2' file1

Where dna.awk contains:
Code:
{chr=$6; min=$7; max=$8; s=$11" "$12" "$13; getline < f2; if (chr==$4 && $5>=min && $5<=max) print $0, s;}

If you still have problems, copy-paste the commands exactly as you ran them and the error messages exactly as you see them. Perhaps that will enable someone with experience dealing with unix tools on windows to help out.

Alister

---------- Post updated at 01:19 PM ---------- Previous update was at 01:14 PM ----------

Or, simpler still, you can put it into a shell script:

Code:
#!/bin/sh

awk '{chr=$6; min=$7; max=$8; s=$11" "$12" "$13; getline < f2; if (chr==$4 && $5>=min && $5<=max) print $0, s;}' f2="$2" "$1"

Assuming the shell script file is named dna.sh, exists in the current directory, and is made executable, it can be run thusly:
Code:
$ ./dna.sh file1 file2
4|17999 - gi|149361523|ref|NC_000074.5|NC_000074 chr1  3000072  TTTATCGTCATCGTC L1_Mur2 LINE L1

Or, if not an executable file:
Code:
$ sh dna.sh file1 file2
4|17999 - gi|149361523|ref|NC_000074.5|NC_000074 chr1  3000072  TTTATCGTCATCGTC L1_Mur2 LINE L1

This User Gave Thanks to alister For This Post:
# 19  
Old 05-22-2010
Quote:
Originally Posted by alister
If you prefer it that way, sure.

Code:
awk -f dna.awk f2='file2' file1

Where dna.awk contains:
Code:
{chr=$6; min=$7; max=$8; s=$11" "$12" "$13; getline < f2; if (chr==$4 && $5>=min && $5<=max) print $0, s;}

If you still have problems, copy-paste the commands exactly as you ran them and the error messages exactly as you see them. Perhaps that will enable someone with experience dealing with unix tools on windows to help out.

Alister

[/code]
I tried this, but the program just runs and prints nothing and goes back to command prompt. No error messages whatsoever.

I ran this:
awk -f dna.awk f2='2.txt' 1.txt

I use gawk, is there a significant difference between regular awk and gawk?



Aaahh! now its working. Ok I removed the single quotes on '2.txt' and its working. So the code that worked is awk -f dna.awk f2=2.txt 1.txt

I knew there is a simple mistake I have been making.

Thank you very much you all, particularly Alister. You rock.

Last edited by polsum; 05-22-2010 at 03:06 PM..
# 20  
Old 05-24-2010
Ok - Now I am in to another problem (life is tough!). May be I did not explain this properly and I am apologize for it. The code here seems to assume line to line matching of file 1 and file 2. But my actual files (which are very big) do not match line by line. For example let me re-frame the original files.

file 1 (THIS IS SAME AS ORIGINAL)
HTML Code:
607    687    174    0    0    chr1    3000001    3000156    -194195276    -    L1_Mur2    LINE    L1    -4310    1567    1413    1
607    917    214    114    45    chr1    3000237    3000733    -194194699    -    L1_Mur2    LINE    L1    -4488    1389    913    1
607    215    31    0    30    chr1    3000733    3000766    -194194666    +    (TTTG)n    Simple_repeat    Simple_repeat    2    33    0    2
607    845    233    76    114    chr1    3000766    3000792    -194194640    -    L1_Mur2    LINE    L1    -6816    912    887    1
607    621    250    65    37    chr1    3001287    3001583    -194193849    -    Lx9    LINE    L1    -1596    6048    5742    3
607    1320    197    332    7    chr1    3001722    3002005    -194193427    -    RLTR25A    LTR    ERVK    0    1028    625    4
file 2

HTML Code:
4|17999 - gi|149361523|ref|NC_000074.5|NC_000074 chr1  3000072  TTTATCGTCATCGTC
28|3721 + gi|149352351|ref|NC_000069.5|NC_000069  chr3  154935392 GAGTTTTACAGTCCA
28|3721 +  gi|149288852|ref|NC_000067.5|NC_000067 chr1  152633707 GAGTTTTACAGTCCA
28|3721  + gi|149361432|ref|NC_000073.5|NC_000073 chr1  3000073 GAGTTTTACAGTCCA
34|3145  - gi|149321426|ref|NC_000084.5|NC_000084 chr1 3000767 ACGGCTTACGA
34|3145  - gi|149354224|ref|NC_000071.5|NC_000071 chr5  37676290 ACGGCTTACGA
So the output should be,

HTML Code:
4|17999 - gi|149361523|ref|NC_000074.5|NC_000074  chr1  3000072 TTTATCGTCATCGTC L1_Mur2    LINE    L1
28|3721  + gi|149361432|ref|NC_000073.5|NC_000073 chr1  3000073  GAGTTTTACAGTCCA     L1_Mur2    LINE    L1
34|3145  - gi|149321426|ref|NC_000084.5|NC_000084 chr1 3000767 ACGGCTTACGA     (TTTG)n    Simple_repeat    Simple_repeat
The code here seems to be matching the two files line to line. I tried editing the code to this purpose but of no avail. Please help me. thanks.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Index problem in associate array in awk

I am trying to reformat the table by filling any missing rows. The final table will have consecutive IDs in the first column. My problem is the index of the associate array in the awk script. infile: S01 36407 53706 88540 S02 69343 87098 87316 S03 50133 59721 107923... (4 Replies)
Discussion started by: yifangt
4 Replies

2. Shell Programming and Scripting

Problem with awk array when loading from shell variable

Hi, I have a problem with awk array when iam trying to use awk in solaris box as below..Iam unable to figure out the problem.. Need your help. is there any alternative to make it in arrays from variable values nawk 'BEGIN {SUBSEP=" "; split("101880|110045 101887|110045 101896|110045... (9 Replies)
Discussion started by: cskumar
9 Replies

3. Shell Programming and Scripting

Using awk array problem

I am trying to map values in the input file, where 2nd column depends on the specific value in the 1st column. When 1st column is A place 1 into 2nd column, when it is B, place 2, when C place 3, otherwise no change. My input: U |100|MAIN ST |CLMN1|1 A |200|GREEN LN |CLMN2|2 1 |12... (4 Replies)
Discussion started by: migurus
4 Replies

4. Shell Programming and Scripting

awk array problem

Hi, Im trying to count bats flying through an infrared beam array. One of the experts here helped me a few months ago but now I am having a problem that is stumping me. here is the original code that works (with two differnt patterns in array): # this has been changed to operate under the... (15 Replies)
Discussion started by: cmp260
15 Replies

5. Shell Programming and Scripting

AWK Array problem

Dear All, I am facing problem to get right output through awk program I have file in which “B” value is appearing multiple time and I need to capture all these values. My script is BEGIN { FS=" " } { if ( substr($1,1,5) == "START" ) { i =... (2 Replies)
Discussion started by: arvindng
2 Replies

6. Shell Programming and Scripting

Problem with lookup values on AWK associative array

I'm at wits end with this issue and my troubleshooting leads me to believe it is a problem with the file formatting of the array referenced by my script: awk -F, '{if (NR==FNR) {a=$4","$3","$2}\ else {print a "," $0}}' WBTSassignments1.txt RNCalarms.tmp On the WBTSassignments1.txt file... (2 Replies)
Discussion started by: JasonHamm
2 Replies

7. Shell Programming and Scripting

awk array problem

hi i am trying to perform some calculations with awk and arrays. i have this so far: awk 'NR==FNR{ for(i=1; i<=NF; i++) {array+=$i} tot++;next} {for(i=1; i<=NF; i++) {avg=array/tot} {diff=(array - avg)}} {for(i=1; i<=NF; i++) {printf("%5.8f\n",diff)}}' "$count".txt "$count".ttt >... (4 Replies)
Discussion started by: npatwardhan
4 Replies

8. Shell Programming and Scripting

Very Challenging Problem. Please read fully.

Hi, This is the Third thread i'm putting here for the same problem. :( Actually, i'm trying a script like this.. but its taking a long time.. about 3 days to complete fully.. #!/bin/ksh if then exit 1 fi while read i do while read j do field7=`echo $j|cut -d "|"... (12 Replies)
Discussion started by: RRVARMA
12 Replies

9. Programming

A challenging problem involving symbolic links.

Hello, I'm working on an application that bridges together several applications involved in creating a video workflow for editing with digital cinema cameras. The main platform is MacOSX. Because of the nature of some of the utilities for working with this video footage I must spoof filenames... (2 Replies)
Discussion started by: ibloom
2 Replies

10. UNIX for Dummies Questions & Answers

A Challenging Situation : i hope the moderators will respond to this problem..

I have the following situation : i have 4 Unix Sco servers, one Windows 2000 server, and an ADSL internet connection. All the servers, that is the 4 unix and the windows server have real static IPs supplied by my ISP. the servers are connected to a Switch , the switch is connected to an... (2 Replies)
Discussion started by: BAM
2 Replies
Login or Register to Ask a Question