awk calculation with zero as N/A


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk calculation with zero as N/A
# 8  
Old 05-23-2016
But isn't he or she calculating (1-f1[$1]/$3)? With $1 == ABHD12, f1["ABHD12"] == 10 (from file1), so the result should be around 1 - 0,015 = 0.985, shouldn't it?
This User Gave Thanks to RudiC For This Post:
# 9  
Old 05-23-2016
I need to clean-up the data a bit more and will post back tomorrow. Thank you very much Smilie.
# 10  
Old 05-23-2016
I think part of my problem is that in the attached files, using the awk below Iam getting the correct counts for most of the ids. However, in cases like RYK I get an output of 250 in $2, but if I manually look at each of the files I count 259 in $2.

awk
Code:
awk '{A[$3] += $2} END{for (i in A) print i, A[i]}' NA12878_newheader_base_counts_lessthan_30reads_perbase_lessthan_genes.txt (file1) NS12911_newheader_base_counts_lessthan_30reads_perbase_lessthan_genes.txt(file2) > all_genes_bases.txt

# 11  
Old 05-24-2016
Quote:
Originally Posted by cmccabe
I think part of my problem is that in the attached files, using the awk below Iam getting the correct counts for most of the ids. However, in cases like RYK I get an output of 250 in $2, but if I manually look at each of the files I count 259 in $2.

awk
Code:
awk '{A[$3] += $2} END{for (i in A) print i, A[i]}' NA12878_newheader_base_counts_lessthan_30reads_perbase_lessthan_genes.txt (file1) NS12911_newheader_base_counts_lessthan_30reads_perbase_lessthan_genes.txt(file2) > all_genes_bases.txt

I'm confused...

You ask us to download over a megabyte of data in a gzipped tar file that contains two files:
Code:
NA12878_newheader_base_counts_lessthan_30reads_perbase.bed
NS12911_newheader_base_counts_lessthan_30reads_perbase.bed

Neither of which are referenced by the above above code (even after removing the parenthetical elements from the file list).
And, if we change the script above to:
Code:
awk '{A[$3] += $2} END{for (i in A) print i, A[i]}' NA12878_newheader_base_counts_lessthan_30reads_perbase.bed NS12911_newheader_base_counts_lessthan_30reads_perbase.bed > all_genes_bases.txt

the output produced is never going to have anything with an alphabetic string in the 1st output field because neither of these input file contain any alphabetic characters in their third fields.

Please explain what is going on here!
This User Gave Thanks to Don Cragun For This Post:
# 12  
Old 05-24-2016
added ICODE tags

I apologize for the confusion and will post back in a bit with a better example. Part of the issue that I am having, besides the zero line after most cases, is that some of the initial calculations are incorrect. The awk posted works for most but not all. Again I apologize and will post better examples with an explanation. Thank you Smilie.

---------- Post updated at 08:03 AM ---------- Previous update was at 05:16 AM ----------

I believe I found my error on the miscalculation issue I was having in the above confusing post. I am not sure why in the output there are leading and trailing zero's or how to fix that. As you suspected that is happening but why is a mystery to me Smilie. Thank you Smilie.

current output
Code:
0
0
0
0
0
0
0
0
0
0
0
AASS 99.26%
0
0
ABCA10 97.61%
0
ABCA12 99.97%

desired output
Code:
AASS 99.26%
ABCA10 97.61%
ABCA12 99.97%

files used to calculate those:
file1
Code:
AASS 24
ABCA10 103
ABCA12 3

file2
Code:
AASS 23 3241
ABCA10 28 4301
ABCA12 52 8804

calculation math:
$1 in is matched to $1 of file2 and the file1 $2 / the file2 $3 x 100 =x
100 - x = %

example
Code:
AASS  24/3241 x 100 = 0.74
100 - 0.74 = 99.26%

# 13  
Old 05-24-2016
Hello cmccabe,

Could you please try following and let me know if this helps.
Code:
awk 'FNR==NR{A[$1]=$2;next} ($1 in A){X=(A[$1]/$3)*100;printf("%s %.2f\n",$1,  100-X)}' file1 file2

Output will be as follows.
Code:
AASS 99.26
ABCA10 97.61
ABCA12 99.97

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 14  
Old 05-24-2016
works great.... thank you very much Smilie.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk split and awk calculation in the same command

I am trying to run the awk below. My question is when I split the input, then run anotherawk to perform a calculation using that splitas the input there are no issues. When I try to combine them the output is not correct, is the split not working or did I do it wrong? Thank you :). input ... (8 Replies)
Discussion started by: cmccabe
8 Replies

2. Programming

arithmetic calculation using awk

hi there again, i need to do a simple division with my data with a number of rows. i think i wanted to have a simple output like this one: col1 col2 col3 val1 val2 val1/val2 valn valm valn/valm any suggestion is very much appreciated. thanks much. (2 Replies)
Discussion started by: ida1215
2 Replies

3. Shell Programming and Scripting

awk/sed percentage calculation

Hi all i have a text file with columns delimited with , 2010-08-18,10,24,.09751,39,7,14872,26732 . . . i would to add a extra column in the end with percentage calculation of columns 5 and 8 ie (39/26732)*100 so the output must look like ... (6 Replies)
Discussion started by: posner
6 Replies

4. Shell Programming and Scripting

Calculation in Multiple files using awk

Hi All, I have some 10 files named samp1.csv, samp2.csv,... samp10.csv Each file having the same number of fields like, Count, field1, field2, field3. And a source.csv file which has three fields field1, field2, field3. Now, i want to find the total count by taking the field1,... (8 Replies)
Discussion started by: johnwilliams.sp
8 Replies

5. Shell Programming and Scripting

awk - calculation of probability density

Hi all! I have the following problem: I would like to calculate using awk a probability of appearing of a pair of numbers x and y. In other words how frequently do these numbers appear? In the case of only one integer number x ranged for example from 1 to 100 awk one liner has the form: awk... (4 Replies)
Discussion started by: jarowit
4 Replies

6. Shell Programming and Scripting

Antilog calculation in awk or sed

Dear Friends, Anybody knows how to take antilog of an value in unix. Thanks in advance Vasanth (2 Replies)
Discussion started by: vasanth.vadalur
2 Replies

7. Shell Programming and Scripting

File Size calculation with AWK

Hello Friends, Im calculating file sizes with below AWK script. I do this before some spesific files are transferred. I run the script it works but after several running it stuck with a limit of 2147483647 (2 Gbytes -1 byte) and cant exceed this. Something is wrong and I can't proceed, would... (1 Reply)
Discussion started by: EAGL€
1 Replies

8. Shell Programming and Scripting

awk calculation problem

I have a list of coordinate data, sampled below. 54555209 784672723 I want it as: 545552.09 7846727.23 Below is my script: BEGIN {FS= " "; OFS= ","} {print $1*.01,$2*.01} This is my outcome: 5.5e7 7.8e8 How do I tell awk that I want to keep all the digits instead of outputting... (1 Reply)
Discussion started by: ndnkyd
1 Replies

9. UNIX for Advanced & Expert Users

Reattemps Calculation using awk

Dear All How are you I have files which look like this : 20080406_12:43:55.779 ISC Sprint- 39 21624032999 218925866728 20080406_12:44:07.811 ISC Sprint- 20 21620241815 218927736810 20080406_12:44:00.485 ISC Sprint- 50 21621910404 218913568053... (0 Replies)
Discussion started by: zanetti321
0 Replies

10. Shell Programming and Scripting

awk calculation

Hallo all, I have a script which creates an output ... see below: root@a7germ:/tmp/pax > cat 20061117.txt 523.047 521.273 521.034 517.367 516.553 517.793 513.114 513.940 I would like to use awk to calculate the (a)total sum of the numbers (b) The average of the numbers. Please... (4 Replies)
Discussion started by: kekanap
4 Replies
Login or Register to Ask a Question