Round up -FASTA file


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Round up -FASTA file
# 1  
Old 10-14-2015
Round up -FASTA file

I have the following script:
Code:
 awk 'FNR==NR{s+=$3;next;} { print $1 , $2, 100*$3/s }'

and the following file:
Code:
>P39PT-1224 Freq 900
cccctacgacggcattggtaatggctcagctgctccggatcccgcaagccatcttggatatgagggttcgtcggcctcttcagccaagg-cccccagcagaacatccagctgatcg
>P39PT-784 Freq 2
accgtcgtcact-gggggacatgccgctcgctccgtacaggggttcgtcggcctcttcagccaagg-cccccagcagaacatccagcagatcg
>P39PT-678 Freq 5
cccctacgacggcactggtaatgaccgtcgtcact-gggggacatgccgctcgctccgtacaggggttcgtcggcctcttcagccgagg-cccccagcagaacatccagctgatcg
>P39PT-22 Freq 3
cccctacgacggcattggtagtggctcagcggac---accgtcgtcact-gggggacatgccgctcgctccgtacaggggttcgtcggcctcttcagccaagg-cccccagcagaacatccagctggtcg

What I need is to calculate the percentage using the Freq values, round up the figures and anything below 1 should be entered as 1. Thus, I will end up with the following file:
Code:
>P39PT-1224 Freq 99
cccctacgacggcattggtaatggctcagctgctccggatcccgcaagccatcttggatatgagggttcgtcggcctcttcagccaagg-cccccagcagaacatccagctgatcg
>P39PT-784 Freq 1
accgtcgtcact-gggggacatgccgctcgctccgtacaggggttcgtcggcctcttcagccaagg-cccccagcagaacatccagcagatcg
>P39PT-678 Freq 1
cccctacgacggcactggtaatgaccgtcgtcact-gggggacatgccgctcgctccgtacaggggttcgtcggcctcttcagccgagg-cccccagcagaacatccagctgatcg
>P39PT-22 Freq 1
cccctacgacggcattggtagtggctcagcggac---accgtcgtcact-gggggacatgccgctcgctccgtacaggggttcgtcggcctcttcagccaagg-cccccagcagaacatccagctggtcg

As of now, the script applies to all lines even if I use NR % 2. For the round up I was hoping to be able to use %.0f but haven't gotten the desire output.
Any help will be greatly appreciated

---------- Post updated at 10:16 AM ---------- Previous update was at 08:52 AM ----------

I came up with the following script but still not getting the desire uotcome

awk 'FNR==NR{s+=$3;next;} { print $1 , $2, int(100*$3/s+0.9) }'
# 2  
Old 10-14-2015
It's incredibly difficult to know what exactly is going wrong with your script without you telling us. Try - based on wild guesses - :
Code:
awk 'FNR==NR{s+=$3;next;} {X=int(100*$3/s+.5); print $1 , $2, $3?(X?X:1):"" }' file file

# 3  
Old 10-14-2015
Hello Xterra,

Could you please try following and let me know if this helps you.
Code:
awk 'FNR==NR{s+=$3;next;} {val=100*$3/s;if(val ~ /[0-9]+\.[0-9]+/ || val ~ /\.[0-9]+/){val++}; print $1 , $2, int(val)}'  Input_file Input_file

Output will be as follows.
Code:
P39PT-1224 Freq 99
cccctacgacggcattggtaatggctcagctgctccggatcccgcaagccatcttggatatgagggttcgtcggcctcttcagccaagg-cccccagcagaacatccagctgatcg  0
>P39PT-784 Freq 1
accgtcgtcact-gggggacatgccgctcgctccgtacaggggttcgtcggcctcttcagccaagg-cccccagcagaacatccagcagatcg  0
>P39PT-678 Freq 1
cccctacgacggcactggtaatgaccgtcgtcact-gggggacatgccgctcgctccgtacaggggttcgtcggcctcttcagccgagg-cccccagcagaacatccagctgatcg  0
>P39PT-22 Freq 1
cccctacgacggcattggtagtggctcagcggac---accgtcgtcact-gggggacatgccgctcgctccgtacaggggttcgtcggcctcttcagccaagg-cccccagcagaacatccagctggtcg  0

Thanks,
R. Singh
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Getting unique sequences from multiple fasta file

Hi, I have a fasta file with multiple sequences. How can i get only unique sequences from the file. For example my_file.fasta >seq1 TCTCAAAGAAAGCTGTGCTGCATACTGTACAAAACTTTGTCTGGAGAGATGGAGAATCTCATTGACTTTACAGGTGTGGACGGTCTTCAGAGATGGCTCAAGCTAACATTCCCTGACACACCTATAGGGAAAGAGCTAAC >seq2... (3 Replies)
Discussion started by: Ibk
3 Replies

2. UNIX for Dummies Questions & Answers

Selectively extracting entries from FASTA file

I would like to extract all entries containing the following patterns: ccccta & ccccccccc from the following infile: >P39PT-1224_Freq_900 cccctacgacggcattggtaatggctcccgcaagccatctctcttcagccaagg >P39PT-784_Freq_2 cccctacgacggcattggtaatggcacccgcaagccatctctcttccccccccc >P39PT-678_Freq_5... (4 Replies)
Discussion started by: Xterra
4 Replies

3. Shell Programming and Scripting

Round off Number in File

Hi Guys, i am having a csv file where i need to round off numerical column to 2 decimal precision in specific columns. i need to ignore the first two line i.e the header columns and manipulate rest of the lines of the csv file. My columns are specific i.e i need to round off only 2nd,4th and... (13 Replies)
Discussion started by: rohit_shinez
13 Replies

4. Shell Programming and Scripting

Shorten header of protein sequences in fasta file

I have a fasta file as follows >sp|O15090|FABP4_HUMAN Fatty acid-binding protein, adipocyte OS=Homo sapiens GN=FABP4 PE=1 SV=3 MCDAFVGTWKLVSSENFDDYMKEVGVGFATRKVAGMAKPNMIISVNGDVITIKSESTFKN TEISFILGQEFDEVTADDRKVKSTITLDGGVLVHVQKWDGKSTTIKRKREDDKLVVECVM KGVTSTRVYERA >sp|L18484|AP2A2_RAT AP-2... (3 Replies)
Discussion started by: alexypaul
3 Replies

5. UNIX for Dummies Questions & Answers

Append file name to fasta file headers in Linux

How do we append the file name to fasta file headers in multiple fasta-files in Linux? (10 Replies)
Discussion started by: Mauve
10 Replies

6. Shell Programming and Scripting

Extract sequences from a FASTA file based on another file

I have two files. File1 is shown below. >153L:B|PDBID|CHAIN|SEQUENCE RTDCYGNVNRIDTTGASCKTAKPEGLSYCGVSASKKIAERDLQAMDRYKTIIKKVGEKLCVEPAVIAGIISRESHAGKVL KNGWGDRGNGFGLMQVDKRSHKPQGTWNGEVHITQGTTILINFIKTIQKKFPSWTKDQQLKGGISAYNAGAGNVRSYARM DIGTTHDDYANDVVARAQYYKQHGY >16VP:A|PDBID|CHAIN|SEQUENCE... (7 Replies)
Discussion started by: nelsonfrans
7 Replies

7. Shell Programming and Scripting

Extract sequence from fasta file

Hi, I want to match the sequence id (sub-string of line starting with '>' and extract the information upto next '>' line ). Please help . input > fefrwefrwef X900 AGAGGGAATTGG AGGGGCCTGGAG GGTTCTCTTC > fefrwefrwef X932 AGAGGGAATTGG AGGAGGTGGAG GGTTCTCTTC > fefrwefrwef X937... (2 Replies)
Discussion started by: ritakadm
2 Replies

8. UNIX for Dummies Questions & Answers

Change sequence names in fasta file

I have fasta files with multiple sequences in each. I need to change the sequence name headers from: >accD:_59176-60699 ATGGAAAAGTGGAGGATTTATTCGTTTCAGAAGGAGTTCGAACGCA >atpA_(reverse_strand):_showing_revcomp_of_10525-12048 ATGGTAACCATTCAAGCCGACGAAATTAGTAATCTTATCCGGGAAC... (2 Replies)
Discussion started by: tyrianthinae
2 Replies

9. UNIX for Dummies Questions & Answers

How to change sequence name in along fasta file?

Hi I have an alignment file (.fasta) with ~80 sequences. They look like this- >JV101.contig00066(+):25302-42404|sequence_index=0|block_index=4|species=JV101|JV101_4_0 GAGGTTAATTATCGATAACGTTTAATTAAAGTGTTTAGGTGTCATAATTT TAAATGACGATTTCTCATTACCATACACCTAAATTATCATCAATCTGAAT... (2 Replies)
Discussion started by: baika
2 Replies

10. UNIX for Dummies Questions & Answers

Find & Replace command - Fasta file

Hi all ! I have a fasta file that looks like that: >Sequence1 RTYIPLCASQHKLCPITFLAVK (it's just an example, obviously in reality I have several pairs of lines like that) Using UNIX command(s), would it be possible to replace all the characters except the "C" of the second line only by... (7 Replies)
Discussion started by: Cevin21
7 Replies
Login or Register to Ask a Question