Round up -FASTA file

10-14-2015

Registered User

365, 3

Join Date: Jun 2010

Last Activity: 6 August 2019, 11:08 PM EDT

Posts: 365

Thanks Given: 149

Thanked 3 Times in 3 Posts

Round up -FASTA file

I have the following script:

Code:

 awk 'FNR==NR{s+=$3;next;} { print $1 , $2, 100*$3/s }'

and the following file:

Code:

>P39PT-1224 Freq 900
cccctacgacggcattggtaatggctcagctgctccggatcccgcaagccatcttggatatgagggttcgtcggcctcttcagccaagg-cccccagcagaacatccagctgatcg
>P39PT-784 Freq 2
accgtcgtcact-gggggacatgccgctcgctccgtacaggggttcgtcggcctcttcagccaagg-cccccagcagaacatccagcagatcg
>P39PT-678 Freq 5
cccctacgacggcactggtaatgaccgtcgtcact-gggggacatgccgctcgctccgtacaggggttcgtcggcctcttcagccgagg-cccccagcagaacatccagctgatcg
>P39PT-22 Freq 3
cccctacgacggcattggtagtggctcagcggac---accgtcgtcact-gggggacatgccgctcgctccgtacaggggttcgtcggcctcttcagccaagg-cccccagcagaacatccagctggtcg

What I need is to calculate the percentage using the Freq values, round up the figures and anything below 1 should be entered as 1. Thus, I will end up with the following file:

Code:

>P39PT-1224 Freq 99
cccctacgacggcattggtaatggctcagctgctccggatcccgcaagccatcttggatatgagggttcgtcggcctcttcagccaagg-cccccagcagaacatccagctgatcg
>P39PT-784 Freq 1
accgtcgtcact-gggggacatgccgctcgctccgtacaggggttcgtcggcctcttcagccaagg-cccccagcagaacatccagcagatcg
>P39PT-678 Freq 1
cccctacgacggcactggtaatgaccgtcgtcact-gggggacatgccgctcgctccgtacaggggttcgtcggcctcttcagccgagg-cccccagcagaacatccagctgatcg
>P39PT-22 Freq 1
cccctacgacggcattggtagtggctcagcggac---accgtcgtcact-gggggacatgccgctcgctccgtacaggggttcgtcggcctcttcagccaagg-cccccagcagaacatccagctggtcg

As of now, the script applies to all lines even if I use NR % 2. For the round up I was hoping to be able to use %.0f but haven't gotten the desire output.
Any help will be greatly appreciated

---------- Post updated at 10:16 AM ---------- Previous update was at 08:52 AM ----------

I came up with the following script but still not getting the desire uotcome

awk 'FNR==NR{s+=$3;next;} { print $1 , $2, int(100*$3/s+0.9) }'

Xterra

View Public Profile for Xterra

Find all posts by Xterra

10-14-2015

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

It's incredibly difficult to know what exactly is going wrong with your script without you telling us. Try - based on wild guesses - :

Code:

awk 'FNR==NR{s+=$3;next;} {X=int(100*$3/s+.5); print $1 , $2, $3?(X?X:1):"" }' file file

RudiC

View Public Profile for RudiC

Find all posts by RudiC

10-14-2015

Moderator

3,105, 1,603

Join Date: May 2013

Last Activity: 31 August 2020, 1:46 AM EDT

Location: Chennai

Posts: 3,105

Thanks Given: 1,269

Thanked 1,603 Times in 1,369 Posts

Hello Xterra,

Could you please try following and let me know if this helps you.

Code:

awk 'FNR==NR{s+=$3;next;} {val=100*$3/s;if(val ~ /[0-9]+\.[0-9]+/ || val ~ /\.[0-9]+/){val++}; print $1 , $2, int(val)}'  Input_file Input_file

Output will be as follows.

Code:

P39PT-1224 Freq 99
cccctacgacggcattggtaatggctcagctgctccggatcccgcaagccatcttggatatgagggttcgtcggcctcttcagccaagg-cccccagcagaacatccagctgatcg  0
>P39PT-784 Freq 1
accgtcgtcact-gggggacatgccgctcgctccgtacaggggttcgtcggcctcttcagccaagg-cccccagcagaacatccagcagatcg  0
>P39PT-678 Freq 1
cccctacgacggcactggtaatgaccgtcgtcact-gggggacatgccgctcgctccgtacaggggttcgtcggcctcttcagccgagg-cccccagcagaacatccagctgatcg  0
>P39PT-22 Freq 1
cccctacgacggcattggtagtggctcagcggac---accgtcgtcact-gggggacatgccgctcgctccgtacaggggttcgtcggcctcttcagccaagg-cccccagcagaacatccagctggtcg  0

Thanks,
R. Singh

RavinderSingh13

View Public Profile for RavinderSingh13

Find all posts by RavinderSingh13

UNIX for Dummies Questions & Answers

Round up -FASTA file

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Getting unique sequences from multiple fasta file

Discussion started by: Ibk

2. UNIX for Dummies Questions & Answers

Selectively extracting entries from FASTA file

Discussion started by: Xterra

3. Shell Programming and Scripting

Round off Number in File

Discussion started by: rohit_shinez

4. Shell Programming and Scripting

Shorten header of protein sequences in fasta file

Discussion started by: alexypaul

5. UNIX for Dummies Questions & Answers

Append file name to fasta file headers in Linux

Discussion started by: Mauve

6. Shell Programming and Scripting

Extract sequences from a FASTA file based on another file

Discussion started by: nelsonfrans

7. Shell Programming and Scripting

Extract sequence from fasta file

Discussion started by: ritakadm

8. UNIX for Dummies Questions & Answers

Change sequence names in fasta file

Discussion started by: tyrianthinae

9. UNIX for Dummies Questions & Answers

How to change sequence name in along fasta file?

Discussion started by: baika

10. UNIX for Dummies Questions & Answers

Find & Replace command - Fasta file

Discussion started by: Cevin21