Round up -FASTA file Post: 302957722

Sponsored Content

Top Forums UNIX for Dummies Questions & Answers Round up -FASTA file Post 302957722 by Xterra on Wednesday 14th of October 2015 10:16:06 AM

10-14-2015

Registered User

Round up -FASTA file

I have the following script:

Code:

 awk 'FNR==NR{s+=$3;next;} { print $1 , $2, 100*$3/s }'

and the following file:

Code:

>P39PT-1224 Freq 900
cccctacgacggcattggtaatggctcagctgctccggatcccgcaagccatcttggatatgagggttcgtcggcctcttcagccaagg-cccccagcagaacatccagctgatcg
>P39PT-784 Freq 2
accgtcgtcact-gggggacatgccgctcgctccgtacaggggttcgtcggcctcttcagccaagg-cccccagcagaacatccagcagatcg
>P39PT-678 Freq 5
cccctacgacggcactggtaatgaccgtcgtcact-gggggacatgccgctcgctccgtacaggggttcgtcggcctcttcagccgagg-cccccagcagaacatccagctgatcg
>P39PT-22 Freq 3
cccctacgacggcattggtagtggctcagcggac---accgtcgtcact-gggggacatgccgctcgctccgtacaggggttcgtcggcctcttcagccaagg-cccccagcagaacatccagctggtcg

What I need is to calculate the percentage using the Freq values, round up the figures and anything below 1 should be entered as 1. Thus, I will end up with the following file:

Code:

>P39PT-1224 Freq 99
cccctacgacggcattggtaatggctcagctgctccggatcccgcaagccatcttggatatgagggttcgtcggcctcttcagccaagg-cccccagcagaacatccagctgatcg
>P39PT-784 Freq 1
accgtcgtcact-gggggacatgccgctcgctccgtacaggggttcgtcggcctcttcagccaagg-cccccagcagaacatccagcagatcg
>P39PT-678 Freq 1
cccctacgacggcactggtaatgaccgtcgtcact-gggggacatgccgctcgctccgtacaggggttcgtcggcctcttcagccgagg-cccccagcagaacatccagctgatcg
>P39PT-22 Freq 1
cccctacgacggcattggtagtggctcagcggac---accgtcgtcact-gggggacatgccgctcgctccgtacaggggttcgtcggcctcttcagccaagg-cccccagcagaacatccagctggtcg

As of now, the script applies to all lines even if I use NR % 2. For the round up I was hoping to be able to use %.0f but haven't gotten the desire output.
Any help will be greatly appreciated

---------- Post updated at 10:16 AM ---------- Previous update was at 08:52 AM ----------

I came up with the following script but still not getting the desire uotcome

awk 'FNR==NR{s+=$3;next;} { print $1 , $2, int(100*$3/s+0.9) }'

Xterra

View Public Profile for Xterra

Find all posts by Xterra

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Find & Replace command - Fasta file

Hi all ! I have a fasta file that looks like that: >Sequence1 RTYIPLCASQHKLCPITFLAVK (it's just an example, obviously in reality I have several pairs of lines like that) Using UNIX command(s), would it be possible to replace all the characters except the "C" of the second line only by...

2. UNIX for Dummies Questions & Answers

How to change sequence name in along fasta file?

Hi I have an alignment file (.fasta) with ~80 sequences. They look like this- >JV101.contig00066(+):25302-42404|sequence_index=0|block_index=4|species=JV101|JV101_4_0 GAGGTTAATTATCGATAACGTTTAATTAAAGTGTTTAGGTGTCATAATTT TAAATGACGATTTCTCATTACCATACACCTAAATTATCATCAATCTGAAT...

3. UNIX for Dummies Questions & Answers

Change sequence names in fasta file

I have fasta files with multiple sequences in each. I need to change the sequence name headers from: >accD:_59176-60699 ATGGAAAAGTGGAGGATTTATTCGTTTCAGAAGGAGTTCGAACGCA >atpA_(reverse_strand):_showing_revcomp_of_10525-12048 ATGGTAACCATTCAAGCCGACGAAATTAGTAATCTTATCCGGGAAC...

4. Shell Programming and Scripting

Extract sequence from fasta file

Hi, I want to match the sequence id (sub-string of line starting with '>' and extract the information upto next '>' line ). Please help . input > fefrwefrwef X900 AGAGGGAATTGG AGGGGCCTGGAG GGTTCTCTTC > fefrwefrwef X932 AGAGGGAATTGG AGGAGGTGGAG GGTTCTCTTC > fefrwefrwef X937...

5. Shell Programming and Scripting

Extract sequences from a FASTA file based on another file

6. UNIX for Dummies Questions & Answers

Append file name to fasta file headers in Linux

How do we append the file name to fasta file headers in multiple fasta-files in Linux?

7. Shell Programming and Scripting

Shorten header of protein sequences in fasta file

I have a fasta file as follows >sp|O15090|FABP4_HUMAN Fatty acid-binding protein, adipocyte OS=Homo sapiens GN=FABP4 PE=1 SV=3 MCDAFVGTWKLVSSENFDDYMKEVGVGFATRKVAGMAKPNMIISVNGDVITIKSESTFKN TEISFILGQEFDEVTADDRKVKSTITLDGGVLVHVQKWDGKSTTIKRKREDDKLVVECVM KGVTSTRVYERA >sp|L18484|AP2A2_RAT AP-2...

8. Shell Programming and Scripting

Round off Number in File

Hi Guys, i am having a csv file where i need to round off numerical column to 2 decimal precision in specific columns. i need to ignore the first two line i.e the header columns and manipulate rest of the lines of the csv file. My columns are specific i.e i need to round off only 2nd,4th and...

9. UNIX for Dummies Questions & Answers

Selectively extracting entries from FASTA file

I would like to extract all entries containing the following patterns: ccccta & ccccccccc from the following infile: >P39PT-1224_Freq_900 cccctacgacggcattggtaatggctcccgcaagccatctctcttcagccaagg >P39PT-784_Freq_2 cccctacgacggcattggtaatggcacccgcaagccatctctcttccccccccc >P39PT-678_Freq_5...

10. Shell Programming and Scripting

Getting unique sequences from multiple fasta file

Hi, I have a fasta file with multiple sequences. How can i get only unique sequences from the file. For example my_file.fasta >seq1 TCTCAAAGAAAGCTGTGCTGCATACTGTACAAAACTTTGTCTGGAGAGATGGAGAATCTCATTGACTTTACAGGTGTGGACGGTCTTCAGAGATGGCTCAAGCTAACATTCCCTGACACACCTATAGGGAAAGAGCTAAC >seq2...

LEARN ABOUT MOJAVE

locale::script

Locale::Script(3pm)					 Perl Programmers Reference Guide				       Locale::Script(3pm)

NAME

       Locale::Script - standard codes for script identification

SYNOPSIS

	  use Locale::Script;

	  $script  = code2script('phnx');		      # 'Phoenician'
	  $code    = script2code('Phoenician'); 	      # 'Phnx'
	  $code    = script2code('Phoenician',
				 LOCALE_CODE_NUMERIC);	      # 115

	  @codes   = all_script_codes();
	  @scripts = all_script_names();

DESCRIPTION

       The "Locale::Script" module provides access to standards codes used for identifying scripts, such as those defined in ISO 15924.

       Most of the routines take an optional additional argument which specifies the code set to use. If not specified, the default ISO 15924
       four-letter codes will be used.

SUPPORTED CODE SETS

       There are several different code sets you can use for identifying scripts. A code set may be specified using either a name, or a constant
       that is automatically exported by this module.

       For example, the two are equivalent:

	  $script = code2script('phnx','alpha');
	  $script = code2script('phnx',LOCALE_SCRIPT_ALPHA);

       The codesets currently supported are:

       alpha, LOCALE_SCRIPT_ALPHA
	   This is a set of four-letter (capitalized) codes from ISO 15924 such as 'Phnx' for Phoenician.  It also includes additions to this set
	   included in the IANA language registry.

	   The Zxxx, Zyyy, and Zzzz codes are not used.

	   This is the default code set.

       num, LOCALE_SCRIPT_NUMERIC
	   This is a set of three-digit numeric codes from ISO 15924 such as 115 for Phoenician.

ROUTINES

       code2script ( CODE [,CODESET] )
       script2code ( NAME [,CODESET] )
       script_code2code ( CODE ,CODESET ,CODESET2 )
       all_script_codes ( [CODESET] )
       all_script_names ( [CODESET] )
       Locale::Script::rename_script  ( CODE ,NEW_NAME [,CODESET] )
       Locale::Script::add_script  ( CODE ,NAME [,CODESET] )
       Locale::Script::delete_script  ( CODE [,CODESET] )
       Locale::Script::add_script_alias  ( NAME ,NEW_NAME )
       Locale::Script::delete_script_alias  ( NAME )
       Locale::Script::rename_script_code  ( CODE ,NEW_CODE [,CODESET] )
       Locale::Script::add_script_code_alias  ( CODE ,NEW_CODE [,CODESET] )
       Locale::Script::delete_script_code_alias  ( CODE [,CODESET] )
	   These routines are all documented in the Locale::Codes::API man page.

SEE ALSO

       Locale::Codes
	   The Locale-Codes distribution.

       Locale::Codes::API
	   The list of functions supported by this module.

       http://www.unicode.org/iso15924/
	   Home page for ISO 15924.

       http://www.iana.org/assignments/language-subtag-registry
	   The IANA language subtag registry.

AUTHOR

       See Locale::Codes for full author history.

       Currently maintained by Sullivan Beck (sbeck@cpan.org).

COPYRIGHT

	  Copyright (c) 1997-2001 Canon Research Centre Europe (CRE).
	  Copyright (c) 2001-2010 Neil Bowers
	  Copyright (c) 2010-2013 Sullivan Beck

       This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

perl v5.18.2							    2013-11-04						       Locale::Script(3pm)

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Find & Replace command - Fasta file

Discussion started by: Cevin21

2. UNIX for Dummies Questions & Answers

How to change sequence name in along fasta file?

Discussion started by: baika

3. UNIX for Dummies Questions & Answers

Change sequence names in fasta file

Discussion started by: tyrianthinae

4. Shell Programming and Scripting

Extract sequence from fasta file

Discussion started by: ritakadm

5. Shell Programming and Scripting

Extract sequences from a FASTA file based on another file

Discussion started by: nelsonfrans

6. UNIX for Dummies Questions & Answers

Append file name to fasta file headers in Linux

Discussion started by: Mauve

7. Shell Programming and Scripting

Shorten header of protein sequences in fasta file

Discussion started by: alexypaul

8. Shell Programming and Scripting

Round off Number in File

Discussion started by: rohit_shinez

9. UNIX for Dummies Questions & Answers

Selectively extracting entries from FASTA file

Discussion started by: Xterra

10. Shell Programming and Scripting

Getting unique sequences from multiple fasta file

Discussion started by: Ibk

LEARN ABOUT MOJAVE

locale::script