Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Select distinct sequences from fasta file and list Post 302918637 by Akshay Hegde on Wednesday 24th of September 2014 01:46:37 PM
Old 09-24-2014
Hi Marion Welcome to Forums, can we have expected output as well please.
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

select distinct row from a file

Hi, buddies out there. I have a text file ( only one column ) which I created using vi editor. The file contains duplicate rows and I would like to select distinct rows, how to go on it using unix command: file content = apple apple orange watermelon apple orange Can it be done... (7 Replies)
Discussion started by: merry susana
7 Replies

2. Shell Programming and Scripting

Select distinct values from a flat file

Hi , I have a similar problem. Please can anyone help me with a shell script or a perl. I have a flat file like this fruit country apple germany apple india banana pakistan banana saudi mango india I want to get a output like fruit country apple ... (7 Replies)
Discussion started by: smalya
7 Replies

3. Shell Programming and Scripting

Select distinct rows in a file by last column

Hi, I have the following file: LOG:015608::ERR:2310:map_spsrec:Invalid parameter LOG:015608::ERR:2471:map_dgdrec:Invalid parameter LOG:015608::ERR:2487:map_nnmrec:Invalid number LOG:015608::ERR:2310:map_nmrec:Invalid number LOG:015608::ERR:2438:map_nmrec:Invalid number As a delimiter I... (2 Replies)
Discussion started by: apenkov
2 Replies

4. Shell Programming and Scripting

Shell script for changing the accession number of DNA sequences in a FASTA file

Hi, I am having a file of dna sequences in fasta format which look like this: >admin_1_45 atatagcaga >admin_1_46 atatagcagaatatatat with many such thousands of sequences in a single file. I want to the replace the accession Id "admin_1_45" similarly in following sequences to... (5 Replies)
Discussion started by: margarita
5 Replies

5. Shell Programming and Scripting

Extract sequences from a FASTA file based on another file

I have two files. File1 is shown below. >153L:B|PDBID|CHAIN|SEQUENCE RTDCYGNVNRIDTTGASCKTAKPEGLSYCGVSASKKIAERDLQAMDRYKTIIKKVGEKLCVEPAVIAGIISRESHAGKVL KNGWGDRGNGFGLMQVDKRSHKPQGTWNGEVHITQGTTILINFIKTIQKKFPSWTKDQQLKGGISAYNAGAGNVRSYARM DIGTTHDDYANDVVARAQYYKQHGY >16VP:A|PDBID|CHAIN|SEQUENCE... (7 Replies)
Discussion started by: nelsonfrans
7 Replies

6. Shell Programming and Scripting

Shorten header of protein sequences in fasta file

I have a fasta file as follows >sp|O15090|FABP4_HUMAN Fatty acid-binding protein, adipocyte OS=Homo sapiens GN=FABP4 PE=1 SV=3 MCDAFVGTWKLVSSENFDDYMKEVGVGFATRKVAGMAKPNMIISVNGDVITIKSESTFKN TEISFILGQEFDEVTADDRKVKSTITLDGGVLVHVQKWDGKSTTIKRKREDDKLVVECVM KGVTSTRVYERA >sp|L18484|AP2A2_RAT AP-2... (3 Replies)
Discussion started by: alexypaul
3 Replies

7. Shell Programming and Scripting

Getting unique sequences from multiple fasta file

Hi, I have a fasta file with multiple sequences. How can i get only unique sequences from the file. For example my_file.fasta >seq1 TCTCAAAGAAAGCTGTGCTGCATACTGTACAAAACTTTGTCTGGAGAGATGGAGAATCTCATTGACTTTACAGGTGTGGACGGTCTTCAGAGATGGCTCAAGCTAACATTCCCTGACACACCTATAGGGAAAGAGCTAAC >seq2... (3 Replies)
Discussion started by: Ibk
3 Replies

8. UNIX for Beginners Questions & Answers

How to count the length of fasta sequences?

I could calculate the length of entire fasta sequences by following command, awk '/^>/{if (l!="") print l; print; l=0; next}{l+=length($0)}END{print l}' unique.fasta But, I need to calculate the length of a particular fasta sequence specified/listed in another txt file. The results to to be... (14 Replies)
Discussion started by: dineshkumarsrk
14 Replies

9. Shell Programming and Scripting

Shorten header of protein sequences in fasta file to only organism name

I have a fasta file as follows >sp|Q8WWQ8|STAB2_HUMAN Stabilin-2 OS=Homo sapiens OX=9606 GN=STAB2 PE=1 SV=3 MMLQHLVIFCLGLVVQNFCSPAETTGQARRCDRKSLLTIRTECRSCALNLGVKCPDGYTM ITSGSVGVRDCRYTFEVRTYSLSLPGCRHICRKDYLQPRCCPGRWGPDCIECPGGAGSPC NGRGSCAEGMEGNGTCSCQEGFGGTACETCADDNLFGPSCSSVCNCVHGVCNSGLDGDGT... (3 Replies)
Discussion started by: jerrild
3 Replies

10. UNIX for Beginners Questions & Answers

How to add specific bases at the beginning and ending of all the fasta sequences?

Hi, I have to add 7 bases of specific nucleotide at the beginning and ending of all the fasta sequences of a file. For example, I have a multi fasta file namely test.fasta as given below test.fasta >TalAA18_Xoo_CIAT_NZ_CP033194.1:_2936369-2939570:+1... (1 Reply)
Discussion started by: dineshkumarsrk
1 Replies
CFETOOLCHECK(8) 					User Contributed Perl Documentation					   CFETOOLCHECK(8)

NAME
cfetoolcheck - Check a new value against the averages currently in the database SYNOPSIS
cfetool check name --value|-V value [--path|-p directory name] [--time|-t seconds] [--daily|-d] [--weekly|-w] [--yearly|-y] [--his- tograms|-H] [--verbose|-v] [--help|-h] DESCRIPTION
Takes a new value and checks it against the averages currently in the database specified by name, located at the path specified by the -p argument, or the current working directory if the -p argument is omitted. The value will be associated with the current time, unless the -t option is given. The output indicates how much higher or lower the new value is compared to the averages in the database, in terms of the number of standard deviations. The -d, -w and -y options specify the databases to check the new value against. If all three options are omitted, only the weekly database will be accessed. OPTIONS
--value|-v value Specifies the new value to check against the database averages. --path|-p directory name The directory in which the database specified by name can be found. --time|-t The time the value was collected, in seconds since epoch (January 1st, 1970). If this argument is omitted, the current time will be used. --daily|-d Check the new value against the daily averages database. --weekly|-w Check the new value against the weekly averages database. --yearly|-y Check the new value against the yearly averages database. --histograms|-H Check which histogram bucket the new value would fall into. The histogram is divided into 64 buckets, which represent distances from the mean value. Bucket 64 represents two standard deviations above the expected value, and bucket 0 represents two standard deviations below the expected value. --verbose|-v Print details of the command's execution to the standard output stream. --help|-h Print a short help message and then exit. OUTPUT
Before exiting, "cfetool check" will print one line to the standard output stream, in the following format: yrly=ynum,bkt=ybkt;wkly=wnum,bkt=wbkt;dly=dnum,bkt=dbkt ybkt, wbkt and dbkt represent the histogram bucket the given value falls into, and will be 0 for databases that are not being checked against, and if there is no histogram file or the -H option was not specified. ynum, wnum and dnum will be either the number 0 if the corresponding database was not updated, or a code indicating the state of the given statistic, as compared to an average of equivalent earlier times, as specified below: code high|low|normal meaning ------------------------------------------------------------- -2 - no sigma variation ------------------------------------------------------------- -4 low within noise threshold, and within -5 normal 2 standard deviations from -6 high expected value ------------------------------------------------------------- -14 low microanomaly: within noise -15 normal threshold, but 2 or more standard -16 high deviations from expected value ------------------------------------------------------------- -24 low normal; within 1 standard deviation -25 normal from the expected value -26 high ------------------------------------------------------------- -34 low dev1; more than 1 standard -35 normal deviation from the expected -36 high value ------------------------------------------------------------ -44 low dev2; more than 2 standard -45 normal deviations from the expected -46 high value ------------------------------------------------------------- -54 low anomaly; more than 3 standard -55 normal deviations from the expected -56 high value Where "low" indicates that the current value is below both the expected value for the current time position, and the global average value. "high" indicates that the current value is above those values. "normal" indicates that the current value is within the range of expected values. "cfetool check" also exits with a code corresponding to the above table. If more than one database is being checked against, the most nega- tive result from all checks is returned, and the individual results must be obtained from the standard output stream, as described above. EXAMPLE
% cfetool check temperature --path /my/path --value 20 --histograms yrly=0,bkt=0;wkly=-6,bkt=51;dly=0,bkt=0 Checks the value 20 against the weekly temperature database and histogram files located in /my/path/ using the current time. The output indicates that the new value given was within cfetool's noise threshold, and also within 2 standard deviations of the previous average stored in the weekly database. AUTHORS
The code and documentation were contributed by Stanford Linear Accelerator Center, a department of Stanford University. This documentation was written by Elizabeth Cassell <e_a_c@mailsnare.net> and Alf Wachsmann <alfw@slac.stanford.edu> COPYRIGHT AND DISCLAIMER
Copyright 2004 Alf Wachsmann <alfw@slac.stanford.edu> and Elizabeth Cassell <e_a_c@mailsnare.net> All rights reserved. perl v5.8.4 2004-09-21 CFETOOLCHECK(8)
All times are GMT -4. The time now is 04:15 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy