Sponsored Content
Top Forums Shell Programming and Scripting Outputting characters after a given string and reporting the characters in the row below --sed Post 303028777 by Don Cragun on Monday 14th of January 2019 10:51:13 PM
Old 01-14-2019
Hi Xterra,
Maybe something like:
Code:
awk -v genes="gene-a gene-b" \
    -v lengths="3 6" \
    -v strings="GCATGAAAACATACA TTTCCAGAAATTGT" '
BEGIN {	nString = split(strings, String)
	split(lengths, SLen)
	split(genes, Id)
	for(i = 1; i <= nString; i++)
		StringLen[i] = length(String[i])
	sort_cmd = "sort -k3,3 -k5,5 -k8,8"
	print "\n\t\t\t\tSummary"
	print "#############################################################" \
	    "################"
	print "Count\t\tGene\t\tCodon\t\t\tQuality Score\n"
}
/^@/ {	getline CodonLine
	getline
	getline QualityLine
	for(i = 1; i <= nString; i++)
		if(spot = index(CodonLine, String[i]))
			count[sprintf( \
			    "Gene:\t%s\tCodon:\t%s\tQuality Score:\t%s",
			    Id[i],
			    substr(CodonLine, spot + StringLen[i], SLen[i]),
			    substr(QualityLine, spot + StringLen[i], SLen[i])) \
			]++
}
END {	for(i in count)
		printf("%d\t%s\n", count[i], i) | sort_cmd
}' test.txt

which produces the output:
Code:
				Summary
#############################################################################
Count		Gene		Codon			Quality Score

1	Gene:	gene-a	Codon:	AAC	Quality Score:	,ED
2	Gene:	gene-a	Codon:	AAC	Quality Score:	GCC
1	Gene:	gene-a	Codon:	TTT	Quality Score:	+GG
2	Gene:	gene-b	Codon:	TCCAAG	Quality Score:	DGGCGG
1	Gene:	gene-b	Codon:	TCCAAG	Quality Score:	G7DCGG

I'm sure you could write this as a 1-liner, but I much prefer something I can see on a screen (and debug).

If there's anything here you can't figure out, ask questions about what you don't understand.

Hope this helps,
Don
These 2 Users Gave Thanks to Don Cragun For This Post:
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

outputting selected characters from within a variable

Hi all, if for example I had a variable containing the string 'hello', is the any way I can output, for example, the e and the 2nd l based on their position in the string not their character (in this case 2 and 4)? any general pointers in the right direction will be much appreciated, at... (3 Replies)
Discussion started by: skinnygav
3 Replies

2. Shell Programming and Scripting

SED help delete characters in a string

Hi Please help me to refine my syntax. I want to delete the excess characters from the out put below. -bash-3.00$ top -b -n2 -d 00.20 |grep Cpu|tail -1 | awk -F ":" '{ print $2 }' | cut -d, -f1 4.4% us now i want to delete the % and us. How wil i do that to make it just 4.4. Thanks (7 Replies)
Discussion started by: redtred
7 Replies

3. Shell Programming and Scripting

Want to remove the last characters from each row of csv using shell script

Hi, I've a csv file seperated by '|' from which I'm trying to remove the excess '|' characters more than the existing fields. My CSV looks like as below. HRLOAD|Service|AddChange|EN PERSONID|STATUS|LASTNAME|FIRSTNAME|ITDCLIENTUSERID|ADDRESSLINE1 10000001|ACTIVE|Testazar1|Testore1|20041|||... (24 Replies)
Discussion started by: rajak.net
24 Replies

4. Shell Programming and Scripting

Delete row if a a particular column has more then three characters in it

Hi i have a data like hw:dsfnsmdf:39843 chr2 76219829 51M atatata 51 872389 hw:dsfnsmdf:39853 chr2 76219839 51M65T atatata 51 872389 hw:dsfnsmdf:39863 chr2 76219849 51M atatata 51 872389 hw:dsfnsmdf:39873 chr2 ... (3 Replies)
Discussion started by: bhargavpbk88
3 Replies

5. Shell Programming and Scripting

sed replacing specific characters and control characters by escaping

sed -e "s// /g" old.txt > new.txt While I do know some control characters need to be escaped, can normal characters also be escaped and still work the same way? Basically I do not know all control characters that have a special meaning, for example, ?, ., % have a meaning and have to be escaped... (11 Replies)
Discussion started by: ijustneeda
11 Replies

6. Shell Programming and Scripting

sed cut characters of string

helloo I wonder if there's a way to cut characters out of a string and keep only the last 2 by using sed. For example if there's the todays' date: 2012-05-06 and we only want to keep the last 2 characters which are the day. Is there a quick way to do it with sed? (2 Replies)
Discussion started by: vlm
2 Replies

7. Shell Programming and Scripting

Trouble with sed and substituting a string with special characters in variable

Hey guys, I know that title is a mouthful - I'll try to better explain my struggles a little better... What I'm trying to do is: 1. Query a db and output to a file, a list of column data. 2. Then, for each line in this file, repeat these values but wrap them with: ITEM{ ... (3 Replies)
Discussion started by: ampsys
3 Replies

8. Shell Programming and Scripting

Help with sed command - find a string between two characters

Hi, I have a xml file (Config.xml) <Header name="" TDate="" PDate=""> <Config> {"config" { "Nation" "Pri:|Sec:"}} </Config> </Header> Now I wanted to printed all the strings between "". I tried the following cat Config.xml | sed -n 's/.*\.*//p' ... (8 Replies)
Discussion started by: vivek_damodaran
8 Replies

9. Shell Programming and Scripting

sed replace nth characters with string

Hi, I hope you can help me out please? I need to replace from character 8-16 with AAAAAAAA and the rest should stay the same after character 16 gtwrhtrd11111111rjytwyejtyjejetjyetgeaEHT wrehrhw22222222hytekutkyukrylryilruilrGEQTH hrwjyety33333333gtrhwrjrgkreglqeriugn;RUGNEURGU ... (4 Replies)
Discussion started by: stinkefisch
4 Replies

10. UNIX for Dummies Questions & Answers

Reporting characters after string

I have a file that looks like this: >ID 1 AATAATTCCGGATCGTGC >ID 2 TTTGACAGTAGAC >ID 3 AGACGATGACGAT I am using the following script to report if AATTCCGGATCG is present in any sequence: awk 'FNR==1{n=substr(FILENAME,1,index(FILENAME,".")-1)} { print n "\t"... (10 Replies)
Discussion started by: Xterra
10 Replies
Bio::Tools::Run::Ensembl(3pm)				User Contributed Perl Documentation			     Bio::Tools::Run::Ensembl(3pm)

NAME
Bio::Tools::Run::Ensembl - A simplified front-end for setting up the registry for, and then using an Ensembl database with the Ensembl Perl API. SYNOPSIS
use Bio::Tools::Run::Ensembl; # get a Bio::EnsEMBL::Gene for agene of interest my $gene = Bio::Tools::Run::Ensembl->get_gene_by_name(-species => 'human', -name => 'BRCA2'); DESCRIPTION
This is a simple way of accessing the Ensembl database to retrieve gene information. Rather than learn the whole Ensembl Perl API, you only need to install it (that is, check it out from CVS: http://www.ensembl.org/info/docs/api/api_installation.html - ignore the information about BioPerl version) and then you can get information about a gene using get_gene_by_name(). For gene retrieval it is especially useful compared to direct Ensembl Perl API usage since it can use information from alternate data sources (orthologues, Swissprot, Entrez) to get your gene. FEEDBACK
Mailing Lists User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to the Bioperl mailing list. Your participation is much appreciated. bioperl-l@bioperl.org - General discussion http://bioperl.org/wiki/Mailing_lists - About the mailing lists Support Please direct usage questions or support issues to the mailing list: bioperl-l@bioperl.org rather than to the module maintainer directly. Many experienced and reponsive experts will be able look at the problem and quickly address it. Please include a thorough description of the problem with code and data examples if at all possible. Reporting Bugs Report bugs to the Bioperl bug tracking system to help us keep track of the bugs and their resolution. Bug reports can be submitted via the web: http://redmine.open-bio.org/projects/bioperl/ AUTHOR - Sendu Bala Email bix@sendu.me.uk APPENDIX
The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ registry_setup Title : registry_setup Usage : Bio::Tools::Run::Ensembl->registry_setup(-host => $host, -user => $user); if (Bio::Tools::Run::Ensembl->registry_setup) {...} Function: Configure the ensembl registy to use a certain database. The database must be an Ensembl database compatible with the Ensembl Perl API, and you must have that API installed for this method to return true. Defaults to anonymous access to ensembldb.ensembl.org Or just ask if the registry is setup and the database ready to use. Returns : boolean (true if Registry loaded and ready to use) Args : -host => host name (defaults to 'ensembldb.ensembl.org') -user => username (defaults to 'anonymous') -pass => password (no default) -port => port (defaults to 3306) -db_version => version of ensembl database to use, if different from your installed Ensembl modules -verbose => boolean (1 to print messages during database connection) -no_database => boolean (1 to disable database access, causing this method to always return false) get_adaptor Title : get_adaptor Usage : my $adaptor = Bio::Tools::Run::Ensembl->get_adaptor($species, $type); Function: Get a species-specific 'core' database adaptor, optionally of a certain type. Returns : Bio::EnsEMBL::DBSQL::DBAdaptor, OR if a certain type requested, a Bio::EnsEMBL::DBSQL::${type}Adaptor Args : Bio::Species or string (species name) (REQUIRED), AND optionally string (the type of adaptor, eg. 'Gene' or 'Slice'). get_gene_by_name Title : get_gene_by_name Usage : my $gene = Bio::Tools::Run::Ensembl->get_gene_by_name(); Function: Get a gene given species and a gene name. If multiple genes match this combination, tries to pick the 'best' match. Returns : Bio::EnsEMBL::Gene Args : -species => Bio::Species or string (species name), REQUIRED -name => string: gene name, REQUIRED If searching for the supplied gene name in the supplied species results in no genes, or more than one, you can choose what else is attempted in order to find just one gene: -use_orthologues => Bio::Species or string (species name), or array ref of such things: see if any of these supplied species have (unambiguously) a gene with the supplied gene name and if a (one-to-one) orthologue of that gene in that species is present in the main desired species supplied to -species, returns that orthologous gene. (default: none, do not use orthologues) -use_swiss_lookup => boolean: queries swissprot at expasy and if a suitable match is found, queries ensembl with the swissprot id. (default: 0, do not use swiss) -use_entrez_lookup => boolean: queries entrez at the NCBI server if (only) a single gene could not be found by any other method, then query ensembl with the entrez gene id. (default: 0, do not use NCBI) (Attempts proceed in this order and return as soon as one method is successful.) -strict => boolean: return undef with no warnings if more than one, or zero genes were found. (default: 0, warnings are issued and if many genes were found, one of them is returned) perl v5.12.3 2011-06-18 Bio::Tools::Run::Ensembl(3pm)
All times are GMT -4. The time now is 10:53 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy