Visit The New, Modern Unix Linux Community


Outputting characters after a given string and reporting the characters in the row below --sed


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Outputting characters after a given string and reporting the characters in the row below --sed
# 8  
Making a few wild guesses... If the output you're trying to produce is:
Code:
Codon:	AAC	Quality Score:	,ED
Codon:	TCCAAG	Quality Score:	G7DCGG
Codon:	AAC	Quality Score:	GCC
Codon:	TCCAAG	Quality Score:	DGGCGG
Codon:	AAC	Quality Score:	GCC
Codon:	TCCAAG	Quality Score:	DGGCGG
Codon:	TTT	Quality Score:	+GG

you could try something like:
Code:
awk -v lengths="3 6" -v strings="GCATGAAAACATACA TTTCCAGAAATTGT" '
BEGIN {	nString = split(strings, String)
	split(lengths, OutLen)
	for(i = 1; i <= nString; i++)
		StringLen[i] = length(String[i])
}
/^@/ {	getline CodonLine
	getline
	getline QualityLine
	for(i = 1; i <= nString; i++)
		if(spot = index(CodonLine, String[i]))
			printf("Codon:\t%s\tQuality Score:\t%s\n",
			    substr(CodonLine, spot + StringLen[i], OutLen[i]),
			    substr(QualityLine, spot + StringLen[i], OutLen[i]))
}' file

These 2 Users Gave Thanks to Don Cragun For This Post:
# 9  
It was interesting to try to implement this algorithm on the sed
Code:
#!/bin/sed -nrf
2~4 h
4~4 {
H;x
s/(.*GCATGAAAACATACA.{3})(.*)/\1\r\2/
}
/\r/ {
:1
s/^.(.{2}[^\r].*)/\1/
T2
s/(\n).(.*)/\1\2/
t1
:2
s/^(.{3}).*/\1/mg
s/(.*)\n(.*)/Codon:\t\1\tQuality Score:\t \2/p
}


Last edited by nezabudka; 01-14-2019 at 09:08 AM..
These 2 Users Gave Thanks to nezabudka For This Post:
# 10  
Don
I modified a bit your script to output the total count and give some format:
Code:
awk -v gene="gene-a gene-b" -v lengths="3 6" -v strings="GCATGAAAACATACA TTTCCAGAAATTGT" '
BEGIN {	nString = split(strings, String)
	split(lengths, OutLen)
	split(gene, Id)
	for(i = 1; i <= nString; i++)
		StringLen[i] = length(String[i])
}
/^@/ {	getline CodonLine
	getline
	getline QualityLine
	for(i = 1; i <= nString; i++)
		if(spot = index(CodonLine, String[i]))
			printf("Gene:\t"Id[i]"\tCodon:\t%s\t\tQuality Score:\t%s\t\n",
			    substr(CodonLine, spot + StringLen[i], OutLen[i]),
			    substr(QualityLine, spot + StringLen[i], OutLen[i]))
}' test.txt | awk '{ count[$0]++ } END {{ print "\n\t\t\t\tSummary\n#############################################################################\nCount\t\tGene\t\tCodon\t\t\tQuality Score\n" } {for (gene in count ) print count[gene] "\t" gene | "sort -k 3"}}'

With the above script I am getting the desired output:
Code:
                                Summary
#############################################################################
Count           Gene            Codon                   Quality Score

1       Gene:   gene-a  Codon:  AAC             Quality Score:  ,ED
2       Gene:   gene-a  Codon:  AAC             Quality Score:  GCC
1       Gene:   gene-a  Codon:  TTT             Quality Score:  +GG
2       Gene:   gene-b  Codon:  TCCAAG          Quality Score:  DGGCGG
1       Gene:   gene-b  Codon:  TCCAAG          Quality Score:  G7DCGG

However, I tried to include the END step in your awk script fail miserably. How can I modify the script so I don't have to "stitch" together the two scripts as shown above?
Thanks!
This User Gave Thanks to Xterra For This Post:
# 11  
Hi Xterra,
Maybe something like:
Code:
awk -v genes="gene-a gene-b" \
    -v lengths="3 6" \
    -v strings="GCATGAAAACATACA TTTCCAGAAATTGT" '
BEGIN {	nString = split(strings, String)
	split(lengths, SLen)
	split(genes, Id)
	for(i = 1; i <= nString; i++)
		StringLen[i] = length(String[i])
	sort_cmd = "sort -k3,3 -k5,5 -k8,8"
	print "\n\t\t\t\tSummary"
	print "#############################################################" \
	    "################"
	print "Count\t\tGene\t\tCodon\t\t\tQuality Score\n"
}
/^@/ {	getline CodonLine
	getline
	getline QualityLine
	for(i = 1; i <= nString; i++)
		if(spot = index(CodonLine, String[i]))
			count[sprintf( \
			    "Gene:\t%s\tCodon:\t%s\tQuality Score:\t%s",
			    Id[i],
			    substr(CodonLine, spot + StringLen[i], SLen[i]),
			    substr(QualityLine, spot + StringLen[i], SLen[i])) \
			]++
}
END {	for(i in count)
		printf("%d\t%s\n", count[i], i) | sort_cmd
}' test.txt

which produces the output:
Code:
				Summary
#############################################################################
Count		Gene		Codon			Quality Score

1	Gene:	gene-a	Codon:	AAC	Quality Score:	,ED
2	Gene:	gene-a	Codon:	AAC	Quality Score:	GCC
1	Gene:	gene-a	Codon:	TTT	Quality Score:	+GG
2	Gene:	gene-b	Codon:	TCCAAG	Quality Score:	DGGCGG
1	Gene:	gene-b	Codon:	TCCAAG	Quality Score:	G7DCGG

I'm sure you could write this as a 1-liner, but I much prefer something I can see on a screen (and debug).

If there's anything here you can't figure out, ask questions about what you don't understand.

Hope this helps,
Don
These 2 Users Gave Thanks to Don Cragun For This Post:

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #402
Difficulty: Medium
The term 3D printing originally referred to a powder bed process employing standard and custom inkjet print heads.
True or False?

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Reporting characters after string

I have a file that looks like this: >ID 1 AATAATTCCGGATCGTGC >ID 2 TTTGACAGTAGAC >ID 3 AGACGATGACGAT I am using the following script to report if AATTCCGGATCG is present in any sequence: awk 'FNR==1{n=substr(FILENAME,1,index(FILENAME,".")-1)} { print n "\t"... (10 Replies)
Discussion started by: Xterra
10 Replies

2. Shell Programming and Scripting

sed replace nth characters with string

Hi, I hope you can help me out please? I need to replace from character 8-16 with AAAAAAAA and the rest should stay the same after character 16 gtwrhtrd11111111rjytwyejtyjejetjyetgeaEHT wrehrhw22222222hytekutkyukrylryilruilrGEQTH hrwjyety33333333gtrhwrjrgkreglqeriugn;RUGNEURGU ... (4 Replies)
Discussion started by: stinkefisch
4 Replies

3. Shell Programming and Scripting

Help with sed command - find a string between two characters

Hi, I have a xml file (Config.xml) <Header name="" TDate="" PDate=""> <Config> {"config" { "Nation" "Pri:|Sec:"}} </Config> </Header> Now I wanted to printed all the strings between "". I tried the following cat Config.xml | sed -n 's/.*\.*//p' ... (8 Replies)
Discussion started by: vivek_damodaran
8 Replies

4. Shell Programming and Scripting

Trouble with sed and substituting a string with special characters in variable

Hey guys, I know that title is a mouthful - I'll try to better explain my struggles a little better... What I'm trying to do is: 1. Query a db and output to a file, a list of column data. 2. Then, for each line in this file, repeat these values but wrap them with: ITEM{ ... (3 Replies)
Discussion started by: ampsys
3 Replies

5. Shell Programming and Scripting

sed cut characters of string

helloo I wonder if there's a way to cut characters out of a string and keep only the last 2 by using sed. For example if there's the todays' date: 2012-05-06 and we only want to keep the last 2 characters which are the day. Is there a quick way to do it with sed? (2 Replies)
Discussion started by: vlm
2 Replies

6. Shell Programming and Scripting

sed replacing specific characters and control characters by escaping

sed -e "s// /g" old.txt > new.txt While I do know some control characters need to be escaped, can normal characters also be escaped and still work the same way? Basically I do not know all control characters that have a special meaning, for example, ?, ., % have a meaning and have to be escaped... (11 Replies)
Discussion started by: ijustneeda
11 Replies

7. Shell Programming and Scripting

Delete row if a a particular column has more then three characters in it

Hi i have a data like hw:dsfnsmdf:39843 chr2 76219829 51M atatata 51 872389 hw:dsfnsmdf:39853 chr2 76219839 51M65T atatata 51 872389 hw:dsfnsmdf:39863 chr2 76219849 51M atatata 51 872389 hw:dsfnsmdf:39873 chr2 ... (3 Replies)
Discussion started by: bhargavpbk88
3 Replies

8. Shell Programming and Scripting

Want to remove the last characters from each row of csv using shell script

Hi, I've a csv file seperated by '|' from which I'm trying to remove the excess '|' characters more than the existing fields. My CSV looks like as below. HRLOAD|Service|AddChange|EN PERSONID|STATUS|LASTNAME|FIRSTNAME|ITDCLIENTUSERID|ADDRESSLINE1 10000001|ACTIVE|Testazar1|Testore1|20041|||... (24 Replies)
Discussion started by: rajak.net
24 Replies

9. Shell Programming and Scripting

SED help delete characters in a string

Hi Please help me to refine my syntax. I want to delete the excess characters from the out put below. -bash-3.00$ top -b -n2 -d 00.20 |grep Cpu|tail -1 | awk -F ":" '{ print $2 }' | cut -d, -f1 4.4% us now i want to delete the % and us. How wil i do that to make it just 4.4. Thanks (7 Replies)
Discussion started by: redtred
7 Replies

10. UNIX for Dummies Questions & Answers

outputting selected characters from within a variable

Hi all, if for example I had a variable containing the string 'hello', is the any way I can output, for example, the e and the 2nd l based on their position in the string not their character (in this case 2 and 4)? any general pointers in the right direction will be much appreciated, at... (3 Replies)
Discussion started by: skinnygav
3 Replies

Featured Tech Videos