Outputting characters after a given string and reporting the characters in the row below --sed


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Outputting characters after a given string and reporting the characters in the row below --sed
# 1  
Old 01-13-2019
Outputting characters after a given string and reporting the characters in the row below --sed

I have this fastq file:
Code:
@M04961:22:000000000-B5VGJ:1:1101:9280:7106 1:N:0:86
GGGGGGGGGGGGCATGAAAACATACAAACCGTCTTTCCAGAAATTGTTCCAAGTATCGGCAACAGCTTTATCAATACCATGAAAAATATCAACCACACCA
+test-1
GGGGGGGGGGGGGGGGGCCGGGGGFF,EDFFGEDFG,@DGGCGGEGGG7DCGGGF68CGFFFGGGG@CGDGFFDFEFEFF:30CGAFFDFEFF8CAF;;8
@M04961:22:000000000-B5VGJ:1:1121:9280:7106 1:N:0:86
GGCATGAAAACATACAAACCGTCTTTCCAGAAATTGTTCCAAGTATCGGCAACAGCTTTATCAATACCATGAAAAATATCAACCACACCAGAAGCAGCAT
+test 2
GGGGGGGGGGGGGGGGGCCGGGGGF,EDFFGEDFG,@DGGCGGEGGG7DCGGGF68CGFFFGGGG@CGDGFFDFEFEFF:30CGAFFDFEFF8CAF;@8F
@M04961:22:000000000-B5VGJ:1:1151:9280:7106 1:N:0:86
GGCATGAAAACATACAAACCGTCTTTCCAGAAATTGTTCCAAGTATCGGCAACAGCTTTATCAATACCATGAAAAATATCAACCACACCAGAAGCAGCAT
+more tests
GGGGGGGGGGGGGGGGGCCGGGGGF,EDFFGEDFG,@DGGCGGEGGG7DCGGGF68CGFFFGGGG@CGDGFFDFEFEFF:30CGAFFDFEFF8CAF;+8F
@M04961:22:000000000-B5VGJ:1:1101:26069:7790 1:N:0:86
CAGAACGTGAAAAAGCGTCCTGCGTGTAGCGAACTGCGATGGGCATACTGTAACCATAAGGCCACGTATTTTGCAAGCTGGCATGAAAACATACATTTTT
+few more
CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG+GGGG

And I am using this script to output the three characters after this sting GCATGAAAACATACA:
Code:
sed -n '/^@/{n;s/.*GCATGAAAACATACA\(...\).*/Codon:\t\1\tQuality Score: /p}'

This is my current output:
Code:
Codon:  AAC     Quality Score:
Codon:  AAC     Quality Score:
Codon:  AAC     Quality Score:
Codon:  TTT     Quality Score:

However, I would like to output the three characters in the same position two rows below from each sequence, something like this:
Code:
Codon:  AAC     Quality Score: ,ED
Codon:  AAC     Quality Score: GCC
Codon:  AAC     Quality Score: GCC
Codon:  TTT     Quality Score: +GG

Is there a way to accomplish this with sed?
operative system W10; shell cygwin
# 2  
Old 01-13-2019
There might be a way to do it with sed, but it would be a lot easier with awk. Don't you have access to awk in cygwin?

P.S. Note that cygwin is not a shell; it is a collection of tools that mimic several common tools found on many BSD-, Linux-, and UNIX-systems. The default shell used when running cygwin is usually bash.
# 3  
Old 01-13-2019
I do have access to awk. I would be interested on seeing a solution with awk too.
# 4  
Old 01-13-2019
There are more efficient ways to do this, but this seems to do what you want:
Code:
awk '
BEGIN {	String = "GCATGAAAACATACA"
	StringLen = length(String)
}
/^@/ {	matchline = NR + 1
	qualityline = NR + 3
	next
}
NR == matchline {
	if(spot = index($0, String))
		printf("Codon:\t%s\tQuality Score:\t",
		    substr($0, spot + StringLen, 3))
	else	qualityline = 0
	next
}
NR == qualityline {
	printf("%s\n", substr($0, spot + StringLen, 3))
}' file

with the sample data you provided contained in a file named file, this produces the output:
Code:
Codon:	AAC	Quality Score:	,ED
Codon:	AAC	Quality Score:	GCC
Codon:	AAC	Quality Score:	GCC
Codon:	TTT	Quality Score:	+GG

These 2 Users Gave Thanks to Don Cragun For This Post:
# 5  
Old 01-13-2019
This is probably easier to read and produces the same output:
Code:
awk '
BEGIN {	String = "GCATGAAAACATACA"
	StringLen = length(String)
}
/^@/ {	getline CodonLine
	getline
	getline QualityLine
	if(spot = index(CodonLine, String))
		printf("Codon:\t%s\tQuality Score:\t%s\n",
		    substr(CodonLine, spot + StringLen, 3),
		    substr(QualityLine, spot + StringLen, 3))
}' file

These 2 Users Gave Thanks to Don Cragun For This Post:
# 6  
Old 01-13-2019
Don
Thanks!
PS. If I would like to search for more than one string (GCATGAAAACATACA and TTTCCAGAAATTGT) and report different number characters (3 and 6. I should be able to do it passing the strings and number of charters as variables, right?

Code:
awk -vlen="3 6" -vstr="GCATGAAAACATACA TTTCCAGAAATTGT" '
BEGIN {	for (MX=n=split (str, TMP); n>0; n--) SRCH[TMP[n]] = n
	String = n
	StringLen = length(String)
}
/^@/ {	matchline = NR + 1
	qualityline = NR + 3
	next
}
NR == matchline {
	if(spot = index($0, String))
		printf("Codon:\t%s\tQuality Score:\t",
		    substr($0, spot + StringLen, len))
	else	qualityline = 0
	next
}
NR == qualityline {
	printf("%s\n", substr($0, spot + StringLen, len))
}' test.txt

This User Gave Thanks to Xterra For This Post:
# 7  
Old 01-14-2019
Quote:
Originally Posted by Xterra
Don
Thanks!
PS. If I would like to search for more than one string (GCATGAAAACATACA and TTTCCAGAAATTGT) and report different number characters (3 and 6. I should be able to do it passing the strings and number of charters as variables, right?

Code:
awk -vlen="3 6" -vstr="GCATGAAAACATACA TTTCCAGAAATTGT" '
BEGIN {	for (MX=n=split (str, TMP); n>0; n--) SRCH[TMP[n]] = n
	String = n
	StringLen = length(String)
}
/^@/ {	matchline = NR + 1
	qualityline = NR + 3
	next
}
NR == matchline {
	if(spot = index($0, String))
		printf("Codon:\t%s\tQuality Score:\t",
		    substr($0, spot + StringLen, len))
	else	qualityline = 0
	next
}
NR == qualityline {
	printf("%s\n", substr($0, spot + StringLen, len))
}' test.txt

I'm pretty sure that the code you have shown us above won't do what you want to do, but from your new description I'm not sure what it is that you're trying to do.

The String variable in my code needs to be a string; not a number representing an index into the TMP[] and/or SRC[] arrays. If you're looking for multiple strings on the 2nd line of each group of input lines you're processing, you need to perform more than one index() operation to search for those strings.

If you could give us a clearer description and show us the output you hope to produce from your new requirements, maybe we could help.
This User Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Reporting characters after string

I have a file that looks like this: >ID 1 AATAATTCCGGATCGTGC >ID 2 TTTGACAGTAGAC >ID 3 AGACGATGACGAT I am using the following script to report if AATTCCGGATCG is present in any sequence: awk 'FNR==1{n=substr(FILENAME,1,index(FILENAME,".")-1)} { print n "\t"... (10 Replies)
Discussion started by: Xterra
10 Replies

2. Shell Programming and Scripting

sed replace nth characters with string

Hi, I hope you can help me out please? I need to replace from character 8-16 with AAAAAAAA and the rest should stay the same after character 16 gtwrhtrd11111111rjytwyejtyjejetjyetgeaEHT wrehrhw22222222hytekutkyukrylryilruilrGEQTH hrwjyety33333333gtrhwrjrgkreglqeriugn;RUGNEURGU ... (4 Replies)
Discussion started by: stinkefisch
4 Replies

3. Shell Programming and Scripting

Help with sed command - find a string between two characters

Hi, I have a xml file (Config.xml) <Header name="" TDate="" PDate=""> <Config> {"config" { "Nation" "Pri:|Sec:"}} </Config> </Header> Now I wanted to printed all the strings between "". I tried the following cat Config.xml | sed -n 's/.*\.*//p' ... (8 Replies)
Discussion started by: vivek_damodaran
8 Replies

4. Shell Programming and Scripting

Trouble with sed and substituting a string with special characters in variable

Hey guys, I know that title is a mouthful - I'll try to better explain my struggles a little better... What I'm trying to do is: 1. Query a db and output to a file, a list of column data. 2. Then, for each line in this file, repeat these values but wrap them with: ITEM{ ... (3 Replies)
Discussion started by: ampsys
3 Replies

5. Shell Programming and Scripting

sed cut characters of string

helloo I wonder if there's a way to cut characters out of a string and keep only the last 2 by using sed. For example if there's the todays' date: 2012-05-06 and we only want to keep the last 2 characters which are the day. Is there a quick way to do it with sed? (2 Replies)
Discussion started by: vlm
2 Replies

6. Shell Programming and Scripting

sed replacing specific characters and control characters by escaping

sed -e "s// /g" old.txt > new.txt While I do know some control characters need to be escaped, can normal characters also be escaped and still work the same way? Basically I do not know all control characters that have a special meaning, for example, ?, ., % have a meaning and have to be escaped... (11 Replies)
Discussion started by: ijustneeda
11 Replies

7. Shell Programming and Scripting

Delete row if a a particular column has more then three characters in it

Hi i have a data like hw:dsfnsmdf:39843 chr2 76219829 51M atatata 51 872389 hw:dsfnsmdf:39853 chr2 76219839 51M65T atatata 51 872389 hw:dsfnsmdf:39863 chr2 76219849 51M atatata 51 872389 hw:dsfnsmdf:39873 chr2 ... (3 Replies)
Discussion started by: bhargavpbk88
3 Replies

8. Shell Programming and Scripting

Want to remove the last characters from each row of csv using shell script

Hi, I've a csv file seperated by '|' from which I'm trying to remove the excess '|' characters more than the existing fields. My CSV looks like as below. HRLOAD|Service|AddChange|EN PERSONID|STATUS|LASTNAME|FIRSTNAME|ITDCLIENTUSERID|ADDRESSLINE1 10000001|ACTIVE|Testazar1|Testore1|20041|||... (24 Replies)
Discussion started by: rajak.net
24 Replies

9. Shell Programming and Scripting

SED help delete characters in a string

Hi Please help me to refine my syntax. I want to delete the excess characters from the out put below. -bash-3.00$ top -b -n2 -d 00.20 |grep Cpu|tail -1 | awk -F ":" '{ print $2 }' | cut -d, -f1 4.4% us now i want to delete the % and us. How wil i do that to make it just 4.4. Thanks (7 Replies)
Discussion started by: redtred
7 Replies

10. UNIX for Dummies Questions & Answers

outputting selected characters from within a variable

Hi all, if for example I had a variable containing the string 'hello', is the any way I can output, for example, the e and the 2nd l based on their position in the string not their character (in this case 2 and 4)? any general pointers in the right direction will be much appreciated, at... (3 Replies)
Discussion started by: skinnygav
3 Replies
Login or Register to Ask a Question